Object Detection – What does it mean?
Object detection is a process where an object can be identified or located in an image or a video through the help of computer vision. If we have an image having a cat and a person and we have to classify the objects within an image, then the technique of detecting objects i.e. object detection can be applied and the required objects i.e. the cat and person can be located. This can be achieved through bounding boxes that are drawn around the detected objects. Hence the location of the desired objects is known.
Most people still get confused with object recognition and image recognition. The difference between the two lies in the fact that in the case of image recognition, a whole image is labelled while in object recognition, objects within the image are labelled.
The object’s class such as a person, table, etc., and the coordinates of objects location in a given image can be identified through the technique of detecting objects. The performance of an object detecting algorithm depends on its ability for locating objects within an image. One such example of detecting objects is face detection.
Why Object Detection?
Due to the unique capabilities belonging to object detection, the process can be applied in a lot of important tasks, such as:
- Counting of people in a crowd.
- Driving in the case of self-driving cars.
- Video surveillance.
- Detection of the face.
- Anomaly detection
Object Detection Approaches
The algorithms used for object detection method might be trained prior to its use or unsupervised.
- Various features of an image such as the color histogram or edges, etc. are being looked upon by the ML-based approaches. Regression models based on these features are then generated and the location of the object is predicted.
- Convolutional neural networks (CNNs) are employed in the case of approaches based on deep learning for detecting an object through unsupervised methods.
Working of an Object Detection Method
Object detection task can be carried out through the following steps:
- The input image is broken down into several small segments. Sets of boxes bound together are created spanning the whole image.
- Each segmented area is subjected to the process of feature extraction. It then predicts the presence of valid objects within the box. The process determines if there are any visual features present in the box.
- A single box is constructed for the overlapping boxes.
Object detection using TensorFlow
TensorFlow can be defined as a library that is open-source for machine learning and has been widely used in several applications like image recognition, voice search, object recognition, etc. Both Python and C++ APIs are provided by TensorFlow.
- Both deep learning algorithms and machine learning algorithms are present within TensorFlow.
- Python is employed as the front-end language and also runs efficiently in C++.
- A computation graph is created by the developers using TensorFlow.
- Mathematical operations are represented by nodes in a graph and the data is represented through the connections.
TensorFlow was developed for conducting research over machine learning and deep neural networks by the Google Brain team within Google’s Machine Intelligence Research organization.
An API (Application Programming Interface) prevents the developers from writing codes from scratch through the provision of a set of common operations. TensorFlow object detection API is applied to train the models for object detection. The framework is built over the TensorFlow. The features belonging to the object detection TensorFlow are:
- Models which are already trained are called the “Model Zoo” and are available in the framework.
- The datasets used for training the various trained models are:
- COCO dataset.
- KITTI dataset.
- Dataset of open Images
The object detection TensorFlow framework consists of various models having varied architecture and hence different accuracies of prediction. The types of architecture of the already trained models are:
A network of single convolution identifies the location of the bounding box at a single pass. The architecture consists of a base layer (MobileNet) with several layers of convolution. The bounding boxes locations are predicted through the operation on the feature maps. The information present with every bounding box is listed below:
- Offset locations of the bounding box in the four corners (cx, cy, w, h).
- Probabilities of C class (c1, c2, …cp)
The box shape is not predicted by SSD rather the locations of the box are predicted. The K number of bounding boxes is determined for each feature map location. The shape of the k bounding box is already set before the actual training.
The following equation computes the loss.
L=1/N (L class + L box)
Where, N: matched boxes number, L class: softmax loss, L box: error associated with the matched boxes.
A standardized convolution is factorized into a convolution and a convolution which is pointwise i.e. 1*1 convolutions. The computation is reduced with reduction in model size through factorization.
Inception-SSD has the same architecture as MobileNet-SSD, however, the base of the architecture in the case of MobileNet-SSD was MobileNet, and here it is the Inception model.
4. Faster RCNN
Prediction of the object’s location is based on algorithms of region proposal. The detection networks have reduced their running time through the advancement in SSPnet and Fast R-CNN. A convolutional feature map is generated when an input image gets fed into the neural network in the case of Faster RCNN.
The region proposal is then identified through the convolutional feature map and warped into squares. The squares are then reshaped into a size that is fixed through a ROL pooling layer (Region Of Interest Layer). It is then used as an input to a layer that is connected fully.
The softmax layer is used for predicting the region proposal class from the ROL feature vector. Also bounding box offset values are predicted.
Selection of the Object Detection TensorFlow Model
The right Object Detection TensorFlow Model can be chosen from the TensorFlow API based on the specific requirements of the user. The single-shot detection network, i.e. the SSD network can be used if the user requires a high-speed model. The model is quite faster and can detect video feed at a high fps.
However, if more accuracy is required then the FasterRCNN might be a better choice as the model accuracy is high but has a comparatively slower speed. Therefore, the user can explore the various available opportunities as per his requirements.
An example of TensorFlow for object detection
Usage of the TensorFlow API for object detection doesn’t require prior knowledge of machine learning or the neural networks. The files provided by the API will be mostly used. The only requirement is knowing the fundamentals of python.
1. Downloading TensorFlow
- TensorFlow can be either downloaded through the git or manually downloaded.
- Downloading TensorFlow through git is one of the easiest ways for downloading. For downloading through git, the system should already have git pre installed in it. Once, git is installed the following command should be typed in the terminal.
- The following link has to be visited and the green button has to be clicked. The zipped files have to be downloaded and extracted.
- The folder has to be renamed into models from models-master.
- An environment has to be created virtually. One of the main objectives of creating a python virtual environment is creating a python environment that is isolated. It is to be used for projects under python. Therefore, the dependencies of every project will be different.
- The following commands have to be used in the prompt of anaconda:
In this case, the virtual environment is named obj_detection
conda create -n obj_detection -> for setting up the environment virtually
conda activate obj_detection -> for activating the virtual environment
2. Installing dependencies
- Dependencies that are required by the API have to be installed on the local PC.
- The dependencies can be installed after activating the virtual environment.
- The following command should be typed
pip install tensorflow
- If a GPU is present, the following command is required
pip install tensorflow-gpu
- The other dependencies are to be installed through the following command
pip install pillow Cython lxml jupyter matplotlib contextlib2 tf_slim
3. Downloading Protocol Buffers (Protobuff)
- The protocol buffers are certain mechanisms for structuring the data serially like the XML.
- ‘Protobuff’ has to be downloaded from the link.
- Extract the files and copy the files to the subfolder named “research” in the already downloaded “models” folder.
- The folder containing the protobuf files has to be navigated onto and running the following command
protoc object_detection/protos/*.proto –python_out=.
- Successful execution of the command will create a python file against each of the proto files in the protos folder under object detection in the models folder.
Object detection is a widely applied technique in various real-time applications. We have learned that the technique can be achieved through the application of machine learning or deep learning algorithms.
Also, with the application of an API, i.e. TensorFlow, the users need to have prior knowledge of the python programming concept. Only then the object detection methods using the TensorFlow API could be properly understood. TensorFlow is an open-source platform for machine learning. Therefore, to understand the working mechanism of TensorFlow and its application, it’s better to gain knowledge of the machine learning concepts.
If you are a mid-level professional who wants to learn python programming, deploy ML models along with cloud computing, you can check out the course “Master of Science in Machine Learning & Artificial Intelligence” provided by upGrad. The course jointly certified by IIIT Bangalore & LJMU will align your goals with the expert training and let you prepare for your entry into your dream companies. All you need is to have a bachelor’s degree. Our assistance team is available to get back to you if there are any queries related to the course by upGrad.