Hello Codings: Detecting Objects using Machine Learning I

Detecting Objects using YOLO

This post deals with my small project on YOLO. It is a great project which if linked with an Arduino will certainly make you win Google Science Fair. Pardon 😁

It also enables to localize the object. If you lost your specs then maybe this will certainly work.

So YOLO stands for "You Only Look Once". Yes, YOLO looks at the image only once. It works by dividing the image into K x K cells

A bit like this

Fig 1 - Image divided into cells

Before working on YOLO have a look at its output like this when I ran an edited version of YOLO over the above image.

Fig 2 - YOLO Output using classification.

Each of these yellow boxes is called bounding box in YOLO language. Each cell in Fig 1 will generate bounding boxes. Treat each image cell as an individual cell and a CNN is run over that image to extract out the features from it. If that feature is significant then a bounding box is drawn over that portion of a particular cell with a bounding information or confidence score.

Remember

Higher is the significance, higher is the boundary information or confidence score.

Bounding information is the thickness of the bounding box. More significant items get a thicker boundary box. On the contrary, less significant items get a thinner boundary. When these cells are merged then all boundary boxes with approximately same boundary information gets converted into a bigger boundary box called as boundary box group. However, this boundary box does not classify any object. It just provides the significance score. This process continues and the result is the Fig 2 or it can even be Fig 3 below.

Fig 3

Back to work now.
If you are imagining the boundary score then prefer the below image.

Fig 4 Confidence Score a.k.a Boundary Information

The boundary boxes with a higher score are used for classification. So first we find out whether there is a boundary box present, second, it predicts the class of the information inside the bounding box. YOLO can detect up to 20 different objects. Some of em are dogs, person, cars, traffic lights etc.

Now YOLO combines the results of image classification and marks the boundary group which contains the complete object. After this, only those boxes are kept whose box information is highest i.e it represents a full object inside or at least more than 80%. Rest other insignificant boxes are removed.

The result is in Fig 2.

Every bounding box has 5 parameters namely x, y, w, h and its score. x and y is the center of the boundary box within the cell. w and h is the width and height of the boundary within the cell respectively. So if we feed an image in to Tensorflow, we would get

K * K * (B * 5 + C) tensors. C is the total number of classes. K is the number of cells.

To begin our own we need

A long nap
Microsoft Visual Studio 2k15 Click here to download or here.
OpenCV 3.0
CUDA 8.0 Click here to download

If you have the 2nd item then I guess you can move over the 1st item or else put the 2nd item on download and follow the 1st item.

**Important**

Install Visual Studio in "custom" mode. Then select Visual C++ in programming languages and also Common Tools.

**Important**

Then clone the following repository.

https://github.com/thtrieu/darkflow.git

Extract the folder in the default python folder of your OS. My extracted folder name was "Darkflow-masters". Do check yours. It may differ

Now open Command Prompt as admin and cd to your scripts folder. The script folder lies inside Python folder. My python folder's location is E:\PyPy so to cd there I opened CMD as an admin. The below screenshot shows how to reach to your scripts folder.