Cовременные информационные технологии/2. Вычислительная техника и программирование

Gunawardana R.S.J.

National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Ukraine

Modern computer vision algorithms for microcomputers

First, it is necessary to define the term "Computer Vision". Computer Vision is a way to determine the objects that have been captured on a particular image or video. Moreover, some of the actions that may seem simple for human beings may be incomprehensible and impossible to be accomplished by computer.

The most frequently visited place, where you may meet with computer vision is a supermarket. When the scanner reads the goods bar code. They are designed to be recognized by the computer. As soon as the computer receives a black and white picture, it understands what is shown in the picture. The machine processes image data and extracts numbers, words, names and any other information, which it has been programmed to understand.

In addition, computer vision is often applied on the roads. Usually, they are placed in the chambers, which are hanging above the road and monitor over speed. As soon as the machine moves past the camera, the computer automatically reads the number. This technology can be called as the computer vision despite being the narrow and frequent task. Recognition of road numbers is spread worldwide.

Now let us compare the differences and common parts between the man and the computer in the recognition.

First, let us talk about people. Right after the birth, a baby begins to evaluate the world with all possible parties to look at things, learning to learn and remember the person objects. Let us assume - we, all our life we study and we train to see every possible object and to recognize them. If we at least once saw a certain object, all subsequent ones, we can compare with what we saw before. After all, if we show the car in profile, then you can immediately add an idea of how it looks from all sides.

We accumulate such a huge amount of information that it can't be calculated in bytes, kilobytes, megabytes, gigabytes, terabytes, and this process does not stop for a second, we continue to expand the capabilities of our neural network, we learn to restore a 3D image in a flat picture. If we show a photo or a picture, for example, at home, we understand that this is the house where it is located and so on. The entire computer can't do by itself so that he can do it, he needs to collect a certain database of information, which in the future can be based on examples that are difficult to collect at the moment.

The attributes by which we are able to distinguish between different objects are local and global. Let us assume, taking a picture on which it is represented: the white car, which is in a wood. Asking the person that is pictured in the picture, and he will answer - a white car. In this situation, he took local signs. In general, the picture is green, it shows a forest, a road, and the car occupies its smallest area. Nevertheless, our brain perceives this picture and understands that the main object on it is a machine, and then we will only describe it. If you look for similar pictures, the main criterion of similarity will be the car. Unlike us, the computer does not know how to allocate and classify objects at a speed equal to ours, and therefore, global signs, analyzing the entire picture, will guide it. That is why during the search among similar ones with this picture, you can find a picture that shows: a forest, a road, and a white umbrella. You immediately think that this is a mistake, because the first picture shows a car, and on this - an umbrella. Nevertheless, global signs guide the computer, will look for pictures with forest and road, and not just with the machine.

This action is called feature detection. Detection is the detection of certain objects in an image. It is important to distinguish between the concepts of detecting and recognizing. The detector says that in this picture, there is a car, and the recognizer - calculates the car make, color, model and other components. The process that precedes feature detection is image segmentation.

Image segmentation is a process of splitting digital images into multiple segments (sets of pixels). Its goal is to simplify the further image processing. The result of image segmentation is set of super pixels with associated labels that together completely cover the original picture.

Image recognition is a complex form of detection, which involves detection algorithm and identification stage. Therefore, recognition allows not only detecting the object but also identifying it.

The next task that is currently being faced in computer vision is tracking. Tracking is a process that involves discovering the object or some event in the first picture and identifying it on the other frames. So, during this process, the inter-frame relationships are discovered. For that, every frame should supply its sequence number or timestamp. Actually, tracking is similar to detection and recognition except that these two processes do not need to know anything about inter-frame relationships.

Currently, the computer vision algorithms are adopting at small and cheap devices. The computing power of such devices does not allow performing heavy computations. Thus, we need to use a simple and time deterministic algorithm.

The example of such algorithms is Haar feature-based cascade classifier for object detection. This algorithm was used in first real time face detection machine. It is different from its competitors mostly by calculation speed. So, using integral images, each of Haar features can be calculated in constant time. Haar classifier is based on machine learning. Therefore, it has to be trained with positive and negative datasets in order to work properly. Only after that, it can be used as object detection algorithm.

One of the most popular implementations of computer vision algorithms for microcomputers such as Arduino is OpenCV. This library is ported to the various platforms including Linux, Windows, MacOS, Android, iOS. It provides bindings for many programming languages including Python, Java, C, C++. That kind of portability allows launching computer vision application even in the mobile phones.

One of the applications of Haar-cascade is face detection. You can use this feature with OpenCV using the previously trained samples. If you prefer using your own data set for classifier training then you can train it using OpenCV as it ships with a trainer. In that way, you can train the classifier to detect plains, spaceships, etc.