Cовременные информационные
технологии/2. Вычислительная техника и программирование
Gunawardana R.S.J.
National Technical
University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute",
Ukraine
Modern computer vision
algorithms for microcomputers
First, it is necessary to define the
term "Computer Vision". Computer Vision is a way to determine the
objects that have been captured on a particular image or video. Moreover, some
of the actions that may seem simple for human beings may be incomprehensible
and impossible to be accomplished by computer.
The most frequently visited place,
where you may meet with computer vision is a supermarket. When the scanner
reads the goods bar code. They are designed to be recognized by the computer.
As soon as the computer receives a black and white picture, it understands what
is shown in the picture. The machine processes image data and extracts numbers,
words, names and any other information, which it has been programmed to
understand.
In addition, computer vision is often
applied on the roads. Usually, they are placed in the chambers, which are
hanging above the road and monitor over speed. As soon as the machine moves
past the camera, the computer automatically reads the number. This technology
can be called as the computer vision despite being the narrow and frequent
task. Recognition of road numbers is spread worldwide.
Now let us compare the differences
and common parts between the man and the computer in the recognition.
First, let us talk about people.
Right after the birth, a baby begins to evaluate the world with all possible
parties to look at things, learning to learn and remember the person objects. Let
us assume - we, all our life we study and we train to see every possible object
and to recognize them. If we at least once saw a certain object, all subsequent
ones, we can compare with what we saw before. After all, if we show the car in
profile, then you can immediately add an idea of how it looks from all sides.
We accumulate such a huge amount of
information that it can't be calculated in bytes, kilobytes, megabytes,
gigabytes, terabytes, and this process does not stop for a second, we continue
to expand the capabilities of our neural network, we learn to restore a 3D
image in a flat picture. If we show a photo or a picture, for example, at home,
we understand that this is the house where it is located and so on. The entire
computer can't do by itself so that he can do it, he needs to collect a certain
database of information, which in the future can be based on examples that are
difficult to collect at the moment.
The attributes by which we are able
to distinguish between different objects are local and global. Let us assume,
taking a picture on which it is represented: the white car, which is in a wood.
Asking the person that is pictured in the picture, and he will answer - a white
car. In this situation, he took local signs. In general, the picture is green, it
shows a forest, a road, and the car occupies its smallest area. Nevertheless,
our brain perceives this picture and understands that the main object on it is
a machine, and then we will only describe it. If you look for similar pictures,
the main criterion of similarity will be the car. Unlike us, the computer does
not know how to allocate and classify objects at a speed equal to ours, and
therefore, global signs, analyzing the entire picture, will guide it. That is
why during the search among similar ones with this picture, you can find a
picture that shows: a forest, a road, and a white umbrella. You immediately
think that this is a mistake, because the first picture shows a car, and on
this - an umbrella. Nevertheless, global signs guide the computer, will look
for pictures with forest and road, and not just with the machine.
This action is called feature detection.
Detection is the detection of certain objects in an image. It is important to
distinguish between the concepts of detecting and recognizing. The detector
says that in this picture, there is a car, and the recognizer - calculates the
car make, color, model and other components. The process that precedes feature
detection is image segmentation.
Image segmentation is a process of
splitting digital images into multiple segments (sets of pixels). Its goal is
to simplify the further image processing. The result of image segmentation is
set of super pixels with associated labels that together completely cover the
original picture.
Image recognition is a complex form
of detection, which involves detection algorithm and identification stage. Therefore,
recognition allows not only detecting the object but also identifying it.
The next task that is currently
being faced in computer vision is tracking. Tracking is a process that involves
discovering the object or some event in the first picture and identifying it on
the other frames. So, during this process, the inter-frame relationships are
discovered. For that, every frame should supply its sequence number or
timestamp. Actually, tracking is similar to detection and recognition except
that these two processes do not need to know anything about inter-frame
relationships.
Currently, the computer vision
algorithms are adopting at small and cheap devices. The computing power of such
devices does not allow performing heavy computations. Thus, we need to use a
simple and time deterministic algorithm.
The example of such algorithms is
Haar feature-based cascade classifier for object detection. This algorithm was
used in first real time face detection machine. It is different from its
competitors mostly by calculation speed. So, using integral images, each of
Haar features can be calculated in constant time. Haar classifier is based on
machine learning. Therefore, it has to be trained with positive and negative
datasets in order to work properly. Only after that, it can be used as object
detection algorithm.
One of the most popular
implementations of computer vision algorithms for microcomputers such as
Arduino is OpenCV. This library is ported to the various platforms including
Linux, Windows, MacOS, Android, iOS. It provides bindings for many programming
languages including Python, Java, C, C++. That kind of portability allows launching
computer vision application even in the mobile phones.
One of the applications of
Haar-cascade is face detection. You can use this feature with OpenCV using the
previously trained samples. If you prefer using your own data set for
classifier training then you can train it using OpenCV as it ships with a
trainer. In that way, you can train the classifier to detect plains,
spaceships, etc.