Deineko A. A., Shumak A. D.
Kharkiv National University of RadioElectronics
Pattern recognition using deep convolutional neural
network
While the amount of digital
information and performance of computer systems are increasing, the problem of
processing and following using all the accumulated data becomes more actual.
One of the types of information processing is pattern recognition – a branch of
machine learning that focuses on the recognition of patterns and regularities
in data, that is used as a method of classifying and identifying objects, phenomena,
processes. One of the subcategories of pattern recognition is the recognition
of images. Nowadays, the problem of correct recognition of a huge number of
images in many areas, from text recognition for digitization to recognition of
faces of criminals, fixed on video surveillance systems, is important. Current
approaches to object recognition make essential use of machine learning
methods. To improve their performance,
we can collect larger datasets, use more powerful models, and introduce better
techniques for preventing overfitting. If we have to learn about thousands of
objects from millions of images, we really need a model with a large learning
capacity. Unfortunately, the immense complexity of the object recognition task
means that this problem cannot be specified even by a dataset as large as
ImageNet [2], so our model is needed to have lots of prior knowledge to
compensate for all the data we don’t have. To solve problems of this kind, it
is necessary to use well-designed and trained systems, and convolutional neural
networks (CNNs) [1] constitute one such class of models.
Human effectiveness in
visual perception far exceeds the performance of modern computer vision
systems. Therefore, the idea of convolutional neural networks is to use some
features of the visual cortex in which so-called simple cells that react to
straight lines from different angles and complex cells associated with the
activation of a certain set of elements were discovered. The work of a
convolutional neural network can be interpreted as a transition from specific
image features to more abstract details and further to even more abstract
details, up to highlighting the concepts of the highest level. In the process,
the network tunes and adjusts its weights, producing the necessary hierarchy of
abstract attributes, the so-called sequence of feature maps, cutting off
unimportant details and highlighting the most essential. Looking at
architecture of different neural networks, we can highlight one of the main
features of CNNs: compared to standard feedforward neural networks with similarly-sized
layers, CNNs have much fewer connections and parameters and so they are easier
to train, while their theoretically-best performance is likely to be only
slightly worse.
Considering the architecture
of CNNs in details, it is possible to distinguish the next stages of image
processing following one another:
-
convolution
operation that is the process of applying a limited matrix of small weights
that are arbitrarily moved along the entire processed layer, forming after each
shift an activation signal for the neuron of the next layer with the same
position;
-
subsampling
operation that performs a dimension reduction of the generated sequences of
feature cards.
The network consists of a
large number of layers. After the initial layer, which receives the input
image, the signal passes through a series of convolution layers, in which
alternate convolution and subsampling operations. Alternating layers allows us
to compose feature cards from previous feature cards, wherein on each next
layer the map decreases in size, but the number of channels increases. In
practice, this means the ability to recognize complex object feature
hierarchies. In most cases, after passing through several layers, the feature
card degenerates into a vector or, sometimes, a scalar, but wherein the number
of feature cards grows to hundreds or thousands, depending on the size and
detail of the image. At the output of the convolutional layers of the network, the
final feature cards are sent to the input of multilayer perceptron, i.e. a
fully connected neural network. Also, between convolutional and subsampling
operations, there can be added an additional ReLU [3]
layer that applies the non-saturating activation function. In most realizations
of CNNs it is used the function f(x)=ReLU(x)=max(0,
x), i.e. choosing the
maximum element in the area. It increases the nonlinear properties of the
decision function and of the overall network without affecting the receptive
fields of the convolution layer.
In addition to standard CNNs, there are several different
implementations which allow us to use it even on systems with low performance
and small devices such as cell phone. For instance, let us consider two
efficient approximations to standard CNNs: Binary-Weight-Networks and
XNOR-Networks [6]. In Binary-Weight-Networks, the filters are approximated with
binary values resulting in x32 memory saving. In XNOR-Networks, both the
filters and the input to convolutional layers are binary. XNOR-Networks
approximate convolutions using primarily binary operations. This results also
in x58 faster convolutional. As CNNs are launched on full desktop system,
because of architecture their operations can be computed with parallel
computing, using, for example, a high performance GPU [5] with a large number
of cores such as NVidia CUDA. It results in very good efficiency in processing
inside CNNs compared with other neural networks. For getting familiar with the
accuracy of CNNs and other NNs, we could refer to the table listed below that
contains the ILSVRC 2014 Detection Challenge results.
Table 1 – Detection performance
|
Team name |
Mean AP |
Approach |
|
BDAT |
0.731392 |
CNN |
|
DeepView |
0.593084 |
RCNN |
|
NUS-Qihoo_DPNs |
0.656932 |
DPN+CNN |
|
KAISTNIA |
0.61022 |
RPN |
Bibliography:
1. LeCun, Y.; Bengio, Y.; Hinton, G. (2015).
"Deep learning". Nature. 521 (7553): 436–444.
2. Krizhevsky, A.; Sutskever, I.; Hinton,
G. E. (2012). "Imagenet classification with deep
convolutional neural networks". Advances in Neural Information Processing
Systems. 1: 1097–1105.
3. Simonyan, K., Zisserman, A.: Very deep convolutional networks
for large-scale image recognition. arXiv preprint
arXiv:1409.1556 (2014)
4. Courbariaux, M., Bengio, Y.: Binarynet: Training
deep neural networks with weights and activations constrained to +1 or -1. CoRR (2016)
5. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov,
D.; Erhan, D.; Vanhoucke, V.;
Rabinovich, A. (2014). "Going Deeper with
Convolutions". Computing Research Repository.
6. Rastegari, M.; Ordonez, V.; Redmon,
J.; Farhadi, A.: XNOR-Net: ImageNet Classification
Using Binary Convolutional Neural Networks. arXiv:1603.05279v4
(2016).