Deineko A. A., Shumak A. D.

Kharkiv National University of RadioElectronics

Pattern recognition using deep convolutional neural network

While the amount of digital information and performance of computer systems are increasing, the problem of processing and following using all the accumulated data becomes more actual. One of the types of information processing is pattern recognition – a branch of machine learning that focuses on the recognition of patterns and regularities in data, that is used as a method of classifying and identifying objects, phenomena, processes. One of the subcategories of pattern recognition is the recognition of images. Nowadays, the problem of correct recognition of a huge number of images in many areas, from text recognition for digitization to recognition of faces of criminals, fixed on video surveillance systems, is important. Current approaches to object recognition make essential use of machine learning methods. To improve their performance, we can collect larger datasets, use more powerful models, and introduce better techniques for preventing overfitting. If we have to learn about thousands of objects from millions of images, we really need a model with a large learning capacity. Unfortunately, the immense complexity of the object recognition task means that this problem cannot be specified even by a dataset as large as ImageNet [2], so our model is needed to have lots of prior knowledge to compensate for all the data we don’t have. To solve problems of this kind, it is necessary to use well-designed and trained systems, and convolutional neural networks (CNNs) [1] constitute one such class of models.

Human effectiveness in visual perception far exceeds the performance of modern computer vision systems. Therefore, the idea of convolutional neural networks is to use some features of the visual cortex in which so-called simple cells that react to straight lines from different angles and complex cells associated with the activation of a certain set of elements were discovered. The work of a convolutional neural network can be interpreted as a transition from specific image features to more abstract details and further to even more abstract details, up to highlighting the concepts of the highest level. In the process, the network tunes and adjusts its weights, producing the necessary hierarchy of abstract attributes, the so-called sequence of feature maps, cutting off unimportant details and highlighting the most essential. Looking at architecture of different neural networks, we can highlight one of the main features of CNNs: compared to standard feedforward neural networks with similarly-sized layers, CNNs have much fewer connections and parameters and so they are easier to train, while their theoretically-best performance is likely to be only slightly worse.

Considering the architecture of CNNs in details, it is possible to distinguish the next stages of image processing following one another:

- convolution operation that is the process of applying a limited matrix of small weights that are arbitrarily moved along the entire processed layer, forming after each shift an activation signal for the neuron of the next layer with the same position;

- subsampling operation that performs a dimension reduction of the generated sequences of feature cards.

The network consists of a large number of layers. After the initial layer, which receives the input image, the signal passes through a series of convolution layers, in which alternate convolution and subsampling operations. Alternating layers allows us to compose feature cards from previous feature cards, wherein on each next layer the map decreases in size, but the number of channels increases. In practice, this means the ability to recognize complex object feature hierarchies. In most cases, after passing through several layers, the feature card degenerates into a vector or, sometimes, a scalar, but wherein the number of feature cards grows to hundreds or thousands, depending on the size and detail of the image. At the output of the convolutional layers of the network, the final feature cards are sent to the input of multilayer perceptron, i.e. a fully connected neural network. Also, between convolutional and subsampling operations, there can be added an additional ReLU [3] layer that applies the non-saturating activation function. In most realizations of CNNs it is used the function f(x)=ReLU(x)=max(0, x), i.e. choosing the maximum element in the area. It increases the nonlinear properties of the decision function and of the overall network without affecting the receptive fields of the convolution layer.

In addition to standard CNNs, there are several different implementations which allow us to use it even on systems with low performance and small devices such as cell phone. For instance, let us consider two efficient approximations to standard CNNs: Binary-Weight-Networks and XNOR-Networks [6]. In Binary-Weight-Networks, the filters are approximated with binary values resulting in x32 memory saving. In XNOR-Networks, both the filters and the input to convolutional layers are binary. XNOR-Networks approximate convolutions using primarily binary operations. This results also in x58 faster convolutional. As CNNs are launched on full desktop system, because of architecture their operations can be computed with parallel computing, using, for example, a high performance GPU [5] with a large number of cores such as NVidia CUDA. It results in very good efficiency in processing inside CNNs compared with other neural networks. For getting familiar with the accuracy of CNNs and other NNs, we could refer to the table listed below that contains the ILSVRC 2014 Detection Challenge results.

Table 1 – Detection performance

Team name	Mean AP	Approach
BDAT	0.731392	CNN
DeepView	0.593084	RCNN
NUS-Qihoo_DPNs	0.656932	DPN+CNN
KAISTNIA	0.61022	RPN

Bibliography:

1. LeCun, Y.; Bengio, Y.; Hinton, G. (2015). "Deep learning". Nature. 521 (7553): 436–444.

2. Krizhevsky, A.; Sutskever, I.; Hinton, G. E. (2012). "Imagenet classification with deep convolutional neural networks". Advances in Neural Information Processing Systems. 1: 1097–1105.

3. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

4. Courbariaux, M., Bengio, Y.: Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1. CoRR (2016)

5. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. (2014). "Going Deeper with Convolutions". Computing Research Repository.

6. Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A.: XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. arXiv:1603.05279v4 (2016).