P.h.D. Kryuchin O.V.

Tambov State University named after G.R. Derzhavin

The analytic model of the artificial neural networks training using the parallel gradient method

 

In this paper we will analyze formulas which are used by gradient algorithms (for example the method of the steepest descent, RPROP or QuickProp). We can see that the -th weight at the -th iteration can be calculated by formula

,

(1)

where  is the function for calculating weight value ,  is the step value (for the i-th weight at the I-th iteration) and is the gradient (for its calculation by formulas

(2)

(3)

we need vector only) [1, 2]. It means that if weights at the -th iteration is known that it is possibly to calculate values at the -th iteration and it is not necessary to know other weight.


Fig. 1. The scheme of the weights location in IR-elements.

 

This means that we can divide the weights to number of information resources elements (IR-elements). Such elements can be processes or node of a computer cluster for example. This scheme is shown in picture 1.

If we analyse the information process for the lead and non-lead IR-elements then we can see that the training algorithm consist of few steps [1-3]:

1.     forming the ANN structure (with the weights initialization);

2.     sending the ANN structure to all non-lead IR-elements;

3.     calculating weights vector ;

4.     receiving  values from all IR-elements;

5.     creating vector  and calculating inaccuracy value ;

6.     checking the stop necessity;

7.     sending the stop (or not stop) command to all non-lead IR-elements;

8.     if the training has not stop then the sending weights to all IR-elements and go to the third step.

It was the algorithm for the lead IR-element, non-lead use other:

1.     receiving the ANN structure from the lead IR-element;

2.     calculating weights ;

3.     sending weights  to the lead IR-element;

4.     receiving the stop command;

5.     if the training has not end then the receiving full weights vector and go to the second step [4].

If we will analyze formulas which are used by gradient methods then we can see that the the steepest descent and QuickProp execute 3 multiplicative and 2 additive operations and RPROP method executes three multiplicative and one additive operation [1-2, 4]. We should remember that it is necessary () multiplicative and () additive operations for one gradient element calculation. Thus we can count multiplicative and additive operations for one gradient methods iteration (tab. 1).

 

Table 1. Number of operations for one iteration.

Method

Mult operations number

Add operations number

The steepest descent

QuickProp

RPROP

 

Each parallel iteration of the information process which uses gradient methods consist of few steps:

·                   sending weights  by the lead IR-element;

·                   calculating part of gradient and weights;

·                   sending new weights to the lead IR-element [1].

The first step needs  multiplicative and  additive operations at the lead IR-element (we will symbol it as ) and  at the non-lead (we will symbol it as ). This value consists of  operations for the data sending and  for the waiting. Operations number which are necessary for the second step are shown in table 2. At the third step the non-lead IR-element executes  multiplication and  additive operations. The lead IR-element executes   multiplicative operations ( for the sending and  for the waiting) and  additive operations.

 

Table 2. Number of operations which are executed at the second iteration of information process.

Method

Operations number

Lead IR-element

Non-lead IR-element

multiplicative

additive

multiplicative

additive

The st. descent

QuickProp

RPROP

 

For reduction of addition operations to multiplication operations the coefficient  is used. This coefficient value is directly proportional to the time spent for one addition operation and inversely related to the time spent for one multiplication operation. So one multiplication operation needs the time which is necessary for  addition operations and one addition operation can be changed to  multiplication operations [4].

The first step needs  operations at the lead IR-element and  operations at other. Values of the second step ( and ) is show in table 2. The third step executes  operations at the non-lead IR-element that is why it needs to wait  operations before receiving.

So the parallel training information process executes

(4)

operations. Here information process executes  operations of the parallel algorithms ( operations are executed at the last step and  is other operations). The efficiency of parallel information process can be calculated by formula

So the analitic model can be wrriten as

for the steepest descent and for QuickProp formula

(6)

(7)

for the RPROP formula. Here  is the iterations number,  is the other operations number.

 

Bibliography

1.     Крючин О.В. Разработка параллельных градиентных алгоритмов обучения искусственной нейронной сети // Электронный журнал "Исследовано в России", 096, стр. 1208-1221, 2009 г. // Режим доступа: http://zhurnal.ape.relarn.ru/articles/2009/096.pdf Загл. с экрана.

2.     Крючин, О.В. Разработка параллельных эвристических алгоритмов подбора весовых коэффициентов искусственной нейронной сети / О.В. Крючин // Информатика и ее применение. 2010. Т. 4, Вып. 2. C. 53-56.

3.     Крючин, О.В. Параллельные алгоритмы обучения искусственных нейронных сетей с использованием градиентных методов / О.В. Крючин // Актуальные вопросы современной науки, техники и технологий: матер. II Всерос. науч.-практ. (заоч.) конф. М: 2010. C. 81-86.

4.     Крючин, О.В. Параллельные алгоритмы обучения искусственных нейронных сетей / О.В. Крючин // Информационные технологии и математическое моделирование (ИТММ-2009) : матер. VIII Всерос. науч.-практ. конф. с междунар. участием, 12-13 ноября 2009 года. Томск:, 2009. Ч. 2. С. 241-244.