P

P.h.D. Kryuchin O.V.

Tambov State University named after G.R. Derzhavin

The analytic model of the artificial neural networks training using the parallel gradient method

In this paper we will analyze formulas which are used by gradient algorithms (for example the method of the steepest descent, RPROP or QuickProp). We can see that the -th weight at the -th iteration can be calculated by formula

,	(1)

where is the function for calculating weight value , is the step value (for the i-th weight at the I-th iteration) and is the gradient (for its calculation by formulas

	(2)
	(3)

we need vector only) [1, 2]. It means that if weights at the -th iteration is known that it is possibly to calculate values at the -th iteration and it is not necessary to know other weight.

Fig. 1. The scheme of the weights location in IR-elements.

This means that we can divide the weights to number of information resources elements (IR-elements). Such elements can be processes or node of a computer cluster for example. This scheme is shown in picture 1.

If we analyse the information process for the lead and non-lead IR-elements then we can see that the training algorithm consist of few steps [1-3]:

1. forming the ANN structure (with the weights initialization);

2. sending the ANN structure to all non-lead IR-elements;

3. calculating weights vector ;

4. receiving values from all IR-elements;

5. creating vector and calculating inaccuracy value ;

6. checking the stop necessity;

7. sending the stop (or not stop) command to all non-lead IR-elements;

8. if the training has not stop then the sending weights to all IR-elements and go to the third step.

It was the algorithm for the lead IR-element, non-lead use other:

1. receiving the ANN structure from the lead IR-element;

2. calculating weights ;

3. sending weights to the lead IR-element;

4. receiving the stop command;

5. if the training has not end then the receiving full weights vector and go to the second step [4].

If we will analyze formulas which are used by gradient methods then we can see that the the steepest descent and QuickProp execute 3 multiplicative and 2 additive operations and RPROP method executes three multiplicative and one additive operation [1-2, 4]. We should remember that it is necessary () multiplicative and () additive operations for one gradient element calculation. Thus we can count multiplicative and additive operations for one gradient methods iteration (tab. 1).

Table 1. Number of operations for one iteration.

Method	Mult operations number	Add operations number
The steepest descent
QuickProp
RPROP

Each parallel iteration of the information process which uses gradient methods consist of few steps:

· sending weights by the lead IR-element;

· calculating part of gradient and weights;

· sending new weights to the lead IR-element [1].

The first step needs multiplicative and additive operations at the lead IR-element (we will symbol it as ) and at the non-lead (we will symbol it as ). This value consists of operations for the data sending and for the waiting. Operations number which are necessary for the second step are shown in table 2. At the third step the non-lead IR-element executes multiplication and additive operations. The lead IR-element executes multiplicative operations ( for the sending and for the waiting) and additive operations.

Table 2. Number of operations which are executed at the second iteration of information process.

Method	Operations number
	Lead IR-element		Non-lead IR-element
	multiplicative	additive	multiplicative	additive
The st. descent
QuickProp
RPROP

For reduction of addition operations to multiplication operations the coefficient is used. This coefficient value is directly proportional to the time spent for one addition operation and inversely related to the time spent for one multiplication operation. So one multiplication operation needs the time which is necessary for addition operations and one addition operation can be changed to multiplication operations [4].

The first step needs operations at the lead IR-element and operations at other. Values of the second step ( and ) is show in table 2. The third step executes operations at the non-lead IR-element that is why it needs to wait operations before receiving.

So the parallel training information process executes

	(4)

operations. Here information process executes operations of the parallel algorithms ( operations are executed at the last step and is other operations). The efficiency of parallel information process can be calculated by formula

So the analitic model can be wrriten as

for the steepest descent and for QuickProp formula

(6)

(7)

for the RPROP formula. Here is the iterations number, is the other operations number.

Bibliography

1. Крючин О.В. Разработка параллельных градиентных алгоритмов обучения искусственной нейронной сети // Электронный журнал "Исследовано в России", 096, стр. 1208-1221, 2009 г. // Режим доступа: http://zhurnal.ape.relarn.ru/articles/2009/096.pdf Загл. с экрана.

2. Крючин, О.В. Разработка параллельных эвристических алгоритмов подбора весовых коэффициентов искусственной нейронной сети / О.В. Крючин // Информатика и ее применение. — 2010. — Т. 4, Вып. 2. — C. 53-56.

3. Крючин, О.В. Параллельные алгоритмы обучения искусственных нейронных сетей с использованием градиентных методов / О.В. Крючин // Актуальные вопросы современной науки, техники и технологий: матер. II Всерос. науч.-практ. (заоч.) конф. — М: 2010. — C. 81-86.

4. Крючин, О.В. Параллельные алгоритмы обучения искусственных нейронных сетей / О.В. Крючин // Информационные технологии и математическое моделирование (ИТММ-2009) : матер. VIII Всерос. науч.-практ. конф. с междунар. участием, 12-13 ноября 2009 года. — Томск:, 2009. — Ч. 2. — С. 241-244.