UDC 007

UDC 007.001.362

Savvina Е.А.

A GENERAL METHOD OF DIAGNOSTICS OF INVARIANT INFORMATION FLOWS

Voronezh state University of engineering technology

For diagnostics of the state of the biotechnology system uses a wide range of methods. So, for example, cluster analysis methods are applied to a classification of biological, geographical systems, methods of fuzzy logic in a social spheres, neural networks for biomedical diagnostics systems with high accuracy. However, the disadvantage of the above methods is the narrow applicability exclusively in the framework of the task, and the impossibility of classification of biotechnological systems of different subject destination. Therefore, the relevance of this work very clear.

The data was processed cluster, дискриминантными and using neural network methods [1]. Method of two-step cluster analysis (TwoStepCluster) allows you to cluster multiple group separately, and after that combine the results obtained in the final structure of the cluster. To measure the distance between the objects is used Euclidean metric:

where is the distance between the object k and l, and is the j-th object properties k and l respectively.

The number of clusters in a two-step cluster analysis can be set automatically or calculated according to the criterion of Акаике (AIC): , where is the number of parameters or the Bayesian information criterion .

Canonical discrimination function is calculated by the formula:

Coefficients дискриминантной functions a_i are defined to mean values of functions and , as diverse as possible, i.e. an order for two sets (classes) was the maximum expression:

Methods for the neural networks model the function of the biological neuron, thus forming output signal depending on the signals coming to its inputs. The state of a neuron is characterized by the size of the synaptic connection (weight w_i) is determined by the formula:

where NET - summing block that adds weighted inputs algebraically, creating output, x_i- set of input signals received on artificial neuron, w_i- many scales signal.

In the course of work, we constructed a database, consisting of 595 analysis, characterizing the quality of bread by 20 grounds. The quality of bread described organoleptic (humidity, mass percentage and quality of gluten etc), chemical indices of flour (mass fraction of fat, fiber, carbohydrates etc), as well as indicators of bread (humidity crumb, porosity and acidity). In accordance with the classification proposed in [3] quality flour is divided into 4 major groups: group 1 (highest quality) – 140 (23,5%) observations, group 2 (good quality) - 195 (32,8%), group 3 (bad quality) - 140 (23,5%), group 4 (very poor quality) - 120 (20,2%).

The structure of the database are not only quantitative humidity flour, active and titrated acidity, mass fraction of gluten gluten quality etc), but also qualitative characteristics (the presence of crackling taste, acid, infestation of pests etc). The value of qualitative features were coded numbers and letters. Source categorical signs were formalized in binary, each of which had 2 state (0 - sign is missing, 1 - present). The result is the number of signs in the database has increased to 28.

As a criterion of informativeness features adopted Pearson correlation coefficient between the sign and the class of the quality of the flour.

For the first grade were identified 2 specific characteristic: color white flour X₁₀ and the content of water-soluble carbohydrates X₂₈, the correlation coefficient which varies from 0,755 to 0,819, that is, closeness of relationship strong. For 6 signs (titrated acidity Х₂, active acidity X₃, mass fraction of gluten X₄, falling number X₇, color flour X₈, white Х₂₆, ash X₂₇) the correlation coefficient exceeds 0,5 modulo, that is, the closeness average.

For the second class detected 1 specific symptom: the color of flour with a yellowish sheen X₁₂, the correlation coefficient which 0,826, that is, closeness of relationship strong. 3 characteristics (taste peculiar to Х₁₄, the absence of a musty taste Х₁₈, smell peculiar to X₁₉, no crackle Х₂₃) Pearson correlation coefficient is in the range from 0,508 to 0,655 modulo, that is, the closeness average.

It is established that the third class has 1 specific symptom: colour grey flour X₁₁, value 0,748. 4 characteristics: taste unusual Х₁₄, taste sour X₁₅, musty taste Х₁₈, unusual smell X₁₉, the correlation coefficient ranges from 0,550 to 0,691, that is, the closeness average.

For the fourth grade was received 7 specific traits:titrated acidity Х₂, mass fraction of gluten X₄, falling number X₇, fineness of grind X₉, taste moldy X₁₇, ash X₂₇, the correlation coefficient that ranges from 0,717 to 0,952.

In the second stage with the classification of the quality of the flour method of two-step cluster analysis were used characteristics selected at the first stage the method of correlation analysis, have a significant correlation with the quality class. When this was adopted data structure represented in figure 1.

Fig.1. Data structure

Analysis of the results showed that there had been 41 (6,9 %) errors of the first and second kind. There was 34 (5,7%) passes the signal. Among them we can note, that 32 (5,4 %) observations mistakenly diagnosed as very good quality instead of good, 2 (0,3 %) cases of very poor quality attributed to the poor. 7 (1,2%) false alarms - assignment of poor quality to very poor. Class 1 was classified exactly 140 (100,0 %). In the second correctly detected 163 (83,6 %) of the observations, in the third – 133 (93,5 %), in the fourth - 118 (98,3 %). Total sample correctly classified 554 (93,9 %) of 595 observations.

In case of application of discriminant analysis was established a training sample of 50 observations (8,4%), and containing objects of all classes of flour. The result of discriminant analysis is presented in table 2.

Table 2.

The result of discriminant analysis

Function D	The value of the function	% dispersion	Canonical correlation	λ - Wilkes	χ²
D₁	9987,18	99,9	1,00	0,00	7790,71
D₂	9,07	0,1	0,94	0,02	2393,74
D₃	4,84	0,0	0,94	0,17	1040,60

Discriminate functions are listed in descending order of their own values. Function D1 has large discriminating guests, as its own value is 9987,18. The function D2 provides maximum discernment after D1. The percentage of the variance of 99.9%, the value of the canonical correlation 1.0 and distribution C2 7790,71 confirms discriminatory features D1 and that this feature is statistically significant. Since the values of the functions D2 and D3 according to several criteria below, therefore, these functions will not strongly influence the process of discrimination.

The equations of linear ordinary classifying functions D1к, D2к, D3к, D4к, allowing to put objects to one of the 4 classes:

D₁_к=-3346,35+85,73Х₂+48,19Х₄+2,29Х₇+1,57Х₉-31,29Х₁₂+115,86Х₂₇+ 28,91Х₂₉+29,21Х₃₀-60,80Х₃₁+70,56Х₃₃+13936,45Х₃₅;	(1)
D_2к=-5034,72+77,66Х₂+44,98Х₄+1,99Х₇+1,44Х₉-31,74Х₁₂+137,58Х₂₇+ +59,14Х₂₉+41,51Х₃₀-63,9Х₃₁+69,42Х₃₃+47887,02Х₃₅;	(2)
D_3к=-39368,68+115,34Х₂+47,81Х₄+1,074Х₇-+1,71Х₉-144,32Х₁₂+164,55Х₂₇-15,2Х₂₉+18,39Х₃₀-143,9Х₃₁-13,02Х₃₃+155295,13Х₃₅;	(3)
D_4к=-39421,92+113,14Х₂+47,94Х₄+0,79Х₇+1,72Х₉-146,33Х₁₂+201,51Х₂₇-25,4Х₂₉+25,79Х₃₀-149,62Х₃₁-11,99Х₃₃+155276,21Х₃₅.	(4)

According to the calculation method of the discriminant analysis revealed that the number of cases of false alarm was 7 (5,0 %) is the assignment of poor quality for the worst. Revealed 3 cases (2,3 %) passes the signal of them in 1 observation (0,5%) of good quality was incorrectly classified as very good quality, in 2 other cases (1,7 %) are very poor quality seen as the bad quality. According to the results of discrimination found that in the first class was true classified 140 (100 %) of the observations in the second grade - 194 (99,5 %), in the third - 124 (94,3 %), in the fourth - 118 (98,3 %). Thus, the accuracy of the classification method of discriminant analysis was 576 (98,02%). The error of attributing bad quality to very bad not worth taking into account, as well as bad and very bad quality of the flour should not be used in baking.

The classification was carried out with the help of neural networks. Thus, of the total sample were selected randomly 400 observations for training sample, 95 control, 94 for a sanity check, 6 cases were randomly assigned ignored software package. Evaluation of the quality of functioning of the diagnostic system was performed on a test sample using the gradient descent method with training of the back propagation, which contains all four quality classes.

In the course of the work was to construct a neural network model of direct distribution. Architecture of the neural network consists of 11 characteristics at the input layer (selected by the method of correlation analysis) with two hidden layers: 11 neurons of the first layer (A) and 4 neuron on the second hidden layer (S). The number of layers was chosen in such a way that the first stage uses all input characteristics, and the second, automatically extracts the most important for classification. As a form of activation function was considered function, which is differentiable on an entire graph of the function.

Calculation of the classification method of neural networks has shown that the best results neural network reaches at 201 iteration. The classification error is minimized. Analysis of the received results on a test sample showed that the number of false alarms and pass the signal dropped to 0 (0%) and 1 (1,06 %) respectively. High accuracy was achieved in the first 140(100%), in the third 140(100%) and fourth 120(100%) grades of quality. In the group with good quality 1 (1,06%) observation incorrectly classified as very good quality. The accuracy of the classification in the group amounted to 194 (98,94 %). The accuracy of the classification of the whole sample, the method of neural networks amounted to 594 (99,73 %).

Thus, the organization of the system of informative features, allows us to classify with high precision quality of flour by the method of two-stage cluster, discriminant and neural network analysis. At the analysis of several methods (two-step cluster analysis, discriminant and neural network) using the correlation analysis for the selection of specific traits of each class on the first stage it was found that the largest number of 594 (99,73%) of 595 (100,0%) observations were properly classified with the help of neural networks. The method of cluster and discriminant analysis showed worse results of diagnostics 554 (93,9%) and 576 (98,02%) respectively. However, these methods is somewhat faster than the method of neural networks.

LIST OF SOURCES USED

1. У.Р. Klekka «Cluster analysis»// Factorial, discriminant and cluster analysis. M: Statistics, 1989.

2. Sanin Ponomareva T.V. E.I. «a points-based assessment of the quality of bakery products»//ВГТА.: 2004.