Bozhday A.S.*, Gnusin P. V.**

* Bozhday A.S., Doctor of Technical Sciences, Professor of CAD, Penza State University, Penza, e-mail: bozhday@yandex.ru

** Gnusin P.V., graduate student CAD, Penza State University, Penza, e-mail: pavel.sin.gnusin@gmail.com

 

Methods to find information in the project archives CAD

 

Active development of the scientific and technological basis leads to a drastic complexity of design objects and associated computer-aided design (CAD). Functioning of modern CAD associated with the creation, storage, and a high number of polytypic project documentation (major CAD enable the creation of about 106 - 107 project documents per year). Another feature of modern CAD is the distributed nature of their architecture and the prevailing multi-user operation. Large industrial enterprises and corporations with geographically distributed structure of their design departments in their work using a single centralized repository of design data under the PDM (Project Data Management). Typological information heterogeneity and the volumes of these stores are so huge that you can compare the resulting data with the problem of finding similar problems of information retrieval on the Internet. At the same time, traditional search techniques used in CAD is still (SQL-queries, the system directories, keywords, etc.) are less and less effective because of its focus mainly on text formats.

Thus, the actual task is to develop methods of information retrieval and design documents that satisfy the following requirements:

- operative work with huge volumes of project archives modern CAD due to the continuous indexing their content;

- to search for information in storage, based on different data models (multidimensional storage, object-oriented model, the relational model, network models, etc.);

- providing an acceptable level of relevancy for different types of documentation (text, raster, vector, digital layouts, different languages describe the objects and tasks, mathematical and geometrical models);

- intelligent data search by identifying implicit information interrelationship;

- search data based recognition algorithms semantically relevant to their search query.

Structurally, this search engine is divided into two basic components: the system crawlers and analytical host. System crawlers are a collection of autonomous software agents, scanning and indexing content distribution project archive. Analytical host provides the user interface for query building, and provides analytical processing queries in order to find relevant index information.

To solve this problem of information retrieval is proposed to integrate the traditional apparatus PDM-systems with some principles of the Internet search systems, intelligent methods Data Mining, as well as elements of expert systems (recognition of text and graphics images, a linguistic analysis of documents).

 

Intelligent methods for Data Mining

The traditional purpose of Data Mining technology is finding implicit, hidden trends and patterns in large amounts of statistical data. In a general sense, the term Data Mining represents a set of different methods of knowledge discovery. The choice of method often depends on the type of data available and on what type of information to be retrieved. However, Intelligent Data Mining techniques are well suited to solve problems of semantic search in the archives of the major design and CAD multidimensional storage.

Data Mining techniques can be divided into two groups: static and cybernetic. In particular, the static include: time series analysis, correlation and regression analysis, factor analysis, analysis of variance, descriptive analysis, component analysis, discriminant analysis. Cybernetic include techniques such as neural networks, fuzzy logic, evolutionary programming, decision trees, genetic algorithms, associative memory, processing systems expertise.

A key advantage of Data Mining - the automatic generation of hypotheses on the relationship between the various components of the data or. For example, the parameters of the search query (specially formulated) can be compared with the fragments of the design content in order to find explicit and implicit correlations. Variety of methods of Data Mining can be used for different types of requests and project documents the most appropriate methods. This allows, in particular, to expand the arsenal of formal methods to account for uncertainty linguistic searches user or some of the wording in the project documentation.

Thus, the use of Data Mining technology offers the prospect of PDM-systems of automatic intelligent search templates (patterns) specific to any fragments of heterogeneous multidimensional data. Unlike methods of online analytical processing (OLAP), the use of Data Mining methods to automate the task of search formulation of hypotheses and identify implicit or unusual (unexpected) patterns [1, 2].

 

Search project documents using expert systems

On the search for information using elements of expert systems can be attributed mainly methods of pattern recognition. All variety of pattern recognition methods can be divided into the following groups:

-   Statistical methods;

-   Method of linear classifier;

-   Logical methods;

-   Structural and syntactic methods;

Classification method based on the construction of a linear separating surface between two classes of data is called a linear classifier. These methods include Bayesian classifier, perceptron, support vector and a number of other [3]. In the case of two classes of separating surface is a hyperplane that separates the feature space into two half. When new data, based on specific numerical attributes can be attributed (recognize) to one of two linearly separated classes. Increasing the number of classes (standard pattern recognition) is achieved by making a linear separating surface corresponding piecewise linear properties.

Statistical methods are essential in the theory of pattern recognition. Statistical formulation of the problem involves the implementation of a powerful apparatus of mathematical statistics and probability theory. It is based on the construction of the distribution for each class and the Bayes classification rule [4].

Logical methods of pattern recognition using the algebra of logic and operate on the information contained not only in individual symptoms, but also in combinations of values of these attributes. In these methods, the value of a trait considered as elementary events. The use of logical methods rationally when there is no apriori information about the current distribution of objects search space, time, or in any other period in the feature space, but there is evidence of deterministic logical connections between the objects and their attributes [5].

Structural and syntactic methods, also called linguistic, applied to objects that have signs in the form of nonderivative elements and their relationships. Build rules were defined grammar obtained either by a priori information about the images, or by the grammatical output on a representative sample. Grammar specifies the rules for the composition nonderivative elements. Thus, the syntactic approach to pattern recognition allows to describe a large set of complex objects by using a small set nonderivative elements and grammatical rules [6].

 

Findings

This article explores the idea of integrating traditional PDM-systems with intellectual and expert methods of data mining in order to provide effective mechanisms for access to the content of complex, typologically and structurally heterogeneous CAD project archives. Using the methods of Data Mining, pattern recognition, linguistic analysis will achieve operational work with huge volumes of data warehousing project, and to provide an acceptable level of relevancy for different types of documents.

 

Bibliography:

1 Bolshakov, P.S. The unique capabilities of STATISTICA Data Miner [electronic resource]. — Access mode: http://www.statistica.ru/home/applications/ dataminer.htm (rus);

2 Bozhday, A.S. The concept of intersectoral monitoring by integrating technology OLAP, Data Mining, GIS [text] // Problems of modern science and practice. University named after Vernadsky. - 2011. - № 1 (32). - 59 – 72 p. (rus);

3 Vorontsov, K.V. Machine learning: a course [electronic resource]. — Access mode : http://www.machinelearning.ru (rus);

4 Vasiliev, V.I. The problem of training pattern recognition [Text] - Kiev: High school, 1989. — 64 p. — ISBN 5-11-00119-1 (rus);

5 Gorelik, A. L. Current status of recognition [Text] / A. L. Gorelik, I. B. Gurevich, V. A. Skripkin. — Moscow: Radio and Communications, 1985. — 160 p. (rus);

6 Verhagen, K. Pattern recognition. Status and Prospects [Text] / K. Verhagen, P. Deyn; translated from English. — Moscow: Radio and Communications, 1985. — 104 p. (rus).