Bozhday A.S.*, Gnusin P. V.**
* Bozhday A.S., Doctor of Technical Sciences, Professor of CAD, Penza State
University, Penza, e-mail: bozhday@yandex.ru
** Gnusin P.V., graduate student CAD, Penza State University, Penza, e-mail: pavel.sin.gnusin@gmail.com
Methods to find information in the project
archives CAD
Active
development of the scientific and technological basis leads to a drastic
complexity of design objects and associated computer-aided design (CAD).
Functioning of modern CAD associated with the creation, storage, and a high
number of polytypic project documentation (major CAD enable the creation of
about 106 - 107 project documents per year). Another feature of modern CAD is
the distributed nature of their architecture and the prevailing multi-user
operation. Large industrial
enterprises and corporations with geographically distributed structure of their
design departments in their work using a single centralized repository of
design data under the PDM (Project Data Management). Typological information heterogeneity and the
volumes of these stores are so huge that you can compare the resulting data
with the problem of finding similar problems of information retrieval on the
Internet. At the same time, traditional search techniques used in CAD is still
(SQL-queries, the system directories, keywords, etc.) are less and less
effective because of its focus mainly on text formats.
Thus, the actual task is to develop methods of information retrieval and
design documents that satisfy the following requirements:
- operative work with huge volumes
of project archives modern CAD due to the continuous indexing their content;
- to search for information in
storage, based on different data models (multidimensional storage,
object-oriented model, the relational model, network models, etc.);
- providing an acceptable level of
relevancy for different types of documentation (text, raster, vector, digital
layouts, different languages describe the objects and tasks, mathematical and
geometrical models);
- intelligent data search by
identifying implicit information interrelationship;
- search data based recognition
algorithms semantically relevant to their search query.
Structurally, this search engine is divided into
two basic components: the system crawlers
and analytical host. System crawlers are a
collection of autonomous software agents,
scanning and indexing content
distribution project archive. Analytical host provides the user interface for
query building, and provides analytical processing queries in order to find
relevant index information.
To solve
this problem of information retrieval is proposed to integrate the traditional
apparatus PDM-systems with some principles of the Internet search systems,
intelligent methods Data Mining, as well as elements of expert systems
(recognition of text and graphics images, a linguistic analysis of documents).
Intelligent methods for Data Mining
The
traditional purpose of Data Mining technology is finding implicit, hidden
trends and patterns in large amounts of statistical data. In a general sense, the term Data Mining represents
a set of different methods of knowledge discovery. The choice of method often
depends on the type of data available and on what type of information to be
retrieved. However, Intelligent Data Mining techniques are well suited to solve
problems of semantic search in the archives of the major design and CAD
multidimensional storage.
Data
Mining techniques can be divided into two groups: static and cybernetic. In particular, the static include: time series
analysis, correlation and regression analysis, factor analysis, analysis of
variance, descriptive analysis, component analysis, discriminant analysis. Cybernetic include techniques such as neural
networks, fuzzy logic, evolutionary programming, decision trees, genetic
algorithms, associative memory, processing systems expertise.
A key
advantage of Data Mining - the automatic generation of hypotheses on the
relationship between the various components of the data or. For example, the parameters of the search query
(specially formulated) can be compared with the fragments of the design content
in order to find explicit and implicit correlations. Variety of methods of Data Mining can be used for
different types of requests and project documents the most appropriate methods. This allows, in particular, to expand the arsenal
of formal methods to account for uncertainty linguistic searches user or some
of the wording in the project documentation.
Thus, the
use of Data Mining technology offers the prospect of PDM-systems of automatic
intelligent search templates (patterns) specific to any fragments of
heterogeneous multidimensional data. Unlike methods of online analytical processing (OLAP), the use of Data Mining methods to automate the task of
search formulation of hypotheses
and identify implicit or unusual (unexpected) patterns [1, 2].
Search project documents using expert systems
On the
search for information using elements of expert systems can be attributed
mainly methods of pattern recognition. All variety of pattern recognition
methods can be divided into the following groups:
- Statistical methods;
- Method of linear classifier;
- Logical methods;
- Structural and syntactic methods;
Classification
method based on the construction of a linear separating surface between two
classes of data is called a linear classifier. These methods include Bayesian classifier,
perceptron, support vector and a number of other [3]. In the case of two classes of separating surface is
a hyperplane that separates the feature space into two half. When new data, based on specific numerical
attributes can be attributed (recognize) to one of two linearly separated
classes. Increasing
the number of classes (standard pattern recognition) is achieved by making a
linear separating surface corresponding piecewise linear properties.
Statistical
methods are essential in the theory of pattern recognition. Statistical formulation of the problem involves the
implementation of a powerful apparatus of mathematical statistics and
probability theory. It is based on the construction of the distribution for each class and the
Bayes classification rule [4].
Logical
methods of pattern recognition using the algebra of logic and operate on the
information contained not only in individual symptoms, but also in combinations
of values of these attributes. In these methods, the value of a trait considered as elementary events. The use of logical methods rationally when there is
no apriori information about the current distribution of objects search space,
time, or in any other period in the feature space, but there is evidence of
deterministic logical connections between the objects and their attributes [5].
Structural
and syntactic methods, also called linguistic, applied to objects that have
signs in the form of nonderivative elements and their relationships. Build rules were defined grammar obtained either by
a priori information about the images, or by the grammatical output on a
representative sample. Grammar specifies the rules for the composition nonderivative elements.
Thus, the syntactic approach to pattern recognition allows to describe a large
set of complex objects by using a small set nonderivative elements and
grammatical rules [6].
Findings
This
article explores the idea of integrating traditional PDM-systems with
intellectual and expert methods of data mining in order to provide effective
mechanisms for access to the content of complex, typologically and structurally
heterogeneous CAD project archives. Using the methods of Data Mining, pattern recognition, linguistic analysis
will achieve operational work with huge volumes of data warehousing project,
and to provide an acceptable level of relevancy for different types of
documents.
Bibliography:
1 Bolshakov,
P.S. The unique capabilities
of STATISTICA Data Miner [electronic resource]. — Access mode: http://www.statistica.ru/home/applications/ dataminer.htm (rus);
2 Bozhday, A.S. The concept of intersectoral monitoring by integrating technology OLAP,
Data Mining, GIS [text] // Problems of modern science and practice. University named after Vernadsky. -
2011. - № 1 (32). - 59 – 72 p. (rus);
3 Vorontsov, K.V. Machine
learning: a course [electronic resource]. — Access mode : http://www.machinelearning.ru (rus);
4 Vasiliev, V.I. The
problem of training pattern recognition [Text] - Kiev: High school,
1989. — 64 p. — ISBN 5-11-00119-1 (rus);
5 Gorelik, A. L. Current
status of recognition [Text] / A. L.
Gorelik, I. B. Gurevich, V. A. Skripkin. — Moscow: Radio and Communications,
1985. — 160 p. (rus);
6 Verhagen, K. Pattern
recognition. Status and Prospects [Text] / K.
Verhagen, P. Deyn; translated from English. — Moscow: Radio and Communications, 1985. — 104 p. (rus).