Bisikalo Oleg, Kravchuk Irina
Vinnitsa National Technical University
Methods of obtaining
knowledge from natural language texts
The task of obtaining useful information from natural language text is
the identification of some elements in it. Natural language text is the input
of the system of obtaining information and some filled with data structures
that allow the further automated or manual processing of information are the
output [Îøèáêà! Èñòî÷íèê ññûëêè íå íàéäåí.].
The most widely used approaches are the following:
4. Associatively-statistical approach to obtain knowledge.
Syntactically-semantic approach to
obtain knowledge
Syntactically-semantic approach to obtain knowledge from natural
language text is based on the linguistic model. According to this model the basis
of the semantic structure of an utterance is the propositional component of the
content.
The main elements in the structure of the sentence are predicates that
represent relationships between objects – required participants of situation,
called arguments or actants of a predicate. In general predicates represent a
situation with multiple binding members (actants), each of which appears in its
semantic role. [Îøèáêà! Èñòî÷íèê ññûëêè íå íàéäåí.].
The syntactically-semantic approach to obtain knowledge involves the
selection semantic phrases from the structure. To this end we use the text
parser that works on the basis of knowledge of general rules of grammar and
vocabulary of control models, describing modes of expression for each predicate
in the language.
Application of syntactic synthesis, after-syntactic transformations of
relations and thesaurus allows transformation of different
syntactically-semantic structures to a unified form and allows identifying
similar elements of the content in the differences in their expressions.
The significance of elements of a sentence, as is commonly believed,
characterizes communicative rank which is determined by their relatedness to
the topic and correlation with parts of a sentence. This information, along
with statistics of use and other factors, allows highlight key elements of the
text to compare documents for retrieval and classification [Îøèáêà! Èñòî÷íèê ññûëêè íå íàéäåí.].
Databases
are used to facilitate collection and storage of information for describing
elements of the dictionary used for linguistic analysis of text information. Database is a collection of information organized in a
certain way about certain objects. Currently in computer linguistics there are
no specialized lexicographical databases, but such databases as ACCESS,
FOX-Base, PARADOX, D-Base can be used for dictionary files [Îøèáêà! Èñòî÷íèê ññûëêè íå íàéäåí.].
Within
computational linguistics the ontology is understood as a system of abstract
concepts that exist only in the mind, which can be expressed in a natural
language (or by means of other system of symbols). Assumptions about the
accuracy or consistency of such a system are not usually done. Linguistic
approach to creation and research of ontology based on the study of natural
language (e.g., semantics) and the construction of ontology for large text
arrays [Îøèáêà! Èñòî÷íèê ññûëêè íå íàéäåí.].
The
main characteristic of linguistic ontologies is that they are tied to values
(related to the semantics) of linguistic expressions (e.g., words). Linguistic
ontologies are covered most words in the language and also have an ontological
structure, which is in relation between concepts. Therefore linguistic
ontologies can be seen as a special kind of lexical databases and a special
type of ontologies. It is assumed that the hierarchy of lexical meaning of
natural language is constructed during development of resources for ontologies
[Îøèáêà! Èñòî÷íèê ññûëêè íå íàéäåí.].
For quality word processing using ontologies should be a detailed description of the problem area with a plurality of logical connections that show the relationship between the terms of the region. The use of ontologies allows to present a natural language text in a form suitable for automatic processing. Additionally, the ontologies can be used as an intermediary between the user and the information system, allowing formalizing of the used terms between all users of the project. Also widely adopted task is the ontological analysis. As part of these tasks the valuable information about the work of complex systems are accumulated using ontological research. Such analysis usually begins with creating a dictionary of terms used in the discussion and investigating characteristics of objects and processes that are considered by the system. In addition, the basic logical relationship between related terms and concepts are documented. The result of this analysis is the glossary, which contains terms, their precise definitions and relationships between them. Collected information is used during the process of reorganization of existing or construction of new systems [Îøèáêà! Èñòî÷íèê ññûëêè íå íàéäåí.].
For many years, designers of artificial intelligence tried to teach
computer logical thinking based on manipulating the formalized knowledge and
rules for converting. This type of thinking is typical for processing by the
left hemisphere of the brain. The simplest examples of left-hemispheric
knowledge models are hierarchical categories, which are used in a retrieval
system for classifying information. However, due to the inability of computers
to speech thinking, their options are limited. The problem is the impossibility
of self-learning without human intervention. At the same time, the brain
contains other, more ancient mechanisms to solve most of the problems of everyday
life without thinking. Such mechanisms inherent in the right hemisphere are
called associative statistical processing of imagery data [Îøèáêà! Èñòî÷íèê ññûëêè íå íàéäåí.].
Physiological basis of this approach lies in the fact that the right
hemisphere performs the statistical analysis; repeated pieces of information
which form the foundation for future knowledge are obtained during this
process. Gradually the concepts acquire some sense - certain associations
appear for familiar words. This is the way to form the associative semantic
network and include it in the work. Such network is a set of connections
between concepts, where every element has its own content through relationships
with others. The emergence of connections is also a statotistics, frequency
analysis of the right brain that is particularly remembered and evaluated, in
which combinations of concepts encountered in the text [Îøèáêà!
Èñòî÷íèê ññûëêè íå íàéäåí.]. 
This approach is used in software products RCO and UCO for Oracle and RCO and UCO Semantic Network (http://www.rco.ru, http://uco.ua), which are used to search for Oracle and the Microsoft.
References:
1.                
Kravchuk I.A. Using morphological
analysis to improve the presentation of educational material for distance
learning courses / I.A. Kravchuk // Proceedings of the Electronic resources and
technologies: creation, use, access Conference, May 2011. – Kiev, 2011. – Pp.
105-108.
2.                
Kalinichenko A.V. Essence problems
in text analysis of text in search engines. Approaches and solutions / A.V.
Kalinichenko // Journal of scientific publications of postgraduate students. -
2010. – V. 5.
3.                
Karpova G.D. Computer syntactic
analysis: description of models and direction of developments / G.D. Karpova, Y.K.
Pirogova, T.Yu. Kobzareva, 
E.V. Mikaelyan // Results of science and techniques: Computational science, V.
6. – Moscow: VINITI, 1991.
4.                
Ermakov A.E. Associative model of
text meaning in applied problems of computer analysis of documents / A.E.
Ermakov, V.V. Pleshko // Proceedings of the Russian language: historical fate
and modernity Conference, 2001. – Moscow, 2001. – Pp. 403-405.
5.                
Rassell S. Artificial intelligence:
modern approach / S. Rassell, 
P. Norvig. – Moscow: Williams, 2006. – 1408 p.
6.                
Reformatskyy A.A. Introduction to
linguistics / A.A. Reformatskyy. – Moscow: Aspect Press, 1996. – 536 p.
7.                
Gorodetskyy B.Yu. Computer
linguistics: modeling language communication / B.Yu. Gorodetskyy // Computer
linguistics. – Moscow – 1989. - V. 24 – Pp. 5-31.
8.                
Ermakov A.E. Associative semantic
network: statistical model of perception and creation of texts [electronic
resource] / A.E. Ermakov, 
V.V. Pleshko. - Mode of access: http://www.dialog-21.ru/Archive/2001/volume2/
2_20.htm.