Methods of obtaining knowledge from natural language texts

Bisikalo Oleg, Kravchuk Irina

Vinnitsa National Technical University

Methods of obtaining knowledge from natural language texts

The task of obtaining useful information from natural language text is the identification of some elements in it. Natural language text is the input of the system of obtaining information and some filled with data structures that allow the further automated or manual processing of information are the output [Ошибка! Источник ссылки не найден.].

The most widely used approaches are the following:

1. Syntactically-semantic approach to obtain knowledge.

2. The approach to obtain knowledge, based on database queries.

3. Approach to obtain knowledge based on ontology.

4. Associatively-statistical approach to obtain knowledge.

Syntactically-semantic approach to obtain knowledge

Syntactically-semantic approach to obtain knowledge from natural language text is based on the linguistic model. According to this model the basis of the semantic structure of an utterance is the propositional component of the content.

The main elements in the structure of the sentence are predicates that represent relationships between objects – required participants of situation, called arguments or actants of a predicate. In general predicates represent a situation with multiple binding members (actants), each of which appears in its semantic role. [Ошибка! Источник ссылки не найден.].

The syntactically-semantic approach to obtain knowledge involves the selection semantic phrases from the structure. To this end we use the text parser that works on the basis of knowledge of general rules of grammar and vocabulary of control models, describing modes of expression for each predicate in the language.

Application of syntactic synthesis, after-syntactic transformations of relations and thesaurus allows transformation of different syntactically-semantic structures to a unified form and allows identifying similar elements of the content in the differences in their expressions.

The significance of elements of a sentence, as is commonly believed, characterizes communicative rank which is determined by their relatedness to the topic and correlation with parts of a sentence. This information, along with statistics of use and other factors, allows highlight key elements of the text to compare documents for retrieval and classification [Ошибка! Источник ссылки не найден.].

The approach to obtain knowledge, based on database queries

Databases are used to facilitate collection and storage of information for describing elements of the dictionary used for linguistic analysis of text information. Database is a collection of information organized in a certain way about certain objects. Currently in computer linguistics there are no specialized lexicographical databases, but such databases as ACCESS, FOX-Base, PARADOX, D-Base can be used for dictionary files [Ошибка! Источник ссылки не найден.].

Approach to obtain knowledge based on ontology

Within computational linguistics the ontology is understood as a system of abstract concepts that exist only in the mind, which can be expressed in a natural language (or by means of other system of symbols). Assumptions about the accuracy or consistency of such a system are not usually done. Linguistic approach to creation and research of ontology based on the study of natural language (e.g., semantics) and the construction of ontology for large text arrays [Ошибка! Источник ссылки не найден.].

The main characteristic of linguistic ontologies is that they are tied to values (related to the semantics) of linguistic expressions (e.g., words). Linguistic ontologies are covered most words in the language and also have an ontological structure, which is in relation between concepts. Therefore linguistic ontologies can be seen as a special kind of lexical databases and a special type of ontologies. It is assumed that the hierarchy of lexical meaning of natural language is constructed during development of resources for ontologies [Ошибка! Источник ссылки не найден.].

For quality word processing using ontologies should be a detailed description of the problem area with a plurality of logical connections that show the relationship between the terms of the region. The use of ontologies allows to present a natural language text in a form suitable for automatic processing. Additionally, the ontologies can be used as an intermediary between the user and the information system, allowing formalizing of the used terms between all users of the project. Also widely adopted task is the ontological analysis. As part of these tasks the valuable information about the work of complex systems are accumulated using ontological research. Such analysis usually begins with creating a dictionary of terms used in the discussion and investigating characteristics of objects and processes that are considered by the system. In addition, the basic logical relationship between related terms and concepts are documented. The result of this analysis is the glossary, which contains terms, their precise definitions and relationships between them. Collected information is used during the process of reorganization of existing or construction of new systems [Ошибка! Источник ссылки не найден.].

Associatively-statistical approach to obtain knowledge

For many years, designers of artificial intelligence tried to teach computer logical thinking based on manipulating the formalized knowledge and rules for converting. This type of thinking is typical for processing by the left hemisphere of the brain. The simplest examples of left-hemispheric knowledge models are hierarchical categories, which are used in a retrieval system for classifying information. However, due to the inability of computers to speech thinking, their options are limited. The problem is the impossibility of self-learning without human intervention. At the same time, the brain contains other, more ancient mechanisms to solve most of the problems of everyday life without thinking. Such mechanisms inherent in the right hemisphere are called associative statistical processing of imagery data [Ошибка! Источник ссылки не найден.].

Physiological basis of this approach lies in the fact that the right hemisphere performs the statistical analysis; repeated pieces of information which form the foundation for future knowledge are obtained during this process. Gradually the concepts acquire some sense - certain associations appear for familiar words. This is the way to form the associative semantic network and include it in the work. Such network is a set of connections between concepts, where every element has its own content through relationships with others. The emergence of connections is also a statotistics, frequency analysis of the right brain that is particularly remembered and evaluated, in which combinations of concepts encountered in the text [Ошибка! Источник ссылки не найден.].

This approach is used in software products RCO and UCO for Oracle and RCO and UCO Semantic Network (http://www.rco.ru, http://uco.ua), which are used to search for Oracle and the Microsoft.

References:

1. Kravchuk I.A. Using morphological analysis to improve the presentation of educational material for distance learning courses / I.A. Kravchuk // Proceedings of the Electronic resources and technologies: creation, use, access Conference, May 2011. – Kiev, 2011. – Pp. 105-108.

2. Kalinichenko A.V. Essence problems in text analysis of text in search engines. Approaches and solutions / A.V. Kalinichenko // Journal of scientific publications of postgraduate students. - 2010. – V. 5.

3. Karpova G.D. Computer syntactic analysis: description of models and direction of developments / G.D. Karpova, Y.K. Pirogova, T.Yu. Kobzareva,
E.V. Mikaelyan // Results of science and techniques: Computational science, V. 6. – Moscow: VINITI, 1991.

4. Ermakov A.E. Associative model of text meaning in applied problems of computer analysis of documents / A.E. Ermakov, V.V. Pleshko // Proceedings of the Russian language: historical fate and modernity Conference, 2001. – Moscow, 2001. – Pp. 403-405.

5. Rassell S. Artificial intelligence: modern approach / S. Rassell,
P. Norvig. – Moscow: Williams, 2006. – 1408 p.

6. Reformatskyy A.A. Introduction to linguistics / A.A. Reformatskyy. – Moscow: Aspect Press, 1996. – 536 p.

7. Gorodetskyy B.Yu. Computer linguistics: modeling language communication / B.Yu. Gorodetskyy // Computer linguistics. – Moscow – 1989. - V. 24 – Pp. 5-31.

8. Ermakov A.E. Associative semantic network: statistical model of perception and creation of texts [electronic resource] / A.E. Ermakov,
V.V. Pleshko. - Mode of access: http://www.dialog-21.ru/Archive/2001/volume2/
2_20.htm.