Phіlоlоgy/3. Theоretісаl аnd methоdоlоgісаl рrоblems іn lаnguаge studіes

Oksana Kunikevych

Lviv Polytechnic National University, Ukraine

Methods of homonymy disambiguation in the Ukrainian language

It is hard to underestimate the significance of human-computer interaction in modern society. For the last century linguistics and especially natural language processing has become one of the leading scientific branches. NLP has significantly influenced the development of modern science and contributed to it with theoretical results and practical applications. As computers and Internet are becoming even more and more affordable all over the world, the importance of robust, fast and user-friendly user interfaces becomes more pronounced. And since natural language is the most effective, effortless and natural way of interaction, its potential and perspective in human-computer interaction is of great importance.  NLP is becoming a solution to bridge the gap between human communication and digital data.

Morphological analysis is one of the crucial steps in natural language processing. It involves the identification and analysis of the structure of a given language's morphemes. One of the most relevant problems on the stage of morphological analysis and POS-tagging is ambiguity. Ambiguity is one of the most crucial problems in the natural language processing.

Linguistic theories have identified two main types of ambiguity:

1.    Syntactic Ambiguity: This type of ambiguity is also known as structural ambiguity. Syntactic ambiguity arises when the role a word plays in a sentence is unclear.

2.    Lexical Ambiguity : This type of ambiguity is also known as semantic ambiguity. Lexical ambiguity arises when a word has more than one generally accepted meaning.

Disambiguation—the process of resolving the conflicts that arise when a single term is ambiguous. Existing approaches to disambiguation are traditionally divided into deterministic (developed from the 60's), that is based on local and global syntactic parsing and dictionaries, and probabilistic [4, p. 48], using statistics of co-occurrence of grammatical features of words in the large text corpus, where homonymy was removed beforehand.

 Homonymy as a lexical unit of Ukrainian language

Two or more words identical in sound and spelling but different in meaning, distribution and in many cases origin are called homonyms. The term is derived from Greek “homonymous” (homos – “the same” and onoma – “name”) and thus expresses very well the sameness of name combined with the difference in meaning.

For example:

дід-1 (father’s or mother's father, an old man), дід-2 (thistle), дід-3 (sheaf that is placed in the house), дід-4 (meal made of wheat and flour); дід-5 (name of dance); лава-1 (bench, desk), лава-2 (row, rank), лава-3 (volcanic mass), лава-4 (working surface in a mine with a solid system of mining), лава-5 (water surface among the swamps); коза-1 (animal), коза-2 (prop for underpinning cart during lubrication), коза-3 (bagpipe), коза-4 (large sea duck), коза-5 (prison cell); рись-1 (large predatory mammal), рись-2 (fast gait, the average between the gallop and stupas); рейд-1 (coastal water area suitable for mooring at anchor), рейд-2 (short-term active military operation, an attack); рація-1 (mobile radio), рація-2 (clever reasoning of something ); коса-1 (braided hair), коса-2 (tool for mowing grass), коса-3 (a narrow strip of land in the sea, river, etc., connected to one end of the coast), коса-4 (spleen), коса-5 (tendon); корнет-1 (brass instrument), корнет-2 (first officer's rank in the cavalry); лев-1 (large predatory mammal), лев-2 (monetary unit in Bulgaria), Лев-3 (name); марина-1 (paintings depicting sea views), марина-2 (plant marina gabled), Марина-3 (name); лютий-1 (ravenous, bloodthirsty, evil), лютий-2 (second month of the year); ставний-1 (which has a tall slender figure, a solid structure of the body), ставний-2 (adjective from the noun став); наколоти-1 (chop wood), наколоти-2 (to damage, to injure by sharp object: chop your finger with a needle); насаджувати-1 (to plant forest), насаджувати-2 (to put a tip on a spear) [1, p. 42].

Homonyms are differentiated from one another by semantic structure and relations between systems of the forms. Their meanings are qualitatively different from the interrelated main and derivative, direct and figurative meanings like in cases of polysemy. Also is essential that it is not peculiar for homonyms to have joint or common structural features of naming an object or phenomenon.

According to the mentioned quality characteristics of comparison of polysemy and homonymy, which, however, does not provide a convincing, consistent separation of the two lexical-semantic phenomena even in the special analysis, particularly in lexicographical practice.

The Main Types and Sorts of the Ukrainian Morphological Ambiguity

Manual disambiguation has revealed that ca 47 % of Ukrainian word forms are morphologically ambiguous. The morphological ambiguity of inflected languages is comparable, e.g. the morphological ambiguity of the Czech language is ca 46 % [5,  p.173]. There are two types of morphologically ambiguous words or word forms: lemma ambiguity and word form ambiguity. An example of lemma ambiguity is серед – noun Pl. Acc. (середа) and preposition (серед). An example of word form ambiguity is mamos (mother’s or mothers) – sg. Gen., pl. Nom or pl. Voc.

3 main types of Morphological Ambiguity can be distinguished:

1) ambiguity of inflected POS

2) ambiguity of inflected and uninflected POS

3) ambiguity of uninflected POS

The types of MA were classified into 43 sorts. MA sorts have the following distribution: 34 sorts of morphologically ambiguous word forms of inflected POS; 8 sorts of ambiguous word forms of inflected and uninflected POS and 1 sort of morphologically ambiguous forms of uninflected POS. A big part of uninflected POS comprises morphological multiword units that are used as separate lexical units.

The most frequent sorts of MA are:
 1) syncretism of singular and plural of the third person verbs

2) syncretism of uninflected POS (the most frequent functional words  a conjunction, an adverb and a particle);
 3) case syncretism of nouns, adjectives, participles, pronouns and some

numerals.

The importance of the research on ambiguity in Ukrainian language can not be underestimated. Theoretical value of the research lies in the presented definitions of ambiguous units, determined types of Ukrainian ambiguities, approaches and rules for homonymy disambiguation.

In addition to the theoretical importance, this work is of a significant practical value. It will improve  greatly the quality of morphological analysis for Ukrainian language.

Sources of Morphological Ambiguity

Part of speech homonymy disambiguation problem can be stated as finding the correct morphological parses of the words in a text given all the possible parses of the words. The morphological parsing of a word may result in multiple parses of that word due to the ambiguity in the root words and the morphemes, and the complex morphophonemic interaction between them ordered according to the morphotactics.  Even to decide the part-of-speech tagging of a word, we may need to disambiguate the parses if they have different part-of-speech tags for the final derived word forms.

The agglutinative or inflective languages such as Turkish, Ukrainian, Czech, Finnish, and Hungarian impose some difficulties in language processing due to the more complex morphology and relatively free word order in sentences when compared with languages like English. The morphemes carry syntactic and semantic information that is called morphosyntactic and morphosemantic features, respectively. Part of speech homonymy problem for these morphologically productive languages can also be considered as morphosyntactic tagging in analogy to part-of-speech tagging in other languages.

There are two main reasons causing morphological ambiguity:

1) Natural ambiguity. The lemmatizer gives all possible morphological forms of analyzed words. The context is not taken into consideration and that is the reason why such a big number of morphologically ambiguous forms is produced. There are three types of ambiguous forms: homoforms, homonyms and homographs:

·         лава (Noun) - bench, desk

·         лава (Noun) - row, rank

·         лава (Noun) - volcanic mass

2) Transposition. The most frequent cases are transposition of independent parts of speech and functional parts of speech. However, if we stick to the traditional view of independent parts of speech as being "relate to the concepts of" and functional that are  "free, root morphemes, mainly used for the expression syntax relations in a sentence" [34, p. 723], the question arises, what happens in the process of transition from one part of speech to another with such linguistic units that have certain lexical meaning. Consider example:

·         Серед (Noun, pl.) -  Wednesdays

·         Серед (Preposition) - between, inbetween

Therefore, homonymy refers to the phenomena that can not be analyzed without taking into account lexica; meaning of the words.



 


 

 

References:

1.     Груба Т. Тематичний словник омонімів / Т.Груба // Українська мова і література в школі. – 2004. – №1—56p.

2.     Етимологічний словник української мови : у 7 т. – К. : Наук. думка, 1985. –Т. 2 : Д-Копці. –570p.

3.     Русанівський В.М. Структура лексичної і граматичної семантики/ В.М. Русанівський // – К., 1988.—243p.

4.     Brill E.: A Simple Rule-Based Part-of-Speech Tagger/E. Brill// Proceedings of Third Conference on Applied Natural Language Processing, Trento, Italy.—1992.—pp.143-148

5.     Hajič J. Probabilistic and rule-based tagger of an inflective language: a comparison / J. Hajič, B. Hladká. // Proceeding ANLC '97 Proceedings of the fifth conference on Applied natural language processing. – 1997. – С. 111–118.

6.     Kveton P. Rule-based morphological disambiguation: On computational complexity of the LanGR formalism. / Pavel Kveton. // Prague Bull. Math. Linguistics 85. – 2006. – С. 57–72.

7.     Simov K. A Hybrid System for MorphoSyntactic Disambiguation in Bulgarian / K. Simov, P. Osenovay. // Proceedings of the RANLP 2001 Conference, Tzigov chark. – 2001.