Современные информационные технологии/

Вычислительная техника и программирова­ние

Kameshova S.S., master of natural sciences

Rauyl Olzhas, 1st year student of the specialty "Informatics"

Kostanay state university named after  A. Baytursynov, Kostanay, Kazakhstan.

Recognition technology of speech signal

The introduction of highly complex, but highly intelligent information and computer technologies in the sphere of human activity requires a change in the management of automated systems for more convenient and efficient to use them. To the greatest extent it stimulates the existence of specific areas of computing, where voice commands are the most goals. These include, for example, include telephone access to self-help systems, management of remote computer or a mobile handheld device, carried out while driving.

Creating a full-fledged language interfaces that support language dialogue "user-computer" - very promising, but difficult direction of the modern computer systems.

Two key problems of speech recognition - to achieve absolute accuracy on a limited set of commands for at least one announcer voice and speaker-independent continuous speech recognition of any acceptable quality - not resolved, despite the almost half-century history of their development.

There are doubts about the concept of answerability both tasks, because even people cannot always completely recognize the language of the interlocutor. If it has more recently been considered as a signal in the range from about 300 to 3500 Hz, that has the characteristic properties (e.g., a pause between words), then from the standpoint of modern technology it - is primarily signal.

What is speech recognition? You say the phrase on which the technical system responds adequately - or machine executes the command contained in the phrase, or gaining dictated text or dispose of information extracted from the phrase otherwise. As it depends on the particular implementation.

What is it? Speaking of speech, we must distinguish between such concepts as "speech", "sound speech", "beep", "message",  "text". In our case, in the annex to the problem of recognition of concepts such as "speech" and "sound speech" mean the same thing - a certain man generated voice message, which can be objectively recorded, measured, stored, processed and reproduced by means of instruments and algorithms. In this case, the term "message" can hide any useful information for the recipient, and not just text.

The text, as it is known, consists of letters, words, sentences - it is discrete. It is a normal sound together. Human speech, as opposed to here the text does not consist of letters. If we write on tape or disk sound of each letter, and then try to link these sounds of it, we have nothing.

A speech recognition system consists of two parts: the acoustic and linguistic. Last named is not strictly linguistic. In general, it may include phonetic, phonological, morphological lexical, syntactic and semantic language model. Acoustic model is responsible for the representation of the speech signal. Rather, his conversion (from the traditional temporal process) in some form, in which more explicitly present information in the content of verbal communication.

Linguistic model interprets information from the acoustic model, and is responsible for presenting the recognition result to the consumer (in the role of which can act not only people, but also the technical system, controlled by speech).

It is difficult to choose a suitable indicator of the quality of a speech recognition system. Most simply an indicator of quality input to the command systems. When tested in random order pronounced all the possible commands quite a number of times. Count the number of correctly recognized commands and divided by the total number of spoken commands.

The result is an estimate of the probability of correct recognition of commands in a given experiment, when the acoustic environment. For dictation systems like quality score can be calculated at the dictation of some test text. Obviously, this is not always convenient indicator of quality. In fact, we are confronted with a variety of listening situations.

And what with the change of speakers and the accompanying training system? Different systems may require different amounts of settings, which greatly affects the ease of use. The standard output is to use multi-criteria, the so-called comprehensive quality index.

As an example, consider the case of a simple command speech recognition system. Operation of the system is based on the hypothesis that the spectral and temporal characteristics of the teams of words for a single speaker vary slightly.

The acoustic model of the system is a converter of a speech signal in the spectral-time matrix. In the simplest case, the command located in time for pauses in the speech signal. Linguistic unit is able to detect a limited number of teams plus one, which means all the other unknown word system.

 As a rule, the linguistic model is constructed as the search algorithm maximum functionality of the input sample and the sample of all "vocabulary" of the system.

Often this is the usual two-dimensional correlation. Although the choice of the dimension of description and his birth certificate may vary widely developer. Linguistic blocks of modern systems implement complex model of natural language.

Sometimes it is based on the mathematical apparatus of hidden Markov chains, sometimes utilizes the latest technology of neural networks.

REFERENCES

1. Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, vol. 77, no.2, February 1989

2. Rabiner, L.R. Juang, B.H. Fundamentals of speech recognition, 1993.