Investigation speech processing algorithms

 

Salykova Olga - Head the Department of software, Candidate of Technical Sciences, Associate Professor.

Kim O.I. - Undergraduate, Kostanay State University named A.Baitursynov, Kostanay.

 

Summary

Computing growing opportunities to stimulate the development of speech recognition systems. These systems can significantly expand the range of services that the user can be used via the telephone. It is free and surfing on the internet with the help of voice calls, and control phone, and work with applications. The development of these services requires intelligence improving speech recognition subsystem comprising the computer telephony systems. One area of ​​speech recognition is to use automatic speech recognition systems in modern telephones. Such systems can go to the usual method of user voice communication with the mobile device neglecting communication with operators.

Modern speech recognition system for computer telephony - is a complex structure, which combines hardware and software components.

 

System automatic speech recognition (SASR) is an element of the process of speech processing, the purpose of which is to provide a comfortable dialogue between the user and the machine. In the broadest sense we are talking about systems which have phonemic decoding speech acoustic signal in the pronunciation of speech messages in free style, any speaker, without regard to problem orientation and limitations on the volume of the dictionary. In a narrow sense SARRE facilitate the solution of private problems by imposing certain restrictions on requirements for the recognition of course of speech in the classical sense. Thus, the range of species SARRE extends from simple stand-alone devices and toys that are able to recognize or synthesize separately spoken words, numbers, city names, etc., to sophisticated recognition systems with natural-sounding speech and its synthesis for use, for example, as administrative assistant (IBM VoiceType Simply Speaking Gold).

As a major component of any user-friendly interface between machine and human, SASR can be embedded in various applications, such as voice control system, voice access to information resources, learning, using computer, using incapable, of access to anything via voice verification/identification.

SASR is very useful as a search tool, and sorting the recorded audio data. Speech recognition is also used when entering information, especially when the eyes or hands of the person engaged. SARR allows people working in a tense situation (doctors in hospitals, workers in manufacturing, drivers), to apply the computer to receive or input the necessary information.

Usually SASR is used in systems such as phone applications, embedded systems (system dialing, work with pocket computer, driving), multimedia applications (the system of language learning).

Voice keys

The voice keys are sometimes called system of automatic recognition of personality in speech. This is usually a biometric system or authorized access to information or physical access to the objects. We must distinguish two kinds of such systems: verification systems and identification systems. The verification of the user previously makes your code that is asserting itself in one way or another, and then aloud says the password or some random phrase. The system checks whether the voice of the references that were called from computer memory according to the shown code.

When identifying statements about the user is not done. In this case, you can compare this voice with all standards, and then determine who is identifiable by his voice. Today there are many approaches and methods for the implementation of such systems, and they usually differ from each other — how many developers, so many varieties of them. The same can be said about the systems of speech recognition. Therefore, to judge the characteristics of particular systems of speech recognition and personality identification using speech is permissible only with a special test database.

The first developments in speech recognition Dating back to the 1920-th years, the first system was established in 1952 by Bell Laboratories (today it is part of Lucent Technologies). And the first commercial system was established later: in 1960, IBM announced the development of such a system, but the program was never released.

The first developments in speech recognition Dating back to the 1920-th years, the first system was established in 1952 by Bell Laboratories (today it is part of Lucent Technologies). And the first commercial system was established later: in 1960, IBM announced the development of such a system, but the program was never released.

Then, in the 1970-ies, the airline Eastern Airlines in the US have established, system-dependent speaker; send Luggage: the operator is called the destination and baggage journeyed. However, due to the number of errors the system has not passed the probationary period.

After this developments in this area and if conducted it is quite sluggish. Even in the 1980's, the actual commercial applications using the systems speech recognition was pretty small.

Today in this area there are not dozens, but hundreds of research groups in scientific and educational institutions, as well as in large corporations. This can be seen in such international forums of scientists and specialists in the field of speech technology, ICASSP, EuroSpeech, ICPHS, etc. the results of the work, which, as we speak, "came over the whole world," it is difficult to overestimate.

For several years the voice navigation or voice recognition systems commands are successfully applied in various fields. For example, call-center OmniTouch delivered to the Vatican by Alcatel, was used for maintenance of the events held in celebration of the 2000th anniversary of Christ. Pilgrim, the caller in call center, set out your question, and a system for automatic speech recognition, "heard" it. If the system determined that the question is on a common subject, such as the schedule of events or addresses of hotels, the included pre-made recording. If we need to clarify the question is asked to a voice menu, the voice had to specify one of the points. If the recognition system has determined that a pre-recorded response to a question no, it was the connection of the pilgrims with a human operator.

In Sweden not so long ago was opened the automatic telephone inquiry service that uses speech-recognition software company Philips. For the first month of service Autosvar, which began without formal Declaration, its services were used by 200 thousand customers. People need to get a certain number and after answering auto attendant to call the interest of his section of the information directory.

The new service is mainly for private clients who prefer it because of the much lower cost of services. Service Autosvar is the first system of its kind in Europe.

The main problem arising in the development of SASR is the variable pronunciation of the same words by different people and the same person in different situations. Person is not confused, but the computer can. In addition, the incoming signal is affected by many factors such as ambient noise, reflection, echo and interference in the channel. This is complicated by the fact that noise and distortion are unknown in advance, that is, the system may not be tailored to get started.

However, more than half a century of work on the various SARR came to fruition. Almost any modern system can run in several modes. First, it can be dependent or independent from the speaker. Speaker dependent system requires special training for a particular user to accurately recognize what he says. Learning system the user should say a few specific words or phrases that the system will analyses and remember the results. This mode is usually used in dictation systems, when the system is running single user.

System without speaker can be used by any user without training procedure. This mode is usually applied where the procedure of learning is impossible, for example in a telephone application. Obviously, the recognition accuracy of this system is higher than that of the system with speaker. However, independent of the speaker system easier to use, for example it can work with unlimited number of users and requires no training.

Secondly, systems are divided into working only with isolated teams and is able to recognize connected speech. Speech recognition is significantly more difficult than the recognition of separately spoken words. For example, in the transition from recognition of isolated words for speech recognition in the dictionary of 1000 words percentage of errors increased from 3.1 to 8.7, in addition, for speech processing requires three times more time.

An, additional variation in speech also arise because of random intonations, accents, poor structure of sentences, pauses, repeats.

At the junction of the fused and separate delivery of the words originated the mode of search keywords. In this mode, SASR finds a predetermined word or group of words in common speech. Where it can be used? For example, the listening devices that turns on and begins recording when it appears in the speech of certain words, or in electronic reference. After receiving a request in a free form, the system selects meaningful words and recognizing them, provides the necessary information.

On the market today are presented SASR various companies. Consider some of them.

Siri - personal assistant and question-answering system, developed for iOS. This application uses the processing of natural speech to answer questions and give recommendations. Siri adapts to each user individually, studying his preferences for a long time.

 

Google Now, the personalized search service from Google Inc.

In late 2011, the Internet began to appear reports that the release of Android 4.1 the app "Google Search" will be improved. The original code name was Google Now "Majeļ".

27 Jun 2012 Google Now was shown during the demonstration of Android 4.1 at Google I/O. At the time of application was 10 information cards.

On 9 July of the same year the application was published in the open access with new Android 4.1.

21 Mar 2013 Google Executive Chairman Eric Schmidt said that Google Now for iOS has been submitted for review in iTunes, but later denied this. Despite this, Google Now for iOS 29 APR 2013-appeared in iTunes.

In December 2012 there appeared information that Google Now can integrate in Google Chrome as well as that the assistant will serve as a replacement for the iGoogle service (later this information was disproved). 15 may 2013 at the conference Google I/O confirmed the information about the integration of the service into the browser. In the version of Goggle Chrome Canary 16 January 2014 appeared alpha version of the assistant. In the stable version of Google Chrome from 24 March of the same year appeared the final version of the assistant. Earlier in the browser has the opportunity to search through a voice command "OK Google", which in October 2015 was removed as it was not popular.

In 2012, the app was named "Innovation of the year".

The app displays information based on the user's current location, personal calendar information, search history, history of displacement, browsing history, etc. the User can configure the card to suit your needs and delete unnecessary. For developers, this interface is the most convenient for constant updating of information.

At the moment there are 36 Google Now information cards: (Birthday, Concert, Currency, News, Events, Reminders, Birthdays friends etc.).

 

 

 

Microsoft Cortana — virtual voice assistant with elements of artificial intelligence from Microsoft for Windows Phone 8.1, Microsoft Band, Windows 10, Android, Xbox One and iOS.

Personal assistant Cortana is designed to anticipate the needs of the user. If desired, it is possible to give access to your personal data such as email, address book, history of searches in the network, etc. — all these data it will use to anticipate your needs. Cortana will replace the standard search engine and will be called by pressing the button "Search". Your query can be printed manually or set to voice. The necessary information it will find, based on the results of the search in Bing, Foursquare and among the personal files of the user. Also the virtual assistant is not without a sense of humor: she can maintain a conversation with you, to sing songs and tell jokes. It will remind in advance you of the planned meeting, a friend's birthday and other important events. Cortana notify you if your flight was cancelled or on the road a lot of traffic.