Investigation speech processing algorithms
Salykova Olga - Head the
Department of software, Candidate of Technical Sciences, Associate Professor.
Kim O.I. -
Undergraduate, Kostanay State University named A.Baitursynov, Kostanay.
Summary
Computing growing opportunities to stimulate the development of speech
recognition systems. These systems can significantly expand the range of
services that the user can be used via the telephone. It is free and surfing on
the internet with the help of voice calls, and control phone, and work with
applications. The development of these services requires intelligence improving
speech recognition subsystem comprising the computer telephony systems. One
area of speech recognition is to use automatic speech recognition
systems in modern telephones. Such systems can go to the usual method of user
voice communication with the mobile device neglecting communication with
operators.
Modern speech recognition system for computer telephony - is a complex
structure, which combines hardware and software components.
System automatic speech recognition (SASR) is an
element of the process of speech processing, the purpose of which is to provide
a comfortable dialogue between the user and the machine. In the broadest sense
we are talking about systems which have phonemic decoding speech acoustic
signal in the pronunciation of speech messages in free style, any speaker,
without regard to problem orientation and limitations on the volume of the
dictionary. In a narrow sense SARRE facilitate the solution of private problems
by imposing certain restrictions on requirements for the recognition of course
of speech in the classical sense. Thus, the range of species SARRE extends from
simple stand-alone devices and toys that are able to recognize or synthesize
separately spoken words, numbers, city names, etc., to sophisticated
recognition systems with natural-sounding speech and its synthesis for use, for
example, as administrative assistant (IBM VoiceType
Simply Speaking Gold).
As a major component of any user-friendly interface
between machine and human, SASR can be embedded in various applications, such
as voice control system, voice access to information resources, learning, using
computer, using incapable, of access to anything via voice verification/identification.
SASR is very useful as a search tool, and sorting the
recorded audio data. Speech recognition is also used when entering information,
especially when the eyes or hands of the person engaged. SARR allows people
working in a tense situation (doctors in hospitals, workers in manufacturing,
drivers), to apply the computer to receive or input the necessary information.
Usually SASR is used in systems such as phone
applications, embedded systems (system dialing, work with pocket computer, driving),
multimedia applications (the system of language learning).
Voice keys
The voice keys are sometimes called system of
automatic recognition of personality in speech. This is usually a biometric
system or authorized access to information or physical access to the objects.
We must distinguish two kinds of such systems: verification systems and
identification systems. The verification of the user previously makes your code
that is asserting itself in one way or another, and then aloud says the password
or some random phrase. The system checks whether the voice of the references
that were called from computer memory according to the shown code.
When identifying statements about the user is not
done. In this case, you can compare this voice with all standards, and then
determine who is identifiable by his voice. Today there are many approaches and
methods for the implementation of such systems, and they usually differ from
each other — how many developers, so many varieties of them. The same can be
said about the systems of speech recognition. Therefore, to judge the
characteristics of particular systems of speech recognition and personality
identification using speech is permissible only with a special test database.
The first developments in speech recognition Dating
back to the 1920-th years, the first system was established in 1952 by Bell
Laboratories (today it is part of Lucent Technologies). And the first
commercial system was established later: in 1960, IBM announced the development
of such a system, but the program was never released.
The first developments in speech recognition Dating
back to the 1920-th years, the first system was established in 1952 by Bell
Laboratories (today it is part of Lucent Technologies). And the first
commercial system was established later: in 1960, IBM announced the development
of such a system, but the program was never released.
Then, in the 1970-ies, the airline Eastern Airlines in
the US have established, system-dependent speaker; send Luggage: the operator
is called the destination and baggage journeyed. However, due to the number of
errors the system has not passed the probationary period.
After this developments in this area and if conducted it
is quite sluggish. Even in the 1980's, the actual commercial applications using
the systems speech recognition was pretty small.
Today in this area there are not dozens, but hundreds
of research groups in scientific and educational institutions, as well as in
large corporations. This can be seen in such international forums of scientists
and specialists in the field of speech technology, ICASSP, EuroSpeech,
ICPHS, etc. the results of the work, which, as we speak, "came over the
whole world," it is difficult to overestimate.
For several years the voice navigation or voice recognition
systems commands are successfully applied in various fields. For example,
call-center OmniTouch delivered to the Vatican by
Alcatel, was used for maintenance of the events held in celebration of the
2000th anniversary of Christ. Pilgrim, the caller in call center, set out your
question, and a system for automatic speech recognition, "heard" it.
If the system determined that the question is on a common subject, such as the
schedule of events or addresses of hotels, the included pre-made recording. If
we need to clarify the question is asked to a voice menu, the voice had to
specify one of the points. If the recognition system has determined that a
pre-recorded response to a question no, it was the connection of the pilgrims
with a human operator.
In Sweden not so long ago was opened the automatic
telephone inquiry service that uses speech-recognition software company
Philips. For the first month of service Autosvar,
which began without formal Declaration, its services were used by 200 thousand
customers. People need to get a certain number and after answering auto
attendant to call the interest of his section of the information directory.
The new service is mainly for private clients who
prefer it because of the much lower cost of services. Service Autosvar is the first system of its kind in Europe.
The main problem arising in the development of SASR is
the variable pronunciation of the same words by different people and the same
person in different situations. Person is not confused, but the computer can.
In addition, the incoming signal is affected by many factors such as ambient
noise, reflection, echo and interference in the channel. This is complicated by
the fact that noise and distortion are unknown in advance, that is, the system
may not be tailored to get started.
However, more than half a century of work on the
various SARR came to fruition. Almost any modern system can run in several
modes. First, it can be dependent or independent from the speaker. Speaker
dependent system requires special training for a particular user to accurately
recognize what he says. Learning system the user should say a few specific
words or phrases that the system will analyses and remember the results. This
mode is usually used in dictation systems, when the system is running single
user.
System without speaker can be used by any user without
training procedure. This mode is usually applied where the procedure of
learning is impossible, for example in a telephone application. Obviously, the
recognition accuracy of this system is higher than that of the system with
speaker. However, independent of the speaker system easier to use, for example
it can work with unlimited number of users and requires no training.
Secondly, systems are divided into working only with
isolated teams and is able to recognize connected speech. Speech recognition is
significantly more difficult than the recognition of separately spoken words.
For example, in the transition from recognition of isolated words for speech
recognition in the dictionary of 1000 words percentage of errors increased from
3.1 to 8.7, in addition, for speech processing requires three times more time.
An, additional variation in speech also arise because
of random intonations, accents, poor structure of sentences, pauses, repeats.
At the junction of the fused and separate delivery of
the words originated the mode of search keywords. In this mode, SASR finds a
predetermined word or group of words in common speech. Where it can be used?
For example, the listening devices that turns on and begins recording when it
appears in the speech of certain words, or in electronic reference. After
receiving a request in a free form, the system selects meaningful words and
recognizing them, provides the necessary information.
On the market today are presented SASR various
companies. Consider some of them.
Siri - personal
assistant and question-answering system, developed for iOS.
This application uses the processing of natural speech to answer questions and
give recommendations. Siri adapts to each user
individually, studying his preferences for a long time.
Google Now, the personalized search service from Google Inc.
In late 2011, the Internet began to appear reports
that the release of Android 4.1 the app "Google Search" will be
improved. The original code name was Google Now "Majeļ".
27 Jun 2012 Google Now was shown during the
demonstration of Android 4.1 at Google I/O. At the time of application was 10
information cards.
On 9 July of the same year the application was
published in the open access with new Android 4.1.
21 Mar 2013 Google Executive Chairman Eric Schmidt
said that Google Now for iOS has been submitted for
review in iTunes, but later denied this. Despite this, Google Now for iOS 29 APR 2013-appeared in iTunes.
In December 2012 there appeared information that
Google Now can integrate in Google Chrome as well as that the assistant will
serve as a replacement for the iGoogle service (later
this information was disproved). 15 may 2013 at the conference Google I/O
confirmed the information about the integration of the service into the
browser. In the version of Goggle Chrome Canary 16 January 2014 appeared alpha
version of the assistant. In the stable version of Google Chrome from 24 March
of the same year appeared the final version of the assistant. Earlier in the
browser has the opportunity to search through a voice command "OK
Google", which in October 2015 was removed as it was not popular.
In 2012, the app was named "Innovation of the
year".
The app displays information based on the user's
current location, personal calendar information, search history, history of
displacement, browsing history, etc. the User can configure the card to suit
your needs and delete unnecessary. For developers, this interface is the most
convenient for constant updating of information.
At the moment there are 36 Google Now information
cards: (Birthday, Concert, Currency, News, Events, Reminders, Birthdays friends
etc.).
Microsoft Cortana — virtual voice
assistant with elements of artificial intelligence from Microsoft for Windows
Phone 8.1, Microsoft Band, Windows 10, Android, Xbox One and iOS.
Personal assistant Cortana
is designed to anticipate the needs of the user. If desired, it is possible to
give access to your personal data such as email, address book, history of
searches in the network, etc. — all these data it will use to anticipate your
needs. Cortana will replace the standard search
engine and will be called by pressing the button "Search". Your query
can be printed manually or set to voice. The necessary information it will
find, based on the results of the search in Bing, Foursquare and among the
personal files of the user. Also the virtual assistant is not without a sense
of humor: she can maintain a conversation with you, to sing songs and tell
jokes. It will remind in advance you of the planned meeting, a friend's
birthday and other important events. Cortana notify
you if your flight was cancelled or on the road a lot of traffic.