Boranbayev A.S.

L.N. Gumilyov Eurasian National University, Astana, Kazakhstan

METHODS OF DEVELOPMENT OF MULTILINGUAL VOICE ENABLED INFORMATION SYSTEMS

 

This paper addresses issues related to the development of multilingual voice portals in Kazakhstan, where Russian one of the main languages in the country, and English is being studied by the population.  I will discuss the technology behind these voice applications.  I will explain why I think that VoiceXML (Extensible Markup Language) is a great way to develop voice recognition applications over the telephone.  I will show you an approach for building a multilingual voice application by using technologies such as VoiceXML [1]. 

Instead of talking to your computer, you're essentially talking to a web site, and you're doing this over the phone.  Not all people have easy access to the Internet due to geographical location and lack of computer knowledge.  On the other hand a lot of people have access to a telephone or a wireless network.  Many enterprises are already taking advantage of the Internet to integrate both Internet and voice recognition technologies into a service that allows easy access to the Internet via the telephone.  This will allow us to access the Internet from anywhere using such devices as telephone, or Voice over Internet Protocol [2].

A lot of people in Kazakhstan have access to telephone lines and personal computers, but very few people in Kazakhstan have Internet access.  Voice portals allow us to access Web information using a voice interface and as such have a major role to play in Kazakhstan.  Establishment of voice portals in Kazakhstan has been almost nonexistent until now, unlike in developed countries, such as United States.

Voice application consist of technology components that allow the interaction between the voice application server and the user.  These components are: a dialog manager, a speech recognition module, a module which understands the language, and a text-to-speech module, which also includes speech synthesis.  The dialogue manager dictates the sequence of prompts and responses or dialogue states, interfaces to information and audio databases, and manages the telephony calls for instance call transfer in the case of a directory enquiries system.  Additional modules can be included such as speaker verification.  In the case of a multilingual voice portal, a language identification module would be added and would be responsible for the automatic switching between Russian and English.

VoiceXML technologies have multilingualism built in, but since Russian belongs to a class of languages that are not represented by Roman scripts, the implementation of our voice portals has certain difficulties.  VoiceXML was originally designed to allow audio dialogues that include voice recognition, speech synthesis, playing audio and telephony.  VoiceXML supports mixedinitiative conversations where the caller and the system take turns in driving the conversation [3].  The important component of the overall voice portal system is called Speaker Verification, which allows the verification of the voice signature of the speaker. 

In the process of developing a multilingual voice portal for Russian language adopting a standard such as VoiceXML has some problems: dialogue modes are limited to directed dialogue, semantic interpretations are confined to key/value pairs in the grammar; there is limited support for multilingualism and no place for language identification.

VoiceXML has limited support for multilingualism expressed in the xml:lang attribute, which can be specified for prompts and grammars [3]. For example the system could prompt in both English and Russian as follows:

<prompt xml:lang=”en-US”>How are you doing?</prompt>

<prompt xml:lang="ru">Как у вас дела?</prompt>

Also, in order to recognize the above two words in English and Russian one needs to specify the grammars as:

<grammar xml:lang=”en-US”>How are you doing?</grammar>

<grammar xml:lang=”ru”> Как у вас дела?</grammar>

Although Russian is one of the major languages in terms of the number of speakers, research in the area of speech processing has been lagging behind English.  A very promising approach to Russian orthography is the romanization or transliteration of Russian language, which can be combined with automatic diacritization into an approach that can be described as auto-romanization.  The biggest advantage of using auto-romanization is the fact that English and Russian can be mixed in the same VoiceXML document without worrying about which encoding standard to use.  The main disadvantage is that it is harder to write Russian in romanized form especially for native speakers of Kazakhstan.   Auto-romanization in VoiceXML can be implemented either by exploiting the object tag or by including them as part of the text-to-speech modules.

An example of a Voice Application was built to demonstrate the ideas that are presented in this paper: the Bill Payment Demo.  The Bill Payment system is automated whereby balance due is recorded and then posted on the Internet.  The speaker can enquire about his/her balances due by account his/her unique ID number.  This is a good example of a mixed-initiative dialogue since the speaker is expected to say the ID number.  CGI / Perl scripts running on an Apache server were used to extract information from a central database.

Next I will tell you what needed to be done for the  multilingual Russian/English implementation.  Whenever the application switched to a different language, the construct xml:lang=“language” had to be used (where language was either en-US or RU).  This had to be done for the prompt and the grammar tags.  Russian recognition was implemented using Cyrillic script.  The Voice Web Server that I used did not have any Russian TTS engine linked to the VoiceXML environment, so the audio had to be pre-recorded.  To aid with the pronunciation of words for the recognition dictionary and the recording of prompts, I used a commercial diacritizer.  We still need to find a solution to integrate a text-to-speech engine combined with diacritization and text normalization front-end.

 

References:

 

1. Boranbayev A.S. Developing applications using speech recognition and VoiceXML // Proceedings of the international conference “The theory of functions and computing methods”. -Astana, 2007, p.66-68.

2. Boranbayev A.S. The future of IVR and the continuous progress in speech recognition technology // Материалы VI Казахстанско-Российской международной научно-практической конференции “Математическое моделирование научно-технологических и экологических проблем в нефтегазодобывающей промышленности”. -Астана, 2007, с.82-86.

3. W3C Voice Browser:

http://www.w3.org/Voice/