Нугуманова
М.А.
Карагандинский
Государственный Университет
им.
академика Е.А. Букетова
Computer technologies in translation
Translation has a long history. The roots it goes back to those far
times when the parent language started to break up to separate languages and
there was a necessity for people knowing of some languages and capable to act
in a role of intermediaries at dialogue of representatives of different
language communities.
Process and result of creation refers to as translation on the basis of
the initial text in one language equivalent to it in the communicative attitude
of the text in the other language. Thus the communicative equivalence, or
equivalence, is understood as such quality of the text of translation which
allows it to act during dialogue of carriers of different languages as full
replacement of the initial text (original) in sphere of action of language of
translation. Communicative equivalence of the new text in relation to the
original is provided with performance of three basic requirements: the text of
translation should transfer the contents of the original in probably more full
volume, that first of all means inadmissibility of any omission or addition of
the information; the text of translation should correspond to norms of language
of translation as their infringement, at least, creates handicaps for
perception of the information, and sometimes conducts and to its distortion;
The text of translation should be approximately comparable with the original on
the volume, than similarity of stylistic effect is provided from the point of
view of laconic openness expressions.
However performance of the specified requirements to the text of
translation frequently is connected to overcoming a different sort of
objectively existing difficulties. In the given work, we shall consider those
from them, which we can collide at machine translation.
Translation can be carried out: from one language on another -
nonnative, related, closely related; from a literary language on its dialect
and on the contrary, or from a dialect of one language on other literary
language; from language of the ancient period on the given language in its
modern status (for example, from old Russian language on modern Russian, with
Old English on modern English, etc.).
About
fifty years ago, Warren Weaver, a former director of the division of natural
sciences at the Rockefeller Institute (1932-55), wrote his famous memorandum
which had launched research on machine translation at first primarily in the
United States but before the end of the 1950s throughout the world.
In
those early days and for many years afterwards, computers were quite different
from those that we have today. They were very expensive machines disposed in
large rooms with reinforced flooring and ventilation systems to reduce excess
heat. They required a huge number of maintenance engineers and a dedicated
staff of operators and programmers. Most of the work was mathematical in fact,
either directly for military institutions or for university departments of
physics and applied mathematics with strong links to the armed forces. It was
perhaps natural in these circumstances that much of the earliest work on
machine translation was supported by military or intelligence funds directly or
indirectly, and was destined for usage by such organizations – hence the
emphasis in the United States on Russian-to-English translation, and in the
Soviet Union on English-to-Russian translation.
Although
machine translation attracted a great deal of funding in the 1950s and 1960s,
particularly when the arms and space races began in earnest after the launch of
the first satellite in 1957, and the first space flight by Gagarin in 1961, the
results of this period of activity were disappointing. US was even going to
close the research after the publication of the shattering ALPAC (Automatic
Language Processing Advisory Committee) report (1966) which concluded that the
United States had no need of machine translation even if the prospect of
reasonable translations were realistic – which then seemed unlikely. The
authors of the report had compared unfavourably the quality of the output
produced by current systems with the artificially high quality of the first
public demonstration of machine translation in 1954 – the Russian-English
program developed jointly by IBM and Georgetown University. The linguistic
problems encountered by machine translation researchers had proved to be much
greater than anticipated, and that progress had been painfully slow. It should
be mentioned that just over five years earlier Joshua Bar-Hillel, one of the
first enthusiasts for machine translation who had been disabused of his work,
had published his critical review of machine translation research in which he
had rejected the implicit aim of fully automatic high quality translation
(FAHQT). Indeed he provided a proof of its "non-feasibility". The
writers of the ALPAC report agreed with this diagnosis and recommended that
research on fully automatic systems should stop and that attention should be
directed to lower-level aids for translators.
For
some years after ALPAC, research continued on a much-reduced financing. By the
mid 1970s, some success could be shown: in 1970 the US Air Force began to use
the Systran system for Russian-English translations, in 1976 the Canadians
began public use of weather reports translated by the Meteo sublanguage machine
translation system, and the Commission of the European Communities applied the
English-French version of Systran for helping it with its heavy translation burden
– which soon was followed by the development of systems for other European
languages. In the 1980s, machine translation rose from its post-ALPAC low
spirits: activity began again all over the world – most notably in Japan – with
new ideas for research (particularly on knowledge-based and interlingua-based
systems), new sources of financial support (the European Union, computer
companies), and in particular with the appearance of the first commercial
machine translation systems on the market.
Initially,
however, attention to the renewed activity was still almost focuses on
automatic translation with human assistance, both before (pre-editing), during
(interactive solution of problems) and after (post-editing) the translation
process itself. The development of computer-based aids or tools for use by
human translators was still relatively neglected – despite the explicit
requests of translators.
Nearly
all research activities in the 1980s were devoted to the exploration of methods
of linguistic analysis in order to create generation of programs based on
traditional rule-based transfer and interlingua (AI-type knowledge bases
representing the more innovative tendency). The needs of translators were left
to commercial interests: software for terminology management became available
and ALPNET produced a series of translator tools during the 1980s – among them
it may be noted was an early version of a program "Translation
Memory" (a bilingual database).
The
real emergence of translator aids came in the early 1990s with the
"translator workstation", among them were such programs as
"Trados Translator Workbench", "IBM Translation Manager 2",
"STAR Transit", "Eurolang Optimizer", which combined
sophisticated text processing and publishing software, terminology management
and translation memories.
In the
early 1990s, research on machine translation was reinforced by the coming of
corpus-based methods, especially by the introduction of statistical methods
("IBM Candide") and of example-based translation. Statistical (stochastic)
techniques have brought a reliase from the increasingly evident limitations and
inadequacies of previous exclusively rule-based (often syntax-oriented)
approaches. Problems of disambiguation, refraining from repetition and more
idiomatic generation have become more solvable with corpusbased techniques. On
their own, statistical methods are no more the answer in contrast to rule-based
methods, but there are now prospects of improved output quality which did not
seem reachable 15 years ago. As many observers have indicated, the most
promising approaches will probably integrate rule-based and corpus-based
methods. Even outside research environments integration is already evident:
many commercial machine translation systems now incorporate translation memories,
and many translation memory systems are being enriched by machine translation
methods.
The
perfect translation system, be it a human or machine, does not exist. However,
the dream of something like the Babblefish from the Hitchhiker’s series or the
universal translator on Star Trek haunts us and might go something like this.
Your personal computer will have a translation module, maintained from some
central database created by the publisher of the system. When email comes in,
it will automatically and almost instantly be translated into whatever language
you desire (presumably your native tongue). When you send email, it will be
translated into whatever language you choose. You will be able to configure it
so that when email goes out to Japan, it is translated into Japanese, when it
goes to France, it is translated into French, and so on (or you can configure
on a person by person basis, giving consideration to the linguistic skills of
individuals). Similar systems will exist for businesses, but they will be
faster and more comprehensive. A book will be scanned into a computer and
rendered into another language in a matter of minutes. The computer might even
attend to the graphics and desktop publishing tasks, assuming you want it to.
The finished translation will need the same amount of editing and proofreading
that any piece of writing does, that is to say a lot.
Such
technology would make communication with anyone anywhere possible. You could
travel in remote parts of Tibet and speak and read with the locals. You will
walk into a conference and listen to an interpretation of the speaker given by
a machine which never tires or loses interest in the task. You can go to a
doctor or hotel or restaurant anywhere and communicate everything you need to, be
it verbally or in writing.
Despite
the prospects for the future, it has to be said that the new approaches of the
present have not yet resulted notable improvements in the quality of the raw
output by translation systems. These improvements may come in the future, but
overall it has to be said that at present the actual translations produced do
not represent major advances on those made by the machine translation systems
of the 1970s. We still see the same errors: wrong pronouns, wrong prepositions,
anomalous syntax, incorrect choice of terms, plurals instead of singulars,
wrong tenses, etc. – errors that no human translators would ever commit.
Unfortunately, this situation probably won't change in the near future. There
is little sign that basic general-purpose machine translation programs are soon
going to show significant advances in translation quality. And I think that if
producers of machine translating systems are still to continue sating market
with software of low quality (as in present) the whole machine translation
industry may be condemned for ever by the general public as producers of
essentially poor-quality software, that could possibly cause damaging of the
research and development or even its closure.
1.
Баранов А.Н. -
"Введение в прикладную лингвистику" (Раздел - Машинный перевод).
УРСС, М., 2001.
2.
Беляева Л.Н.,
Откупщикова М.И. - "Прикладное языкознание" (Раздел - Автоматический
(машинный) перевод). Изд-во Санкт-Петербургского ун-та, СПб., 2001.
3.
Журнал "Вопросы
языкознания" - Шаляпина З.М. - "Автоматический перевод: эволюция и
современные тенденции", 1996, № 2.
4.
Леонтьева Н.Н. - "К теории автоматического понимания естественных
текстов". Издательство Московского университета, М., 2000.
5.
Нелюбин Л.Л. - "Компьютерная лингвистика и машинный перевод". ВЦП, М., 1991.