Современные информационные технологии/ 2. Вычислительная
техника и программирование
M.Sc. Ospanov M.
Kostanay State University named after A.Baytursynov, Kazakhstan
Automated recognition of handwritten
curves structure
The rapid growth of scientific
and technological progress has led to the present, to a qualitatively new stage
in the state of computer technology. Technical devices have achieved high
performance speed, data capacity and transfer. Suggests a rapid increase in the
number of potential computer users with constantly updated technical devices
and software.
Putting documents into a computer
requires a digitizing device (scanner, digital camera or video camera),
long-term storage (hard, magneto-optical disk or CD) and programs of the
digitized image of the document. As a minimum, the processing includes a
storage device for recording digital image of the document in one of the
standard formats, image storage, but the digitized images containing text
information useful to recognize that there are found in the text data and
present it in a text format.
From the user's personal computer
text recognition is a process, symmetrical printed document equalizing
electronic documents and hard copies. Arbitrary document containing the text
information can be converted after scanning and recognizing the electronic
document to be further processed in word processors just like the newly created
document. Recognizable types of documents can serve as a typewritten pages,
pages of books, magazines and newspapers, and hard copies of electronic
documents produced by various systems of printing devices, printing faxes.
There may also be recognized digital images such as faxes and received a copy
of the computer screen. Given the steady increase in the number of personal
computers and scanners [1], it becomes evident the need to OCR.
Professional use of text entry
diverse. These are, of course, is publishing, in need of a re-release of texts
that do not have electronic submission. Other application is the creation of
digital archives, including financial institutions, and digital libraries for
long-term storage.
Problem of pattern recognition
and is widely known for a long time. In 1932, engineer V.E.Agapovym developed a
machine designed to enter numbers in a counting device [2]. The method of
applying a system of standards. In , the author refers to Jacob Rabinow, who
began research on optical recognition in 1940. A large number of well-known
works explore solutions recognition task as classification problem. This stage
of accumulation of a large number of approaches to pattern recognition,
including questions optimal training and classification, characterized by
autonomous research towards the general problem of entering text into a
computer document [3].
Both early and late works
gravitate to the study of problems of recognition with fixed indicative of
systems that are optimal conditions and calculation features, learning and
classification. Of course, the optimization of the system and indicative of the
rate of recognition are important issues. But, first, increase the amount of
RAM and the speed of the computers in the last decade has become irreversible
(can be estimated annual change in these characteristics for home personal
computers as doubling the amount of memory and speed), and second, no one
limits the system developers recognition of any particular set of features. It is
understood that the number of attributes can grow may also indicative of
multiple systems and other mechanisms, recognize, of course, to achieve better
recognition of documents .
At the root of change and test
the quality of recognition. This means that the number of errors is evaluated
after the recognition of the actual documents as the number of patches needed
to bring recognition results into a form acceptable for further processing. No
less important in systems of mass input of structured documents and reliability
criterion is necessary to highlight the results in text fragments that need to
be checked by the operator, that is, the reliability of the system is
determined by the ability to form cracks. Objectification of quality criteria
together with the availability of computational experiments, consisting in the
recognition of large quantities of scanned documents, resulting in a more reliable
training and testing not only the entire system, but also the individual
components of recognition algorithms.
Unlike stand-alone pattern
recognition algorithms input scanned documents suggests a systematic approach
to the problem. Namely, natural ingredients such as page segmentation
recognition for text fragments and lines, finding figures and tables,
segmentation and recognition of individual characters, the use of geometric and
linguistic considerations, and other algorithms are considered interdependent,
inter-component communications are both logically and information. To implement
this approach requires the construction of each of the detection algorithms so
that it is used in a mechanism could provide all the information stored
algorithms and then use the results of the previous component.
Designing
programs for identification of text documents, which are a central part of
document systems and paper-based document capture systems, the computer must be
made in view of today's consumers.
In
the study, the following main results:
-
Identified grammatical and spelling graphology features Arabic texts that
affect their recognition algorithms.
-
To improve the reliability of identification used by the width of the line as
an additional parameter.
-
Asked to recognize the symbols and signs separately, and then link them
relations of space.
Literature:
1. Арлазаров В.Л., Славин
O.A. Алгоритмы распознавания и технологии ввода текстов в ЭВМ.
Информационные технологии и вычислительные системы № 1, 1996, стр. 48-54
2.
Braun E.W. Applying Neural Networks to Character Recognition.
http://www.ccs.neu.edu/home/feneric/charrecnn.html
3. Славин О.А. Средства
управления базами графических образов символов и их место в системе
распознавания. В сб. "Развитие безбумажных технологий в организациях",
1999, с. 277-289