Современные информационные технологии/ 2. Вычислительная техника и программирование

 

M.Sc. Ospanov M.

Kostanay State University named after A.Baytursynov, Kazakhstan

Automated recognition of handwritten curves structure

 

The rapid growth of scientific and technological progress has led to the present, to a qualitatively new stage in the state of computer technology. Technical devices have achieved high performance speed, data capacity and transfer. Suggests a rapid increase in the number of potential computer users with constantly updated technical devices and software.

Putting documents into a computer requires a digitizing device (scanner, digital camera or video camera), long-term storage (hard, magneto-optical disk or CD) and programs of the digitized image of the document. As a minimum, the processing includes a storage device for recording digital image of the document in one of the standard formats, image storage, but the digitized images containing text information useful to recognize that there are found in the text data and present it in a text format.

From the user's personal computer text recognition is a process, symmetrical printed document equalizing electronic documents and hard copies. Arbitrary document containing the text information can be converted after scanning and recognizing the electronic document to be further processed in word processors just like the newly created document. Recognizable types of documents can serve as a typewritten pages, pages of books, magazines and newspapers, and hard copies of electronic documents produced by various systems of printing devices, printing faxes. There may also be recognized digital images such as faxes and received a copy of the computer screen. Given the steady increase in the number of personal computers and scanners [1], it becomes evident the need to OCR.

Professional use of text entry diverse. These are, of course, is publishing, in need of a re-release of texts that do not have electronic submission. Other application is the creation of digital archives, including financial institutions, and digital libraries for long-term storage.

Problem of pattern recognition and is widely known for a long time. In 1932, engineer V.E.Agapovym developed a machine designed to enter numbers in a counting device [2]. The method of applying a system of standards. In , the author refers to Jacob Rabinow, who began research on optical recognition in 1940. A large number of well-known works explore solutions recognition task as classification problem. This stage of accumulation of a large number of approaches to pattern recognition, including questions optimal training and classification, characterized by autonomous research towards the general problem of entering text into a computer document [3].

Both early and late works gravitate to the study of problems of recognition with fixed indicative of systems that are optimal conditions and calculation features, learning and classification. Of course, the optimization of the system and indicative of the rate of recognition are important issues. But, first, increase the amount of RAM and the speed of the computers in the last decade has become irreversible (can be estimated annual change in these characteristics for home personal computers as doubling the amount of memory and speed), and second, no one limits the system developers recognition of any particular set of features. It is understood that the number of attributes can grow may also indicative of multiple systems and other mechanisms, recognize, of course, to achieve better recognition of documents .

At the root of change and test the quality of recognition. This means that the number of errors is evaluated after the recognition of the actual documents as the number of patches needed to bring recognition results into a form acceptable for further processing. No less important in systems of mass input of structured documents and reliability criterion is necessary to highlight the results in text fragments that need to be checked by the operator, that is, the reliability of the system is determined by the ability to form cracks. Objectification of quality criteria together with the availability of computational experiments, consisting in the recognition of large quantities of scanned documents, resulting in a more reliable training and testing not only the entire system, but also the individual components of recognition algorithms.

Unlike stand-alone pattern recognition algorithms input scanned documents suggests a systematic approach to the problem. Namely, natural ingredients such as page segmentation recognition for text fragments and lines, finding figures and tables, segmentation and recognition of individual characters, the use of geometric and linguistic considerations, and other algorithms are considered interdependent, inter-component communications are both logically and information. To implement this approach requires the construction of each of the detection algorithms so that it is used in a mechanism could provide all the information stored algorithms and then use the results of the previous component.

Designing programs for identification of text documents, which are a central part of document systems and paper-based document capture systems, the computer must be made in view of today's consumers.

In the study, the following main results:

- Identified grammatical and spelling graphology features Arabic texts that affect their recognition algorithms.

- To improve the reliability of identification used by the width of the line as an additional parameter.

- Asked to recognize the symbols and signs separately, and then link them relations of space.

Literature:

1. Арлазаров В.Л., Славин O.A. Алгоритмы распознавания и технологии ввода текстов в ЭВМ. Информационные технологии и вычислительные системы № 1, 1996, стр. 48-54

2. Braun E.W. Applying Neural Networks to Character Recognition. http://www.ccs.neu.edu/home/feneric/charrecnn.html

3. Славин О.А. Средства управления базами графических образов символов и их место в системе распознавания. В сб. "Развитие безбумажных технологий в организациях", 1999, с. 277-289