Oleksii V. Samoilenko, Cand. Sc. (Eng.), Assoc. Prof.

National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"

Technical Problems of Academic Plagiarism Identifying
in Higher Engineering Education

No one doubts the fact that academic plagiarism is a shameful phenomenon, with which it is necessary to fight irreconcilably.

Academic plagiarism is less common in the field of technical sciences and higher engineering education. Primarily, this is due to these reasons:

– high labour intensity of the work;

– high visibility of received academic results.

The issue of academic plagiarism detection in large arrays of information occurs more and more frequently. Therefore, it is quite logical to use computer technology for this. Now there is a lot of software (including free) to search for borrowings on the Internet and on the local computer.

However, work in the field of technical science and engineering education has features that significantly reduce the possibilities of automated processing or even make it impossible:

– large amount of graphic information that determines the essence of the work;

– presence of mathematical and chemical formulas, graphs, etc. in the text;

– limited lexical possibilities due to the standardization of terminology;

– presence of formally identical fragments in different places.

For example, the bachelor's degree project contains (on average) 8 blueprints of the A1 format and an explanatory note with a volume of about 70 pages. The blueprints are the main.

However, the software tests conducted by the author have shown that anti-plagiarism programs work very poorly with graphic information. If the search for coincidence in full-colour images of the program is poorly handled, then when processing monochrome images there are problems.

For example, the program does not distinguish between a monochrome image of a text and a monochrome sketch. The program does not distinguish significant and insignificant (framework and main labels) elements in the processing of blueprints.

Also, the difficulty is caused by the abundance of vector graphic formats in which blueprints can be presented.

Mathematical and chemical formulas are very poorly recognized by anti-plagiarism software. Especially because the same mathematical formula can be written in several ways (for example, permutation of terms). Therefore, anti-plagiarism software should recognize not only the appearance, but also the essence of mathematical and chemical formulas. And this has not yet been observed.

The language used in the field of technical science and engineering education is much formalized. For example, the search engine on the first attempt finds 23 Ukrainian state standards (which are mandatory for use) that regulate the terminology in the engineering industry. And this is not a complete list.

Also several explanatory notes of diploma projects can contain the same elements. This can be, for example, a description of the basic equipment (the same for several students who have a common object for modernization) or typical calculations (which are also often standardized).

The citation problem has a satisfactory solution mainly in relation to modern publications. At the same time, the publications of the countries of the former USSR are not yet sufficiently represented in free access.

Also, the citation has such difficulties:

– incompleteness of the bibliographic description of the source of information or even printed or grammatical errors in it;

– location of the source list information in a separate file;

– variety of ways to specify the information sources and their location in the text of the investigated document.

One of the solutions to this problem may be, obviously, the establishment in the future of a single identifier for scientific publications and technical documents.

In the short term there are two possible solutions:

– development of special high-performance anti-plagiarism software;

– adaptation of the tested work to the analytical capabilities of the existing anti-plagiarism software.

Obviously, the second method is the most preferable.

Author suggests such ways of adaptation of the information under study:

– translation of graphic information from vector to raster form;

– presentation of textual information in the simplest possible form.

The optimal use is the mark-up language HTML with attached CSS style sheets. And mathematical formulas can be written in LaTeX.

Advantages of the HTML/CSS format:

– complete document formatting;

– easy to learn;

– large number of free editors, including the WYSIWYG class;

– openness and the lack of encryption.

– the information is placed in the appropriate containers (tags).

Requirements for editing a document in the HTML/CSS format:

– strict unification of links to information sources;

– placing of quotations and other loans in special tags;

– minimal formatting.

However, anti-plagiarism software should be upgrated to use the HTML/CSS format.

The described problems can lead to false positive results. Therefore, the supervisory authorities should not apply decisions solely on the basis of the results of the work of the anti-plagiarism software. Each case must be considered individually. Any doubts should be interpreted in favour of authorship.