Modern information technology / 2. Computer Science and Programming

 

 

Lishchuk O.O.

Vinnitsa National Technical University, Ukraine

Analysis of lossless image compression algorithms

 

Among the algorithms for lossless two compression schemes – coding Huffman (Huffman) and LZW – encoding (the initial letters of the names Lempel (Lempel) i Ziv (Ziv) – the authors and Welch (Welch), which it greatly modified) form the basis for many of compression. These schemes represent two different approaches to data compression:

- statistical compression methods;

- vocabulary (heuristic) methods of compression.

Statistical algorithms (Huffman, Shannon-Fano, arithmetic) require knowledge of the probabilities of occurrence of characters in the image, the assessment of which is the frequency of the characters in the input. Typically, these probabilities are unknown. Given this statistical algorithms can be divided into three classes[1]:

1. Do not adaptive – use fixed beforehand set probability symbols. Table probabilities characters are not transmitted with the file as it is known in advance. Disadvantage: a short list of files, which achieved an acceptable compression ratio.

2. Semi adaptive – for each file based table of frequencies of symbols and with its help compress the file. Together with the compressed file is transferred Character. These algorithms compress pretty well most of the files, but requires additional transmission frequencies symbol table and two passes initial executable file encoding.

3. Adaptive – to deal with the initial fixed frequency table of characters (usually all the characters first equally possible) and in the process of this table varies dependences of characters found in the file. Advantages: permeability algorithm does not require transmission frequency tables of characters efficiently compresses large class files.

Heuristic (vocabulary) compression algorithms (such as LZ77, LZ78), tend to look in the file strings, duplicate and build vocabulary phrases that are met.

Typically, these algorithms have a number of specific parameters (buffer size, the maximum length of phrases, etc.), selection of which depends on the experience of the author, and these parameters are selected so as to achieve the optimal balance of time of the algorithm, compression ratio and file list that is well compressed.

RLE (Run Length Encoding), or group encoding one of the most ancient coding algorithms graphics. Its essence is to replace the characters that are repeated on a counter and the symbol repetition. The problem is to archiving the recovery could distinguish resulting flow in this series of coded symbols. The solution is obvious - put in chains all headers (eg, use as a sign of the first bit coding series). The method is quite effective for graphics format "byte per pixel" (eg, encoding format uses the RSH RLE).

However, the algorithm shortcomings are obvious - it is, in particular, has adapted to many types of files commonly found, such as text. Therefore, it can be used effectively only in conjunction with a secondary encoding. This approach is reflected in the algorithm coding Fax: first image is divided into black and white dots that turn RLE algorithm flow lengths in series, and these series length encoded by Huffman with specially chosen (experimental) tree [2].

Statistical Huffman encoding matches the input characters represented by bit strings of equal length (for example, eight-bit bytes) strings of bits of variable length. The length of the code to binary symbol is proportional to the logarithm of its frequency, taken with the opposite sign. This coding is prefix, making it easy to decode single-pass algorithm. Prefix code is conveniently displayed as a binary tree, in which characters are letters, and curves marked 0 or 1. Then the character code can be set as the path from the root of the tree to a leaf containing that character.

 Compression is performed in two passes:

1. During the first pass are rated frequency of characters in the original file.

2. During the second pass is performed presenting these characters prefix codes of variable length.

 The disadvantage of Huffman coding compression rate is dependent on the proximity of probability symbols to negative powers of two, and complex hardware implementation, which is the need of memorizing the frame image encoding is performed in two passes.

LZW – compression replaces strings of characters some codes. This is done without any analysis of the input text. Instead, the addition of each new line visible table rows. Compression occurs when code replaces the character string. Codes generated LZW – algorithm can be any length, but should contain more bits than a single character. The first 256 codes (when using 8-bit characters) meet the standard default character set. The rest of the code matches the processed algorithm.

When compressing and restoring LZW manipulates three objects: a stream of symbols, codes and flow of table rows. When compressing symbol stream is an input stream and codes – source. When restoring stream is input codes, and the flow of characters-starting. Grid lines generated and in compression and the recovery, but it never passed from the compression to restore and vice vesrsa.

Arithmetic coding allows for a high degree of data compression, especially when there are data where the frequency of the different characters is very different from one another. At the same time the procedure of arithmetic coding requires powerful computing resources, and until recently, this method was used when encoding images through the slow operation of the algorithm and, accordingly, a significant delay in the transmission of data. Although, expect high compression ratios in its application to image coding.

 

Literature

1.     Kozhemiako V.P., Maydanyuk V.P., Zhukov K.M. – Analysis and prospects of encoding images VPI // Bulletin, 1999, ¹ 3. - P. 42-48.

2.     Vatolin D.S. – Image compression algorithms. – Publishing Department of the Faculty of Computational Mathematics and Cybernetics, Moscow State University. MSU, 1999 - 76.