Modern information technology / 2. Computer Science
and Programming
Lishchuk O.O.
Vinnitsa National
Technical University, Ukraine
Analysis of
lossless image compression algorithms
Among the algorithms for lossless two compression
schemes – coding Huffman (Huffman) and LZW – encoding (the initial letters of
the names Lempel (Lempel) i Ziv (Ziv) – the authors and Welch (Welch), which it
greatly modified) form the basis for many of compression. These schemes
represent two different approaches to data compression:
- statistical compression methods;
- vocabulary (heuristic) methods of compression.
Statistical algorithms (Huffman, Shannon-Fano,
arithmetic) require knowledge of the probabilities of occurrence of characters
in the image, the assessment of which is the frequency of the characters in the
input. Typically, these probabilities are unknown. Given this statistical
algorithms can be divided into three classes[1]:
1. Do not adaptive – use fixed beforehand set
probability symbols. Table probabilities characters are not transmitted with
the file as it is known in advance. Disadvantage: a short list of files, which
achieved an acceptable compression ratio.
2. Semi adaptive – for each file based table of
frequencies of symbols and with its help compress the file. Together with the
compressed file is transferred Character. These algorithms compress pretty well
most of the files, but requires additional transmission frequencies symbol
table and two passes initial executable file encoding.
3. Adaptive – to deal with the initial fixed frequency
table of characters (usually all the characters first equally possible) and in
the process of this table varies dependences of characters found in the file.
Advantages: permeability algorithm does not require transmission frequency
tables of characters efficiently compresses large class files.
Heuristic (vocabulary) compression algorithms (such as
LZ77, LZ78), tend to look in the file strings, duplicate and build vocabulary
phrases that are met.
Typically, these algorithms have a number of specific
parameters (buffer size, the maximum length of phrases, etc.), selection of
which depends on the experience of the author, and these parameters are
selected so as to achieve the optimal balance of time of the algorithm,
compression ratio and file list that is well compressed.
RLE (Run Length Encoding), or group encoding one of
the most ancient coding algorithms graphics. Its essence is to replace the
characters that are repeated on a counter and the symbol repetition. The
problem is to archiving the recovery could distinguish resulting flow in this
series of coded symbols. The solution is obvious - put in chains all headers
(eg, use as a sign of the first bit coding series). The method is quite
effective for graphics format "byte per pixel" (eg, encoding format
uses the RSH RLE).
However, the algorithm shortcomings are obvious - it
is, in particular, has adapted to many types of files commonly found, such as
text. Therefore, it can be used effectively only in conjunction with a secondary
encoding. This approach is reflected in the algorithm coding Fax: first image
is divided into black and white dots that turn RLE algorithm flow lengths in
series, and these series length encoded by Huffman with specially chosen
(experimental) tree [2].
Statistical Huffman encoding matches the input
characters represented by bit strings of equal length (for example, eight-bit
bytes) strings of bits of variable length. The length of the code to binary
symbol is proportional to the logarithm of its frequency, taken with the
opposite sign. This coding is prefix, making it easy to decode single-pass
algorithm. Prefix code is conveniently displayed as a binary tree, in which
characters are letters, and curves marked 0 or 1. Then the character code can
be set as the path from the root of the tree to a leaf containing that
character.
Compression is performed in two passes:
1. During the first pass are rated frequency of
characters in the original file.
2. During the second pass is performed presenting
these characters prefix codes of variable length.
The disadvantage of Huffman coding compression
rate is dependent on the proximity of probability symbols to negative powers of
two, and complex hardware implementation, which is the need of memorizing the
frame image encoding is performed in two passes.
LZW – compression replaces strings of characters some
codes. This is done without any analysis of the input text. Instead, the
addition of each new line visible table rows. Compression occurs when code
replaces the character string. Codes generated LZW – algorithm can be any
length, but should contain more bits than a single character. The first 256
codes (when using 8-bit characters) meet the standard default character set.
The rest of the code matches the processed algorithm.
When compressing and restoring LZW manipulates three
objects: a stream of symbols, codes and flow of table rows. When compressing
symbol stream is an input stream and codes – source. When restoring stream is
input codes, and the flow of characters-starting. Grid lines generated and in
compression and the recovery, but it never passed from the compression to
restore and vice vesrsa.
Arithmetic coding allows for a high degree of data
compression, especially when there are data where the frequency of the
different characters is very different from one another. At the same time the
procedure of arithmetic coding requires powerful computing resources, and until
recently, this method was used when encoding images through the slow operation
of the algorithm and, accordingly, a significant delay in the transmission of
data. Although, expect high compression ratios in its application to image
coding.
Literature
1. Kozhemiako V.P., Maydanyuk V.P., Zhukov K.M. – Analysis
and prospects of encoding images VPI // Bulletin, 1999, ¹ 3. - P. 42-48.
2. Vatolin D.S. – Image compression algorithms. – Publishing
Department of the Faculty of Computational Mathematics and Cybernetics, Moscow
State University. MSU, 1999 - 76.