Novikov A.O.
National Research
Irkutsk State Technical University
Lossless data compression and lossy
Nowadays we have a lot of volume data storage medium
and high-speed data links. At the same time, size of transmitted data is
constantly growing. If we take as an example any HD-film, its size on a disk
may be more than dozens of gigabyte. One would think, why we need data
compaction, but there are some situations, where we cannot work without it, for
example:
‒
Mailing documents (especially if these documents have
big size and we mail them with mobile devices);
‒
Publication documents on sites, requirement of traffic
economy;
‒
Economy of disc space in cases, when substitution or
addition of storehouse facilities is difficult. For example, we face with this
situation, when we cannot find account for capital expenditure and there is not
disc space.
All compression methods can be divided into two
groups: lossy compression technique and lossless compression technique.
Lossless compression technique is used in cases when we need rebuild
information accurate within bit. For example, this method is only possible in text compression.
In some cases we do not need rebuild rebuilding so we
can use lossy compression technique. By contrast with lossless compression
technique, lossy compression technique realizes easier and provides large state
of archivation.
Lossy compression
technique. As the result of this type of compression we get the
best stage of compression and reservation of good quality of data. This type is
used for compression of analogue data – sound or pictures. In such cases expanded file may be
differ from the original file at the level “bit to bit”, but at the same time
it is undistinguishable for people’s ears or eyes in most cases.
Lossless compression
technique. Data is rebuilt accurate within bit, so it is not
lead to data loss. At the same time lossless compression technique shows us
worse stage of compression.
So what of these types of compression is better and
what of them we should choose for one or another types of data? Let’s examine the
main types of compression for the first and the second variants. In whole we
can separate three basic variants, on which compression algorithm works.
The first group of methods is data-flow reduction. It suggests
definition of new incoming uncompressed data through rendered data. Herein there
is no calculation of any certainty value, character encoding is based on
monitored data, for example, LZ-methods (these methods are called at first
letters of inventors’ surnames - Abraham Lempel and Jacob Ziv). In this case
the second and further occurrences of enrollment, which have been known by coder,
are changed with links to its first occurrence. LZ is used in gif and many
others.
The second group of methods is statistical methods of compression. In turn these
methods are divided into adaptive data compression (or ADC) and block-structured
data compression. In the first variant odds calculation for new data is
accomplished with rendered data. These methods include self-adaptive algorithms
of Huffman and Shannon–Fano. In the second case statistics of each block of
data are calculated separately and added to the most compression block. Here we
can include statistic variants of methods of Huffman and Shannon–Fano and arithmetic
coding. Probably, Huffman’s coding is the most popular method of data
compression. Simplicity
and clarity of this method made it academic front-runner. But at the same time Huffman’s
codes have practical use, for example, Huffman’s compile-time codes are used at
the last stage of compression JPEG. The standard of data compression for modems
MNP-5 use Huffman’s dynamic compression as a part of this process. Finally Shannon–Fano’s
coding, which is closed to Huffman’s coding, is used as one of the stage in imploding
algorithm of PkZip program.
The third method is so-called method of block converting.
Incoming data is divided into blocks, which then are all transformed. At that
some methods, especially which are based on block replacement, cannot lead to
sufficient shortcut of data size. But after such process data structure becomes
better considerably and further compression by other algorithms will be more
successful and fast.
There are two main schemes of lossy compression
technique:
1.
In inverted coders frames of pictures and sound are
transformed to the new basis area and quantize takes place. Transformation can
be for all frame (for example, in scemes based on wavelet- transformation) or in
a blocked manner (the typical example – JPEG). Then the result compresses in
entropy method.
2.
In lookahead coders previous and/or future data are
used for prediction active sample of a picture or sound. The mistake between lookahead
data and real data with additional information are combined by using lookahead
coders for compression of
error signals, which are generated on a predictable stage.
Lossy compression technique is used for graphic
(JPEG), sound (MP3), video (MPEG), in one word, in cases where in virtue of
large sizes of files compressive ratio is very important and we can forego details
which are not sufficient for understanding these information. There are special
possibility for information compression in the process of video compression. In
some cases the main part of a picture is pitched from frame to frame without
changing, so it allows us to build algorithms of compression based on custom track of a part of a
picture. In private
case the picture of speaking person who do not change his position can be update
only just in the area of the face or mouth – the part where we have the most
frequent changes from frame to frame.
In the end it is worth to say that these types of compression
are good differently
for special types of data.
Reference
list
1. Smirnov Yu.K. Secrets recovery PC hard disk.
Publishing house: BHV-Petersburg, 2011.-272 p .: ill.
2. Elektoronny resource. Access:
http://www.computerbooks.ru/êíèãà/