Novikov A.O.

National Research Irkutsk State Technical University

Lossless data compression and lossy

Nowadays we have a lot of volume data storage medium and high-speed data links. At the same time, size of transmitted data is constantly growing. If we take as an example any HD-film, its size on a disk may be more than dozens of gigabyte. One would think, why we need data compaction, but there are some situations, where we cannot work without it, for example:

         Mailing documents (especially if these documents have big size and we mail them with mobile devices);

         Publication documents on sites, requirement of traffic economy;

         Economy of disc space in cases, when substitution or addition of storehouse facilities is difficult. For example, we face with this situation, when we cannot find account for capital expenditure and there is not disc space.

All compression methods can be divided into two groups: lossy compression technique and lossless compression technique. Lossless compression technique is used in cases when we need rebuild information accurate within bit. For example, this method is only possible in text compression.

In some cases we do not need rebuild rebuilding so we can use lossy compression technique. By contrast with lossless compression technique, lossy compression technique realizes easier and provides large state of archivation.

Lossy compression technique. As the result of this type of compression we get the best stage of compression and reservation of good quality of data. This type is used for compression of analogue data – sound or pictures. In such cases expanded file may be differ from the original file at the level “bit to bit”, but at the same time it is undistinguishable for people’s ears or eyes in most cases.

Lossless compression technique. Data is rebuilt accurate within bit, so it is not lead to data loss. At the same time lossless compression technique shows us worse stage of compression.

So what of these types of compression is better and what of them we should choose for one or another types of data? Let’s examine the main types of compression for the first and the second variants. In whole we can separate three basic variants, on which compression algorithm works.

The first group of methods is data-flow reduction. It suggests definition of new incoming uncompressed data through rendered data. Herein there is no calculation of any certainty value, character encoding is based on monitored data, for example, LZ-methods (these methods are called at first letters of inventors’ surnames - Abraham Lempel and Jacob Ziv). In this case the second and further occurrences of enrollment, which have been known by coder, are changed with links to its first occurrence. LZ is used in gif and many others.

The second group of methods is statistical methods of compression. In turn these methods are divided into adaptive data compression (or ADC) and block-structured data compression. In the first variant odds calculation for new data is accomplished with rendered data. These methods include self-adaptive algorithms of Huffman and Shannon–Fano. In the second case statistics of each block of data are calculated separately and added to the most compression block. Here we can include statistic variants of methods of Huffman and Shannon–Fano and arithmetic coding. Probably, Huffman’s coding is the most popular method of data compression. Simplicity and clarity of this method made it academic front-runner. But at the same time Huffman’s codes have practical use, for example, Huffman’s compile-time codes are used at the last stage of compression JPEG. The standard of data compression for modems MNP-5 use Huffman’s dynamic compression as a part of this process. Finally Shannon–Fano’s coding, which is closed to Huffman’s coding, is used as one of the stage in imploding algorithm of PkZip program.

The third method is so-called method of block converting. Incoming data is divided into blocks, which then are all transformed. At that some methods, especially which are based on block replacement, cannot lead to sufficient shortcut of data size. But after such process data structure becomes better considerably and further compression by other algorithms will be more successful and fast.

There are two main schemes of lossy compression technique:

1.                 In inverted coders frames of pictures and sound are transformed to the new basis area and quantize takes place. Transformation can be for all frame (for example, in scemes based on wavelet- transformation) or in a blocked manner (the typical example – JPEG). Then the result compresses in entropy method.

2.                 In lookahead coders previous and/or future data are used for prediction active sample of a picture or sound. The mistake between lookahead data and real data with additional information are combined by using lookahead coders for compression of error signals, which are generated on a predictable stage.

Lossy compression technique is used for graphic (JPEG), sound (MP3), video (MPEG), in one word, in cases where in virtue of large sizes of files compressive ratio is very important and we can forego details which are not sufficient for understanding these information. There are special possibility for information compression in the process of video compression. In some cases the main part of a picture is pitched from frame to frame without changing, so it allows us to build algorithms of compression based on custom track of a part of a picture. In private case the picture of speaking person who do not change his position can be update only just in the area of the face or mouth – the part where we have the most frequent changes from frame to frame.

In the end it is worth to say that these types of compression are good differently for special types of data.

 

Reference list

1. Smirnov Yu.K. Secrets recovery PC hard disk. Publishing house: BHV-Petersburg, 2011.-272 p .: ill.

2. Elektoronny resource. Access: http://www.computerbooks.ru/êíèãà/