2.7.3 MPEG-1 Compression Stages

Encoding
MPEG is a much more complex process than decoding it. Software encoders do exist [Labriola, 1995] but are much slower than their hardware counterparts. Quality can be improved by using high-quality sources and for video-CD's, filtering followed by artifact checks and additional post-processing is used to improve visual quality tremendously. The process of producing MPEG-encoded video generally includes all of the steps shown in figure 14.

Figure 14. Compression stages for MPEG-1 compressed video.

Stage 1: Preliminary Scaling and Colour Conversion
Original video signals in NTSC or PAL format are initially sampled and then decimated by 2:1 in both the horizontal and vertical directions. This results in a format known as SIF (Source Input Format or Standard Interchange Format). SIF from NTSC video is 352 pixels by 240 lines at 30 fps. In Europe, for PAL/ SECAM, SIF is 352 pixels by 288 lines at 25 fps which leads to indentical bitrates for both versions. This decimation used by MPEG-1 is avoided by MPEG-2, improves the quality by compressing the full CCIR-601 original. RGB colour information is converted to Luminance (Y) and Chrominance (Cb and Cr) values.

Stage 2: Colour Subsampling
The human eye is more sensitive to changes in brightness than in hue [Labriola, 1995] and since a reduction in the amout of colour data within an image has little effect on percieved quality, 75% of the colour data is automatically discarded by 2:1 horizontal and 2:1 vertical downsampling. This is a format known as 4:2:0 (4 Y samples for every Cb and Cr samples in a scanline) in A:B:C notation. Other formats possible are 4:2:2 and 4:4:4 but both require higher bitrates.

Stage 3: Discrete Cosine Transform ( DCT)
A two-dimensional DCT is applied to 8 pixel by 8 line blocks of either the image itself or the difference between two images. This produces blocks of DCT coefficients with 11 bits of amplitude information for 8-bit pixels [Tudor, 1992], an increase in the amount of data. Having transformed the image from the time domain to the frequency domain in this way, the DCT coefficients are scanned to form an output stream in a specific way (see section 2.4). For typical blocks of natural images, the distribution of coefficients is non-uniform and energy is concentrated into the lower frequency components. By clustering these together, leaving the lower-value coefficients in long strings and then encoding using the zigzag pattern of figure 8 (section 2.4), the efficiency of the run-length encoding is maximised.

Stage 4: Quantisation
Quantisation, a form of entropy coding, is then used with non-uniform quantisation levels. In practice, quantisation levels are coarser at higher frequencies because quantisation noise at lower frequencies is more noticeable to a human observer [Tudor, 1992]. Quantisation reduces the number of values a coefficient can take (it introduces quantisation noise here) and makes it more likely that adjacent coefficients are identical. Quantisation can also be used to control the bitrate of the output stream - fewer quantisation levels incareses the compression ratio since there is a reduction in the number of bits required to represent the levels.

Stage 5: Variable-length Coding ( VLC)
Variable length codes are then applied using a fixed dictionary of codewords. VLC coding uses the fact that short runs of zeroes are more likely than long runs and that small coefficients are more probable than large ones. The shortest codes are therefore applied to small coefficients preceded by short runs of zeroes and this results in an overall decrease in the number of bits required to represent the data.

Stage 6: Huffman Coding
A form of Huffman coding is then used, which takes the variable-length encoded data stream and produces a further-compressed stream. Huffman coding also assigns shorter codes to more probable symbols within the stream, and this achieves a good compression ratio [Crowcroft, 1995]. The combination of this form of encoding as well as the variable length encoding all but eliminates spatial redundancy within the frames.

Stage 7: Motion-Compensated Interframe Prediction
MPEG achieves even greater compresion ratios by eliminating redundant data that appears in more than one frame. If a block of pixels is identical to a block in an earlier frame, the second block will be replaced with a pointer to the data. This type of redundancy is known as temporal redundancy. If an identical block appears in a later frame, then again it will be replaced by a pointer but this time the order of transmission is reversed so that the data is received in the correct order, and the decoder is given the task of displaying all frames in the correct order. This eliminates temporal redundancy. Motion-compensation techniques describe the path of moving objects rather than store repeated images of the object, and results in compression ratios three times higher than that achieved with I-frames [C-Cube, 1996].

Last page

Next page