Figure 14. Compression stages for MPEG-1 compressed video.
Stage 1: Preliminary Scaling and Colour Conversion
Original video
signals in NTSC or PAL format are initially sampled and then decimated by 2:1 in
both the horizontal and vertical directions. This results in a format known as
SIF (Source Input Format or Standard Interchange Format). SIF
from NTSC video is 352 pixels by 240 lines at 30 fps. In Europe, for PAL/ SECAM, SIF is 352 pixels by 288 lines at 25 fps which leads
to indentical bitrates for both versions. This decimation used by MPEG-1 is
avoided by MPEG-2, improves the quality by compressing the full CCIR-601 original. RGB colour information is converted to Luminance (Y) and
Chrominance (Cb and Cr) values.
Stage 2: Colour Subsampling
The human eye is more sensitive to
changes in brightness than in hue [Labriola, 1995] and since a reduction in the
amout of colour data within an image has little effect on percieved quality, 75%
of the colour data is automatically discarded by 2:1 horizontal and 2:1 vertical
downsampling. This is a format known as 4:2:0 (4 Y samples for every Cb and Cr
samples in a scanline) in A:B:C notation. Other formats possible are 4:2:2 and 4:4:4
but both require higher bitrates.
Stage 3: Discrete Cosine Transform ( DCT)
A two-dimensional DCT is applied to 8 pixel by 8
line blocks of either the image itself or the difference between two images.
This produces blocks of DCT coefficients with 11 bits of amplitude information
for 8-bit pixels [Tudor, 1992], an increase in the amount of data. Having
transformed the image from the time domain to the frequency domain in this way,
the DCT coefficients are scanned to form an output stream in a specific way (see
section 2.4). For typical blocks of natural images, the distribution of
coefficients is non-uniform and energy is concentrated into the lower frequency
components. By clustering these together, leaving the lower-value coefficients
in long strings and then encoding using the zigzag pattern of figure 8 (section
2.4), the efficiency of the run-length encoding is maximised.
Stage 4: Quantisation
Quantisation, a form of entropy coding, is
then used with non-uniform quantisation levels. In practice, quantisation levels
are coarser at higher frequencies because quantisation noise at lower
frequencies is more noticeable to a human observer [Tudor, 1992]. Quantisation
reduces the number of values a coefficient can take (it introduces quantisation
noise here) and makes it more likely that adjacent coefficients are identical.
Quantisation can also be used to control the bitrate of the output stream -
fewer quantisation levels incareses the compression ratio since there is a
reduction in the number of bits required to represent the levels.
Stage 5: Variable-length Coding ( VLC)
Variable length codes are then applied using a
fixed dictionary of codewords. VLC coding uses the fact that short runs of
zeroes are more likely than long runs and that small coefficients are more
probable than large ones. The shortest codes are therefore applied to small
coefficients preceded by short runs of zeroes and this results in an overall
decrease in the number of bits required to represent the data.
Stage 6: Huffman Coding
A form of Huffman coding is then used,
which takes the variable-length encoded data stream and produces a
further-compressed stream. Huffman coding also assigns shorter codes to more
probable symbols within the stream, and this achieves a good compression ratio
[Crowcroft, 1995]. The combination of this form of encoding as well as the
variable length encoding all but eliminates spatial redundancy within the
frames.
Stage 7: Motion-Compensated Interframe Prediction
MPEG achieves even greater compresion ratios by eliminating
redundant data that appears in more than one frame. If a block of pixels is
identical to a block in an earlier frame, the second block will be replaced with
a pointer to the data. This type of redundancy is known as temporal redundancy. If an identical block appears in a later
frame, then again it will be replaced by a pointer but this time the order of
transmission is reversed so that the data is received in the correct order, and
the decoder is given the task of displaying all frames in the correct order.
This eliminates temporal redundancy. Motion-compensation techniques describe the path of moving
objects rather than store repeated images of the object, and results in
compression ratios three times higher than that achieved with I-frames [C-Cube,
1996].