Video Encoding Course

Before learning about video encoding, it is important to acquire some terminology and concepts. Most of them you might already know, but a refresh never hurts!

Since video is by definition a succession of images, so we will start with understanding image formats.

Basic concepts

The pixel (picture element) represents the intensity of a given color, and it's usually a numeric value. For example:

Red Pixel = 0 of green + 0 of blue + maximum of red
Pink Pixel: 192 of green + 203 of blue + 255 of red.

We make use of a three-dimensional space where the intensity of each of the RGB colors is added to display the desired one.

píxels

There are other ways to encode a color image. For example, we can use an indexed palette where we will only need a single byte to represent each pixel, instead of the 3 needed when using the RGB model. In this kind of model, we can use a 2D matrix instead of a 3D one to represent the color, saving memory but limiting our color options.

For the first image format, the JPEG, we will only consider the first method.

Another basic concept is resolution, which is the number of pixels in a single dimension.

JPEG

As all standards, the JPEG codec was created by a board of experts who in this case, tried to agree on a normalized way of encoding images.

This group was the Joint Photographic Experts Group or JPEG, which would also be the name of the format.

It is a lossy compression method and supports true color (24 bits, or 16777216 different colors). Its ISO norm was created in 1983.

The following is the encoding / decoding diagrams of the JPEG compression:

codingdiagram

To understand better the JPEG, let's think about how the human eye works: we recognize objects equally well regardless of the image size. Based on this concept, the JPEG achieves considerable compression of data, making it extremely used in computing and the Internet.

However, since it's a lossy method, the reconstructed image will never be equal to the original.

comparison

In the previous comparison, we can see the difference between an uncompressed image and one compressed with JPEG. The image goes from 83 to 10 kilobytes, which is about 1/8 of compression, maintaining enough information so that the human eye understands the image.

DCT

One of the most important parts of the encoding process is the DCT (Discrete Cosine Transform). To understand how it works, let's use an example:

We assume that we want to compress a black and white image. The first step will be to fit the image into a grid. Then, for each cell, we will assign a value of intensity, or in other words, we will quantize the image.

dct1

Afterward, we can plot the numerical quantized values in a graph. The DCT will try to find the best approximation to the line created by the values. We draw a line along with the average value, and we try to match a sinusoidal wave to the obtained shape.

dct2

Therefore, the DCT can compress a lot of numerical values in just the numbers that define the sinusoidal function.

dct3

The JPEG standards define a serpentine pattern to convert the DCT into a linear sequence of values. Additionally, run-length encoding is applied, which swaps every line of zeroes for two digits, a zero, and the number of consecutive zeroes, reducing, even more, the output.

dct4

YCbCr

Until now, we used a black and white image as an example. But what about colored images? In that case, we just would need to repeat the operation three times, one for each channel RGB. Instead, JPEG introduced a new color model: the YCbCr. This stands for Luminance, Chroma blue difference and Chroma red difference respectively.

These values are obtained by applying a mathematical process to the RGB ones. Where R', G' and B' are the gamma-corrected values and Kr, Kg, and Kb three defined constants:

YUV

    Y  = R *  0.29900 + G *  0.58700 + B *  0.11400
    Cb = R * -0.16874 + G * -0.33126 + B *  0.50000 + 128
    Cr = R *  0.50000 + G * -0.41869 + B * -0.08131 + 128

    R  = Y +                       + (Cr - 128) *  1.40200
    G  = Y + (Cb - 128) * -0.34414 + (Cr - 128) * -0.71414
    B  = Y + (Cb - 128) *  1.77200

This method of representation of color will make it easier to compress. Note: It is possible that you see it also named as YPbPr, and although it is based on the same concept, that is applied only for analog signals.

yuv2

JPEG 2000

To solve the quality loss issue with JPEG, the same board of experts presented in the year 2000 a new standard: JPEG2000. It was technologically the best possible engineering solution to the problem they had, but it failed to standardize, as it required the hardware to adapt to the new codec (in other words, it was not backward compatible with JPEG).

Its file extension was .jp2. More information about this codec can be found in slides pdf at the bottom of this page.

Other Standards

Other methods of image compression succeeded in standardizing and are commonly used.

GIF (Graphical Image Interface)

It stores 8 bits per pixel for each image, up to 256 different colors chosen from the 24-bit RGB color space, therefore, using the second method for representing color that we explained in the first section of this document. It uses the file extension .gif, and one of the most interesting property is its animation support.

It is less suitable for color photographs but good enough for graphics or logos. Gif images are compressed using the Lempel-Ziv-Welch (LZW) lossless data methods to reduce the file size without degrading the visual quality.

PNG (Portable Network Graphics)

Another image standard that succeeded more than JPEG2000, was the PNG. It was a non-patented, open-source format that aimed to substitute GIF.

It supports 24RGB or 32-bit RGBA (allowing opacity). The file extension is .png and uses a lossless codec, based on DEFLATE encoding (a mix of LZSS and Huffman).

Resources

Download Slides T1

Download Seminar 1

Download Practical 1

T1. JPEG & JPEG2000