Overview of Digital Video 1 - PowerPoint PPT Presentation

Loading...

PPT – Overview of Digital Video 1 PowerPoint presentation | free to view - id: 19540f-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Overview of Digital Video 1

Description:

Analog video only use a bandwidth of few megahertz, but the bit rate of ... information prior to the network transmission and decompress (decode) ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 82
Provided by: ycc5
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Overview of Digital Video 1


1
Overview of Digital Video (1)
  • Digital video is regarding video information
    that is stored and transmitted
  • in digital form.
  • Analog video only use a bandwidth of few
    megahertz, but the bit rate of
  • the digital signal for transmitting the same
    video content is typically over
  • 100 Mbps, which is too high to be feasible for
    most of the networks today.
  • Video compression is the solution for both
    stored video and video
  • transmission over the network. The the video
    compression techniques
  • have been evolving for the past two decades.
  • Due to the advance of the processor technology
    and the development of
  • international video compression standards, a
    wide range of video
  • communications applications have been developed
    in recent years.

2
Overview of Digital Video (2)
  • Representation of Video Information
  • A color image is represented in terms of
    component signals.
  • A color can be synthesized by combining three
    color primaries,
  • red, green and blue (RGB).
  • Each of the three primary contains information
    of luminance (brightness)
  • and chrominance (color), which can be
    represented separately.
  • A luminance signal (Y) can be produced from a
    weighted sum of R, G,
  • and B components.
  • The chromaticity of the color can be represented
    by the color difference
  • signal.
  • Cr wr (R-Y)
  • Cb wb (B-Y)
  • Cg wg (G-Y)
  • where wr , wb , and wg are weighting factors.

3
Overview of Digital Video (3)
  • Standards for Analog Color TV
  • NTSC (National Television System Committee),
    this is used in the USA
  • and Japan. NTSC use 525 lines per frame, and
    its field rate is 60 Hz (i.e.
  • 30 frame per second)
  • PAL (Phase Alternation Line) system is used in
    most of Western Europe.
  • SECAM (Sequential Colour avec Memoire ) is used
    in France and in
  • parts of Eastern Europe.
  • In both PAL and SECAM systems, each frame
    consists of 625 active
  • lines, and the field rate is 50 Hz (i.e. 25
    frames per second) .
  • All the above systems use three components
    luminance Y, blue color
  • difference U (equivalent to Cb), and red color
    difference V (equivalent
  • to Cr).

4
Overview of Digital Video (4)
  • ITU-R (International Telecommunications Union -
    Radio) Standard
  • The ITU-R (former CCIR) provides a standard
    method of encoding
  • television information in digital form.
  • The luminance and color difference components
    are sampled with a
  • precision of 8-bits.
  • The luminance component of an NTSC frame is
    sampled to produce an
  • image of 525 lines, each containing 858
    samples.The active area of the
  • digitized frame is 720 x 486 picture elements
    (pixels).
  • The luminance component of a PAL or SECAM frame
    is sampled with
  • 625 lines and 864 samples, and the active area
    is 720x576 pixels.
  • The color difference signals are sampled with
    the same vertical resolution
  • (486 or 576 active lines) and the horizontal
    resolution halved.
  • Only the odd-numbered luminance pixels in each
    line have associated
  • color difference pixels.

5
Overview of Digital Video (5)
Bit rate of CCIR 601 digital TV signal NTSC
original bit rate 30 x 8 x( (858x525)
(429x525x2)) 216.216Mbps - 30 frame per
second - 858 x 525 luminance samples - 429 x 525
x 2 chrominance samples - 8 bits per
sample PAL/SECAM original bit rate 25 x 8 x(
(864x625)(432x625x2))
216.0Mbps - 25
frame per second - 864 x 625 luminance samples -
432 x 625 x 2 chrominance samples - 8 bits per
sample
6
Why Video Compression ?
A single digital TV signal for CCIR 601 format
needs 216 Mbps bit rate, which is unacceptable
for most practical network transmission. On the
other hand, a traditional analog TV signal with
similar quality only requires 6 7 MHz.
Obviously, network transmission of original
digital TV signal is too expensive to be
practical. Before digital TV or video can be
transmitted through the network, the data rate
needs to be reduced. This means to compress
(encode) the digital video information prior to
the network transmission and decompress (decode)
the received video information before displaying
it. Digital video information contains a
considerable amount of redundancy. Video data is
usually highly correlated both spatially and
temporally. This redundancy can be removed by
coding the data in a more efficient way.
7
Overview of Compression Methods (1)
  • Types of redundancy
  • Spatial redundancy - the values of neighboring
    pixels are strongly
  • correlated in almost all natural images.
  • Redundancy in scale - important image features
    such as straight edges
  • and constant regions are invariant under
    rescaling.
  • Redundancy in frequency - in images composed of
    more than one
  • spectral band, the spectral values for the same
    pixel location are often
  • correlated and an audio signal can completely
    mask a sufficiently
  • weaker signal in its frequency vicinity.
  • Temporal redundancy - adjacent frames in a
    video sequence often
  • show very little change, and a strong audio
    signal in a given time block
  • can mask an adequately lower distortion in a
    previous or future block.
  • Stereo redundancy - audio coding methods can
    take advantage of the
  • correlations between stereo channels.

8
Overview of Compression Methods (2)
Characteristics of Compression Methods
9
Overview of Compression Methods (3)
Relation between perceptible quality and required
bandwidth
Quality
high
lossless compression
lossy compression
low
Bandwidth
high
low
10
Overview of Compression Methods (4)
Coding Techniques for Multimedia Systems
11
Overview of Compression Methods (5)
  • Entropy coding (lossless)
  • Entropy is defined as the average information
    content of given data.
  • It defines the minimum number of bits needed to
    represent the information
  • content without information loss.
  • Entropy coding is a lossless technique, it tries
    to achieve this theoretical
  • lower limit.
  • Source coding (lossy)
  • It distinguishes relevant and irrelevant data.
  • It takes into account the semantics of the data
    and removes the irrelevant
  • data, so that the original data stream can be
    compressed.
  • Hybrid coding (lossy)
  • This is a combination of entropy coding and
    source coding

Compressed data
Uncompressed data
Source Coding steps
Entropy Coding steps
Preparation
12
Basic Coding Methods (1)
  • Entropy coding does not use semantics of the
    data only bit stream is
  • Considered.
  • Run-Length Coding
  • Uncompressed data ABCCCCCCCDEFGH
  • Run-Length coded A!7CDEFGH
  • Huffman Coding
  • p(A) ¾, p(B) 1/8,
  • p(C) p(D) 1/16
  • w(A) 1, w(B) 01,
  • w( C) 001,
  • w(D) 000

0
1
0
1
1
0
13
Basic Coding Methods (2)
  • Arithmetic Coding
  • This coding method often generates slightly
    better results in audio and
  • video coding because it works with floating
    points instead of the characters.
  • used in Huffman coding.
  • The floating points is computationally more
    expensive.
  • It has been shown that average compression
    achieved by arithmetic and
  • Huffman coding is very similar.
  • The algorithm for arithmetic coding is covered
    by patents held by IBM,
  • ATT and Mitsubishi

14
Basic Coding Methods (3)
  • Discrete Cosine Transform Coding
  • Pixels are grouped into blocks, which are
    transformed into another domain
  • to form a set of coefficients, these
    coefficients are coded and transmitted.
  • Compression is done by quantizing the
    coefficients so that the useful
  • coefficient are transmitted and the remaining
    coefficients are discarded.
  • The most effective compaction is achieved using
    Karhunen-Loeve
  • Transform (KLT), but it is very
    computationally intensive, while discrete
  • cosine transform (DCT) is a widely used
    alternative to KLT.
  • A DCT-based image coding system usually consists
    of following steps
  • - Separate the image into blocks
  • - Discrete Cosine Transform
  • - Quantization
  • - Encoding

15
Basic Coding Methods (4)
Discrete Cosine Transform The DCT converts a
block of pixels into a block of transform
coefficients of the same dimensions. These
coefficients represent the spatial frequency
components that make up the original pixel
block. Each coefficient can be thought of as a
weight that is applied to an appropriate basis
function.
DC basis function
Increasing horizontal frequency
Increasing vertical frequency
16
Basic Coding Methods (5)
  • A gray-scale 8 x 8 pixel block can be fully
    represented by a weighted sum
  • of these 64 basis functions.
  • The appropriate weights that are required to
    produce a particular block are
  • the DCT coefficients for that block.
  • The two-dimensional DCT of an N x N block of
    pixel values

and the inverse DCT
where F(u,v) is the transform coefficient and
f(i,j) is the pixel value, and
17
Basic Coding Methods (6)
  • Each coefficient in the forward transform is
    calculated by summing 64
  • separate calculations, this results a total of
    64 x 64 4,096 calculations
  • for transforming an 8 x 8 block of pixels.
  • The above calculation process can be replaced by
    a one-dimensional
  • transform along all the rows and then all the
    columns of the block.
  • Since each coefficient in the one-dimensional
    transform requires 8
  • calculations, so either a row or a columns
    needs 8 x 8 64 calculations,
  • this results 64 x 8 (for 8 rows) 64 x 8 (for
    8 columns) 1,024
  • calculations.
  • The computational complexity can be further
    reduced by replacing the
  • cosine form of the transform with an algorithm
    which only perform a
  • series of multiplication and addition
    operations

18
Basic Coding Methods (7)
Frequency distribution
vertical
DC
DC
low
diagonal
medium
horizontal
high
block features
frequency distribution
19
Basic Coding Methods (8)
Zigzag Sequence Order of AC coefficients
with increasing frequency
20
Basic Coding Methods (9)
  • Differential Pulse Code Modulation (DPCM)
  • The image is scanned in a raster fashion
  • Each pixel is represented as a number with a
    limited precision
  • A predictive pixel value based on the previously
    transmitted pixel is
  • transmitted instead of the actual value.
  • The prediction error between the predicted pixel
    value and the actual value
  • is quantized and transmitted.
  • Encoding the quantized error using variable
    length codes to achieve further
  • compression.

Current pixel
21
Basic Coding Methods (10)
  • Motion-Compensated Prediction
  • Temporal redundancies between two frames bin a
    video sequence can be
  • exploited.
  • The idea is to look for a certain area in
    previous or subsequent frame that
  • matches very closely an area of the same size
    in the current frame.
  • If successful, a best matching block can be
    found, and the difference signal
  • between the block intensity values of the
    block in the current frame and
  • the block in the reference frame is
    calculated.
  • The motion vector, which represents the
    translation of corresponding
  • blocks in both x- and y-direction is
    determined.
  • The difference signal and the motion vector
    represent the deviation between
  • reference block and predicted block, both are
    called prediction error.

22
Basic Coding Methods (11)
  • There are three types of motion-compensated
    prediction
  • Unidirectional motion-compensatd prediction
  • Bidirectional motion-compensatd prediction
  • Interpolative motion-compensated prediction

Forward Motion- Compensated Prediction
po
23
Still Image Coding (1)
Image Coding
Encoder
Decoder
Input image
Compressed data
Image output
Encoder model
Entropy Encoder
Entropy decoder
Decoder model
  • Image coding techniques
  • Predictive coding
  • Discrete Cosine Transform Coding
  • Subband coding
  • Fractal coding

24
Still Image Coding (2)
  • JPEG (Joint Photographic Experts Group)
  • JPEG standard defines 4 coding modes
  • Sequential DCT-based encoding - each component
    is encoded in a single
  • scan based on DCT.
  • Progressive DCT-based encoding - it uses
    multiple scans, each scan
  • contains a partially encoded version of the
    image. The scans can be
  • decoded sequentially so that a rough image is
    quickly decoded and this is
  • then built up using further scans.
  • Hierarchical encoding - each component is
    encoded at multiple resolutionstor,
  • dufferring by a factor of two or
  • Lossless encoding it is based on DPCM system.
    This mode provides
  • compression without any loss of quality using
    more time consuming
  • algorithms.

25
Still Image Coding (3)
Four Coding Modes of JPEG
Progressive DCT
Sequential DCT
Sequential lossless
Hierarchical
26
Still Image Coding (4)
27
Still Image Coding (5)
  • Image preparatioon
  • A source image must have a rectangular format
    and consist of 1 to 255
  • planes or components, such as RGB or YUV. After
    the separation of
  • components, each component is divided into data
    units of 8x8 pixel blocks.
  • Picture processing
  • The baseline mode compresses the data by
    applying a two-dimensional
  • DCT, then quantizing and entropy coding the
    corresponding DCT
  • coefficients. There are 64-element quantization
    table associated with
  • 64 DCT coefficient.
  • Entropy encoding
  • JPEG specifies both Huffman and arithmetic
    encoding for entropy coding.

28
Coding of Moving Images
A video CODEC can be anything from the simplest
A2D device, through to something that does
picture pre-processing, and even has network
adapters build into it. A CODEC usually does
most of its work in hardware, but there is no
reason not to implement everything in software on
a reasonably fast processor. The most
expensive and complex component of a CODEC is the
compression/decompression part. There are a
number of international standards and many
number of proprietary compression techniques for
video.
29
Moving Image Coding (1)
  • H.261
  • H.261 is the most widely used international
    video compression standard
  • for video conferencing.
  • This ITU (was CCITT) standard describes the
    video coding and decoding
  • methods for the moving picture component of an
    audiovisual service at
  • the rates of p 64 Kbps where p is in the
    range of 1 to 30.
  • The standard targets and is really suitable for
    applications using circuit
  • switched networks as their transmission
    channels. This is as ISDN with
  • both basic and primary rate access was the
    communication channel
  • considered within the framework of the
    standard.
  • H.261 is usually used in conjunction with other
    control and framing
  • standards such as H.221, H.230 H.242 and H.320,
    of which more later.

30
Moving Image Coding (2)
Processing Steps of the H.261 Video Codec
External control
Coding control
Transmission buffer
Transmission coder
Source coder
Video multiplex coder
Video signal
Coded bit stream
Video coder
Video coder
31
Moving Image Coding (3)
  • H.261 Inage preparation
  • The source coder operates on only non-interlaced
    pictures. Pictures are
  • coded as luminance and two color difference
    components(Y, Cb, Cr). The
  • Cb and Cr matrices are half the size of the Y
    matrix.
  • H.261 supports two image resolutions, CIF (Common
    Intermediate Format)
  • and QCIF.
  • CIF
    QCIF
  • Y 352 x 288 176
    x 144
  • Cb 176 x 144 88
    x 72
  • Cr 176 x 144
    88 x 72
  • CIF and QCIF frames are divided into a
    hierarchical block structure
  • Consisting of picture, group of blocks (GOB),
    macro blocks, and blocks.

32
Moving Image Coding (4)
  • Hierarchical block structure of H.261
  • Structure Element
    Description
  • Picture (frame) 1
    video picture
  • Group of blocks 33
    macro blocks
  • Macro block 16
    x 16 Y, 8 x 8 Cb, Cr
  • Block
    8 x 8 pixels (coding unit for DCT)
  • Picture
  • H.261 uses two types of coding macro blocks,
    intraframe and interframe.
  • There is no advantage regarding the redundancy
    between frames. Beyond
  • this, H.261 tries to make use of temporal
    redundancies by means of
  • motion-compensated prediction.

33
Moving Image Coding (5)
  • The first frame to be transmitted is always an
    intraframe coded frame (i.e.
  • all macro blocks are intraframe coded.)
  • The entire picture is divided into
    nonoverlapping 8x8 pixel blocks on
  • which the forward DCT is applied.
  • The resulting 64 DCT coefficients quantized and
    zigzag-reordered
  • For interframe coding, the recently coded frame
    is decoded again within
  • the encoder using inverse quantization and
    inverse DCT.
  • For the next frame to be encoded, the last
    previously coded and stored
  • frame is used for deciding whether to
    intraframe- or interframe-code
  • each macro block.
  • The algorithm performs a unidirectional
    motion-compensated prediction
  • which uses four luminance blocks of each macro
    block to find a close
  • match in the previous frame for the macroblock
    currently encoded.
  • If it cannot find a close match, it employs the
    same coding for the macro
  • block as in intraframe coding.

34
Moving Image Coding (6)
Block Transformation H.261 supports motion
compensation in the encoder as an option. In
motion compensation a search area is constructed
in the previous (recovered) frame to determine
the best reference macroblock . Both the
prediction error as well as the motion vectors
specifying the value and direction of
displacement between the encoded macroblock and
the chosen reference are sent. The search area
as well as how to compute the motion vectors are
not subject to standardization. Both horizontal
and vertical components of the vectors must have
integer values in the range 15 and 15 though
In block transformation, INTRA coded frames as
well as prediction errors will be composed into
8x8 blocks. Each block will be processed by a
two- dimensional FDCT function. If this sounds
expensive, there are fast table driven
algorithms and can be done in s/w quite easily,
as well as very easily in hardware.
35
Moving Image Coding (7)
  • The motion estimation process results in three
    possible decisions for
  • coding a macro block.
  • - Intracoding, where blocks of 8x8 pixels each
    are only with reference
  • to themselves and are sent directly to the
    block transformation process.
  • - Intercoding with motion compensation (the
    motion vector has zero
  • value)
  • Intercoding with motion compensation.
  • There is an optional filter between DCT and the
    entropy coding process,
  • which can be used to improve the image quality by
    removing high-
  • frequency noise as needed.
  • Quantization in H.261 is a linear function, the
    quantization step size
  • depends on the amount of data in the transform
    buffer, thereby generating
  • a constant data rate at the output of the coder.

36
Moving Image Coding (8)
  • A prediction error is calculated between a 16x16
    pixel region (macroblock)
  • and the (recovered) correspondent macroblock in
    the previous frame.
  • Prediction error of transmitted blocks (criteria
    of transmission is not
  • standardized) are then sent to the block
    transformation process.
  • Blocks are inter or intra coded
  • Intra-coded blocks stand alone
  • Inter-coded blocks are based on predicted
    error between the previous
  • frame and this one
  • Intra-coded frames must be sent with a minimum
    frequency to avoid loss
  • of synchronization of sender and receiver.

37
Moving Image Coding (9)
Quantization Entropy Coding The purpose of
this step is to achieve further compression by
representing the DCT coefficients with no
greater precision than is necessary to achieve
the required quality. The number of quantizers
are 1 for the INTRA dc coefficients and 31 for
all others. Entropy coding involves extra
compression (non-lossy) is done by assigning
shorter code-words to frequent events and longer
code-words to less frequent events. Huffman
coding is usually used to implement this step.
In other words, for a given quality, we can
lose coefficients of the transform by using less
bits than would be needed for all the values This
leads to a "coarser" picture. We can then
entropy code the final set of values by using
shorter words for the most common values and
longer ones for rarer ones (like using 8 bits
for three letter words in English)
38
Moving Image Coding (11)
H.263 H.263 is a new addition to the ITU H
series and is aimed at extending the repertoire
to Video Coding for Low Bit Rate Communication.
This makes it suitable to a wide variety of
Internet access line speeds, and therefore also
probably reasonably friendly to many Internet
Service Providers backbone speeds. Existing
A/V Standards and Video and the basic technology
of CCD camera and of Television and general CRT
dictates frame grabbing at some particular
resolution and rate. The choice of resolution is
complex. One could have fixed number of pixels,
and aspect ratio, or allow a range of choice of
line rate and samples rates. H.261 and MPEG
choose the latter.
39
Moving Image Coding (12)
The line rate (a.k.a. Picture Clock Frequency -
PCF) is 30,000/1001 or about 29.97Hz but one can
also use multiples of this. The chosen
resolution for H.263 is dxdy luminance and
chrominanace is just one half this in both
dimensions. H.263 then allows for sub-QCIF which
is 12896 pixels, QCIF - 176144 pixels, CIF -
352288 pixels, 4CIF (SCIF in the INRIA Ivs
tool) 704576 pixels and 16CIF 14081152 pixels.
The designer can also choose a pixel aspect
ration the default is 288/3 352/4 which is
1211 (as per H.261). The picture area covered by
standard formats has aspect ratio of 43.
Luminance and chromnance sample positions as
per H.261, discussed earlier in this chapter.
The structure of the coder is just the same too,
although there are now two additional modes
called the slice'' and picture block''
modes.
40
Moving Image Coding (13)
A block is 1616 Y and 88 Cb and Cr each The
Group of Block, or GOB refers to k16 lines
GOBS are numbered using a vertical scan starting
with 0 to k, depending on the number of lines in
Picture. e.g. normally, when lines lt 400, where
k is 1. The number of GOBS per picture then is 6
for subQCIF, 9 for QCIF, 18 for CIF (and for
4CIF and 16CIF because of special rules).
Prediction works on Intra, inter, B, PB, EI or EP
(the reference picture is smaller). The
Macroblock is 16 lines of Y, and the
corresponding 8 each of Cb and Cr Motion vectors
of which we can receive 1 per Macroblock. H.263
extends H.261 over lower bit rate (not just
p64kbps design goal) and more features for
better quality and services, but the basic ideas
same. Intra and Inter frame compression
DCT block transform plus quantization
41
Moving Image Coding (14)
There are then a number of basic enhancements in
H.263 including 1. Continuous Presence
Multi-point and Video Multiplex mode - basically
4 in 1 sub-bit-stream transmission. This may
be useful for conferences, tele-presence,
surveillance and so on 2. Motion Vectors can
point outside picture 3. Arithmetic as well as
variable length coding (VLC) 4. Advanced
Prediction Mode which is also known as
Overlapped Block Motion Compensation'
uses 4 88 blocks instead of 1 1616, This gives
better detail. 5. PB Frames known as
combined Predictive and Bi-Directional frames
(like MPEG II). 6. FEC to help with
transmission loss Advanced Intra coding to help
with interpolation Deblocking Filter mode,
to remove blocking artifacts
42
Moving Image Coding (15)
7. Slice Structured Mode (re-order blocks so
Slice layer instead of GOB layer is more
delay and loss tolerant for packet transport 8.
Supplemental Enhancement Information,
Freeze/Freeze Release and Enhancement and
Chroma Key (use external picture as
merge/background etc...for mixing). 9.
Improved PB mode, including 2 way motion vectors
in PB mode 10. Reference Picture Selection 11.
Temporal, SNR and Spatial Scalability mode this
allows receivers to drop B frames for
example - gives potential heterogeneity amongst
receivers of multicast. 12. Reduced
Resolution Update Independent Segment decoding
Alternate INTER VLC mode 13. Modified
Quantization mode (can adjust up or down the
amount of quantization to give fine
quality/bit-rate control.
43
Moving Image Coding (16)
Chroma Keying is a commonly used technology in
TV, e.g. for picture in picture/superimpose etc,
for weather people and so on. The idea is to
define some pixels in an image as
transparent'' or semi-transparent'' and
instead of showing these, a reference,
background image is used (c.f. transparent GIFs
in WWW). We need an octet per pixel to define the
keying color for Y, Cb and Cr, each. The actual
choice when there isn't an exact match is
implementor defined.
44
Moving Image Coding (17)
  • MPEG
  • The aim of the MPEG-II video compression standard
    is to cater for the
  • growing need of generic coding methods for moving
    images for various
  • applications such as digital storage and
    communication. So unlike the
  • H.261 standard who was specifically designed for
    the compression of
  • moving images for video conferencing systems at p
    64Kbps , MPEG
  • is considering a wider scope of applications.
  • Aimed at storage as well as transmission
  • Higher cost and quality than H.261
  • Higher minimum bandwidth
  • Decoder is just about implementable in
    software
  • Target 2Mbps to 8Mbps really.
  • The "CD" of Video?

45
Moving Image Coding (18)
MPEG Source Images format The source pictures
consist of three rectangular matrices of
integers a luminance matrix (Y) and two
chrominance matrices (Cb and Cr). The MPEG
supports three formats 420 format - In this
format the Cb and Cr matrices shall be one half
the size of the Y matrix in both horizontal and
vertical dimensions. 422 format - In this
format the Cb and Cr matrices shall be one half
the size of the Y matrix in horizontal
dimension and the same size in the vertical
dimension. 444 format - In this format the Cb
and Cr matrices will be of the same size as the
Y matrix in both vertical and horizontal
dimensions. It may be hard to convert to this,
but then this is targeted at digital video tape
and video on demand really.
46
Moving Image Coding (19)
MPEG frames The output of the decoding process,
for interlaced sequences, consists of a series
of fields that are separated in time by a field
period. The two fields of a frame may be coded
independently (field-pictures) or can be coded
together as a frame (frame pictures). An MPEG
source encoder will consist of the following
elements - Prediction (3 frame times)
- Block Transformation - Quantization and
Variable Length Encoding The diagram in the
following shows the intra, predictive and
bi-directional frames that MPEG supports
47
Moving Image Coding (20)
MPEG GOP structure
Forward prediction
1
3
4
5
6
7
8
9
2
Bidirectional prediction
48
Moving Image Coding (21)
Structure of MPEG bitstream
Sequence layer
GOP layer
Picture layer
Slice layer
Macroblock layer
Block layer
49
Moving Image Coding (22)
  • MPEG Prediction
  • MPEG defines three types of pictures
  • Intrapictures (I-pictures)
  • These pictures are encoded only with respect
    to themselves. Here each
  • picture is composed onto blocks of 8x8 pixels
    each that are encoded only
  • with respect to themselves and are sent
    directly to the block transformation
  • process.
  • Predictive pictures (P-pictures)
  • These are pictures encoded using motion
    compensated prediction from a
  • past I-picture or P-picture. A prediction
    error is calculated between a
  • 16x16 pixels region (macroblock) in the
    current picture and the past
  • reference I or P picture.

50
Moving Image Coding (23)
A motion vector is also calculated to determine
the value and direction of the prediction. For
progressive sequences and interlaced sequences
with frame-coding only one motion vector will be
calculated for the P-pictures. For interlace
sequences with field-coding two motion vectors
will be calculated. The prediction error is then
composed to 8x8 pixels blocks and sent to the
block transformation Bi-directional pictures
(B-pictures) These are pictures encoded using
motion compensates predictions from a past
and/or future I-picture or P-picture. A
prediction error is calculated between a 16x16
pixels region in the current picture and the past
as well as future reference I-picture or
P-picture. Two motion vectors are calculated.
One to determine the value and direction of the
forward prediction the other to determine the
value and direction of the backward prediction.
51
Moving Image Coding (24)
For field-coding pictures in interlaced sequences
four motion vectors will thus be calculated.
It should be noted that a B-picture can never
be used as a prediction picture. The method of
calculating the motion vectors as well as the
search area for the best predictor is left to be
determined by the encoder. MPEG Block
Transformation In block transformation, INTRA
coded blocks as well as prediction errors are
processed by a two-dimensional DCT function.
Quantization The purpose of this step is to
achieve further compression by representing the
DCT coefficients with no greater precision than
is necessary to achieve the required quality.
52
Moving Image Coding (25)
Variable length encoding Here extra compression
(non-lossy) is done by assigning shorter
code-words to frequent events and longer
code-words to less frequent events. Huffman
coding is usually used to implement this step.
MPEG Picture Order It must be noted that in
MPEG the order of the picture in the coded stream
is the order in which the decoder process them.
The reconstructed frames are not necessarily in
the correct form of display.
53
Moving Image Coding (26)
Multiplexing and Synchronizing In networked
multimedia standards, the multiplexing function
defines the way that multiple streams of
different or the same media of data are carried
from source to sink over a channel. There are at
least three completely different points in this
path where we can perform this function we can
design a multi-media codec which mixes together
the digital coded (and possibly compressed)
streams as it generates them - possibly
interleaving media at a bit by bit level of
granularity we can design a multiplexing layer
that mixes together the different media as it
packetizes them, possibly interleaving samples
of different media in the same packets or we can
let the network do the multiplexing, packetizing
different media streams completely separately.
54
Moving Image Coding (27)
MPEG-1 Video MPEG-1 consists of several parts
System, Video and Audio etc. Beyond simple
playback, the MPEG-1 system is responsible for
multiplexing and synchronization. MPEG-1 video
distinguishes between four different coding types
for Images I frames, P frames, B frames and D
frames. I frames (intracoded frames) are coded
without reference to other images. MPEG makes use
of JPEG for I frames. The compression rate for I
frames is the lowest for all defined coding
types. P frames (predictively coded frames) need
information from the previous I and/or P frame
for encoding and decoding. The achievable
compression is higher than that for I frames.
55
Moving Image Coding (28)
B frames (bidirectionally predictively coded
frames) require information from the previous and
following I and/or P frames for encoding and
decoding. The highest compression ration can be
achieved, also a bidirectional
motion-compensation prediction can be used. D
frames (DC coded frame) are encoded intraframe,
whereby the AC coefficient are neglected. D
frames can never be used with the other picture
types. Reference frames must be transmitted
first. The transmission order and the display
order may differ. At the beginning, there is
always an I frame. The first I frame and the
first P frame is also the reference for the
first two B frames. The first I frame is also the
reference of the P frame. Thus the I frame must
be transmitted first, followed next by the P
frame and then the B frame.
56
Moving Image Coding (29)
Display and Transmission Order in MPEG-1 Video
Display order
I B B P B B I
Transmission order I P
B B I B B The second I frame
must be transmitted since it serves as reference
for The second pair of B frames.
57
Moving Image Coding (30)
Video Sequence
Group of Pictures
?
???
???
Block
Picture
Macro block
Slice
8 Pixel
?
8 Pixel
58
Moving Image Coding (31)
MPEG-1 Constrained Parameter Set
Parameter Restrictions Horizont
al resolution ? 768 pixel Vertical
resolution ? 576 lines Macro
blocks/s ? 25 macro
blocks/s Frames/s ?
30 Hz Motion vector range ? (-64/ 63,
5) pixel Input buffer size ?
327.680 bit Bit rate
? 1 856 Mbps MPEG-1 video uses the same
image format as H.261 but allows a greater Choice
of image size.
59
Moving Image Coding (32)
  • MPEG-2 Video
  • MPEG-2 video conforms with MPEG-1. It allows
    data rates up to 100
  • Mbps, also it supports interlaced video formats
    as well as HDTV.
  • MPEG-2 can be used for the digital transmission
    of video over satellite,
  • cable, and other broadcast channels.
  • MPEG-2 builds upon the completed MPEG-1 standard
    and was
  • cooperatively developed by ISO/IEC and ITU
    (H.262).
  • MPEG-2 video was defined in terms of extensible
    profiles, each of which
  • supports the features needed by an important
    class of applications.
  • Initially, MPEG-3 was intended to support HDTV
    applications. During
  • development, MPEG-2 video proved adequate when
    scaled up to meet
  • HDTV requirements. As a result, MPEG-3 was
    dropped.

60
Moving Image Coding (33)
MPEG-2 Video Profiles and Levels
61
Moving Image Coding (34)
MPEG-4 Video MPEG-4 Video supports low-bit-rate
applications. ISO expert group Developing the
MPEG 4 standard has decided to stop the
development of A new video coding method for low
bit rates. Instead, they focus on Providing
enhanced functionality based on existing
compression methods. For example, the coding of
audio-visual objects. It encodes objects with Any
shape in a video scene. Instead of each image in
the video clip coded As a whole, the stationary
background and the tennis player in the
Foreground can be coded independently with
different methods or Parameter sets. On the
audio side, audio objects are identified and
coded Depending on their contents. One of the
existing video coding methods under study for
MPEG-4 is H.263.
62
Moving Image Coding (35)
  • H.263
  • The ITU-T Recommendation H.263 defines a codec
    for the compression
  • of moving picture component of audio-visual
    services at low bit rates.
  • A typical application is the transmission of
    video over a V.34 modem
  • connection using 20 kbps for video and 6.5 kbps
    for audio.
  • H.263 is based on H.261, but it supports 5 image
    formats, it has refined
  • motion-compensation, and the standard supports
    B frames.
  • B frames in H.263 have only P frames as a
    reference.
  • The up-to-date video compression methods used in
    H.263 are
  • - Wavelet Image Compression
  • - Fractal Image Compression

63
Moving Image Coding (36)
The approaches have different performance
benefits and costs, and all three approaches are
in use for Internet Multimedia. Some of the costs
are what engineers call non-functional'' ones,
which derive from business cases of the
organizations defining the schemes. There are a
lot of players (stakeholders'') in the
multimedia market place. Many of them have
devised their own system architectures - none the
least of these are the ITU, ISO, DAVIC and the
IETF. The ITU has largely been concerned with
video telephony, whilst DAVIC has concerned
itself with Digital Broadcast technology, and the
IETF has slowly added multimedia (store and
forward and real time) to its repertoire.
64
Moving Image Coding (37)
Each group has its own mechanism or family of
mechanisms for identifying media in a stream or
on a store, and for multiplexing over a stream.
The design criteria were different in each case,
as were the target networks and underlying
infrastructure. This has led to some confusion
which will probably persist for a few years now.
Here we look at the 4 major players and their
three major architectures for a multimedia
stream. Two earlier attempts to make sense out of
this jungle were brave goals of Applet and
Microsoft, and we briefly discuss their earlier
attempts to unravel this puzzle - Microsoft have
made recent changes to their architecture at many
levels and this is discussed in their product
specifications and we will not cover it here.
65
Moving Image Coding (38)
To cut to the chase, the ITU defines a bit level
interleave or multiplex appropriate to low cost,
low latency terminals and a bit pipe model of the
network, while ISO MPEG group defines a CODEC
level interleave appropriate to digital
multimedia devices with high quality, but
possibly higher cost terminals (it is hard to
leave out a function) finally, the DAVIC and
Internet communities define the multiplexer to be
the network, although DAVIC assume an ATM network
whereas the Internet community obviously assume
an IP network as the fundamental layer.
66
Moving Image Coding (39)
The Internet community try to make use of
anything that its possible to use, so that if an
ITU or DAVIC or ISO CODEC is available on an
Internet capable host, someone, somewhere will
sometime devise a way to packetize its output
into IP datagrams. The problem with this is that
it means that for the non-purist approaches of
separate media in separate packets, there are
potentially then several layers of the technical
multiplexing. In a classic paper, David
Tennenhouse describes reasons why this is a very
bad architecture for communicating software
systems. Note that this is not a critique of the
ISO MPEG, DAVIC or ITU H.320 architectures they
are beautiful pieces of design fit for a
particular purpose it is merely an observation
that it is better to unpick their multiplex in
an Internet based system. It certainly leads to
more choice for where to carry out other
functions (e.g. mixing, re- synchronization,
trans-coding, etc etc).
67
Digital Signal Processing (1)
Analog to Digital Conversion Sampling An
input signal is converted from some continuously
varying physical value. This continuously
varying electrical signal can then be converted
to a sequence of digital values, called
samples, by some analog to digital conversion
circuit.
68
Digital Signal Processing (2)
There are two factors which determine the
accuracy with which the digital sequence of
values captures the original continuous signal
the maximum rate at which we sample, and the
number of bits used in each sample. This latter
value is known as the quantization level.
69
Digital Signal Processing (3)
  • The raw (uncompressed) digital data rate
    associated with a signal then is
  • simply the sample rate times the number of bits
    per sample.
  • To capture all possible frequencies in the
    original signal, Nyquist's
  • theorem shows that the digital rate must be
    twice the highest frequency
  • component in the continuous signal.
  • It is often not necessary to capture all
    frequencies in the original signal
  • for example, voice is comprehensible with a
    much smaller range of
  • frequencies than we can actually hear.
  • When the sample rate is much lower than the
    highest frequency in the
  • continuous signal, a band-pass filter which
    only allows frequencies in
  • the range actually needed, is usually put
    before the sampling circuit.
  • This avoids possible ambiguous samples
    (aliases'').

70
Audio Coding (1)
Audio Input and Output Audio signals vary
depending on the application. Human speech has a
well understood spectrum, and set of
characteristics, whereas musical input is much
more varied, and the human ear and perception and
cognition systems behave rather differently in
each case. For example, when a speech signal
degrades badly, humans make use of comprehension
to interpolate. Basically, for speech, the
analog signal from a microphone is passed through
several stages. Firstly a band pass filter is
applied eliminating frequencies in the signal
that we are not interested in (e.g. for telephone
quality speech, above 3.6Khz). Then the signal
is sampled, converting the analog signal into a
sequence of values, each of which represents the
amplitude of the analogue signal over a small
discrete time interval. This is then quantized,
or mapped into one of a set of fixed values
These values are then coded for transmission.
The process at the receiver is simply the reverse.
71
Audio Coding (2)
  • Audio compression methods differ in the
    trade-offs between
  • Encoder and decoder complexity,
  • Quality of the compressed audio, and
  • Amount of data.
  • A basic audio compression technique employed in
    digital telephony is based
  • on a logarithmic transformation,
  • A-law transformation maps from 13 bits linearly
    quantized PCM values
  • to 8 bits, commonly used in Europe
  • ?-law transformation maps from 14 bits to 8
    bits, used in North America
  • and Japan
  • Both specifications are covered in ITU
    recommendation G.711.

72
Audio Coding (3)
  • Adaptive Differential Pulse Code Modulation
    (ADPCM)
  • ADPCM overcomes the disadvantages of DPCM.
  • It is a lossy method that codes differences
    between PCM-coded audio
  • signals using only a small number of bits.
  • It can change the step size of the quantizer,
    the predictor, and adapt to the
  • characteristics of the signal.
  • It is able to code either the high- or the
    low-frequency portion of a signal
  • exactly, and always operates in one of these
    two modes.
  • It reduces the data rate of high-quality audio
    from 1.4 Mbps to 32 kbps.
  • The ADPCM standard is covered by ITU G.721.

73
Audio Coding (4)
  • MPEG-1 Audio
  • MPEG-1 Audio compression is lossy, but it can
    achieve transparent,
  • perceptually lossless compression.
  • The algorithm exploits perceptual limitations of
    the human hearing
  • threshold and auditory masking to determine
    which part of an audio
  • signal is acoustically irrelevant and can be
    removed in the compression.

80 60 40 20 0
Hearing threshold For the human ear
Amplitude(dB)
Hearing threshold
0.02 0.05 0.1 0.2 0.5 1
2 5 10 20


Frequency(kHz)
74
Audio Coding (5)
Auditory masking Auditory masking is a perceptual
weakness of the ear that occurs whenever the
presence of a strong audio signal makes a
spectral neighborhood of weaker audio signals
imperceptible. The threshold for noise masking
at any given frequency is solely dependent on the
signal activity within a critical band of that
frequency.
Strong tonal signal
Amplitude
Region where weaker Signals are masked
Frequency
75
Audio Coding (6)
  • MPEG-1 Audio defines three layers.
  • Compression techniques for each layer are
    similar, but coder complexity
  • increases with each layer.
  • Each layer uses a separate but a related way of
    compressing audio,
  • Each layer's decoder must decode audio from any
    layer below it. For
  • example, a Layer III decoder must also decode
    Layer II and Layer I audio,
  • while a Layer II decoder must decode Layer I
    audio, but not Layer III.
  • The input audio stream passes simultaneously
    through a filter bank and
  • through a psychoacoustic model.
  • The filter bank divides the input into multiple
    subbands,
  • The psychoacoustic model determines the
    signal-to-mask ratio of each
  • subband.

76
Audio Coding (7)
MPEG-1 Audio Encoder
Encoded Bit Stream
PCM Audio Input
Time-to- Frequency Mapping Filter Bank
Bit/Noise Allocation, Quantizer and Coding
Bit-Stream Formating
Psycho- acoustic Model
MPEG-1 Audio Decoder
Decoded PCM Audio
Encoded Bit Stream
Bit-stream Unpacking
Frequency Sample Re- construction
Frequency- to-Time Mapping
77
Audio Coding (8)
  • MPEG-2 Audio Standard extends the functionality
    of MPEG-1 by
  • multichannel coding with up to five channels
    (left, right, center, and two
  • surround channels), plus an additional
    low-frequency enhancement
  • channel, and/or up to seven commentary/multiling
    ual channels.
  • It extends stero and mono coding of the MPEG-1
    Audio standard by
  • further sampling rates.
  • MPEG-2 Audio is backward compatible with MPEG-1
    Audio.
  • An MPEG-2-Audio decoder can process any MPEG-1
    Audio bit stream,
  • also an MPEG-1-Audio decoder can read and
    process the stero information
  • of an MPEG-2-Audio bit stream.
  • For more information regarding MPEG, search
    engines such as
  • Altavista, Lycos, Yahoo! can be checked.

78
Audio Coding (9)
  • Codebook Excited Linear Predictive Coding (CELP)
  • The main problem with vocoders is the simplistic
    model of the excitation
  • used. One method of circumventing this problem
    is Codebook Excited
  • Linear Prediction (CELP).
  • In the CELP coder the speech is passed through
    the cascade of the vocal
  • tract predictor and the pitch predictor. The
    output of this predictor is a
  • good approximation to Gaussian noise. This
    noise sequence has to be
  • quantized and transmitted to the receiver.
  • Multi-pulse coders quantize it using a series of
    weighted impulses.
  • CELP coders use vector quantization. The index
    of the codeword that
  • produces the best quality speech is
    transmitted along with a gain term for it.
  • The codebook search is carried out using an
    analysis-by-synthesis technique,
  • The speech is synthesized for every entry in the
    codebook.
  • The codeword that produces the lowest error is
    chosen as the excitation.
  • The error measure used is perceptually weighted
    so the chosen codeword
  • produces the speech that sounds the best.

79
Audio Coding (10)
80
Audio Coding (11)
Summary of Audio and Video Input and Output
Audio and Video are loss tolerant, so can use
cleverer compression that discards some
information. Compression of 400 times is possible
on video A lot of standards for this now
including schemes based on PCM, such as ADPCM,
or on models such as LPC, and MPEG Audio. Note
that lossy compression of audio and video is not
acceptable to some classes of user (e.g.
radiologist, or air traffic controller). It is
sometimes said that the eye integrates while
the ear differentiates''. What is meant by this
is that the eye responds to stronger signals or
higher frequencies with cumulative reaction,
while the ear responds less and less (i..e to
double the pitch, you have to double the
frequency - so we hear a logarithmic scale as
linear, and to double the loudness, you have to
increase the power exponentially too).
81
References
  • F. Kuo et. al, Multimedia Communications
    protocols and Applications, 1998
  • Prentice Hall
  • M. Riley and L. Richardson, Digital Video
    Communications, 1997 Artech House
  • http//dnausers.d-n-a.net/dnetzNRo/mp3info.htm
  • http//www.cas.mcmaster.ca/malcolm/cs4cb3/node28.
    html
  • http//www.cs.ucl.ac.uk/staff/jon/mmbook/book/book
    .html
  • ITU-T. Video Codec for Audiovisual Services at 64
    kbps. Recommendation H.261,
  • 1993
  • ITU-T. Generic Coding of Moving Pictures and
    Associated Audio Recommendation
  • H.262, 1994
  • 8 ITU-T. Video Coding for low bit rate
    communications. Recommendation H.263, 1995
About PowerShow.com