Title: MPEG4 AVC (H.264)
1MPEG4 AVC (H.264)
2Introduction
 The H.264 is aimed at very low bit rate,
realtime, low endtoend delay, and mobile
applications such as conversational services and
internet video.  Enhanced visual quality at very low bit rates and
particularly at rate below 24kb/s.
3Structure of H.264/AVC Video Coder
 VCL Designed to efficiently represent the video
content  NAL formats the VCL representation of the video
and provides head information for conveyance by a
variety of transport layers or storage media.
4(No Transcript)
5Video Coding Layer
6Basic Structure of VCL
Input Video Signal Split into Macroblocks 16x1
6 pixels
Coder Control
Control Data
Transform/Scal./Quant.
Quant.Transf. coeffs

Decoder
Scaling Inv. Transform
Entropy Coding
Deblocking Filter
Intraframe Prediction
Output Video Signal
Motion Compensation
Intra/Inter
Motion Data
Motion Estimation
7Intraframe Prediction
Input Video Signal Split into Macroblocks 16x1
6 pixels
Coder Control
Control Data
Transform/Scal./Quant.
Quant.Transf. coeffs

Decoder
Scaling Inv. Transform
Entropy Coding
Deblocking Filter
Intraframe Prediction
Output Video Signal
Motion Compensation
Intra/Inter
Motion Data
Motion Estimation
8 Intraframe encoding of H.264 supports Intra_4
?4, Intra_16 ?16 and I_PCM.  I_PCM allows the encoder directly send the values
of encoded sample.  Intra_4 ?4 and Intra_16 ?16 allows the intra
prediction.
9 Intra 4?4
 9 modes
 Used in texture area
 Intra 16?16
 4 modes
 Used in flat area
10 Four modes of Intra_16?16
 Mode 0 (vertical) extrapolation from upper
samples(H)  Mode 1 (horizontal) extrapolation from left
samples(V)  Mode 2 (DC) mean of upper and lefthand samples
(HV)  Mode 3 (Plane) a linear plane function is
fitted to the upper and lefthand samples H and
V. This works well in areas of smoothlyvarying
luminance
11Example
Original image
12 Nine modes of Intra_4?4
 The prediction block P is calculated based on the
samples labeled AM.  The encoder may select the prediction mode for
each block that minimizes the residual between P
and the block to be encoded
13Example
Consider a 4?4 block and its neighbors
labeled below.
Suppose we use the mode 4 for prediction. Then
a (A 2M I 2)/4
14Example
15(No Transcript)
16Motion Estimation/Compensation
Input Video Signal Split into Macroblocks 16x1
6 pixels
Coder Control
Control Data
Transform/Scal./Quant.
Quant.Transf. coeffs

Decoder
Scaling Inv. Transform
Entropy Coding
Deblocking Filter
Intraframe Prediction
Output Video Signal
Motion Compensation
Intra/Inter
Motion Data
Motion Estimation
17 Features of the H.264 motion estimation
 Various block sizes
 ¼ sample accuracy
 6tap filtering to ½ sample accuracy
 simplified filtering to ¼ sample accuracy
 Multiple reference pictures
 Generalized BFrames
18 Variable Block Size BlockMatching
 In the H.264, a video frame is first splitted
using fixed size macroblocks.  Each macroblock may then be segmented into
subblocks with different block sizes.  A macroblock has a dimension of 16 ? 16 pixels.
The size of the smallest subblock is 4 ? 4
19Example
This example shows the effectiveness of block
matching operations with smaller sizes.
Frame 1
20Frame 2
21Difference between Frame 1 and Frame 2
22Results of blockmatching operation with size
1616
23Results of blockmatching operation with size 88
24Results of blockmatching operation with size 44
25To use a subblock with size less than 8?8, it is
necessary to first split the macroblock into
four 8?8 subblocks.
26Example
27Encoding a motion vector for each subblock can
cost a significant number of bits, especially if
small block sizes are chosen. Motion vectors
for neighboring subblocks are often highly
correlated and so each motion vector is
predicted from vectors of nearby, previously
coded subblocks. The difference between the
motion vector of the current block and its
prediction is encoded and transmitted.
28The method of forming the prediction depends on
the block size and on the availability of nearby
vectors. Let E be the current block, let A be
the subblock immediately to the left of E, let
B be the subblock immediately above E, and let C
be the subblock above and to the right of E. It
is not necessary that A, B, C, and E have the
same size.
C C C C
D D B C C C C
A E E
E E
29 There are two modes for the prediction of
 motion vectors
 Median prediction
 Use for all block sizes excluding 168 and
816  Directional segmentation prediction
 Use for 168 and 816
30C C C C
D D B C C C C
A E E
E E
Median prediction If C not exist then CD If B, C
not exist then prediction VA If A, C not exist
then prediction VB If A, B not exist then
prediction VC Otherwise Prediction
median(VA,,VB,VC)
31 Directional segmentation prediction
 Vector block size 816
 Left prediction VA
 Right prediction VC
 Vector block size 168
 Up prediction VB
 Down prediction VA
32 Fractional Motion Estimation
In H.264, the motion vectors between current
block and candidate block has ¼pel resolution.
The samples at subpel positions do not exist
in the reference frame and so it is necessary
to create them using interpolation from nearby
image samples.
33Interpolation of ½pel samples.
bround((E5F20G20H5IJ)/32) hround((A5C20G
20M5RT)/32) jround((aa5bb20b20s5gghh)/32)
34Interpolation of ¼pel samples.
around((Gb)/2)
dround((Gh)/2)
eround((bh)/2)
35 Multiple Reference Frames
36(No Transcript)
37 The motion estimation techniques based on
multiple  reference frame technique provides
opportunities  for more precise interprediction, and also
improved  robustness to lost picture data.
 The drawback of multiple reference frames is that
 both the encoder and decoder have to store the
 reference frames used for Interframe
prediction in  a multiframe buffer.
38Mobile Calendar (CIF, 30 fps)
38
37
36
35
34
33
32
PSNR Y dB
31
30
29
PBB... with generalized B pictures
28
PBB... with classic B pictures
PPP... with 5 previous references
27
PPP... with 1 previous reference
26
0
1
2
3
4
R Mbit/s
39Basic Bframes The basic Bframes cannot be
used as reference frames.
40Generalized Bframes The generalized Bframes
can be used as reference frames.
41Mobile Calendar (CIF, 30 fps)
38
37
36
35
34
33
32
PSNR Y dB
31
30
29
PBB... with generalized B pictures
28
PBB... with classic B pictures
PPP... with 5 previous references
27
PPP... with 1 previous reference
26
0
1
2
3
4
R Mbit/s
42Mobile Calendar (CIF, 30 fps)
38
37
36
35
34
33
32
PSNR Y dB
31
30
29
PBB... with generalized B pictures
28
PBB... with classic B pictures
PPP... with 5 previous references
27
PPP... with 1 previous reference
26
0
1
2
3
4
R Mbit/s
43Transformation/Quantization
Input Video Signal Split into Macroblocks 16x1
6 pixels
Coder Control
Control Data
Transform/Scal./Quant.
Quant.Transf. coeffs

Decoder
Scaling Inv. Transform
Entropy Coding
Deblocking Filter
Intraframe Prediction
Output Video Signal
Motion Compensation
Intra/Inter
Motion Data
Motion Estimation
44The Discrete Cosine transform (DCT) operates on
x, a block of NN samples and creates X, and
NN block of coefficients.
The forward DCT
The reverse DCT
45The elements of A are
where
That is,
46Example
The transform matrix A for a 44 DCT is
47That is,
or
where
48 The H.264 transform is based on the 44 DCT but
with  some fundamental differences
 It is an integer transfer,.
 The core part of the transform can be implemented
 using only additions and shifts.
 A scaling multiplication is integrated into the
 quantizer, reducing the total number of
 multiplications.
49Recall that
where
50Postscaling
(where d c/b)
1. We call (CxCT) the core 2D transform. 2. E is
a matrix of scaling factors. 3. ? indicates that
each element of (CxCT) is multiplies by the
scaling factor in the same position in matrix E
(i.e., ? is scalar multiplication rather than
matrix multiplication)
51To simplify the implementation of the transform,
d is approximated by 0.5. In order to ensure
that the transform remains orthogonal, b also
needs to be modified so that
52The final forward transform becomes
53The inverse transform is given by
PreScaling
54 H.264 assumes a scalar quantization.
 The quantization should satisfy the following
 requirements
 avoid division and/or floating point arithmetic
 incorporate the post and prescaling matrices
 Ef and Ei.
55The basic forward quantizer operation is
Z(u,v) round( X(u,v)/QStep )
where X(u,v) is a transform coefficient,
Z(u,v) is a quantized coefficient, and
QStep is a quantizer step size.
56There are 52 quantizers (i.e.,Quantization
Parameter (QP)051). Increase of 1 in QP means
an increase of QStep by approximately
12 Increase of 6 in QP means an increase of
QStep by a factor of 2.
57 The postscaling factor (PF) (i.e., a2 , ab/2 or
b2/4) is  incorporated into the forward quantizer in the
 following way
 The input block x is transformed to give a block
 of unscaled coefficients WCf xCfT.
 Then, each coefficient in W is quantized and
scaled in  a single operation
 where PF is a2 , ab/2 or b2/4 depending on the
position  (u,v).
Z(u,v) round( W(u,v)PF /QStep )
Why?
58In order to simplify the arithmetic, the factor
(PF/QStep) is implemented as a multiplication by
a factor MF and a right shift, avoiding any
division operations.
Z(u,v) round( W(u,v)MF /2qbits )
where
and
qbits15?QP/6?
59Note that the round operation does not have to
be the nearest integer operation. In the
reference model software, the round operation is
realized by
Z(u,v)(W(u,v)MFf)gtgtqbits sign(Z(u,v))sign(
W(u,v))
where f is 2qbits/3 for Intra blocks and 2qbits
/6 for Inter blocks.
60Example
Suppose QP4 and (u,v)(0,0). Therefore,
QStep1.0, PFa20.25, and qbits15.
From
We have MF8192
61The MF value for various QPs (QP ?5) are shown
below.
Table_for_MF
For QPgt5, the factors MF remain unchanged,
but qbits increases by 1 for each increment of
six in QP. That is, qbits16 for 6?QP ?11,
qbits17 for 12 ?QP ?17, and so on.
62The dequantized coefficient is given by
The inverse transform involving prescaling
operations proceeds in the following way 1.
The dequantized block is prescaled to
block for core 2D inverse
transform. 2. The reconstructed block
is then given by
63The prescaling factor (PF) (i.e., a2 , ab or
b2) is incorporated in the computation of
, together with a constant scaling factor of 64
to avoid rounding errors.
The values at the output of the inverse transform
should be divided by 64 to remove the constant
scaling factor.
64The H.264 standard does not specify QStep or PF
directly. Instead, the parameters VQStepPF64
is defined.
The V values for various QPs (QP ?5) are shown
below.
Table_for_V
65For QPgt5, the V value increases by a factor of 2
for each increment of six in QP.
That is,
where
66 The Complete Transformation, Quantization,
Rescaling and Inverse Transformation
 Encoding
 Input 44 block x
 Forward core transform WCf xCfT
 Postscaling and quantization
 Z(u,v) round( W(u,v)MF /2qbits )
 Decoding
 Prescaling
 Inverse core transform
 Rescaling
67Example
1. Suppose QP10, and input block x
5 11 8 10
9 8 4 12
1 10 11 4
19 6 15 7
2. Forward core transform W
140 1 6 7
19 39 7 92
22 17 8 31
27 32 59 21
683. MF8192,3355 or 5243, qbits16 and f is
2qbits/3. Z
17 0 1 0
1 2 0 5
3 1 1 2
2 1 5 1
4. V32, 50 or 40 because 2?QP/6? 2.
544 0 32 0
40 100 0 250
96 40 32 80
80 50 200 50
695. Output of the inverse core transform after
division by 64 is
4 13 8 10
8 8 4 12
1 10 10 3
18 5 14 7
70Entropy Coding
Input Video Signal Split into Macroblocks 16x1
6 pixels
Coder Control
Control Data
Transform/Scal./Quant.
Quant.Transf. coeffs

Decoder
Scaling Inv. Transform
Entropy Coding
Deblocking Filter
Intraframe Prediction
Output Video Signal
Motion Compensation
Intra/Inter
Motion Data
Motion Estimation
71 Here we present two basic variable length coding
 (VLC) techniques used by H.264 the ExpGolomb
 code and context adaptive VLC (CAVLC).
 ExpGolomb code is used universally for all
symbols  except for transform coefficients.
 CAVLC is used for coding of transform
coefficients.  No endofblock, but number of coefficients is
 decoded.
 Coefficients are scanned backward.
 Contexts are built dependent on transform
 coefficients.
72ExpGolomb codes are variable length codes with
a regular construction.
First 9 codewords of ExpGolomb codes
73Each codeword of ExpGolomb codes is constructed
as follows M zeros1INFO where INFO is
an Mbit field carrying information. Therefore,
the length of a codeword is 2M1.
74 Given a code_num, the corresponding ExpGolomb
 codeword can be obtained by the following
procedure  M ?log2code_num1)?
 INFOcode_num12M
Example code_num6 M?log261)?2 INFO6122
3 The corresponding ExpGolomb codeword M
zeros1INFO00111
75 Given a ExpGolomb codeword, its code_num can be
 found as follows
 Read in M leading zeros followed by 1.
 Read Mbit INFO field
 code_num2MINFO1
 Example
 ExpGolomb codeword00111
 M2
 INFO3
 code_num22316
76A parameter v to be encoded is mapped to
code_num in one of 3 ways ue(v) Unsigned
direct mapping, code_numv. (Mainly
used for macroblock type and
reference frame index) se(v) Signed mapping. v
is mapped to code_num as follows.
code_num2v, (v?0)
code_num2v1,(vgt0) (Mainly used
for motion vector difference and
delta QP)
77me(v) Mapped symbols. Parameter v is
mapped to code_num according to a
table specified in the standard.
This mapping is used for coded_block_pattern
parameters. An example of such a mapping
is shown below.
78This is the method used to encode residual and
zigzag ordered blocks of transform coefficients.
79 The CAVLC is designed to take advantage of
several  characteristics of quantized 44 blocks
 After prediction, transformation and
quantization,  blocks are typically sparse (containing
mostly zeros).  The highest nonzero coefficients after the
zig/zag  are often sequences of / 1.
 The number of nonzero coefficients in
neighboring  blocks is correlated.
 The level (magnitude) of nonzero coefficients
tends  to be higher at the start of the zigzag
scan, and  lower towards the high frequencies.
80The procedure described below is based on the
document entitled JVT Document JVTC028, Gisle
Bjøntegaard and Karl Lillevold,
Contextadaptive VLC (CVLC) coding of
coefficients, Fairfax, VA, May 2002.
The H.264 CAVLC is an extension of this work.
81 The CAVLC encoding of a block of transform
 coefficients proceeds as follows.
 Encode the number of coefficients and trailing
ones.  Encode the sign of each trailing ones.
 Encode the levels of the remaining nozero
coefficients.  Encode the total number of zeros before the last
 coefficients.
 Encode each run of zeros.
82 Encode the number of coefficients and trailing
ones
The first step is to encode the number of
coefficients (NumCoef) and trailling ones
(T1s). NumCoef can be anything from 0 (no
coefficient in the block) to 16 (16 nonzero
coefficients). T1s can be anything from 0 to
3. If there are more than 3 trailing / 1s,
only the last 3 are treated as special cases
and the others are coded as normal coefficients.
83Example Consider the 44 block shown below
2 4 0 1
3 0 0 0
0 0 1 0
1 1 0 0
The NumCoef7, and T1s3
84Three tables can be used for the encoding of
Num_Coeff and T1 NumVLC0, NumVLC1 and
NumVLC2.
NumVLC0
85The selection of tables depends on the number of
nonzero coefficients in upper and lefthand
previously coded blocks NU and NL. A parameter N
is calculate as follows If blocks U and L are
available (i.e., in the same coded slice),
N(NUNL)/2 If only block U is available,
NNU. If only block L is available, N NL. If
neither is available, N0.
86The selection of table is based on N in the
following way
N Selected Table
0,1 NumVLC0
2,3 NumVLC1
4,5,6,7 NumVLC2
8 or above FLC
The FLC is of the following form xxxxyy (i.e.,
6 bits) where xxxx and yy represent Num_Coeff
and T1, respectively.
87 Encode the sign of each trailing ones
For each T1, a single bit encodes the sign
(0,1). These are encoded in reverse order,
starting with the highest frequency T1.
88 Encode the levels of the remaining nozero
coefficients
The level (sign and magnitude) of each remaining
nonzero coefficient in the block is encoded in
reverse order. There are 5 VLC tables to choose
from, Lev_VLC0 to Lev_VLC4. Lev_VLC0 is biased
towards lower magnitudes Lev_VLC1 is biased
towards slightly higher magnitudes, and so on.
89(No Transcript)
90This is used only when it is impossible for a
coefficient to have values / 1. It will happen
when T1slt3.
91(No Transcript)
92To improve coding efficiency, the tables are
changed along with the coding process based on
the following procedure.
93 Encode the total number of zeros before the last
 coefficient
The following shows the table for encoding the
total number of zeros before the last
coefficient (TotZeros)
94At this stage it is known how many zeros are left
to distribute (call this ZerosLeft). When
encoding or decoding a nonzero coefficient for
the first time, ZerosLeft begins at TotZeros,
and decreases as more nonzero coefficients are
encoded or decoded. The number of preceding
zeros before each nonzero coefficient (called
RunBefore) needs to be coded to properly locate
that coefficient. Before coding the next
RunBefore, ZerosLeft is updated and used to
select one out of 7 tables.
95zeroleft
Why the maximum number is 14?
96Example
Consider the following interframe residual 44
block
0 3 1 0
0 1 1 0
1 0 0 0
0 0 0 0
The zigzag reordering of the block is shown
below 0,3,0,1,1,1,0,1,0,0,0,0,0,0,0,0 Therefor
e, NumCoeff5, TotZero3, T1s3 Assume N0
97Encoding
Value Code Comments
NumCoeff5, T1s3 0001011 Use NumVLC0
sign of T1 (1) 0 Starting at highest frequency
sign of T1(1) 1
sign of T1(1) 1
Level 1 1 Use LevVLC0
Level 3 0010 Use LevVLC1
TotZeros3 1110 Also depends on NumCoeff
ZerosLeft3RunBefore1 00 RunBefore of the 1st Coeff
ZerosLeft2RunBefore0 1 RunBefore of the 2nd Coeff
ZerosLeft2RunBefore0 1 RunBefore of the 3rd Coeff
ZerosLeft2RunBefore1 01 RunBefore of the 4th Coeff
ZerosLeft1RunBefore1 No code required last coeff
The transmitted bitstream for this block is
0001011011100101110001101
98Decoding
Code Value Output Array Comments
0001011 NumCoeff5, T1s3 Empty
0 1 T1 sign
1  1,1 T1 sign
1  1,1,1 T1 sign
1 1 1,1,1,1 level value
0010 3 3,1,1,1,1 level value
1110 TotZeros3 3,1,1,1,1
00 RunBefore1 3,1,1,1,0,1 RunBefore of the 1st Coeff
1 RunBefore0 3,1,1,1,0,1 RunBefore of the 2nd Coeff
1 RunBefore0 3,1,1,1,0,1 RunBefore of the 3rd Coeff
01 RunBefore1 3,0,1,1,1,0,1 RunBefore of the 4th Coeff
0,3,0,1,1,1,0,1 ZeroLeft1
99Deblock Filter
Input Video Signal Split into Macroblocks 16x1
6 pixels
Coder Control
Control Data
Transform/Scal./Quant.
Quant.Transf. coeffs

Decoder
Scaling Inv. Transform
Entropy Coding
Deblocking Filter
Intraframe Prediction
Output Video Signal
Motion Compensation
Intra/Inter
Motion Data
Motion Estimation
100The beblocking filter improves subjective visual
quality. The filter is highly context adaptive.
It operates on the boundary of 44 subblock as
shown below.
q3
q2
q1
q0
p0
p1
p2
p3
q3 q2 q1 q0 p0 p1 p2 p3
101The choice of filtering outcome depends on the
boundary strength and on the gradient of image
samples across the boundary. The boundary
strength parameter Bs is selected according to
the following rules.
102 A group of samples from the set
(p2,p1,p0,q0,q1,q2)  is filtered only if
 (a) Bsgt0 and
 (b) p0q0 lt? and p1p0 lt? and q1q0 lt?
 where ? and ? are thresholds defined in the
standard.  The threshold values increase with the average
quantizer  parameter QP of two blocks q and p.
103When QP is small, anything other than a very
small gradient across the boundary is likely to
be due to image features that should be
preserved and so the thresholds ? and ? are
low. When QP is larger, blocking distortion is
likely to be more significant and ? and ? are
higher so that more boundary samples are
filtered.
104without deblock filtering
with deblock filtering
105Data Partitioning andNetwork Abstraction Layer
106A video picture is coded as one or more
slices. Each slice contains an integral number
of macroblocks from 1 to total number of
macroblocks in a picture. The number of
macroblocks per slice need not to be constant
within a picture.
107 There are five slice modes. Three commonly use
modes are  Islice A slice where all macroblocks of the
slice are coded using intra prediction.  Pslice In addition to the coding types of the
Islice, some macroblocks of the Pslice can be
coded using interprediction (predicted from one
reference picture buffer only).  Bslice In addition to the coding types
available in a Pslice, some macroblocks of the
Bslice can be predicted from two reference
picture buffers.
108Note that the coded data in a slice can be placed
in three separate Data Partitions (A, B and C)
for robust transmission. Partition A contains
the slice header and header data for each
marcoblock in the slice. Partition B contains
coded residual data for Intra slice
macroblocks. Partition C contains coded residual
data for Inter slice macroblocks.
109In the H.264, the VCL data will be mapped into
NAL units prior to transmission or
storage. Each NAL unit contains a Raw Byte
Sequence Payload (RBSP), a set of data
corresponding to coded video data or header
information. The NAL units can be delivered over
a packetbased network or a bitstream
transmission link or stored in a file.
NAL header RBSP NAL header RBSP NAL header RBSP
sequence of NAL units
110RBSP type Description
Parameter Set Global parameter for a sequence such as picture dimensions, video format.
Supplemental Enhancement Information Side messages that are not essential for correct decoding of the video sequences.
Picture Delimiter Boundary between pictures (optional). If not present, the decoder infers the boundary based on the frame number contained within each slice header.
Coded Slice Header and data for a slice this RBSP contains actual coded video data.
Data Partition A, B or C Three units containing Data Partitioned slice layer data (useful for error decoding).
End of Sequence
End of Stream
Filler Data Contains dummy data
111Example
The following figure shows an example of RBSP
elements.
Sequence parameter set SEI Picture parameter set I Slice (Coded slice) Picture delimiter P Slice (Coded slice) P Slice (Coded slice)
...
112Profiles
 Baseline
 Main
 Extended
 High
113(No Transcript)