Introduction%20to%20H.264/AVC%20Video%20Coding - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction%20to%20H.264/AVC%20Video%20Coding

Description:

J rn Ostermann, Jan Bormans, Peter List, Detlev Marpe, Matthias ... Interlaced Frame (Top Field First) Progressive. Frame. Top. Field. Bottom. Field. 11/27/09 ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 79
Provided by: Ares2
Category:

less

Transcript and Presenter's Notes

Title: Introduction%20to%20H.264/AVC%20Video%20Coding


1
Introduction toH.264/AVC Video Coding
Thomas Wiegand, Gary J. Sullivan, Gisle
Bjøntegaard, and Ajay Luthra, Overview of the
H.264/AVC Video Coding Standard, IEEE
Transactions on Circuits and Systems for Video
Technology, Vol. 13, No. 7, JULY 2003
  • Jörn Ostermann, Jan Bormans, Peter List, Detlev
    Marpe, Matthias Narroschke, Fernando Pereira,
    Thomas Stockhammer, and Thomas Wedi,
  • Video coding with H.264/AVC Tools, Performance,
    and Complexity,
  • Circuits and Systems Magazine, IEEE , Vol. 4
    , Issue 1 , First Quarter 2004

2
Outline
  • Goals of the H.264/AVC
  • Structure of H.264/AVC video encoder
  • Design feature highlights
  • prediction methods
  • Transform details and VLC
  • Robustness on transmission
  • Video coding layer
  • Hypothetical reference decoder
  • Profiles and Levels
  • Network adaptation layer
  • Comparisons

3
Goals of the H.264/AVC
  • Video Coding Experts Group (VCEG), ITU-T SG16 Q.6
  • H.26L project (early 1998)
  • Target double the coding efficiency in
    comparison to any other existing video coding
    standards for a broad variety applications.
  • H.261, H.262 (MPEG-2),
  • H.263 (H.263, H.263)

4
Structure of H.264/AVC video encoder
H.264/AVC Conceptual Layers
Video Coding Layer Encoder
Video Coding Layer Decoder
VCL-NAL Interface
Network Abstraction Layer Encoder
Network Abstraction Layer Decoder
NAL Decoder Interface
NAL Encoder Interface
Transport Layer
H.264 to File Format TCP/IP
H.264 to H.320
H.264 to MPEG-2
H.264 to H.324/M



Wired Networks
Wireless Networks
5
Design feature highlights (1) improved
on prediction methods
  • Variable block-size motion compensation with
    small block sizes
  • A minimum luma motion compensation block size as
    small as 44.
  • Quarter-sample-accurate motion compensation
  • First found in an advanced profile of the MPEG-4
    Visual (part 2) standard, but further reduces the
    complexity of the interpolation processing
    compared to the prior design.

6
Design feature highlights (2) improved on
prediction methods
  • Motion vectors over picture boundaries
  • First found as an optional feature in H.263 is
    included in H.264/AVC.
  • -------------------------------------------------
  • Multiple reference picture motion compensation
  • Decoupling of referencing order from display
    order
  • (X)IBBPBBPBBP gt IPBBPBBPBB
  • Bounded by a total memory capacity imposed to
    ensure decoding ability.
  • Enables removing the extra delay previously
    associated with bi-predictive coding.

7
Design feature highlights (3) improved on
prediction methods
  • Decoupling of picture representation methods from
    picture referencing capability
  • B-frame could not be used as references for
    prediction
  • Referencing to closest pictures
  • Weighted prediction
  • A new innovation in H.264/AVC allows the
    motion-compensated prediction signal to be
    weighted and offset by amounts specified by the
    encoder.
  • For scene fading, etc
  • --------------------------------------------------
    ---

8
Design feature highlights (4) improved on
prediction methods
  • Improved skipped and direct motion inference
  • Inferring motion in skipped areas gt for global
    motion
  • Enhanced motion inference method for direct

9
Design feature highlights (5) improved on
prediction methods
  • Directional spatial prediction for intra coding
  • Allowing prediction from neighboring areas that
    were not coded using intra coding
  • Something not enabled when using the
    transform-domain prediction method found in
    H.263 and MPEG-4 Visual

10
Design feature highlights (6) improved on
prediction methods
  • In-the-loop deblocking filtering
  • Building further on a concept from an optional
    feature of H.263
  • The deblocking filter in the H.264/AVC design is
    brought within the motion-compensated prediction
    loop

11
Design feature highlights (7)
other parts
  • Small block-size transform
  • The new H.264/AVC design is based primarily on a
    44 transform.
  • Allowing the encoder to represent signals in a
    more locally-adaptive fashion, which reduces
    artifacts known colloquially as ringing.
  • Quantization DPCM for DC terms
  • Spurious frequencies truncation mismatch periods

12
Design feature highlights (8)
other parts
  • Hierarchical block transform
  • Using a hierarchical transform to extend the
    effective block size use for low-frequency chroma
    information to an 88 array
  • Allowing the encoder to select a special coding
    type for intra coding, enabling extension of the
    length of the luma transform for low-frequency
    information to a 1616 block size

13
Design feature highlights (9)
other parts
  • Short word-length transform
  • While previous designs have generally required
    32-bit processing, the H.264/AVC design requires
    only 16-bit arithmetic.
  • Exact-match inverse transform
  • Building on a path laid out as an optional
    feature in the H.263 effort, H.264/AVC is the
    first standard to achieve exact equality of
    decoded video content from all decoders.
  • Integer transform

14
Design feature highlights (10)
other parts
  • Arithmetic entropy coding
  • While arithmetic coding was previously found as
    an optional feature of H.263, a more effective
    use of this technique is found in H.264/AVC to
    create a very powerful entropy coding method
    known as CABAC (context-adaptive binary
    arithmetic coding)

15
Design feature highlights (11)
other parts
  • Context-adaptive entropy coding
  • CAVLC (context-adaptive variable-length coding)
  • CABAC (context-adaptive binary arithmetic coding)

16
Design feature highlights (12) Robustness to
data errors/losses and flexibility for operation
over a variety of network environments
  • Parameter set structure
  • The parameter set design provides for robust and
    efficient conveyance header information
  • NAL unit syntax structure
  • Each syntax structure in H.264/AVC is placed into
    a logical data packet called a NAL unit

17
Design feature highlights (13) Robustness to
data errors/losses and flexibility for operation
over a variety of network environments
  • Flexible slice size
  • Unlike the rigid slice structure found in MPEG-2
    (which reduces coding efficiency by increasing
    the quantity of header data and decreasing the
    effectiveness of prediction),
  • slice sizes in H.264/AVC are highly flexible, as
    was the case earlier in MPEG-1.

18
Design feature highlights (14) Robustness to
data errors/losses and flexibility for operation
over a variety of network environments
  • Flexible macroblock ordering (FMO)
  • Significantly enhance robustness to data losses
    by managing the spatial relationship between the
    regions that are coded in each slice
  • Arbitrary slice ordering (ASO)
  • sending and receiving the slices of the picture
    in any order relative to each other
  • first found in an optional part of H.263
  • can improve end-to-end delay in real-time
    applications, particularly when used on networks
    having out-of-order delivery behavior

19
Design feature highlights (15) Robustness to
data errors/losses and flexibility for operation
over a variety of network environments
  • Redundant pictures
  • Enhance robustness to data loss
  • A new ability to allow an encoder to send
    redundant representations of regions of pictures

20
Design feature highlights (15) Robustness to
data errors/losses and flexibility for operation
over a variety of network environments
  • Data Partitioning
  • Allows the syntax of each slice to be separated
    into up to three different partitions for
    transmission, depending on a categorization of
    syntax elements
  • This part of the design builds further on a path
    taken in MPEG-4 Visual and in an optional part of
    H.263.
  • The design is simplified by having a single
    syntax with partitioning of that same syntax
    controlled by a specified categorization of
    syntax elements.

21
Design feature highlights (16) Robustness to
data errors/losses and flexibility for operation
over a variety of network environments
  • SP/SI synchronization/switching pictures
  • A new feature consisting of picture types that
    allow exact synchronization of the decoding
    process of some decoders with an ongoing video
    stream produced by other decoders without
    penalizing all decoders with the loss of
    efficiency resulting from sending an I picture
  • Enable switching a decoder between different data
    rates, recovery from data losses or errors, as
    well as enabling trick modes such as
    fast-forward, fast-reverse, etc.

22
Coded Video Sequences
  • A coded video sequence consists of a series of
    access units that are sequential in the NAL unit
    stream and use only one sequence parameter set.
  • Can be decoded independently
  • Start with an instantaneous decoding refresh
    (IDR) access unit must be Intra.
  • A NAL unit stream may contain one or more coded
    video sequences.

23
VCL (Video Coding Layer)
input video
DCT
Q
VLC
-
output bitstream
1616 macroblocks
IQ
IDCT
Intra- Prediction
Intra / inter
Motion Compensation
De-blocking Filter
Motion Estimation
Frame Memory
output video
Clipping
Decoder
YCbCr Color Space and 420 Sampling
24
Pictures, Frames, and Fields
Progressive Frame
Bottom Field
Top Field
?t
Interlaced Frame (Top Field First)
25
Slices and Slice Groups (1)
Slice 0
Slice 1
Slice 2
Subdivision of a picture into slices when not
using FMO. (Flexible Macroblock Ordering)
26
Slices and Slice Groups (2)
Slice Group 0
Slice Group 0
Slice Group 1
Slice Group 1
Slice Group 2
Subdivision of a QCIF frame into slices utilizing
FMO.
27
Slice coding types
  • I Slice
  • P Slice
  • B Slice
  • SP Slice
  • Switching between P slices
  • efficient switching between different pre-coded
    pictures becomes possible.
  • SI Slice
  • Switching between I slices
  • Allowing an exact match of a macroblock in an SP
    slice for random access and error recovery
    purposes.

28
Adaptive Frame/Field Coding Operation
  • Three modes can be chosen adaptively for each
    frame in a sequence.
  • Frame mode
  • Field mode
  • Frame mode / Field coded
  • For a frames consists of mixed moving regions
  • The frame/field encoding decision can be made for
    each vertical pair of macroblocks (a 1632 luma
    region) in a frame.
  • to code the nonmoving regions in frame mode and
    the moving regions in the field mode.
  • Macroblock-adaptive frame/field (MBAFF)

Picture-adaptive frame/field (PAFF) 16 20
save over frame-only for ITU-R 601 Canoa,
Rugby, etc.
MBAFF
29
Macroblock-adaptive frame/field (MBAFF)
A Pair of Macroblocks in Frame Mode
Top/Bottom Macroblocks in Field Mode
30
PAFF vs. MBAFF
  • The main idea of MBAFF is to preserve as much
    spatial consistency as possible.
  • In MBAFF, one field cannot use the macroblocks in
    the other field of the same frame as a reference
    for motion prediction.
  • PAFF coding can be more efficient than MBAFF
    coding in the case of rapid global motion, scene
    change, or intra picture refresh.
  • MBAFF was reported to reduce bit rates 14 16
    over PAFF for ITU-R 601 (Mobile and Calendar,
    MPEG-4 World News)

31
Intra-Frame Prediction (1)
  • Intra_44
  • Well suited for coding of parts of a picture with
    significant detail.
  • Intra_1616 together with chroma prediction
  • More suited for coding very smooth areas of a
    picture.
  • 4 prediction modes
  • I_PCM
  • Bypass prediction and transform coding and, send
    the values of the encoded samples directly

32
Intra-Frame Prediction(2)
  • Intra_16 ? 16
  • Vertical prediction
  • Horizontal prediction
  • DC-prediction
  • Plane-prediction
  • Works very well in areas of a gently changing
    luminance.
  • Chrominance signals
  • 8 ? 8 blocks
  • Very smooth in most cases.
  • Use the same modes as in Intra_16 ? 16.

33
Intra-Frame Prediction (3)
  • In H.263 and MPEG-4 Visual
  • Intra prediction is conduced in the transform
    domain
  • In H.264/AVC
  • Intra prediction is always conducted in the
    spatial domain

34
Intra-Frame Prediction (3)
35
Intra-Frame Prediction (4)
Across slice boundaries is not allowed.
36
Inter-Frame Prediction in P slices (1)
Segmentations of the macroblock
MB Types
8
8
8
8
16
16






8
8
16
16
8
8
8x8 Types
4
8
4
4
8






4
4
8
8
4
P_Skip
www.vcodex.com H.264 / MPEG-4 Part 10 Inter
Prediction
37
Inter-Frame Prediction in P slices (2) The
accuracy of motion compensation
A
B
aa
b1(E-5F20G20H-5IJ) h1(A-5C20G20M-5RT) b(
b116) gtgt 5 h(h116) gtgt 5 ---------- j1cc-5dd20
h120m1-5eeff j (j1512) gtgt10 ---------- a(G
b1) gtgt1 e(bh1) gtgt 1
C
D
bb
clipped to 0255
E
F
G
H
I
J
b
a
c
e
f
g
d
clipped to 0255
cc
dd
ee
ff
i
j
k
h
m
p
q
r
n
K
L
M
N
O
P
s
R
S
gg
T
U
hh
38
Inter-Frame Prediction in P slices (3)
Multiframe motion-compensated prediction
?1
?4
?2
Current Picture
4 Prior Decoded Pictures As Reference
39
Inter-Frame Prediction in B slices
  • Other pictures can refer pictures containing B
    slices
  • Weighted averaging of two distinct
    motion-compensated prediction
  • Utilizing two distinct lists of reference
    pictures (list0, list1)
  • 4 prediction types
  • list0, list1, bi-predictive, direct prediction,
    B_Skip
  • For each partition, the prediction type can be
    chosen separately.

40
Transform, Scaling, and Quantization(1)
  • 4 ? 4 and 2 ? 2 DCT
  • Integer transform matrix

-1
17
16
INTRA_16 ?16
H2
H3
H3
0
1
4
5
18
19
22
23
H1
2
3
6
7
20
21
24
25
DCT
8
9
12
13
Cb
Cr
Cr
Cb
10
11
14
15
Transmission order -1,0,1, , 24,25
Y
Y
41
Transform, Scaling, and Quantization(2) Repeate
d Transforms
  • Intra_1616, chroma intra modes are intend coding
    for smooth areas
  • The DC coefficients undergo a second transform
    with the results that we have transform
    coefficients covering the whole macroblock

0
1
00
01
indices correspond to the indices of 22 inverse
Hadamard transform
2
3
10
11
Repeat transform for chroma blocks
42
Transform, Scaling, and Quantization(3)
  • Quantized by scalar quantizer the quantization
    step size is chosen by a so-called quantization
    parameter (QP) that has 52 values.
  • An increment of QP by 1 results in an increase of
    the required data rate of approximately 12. (The
    step size doubles with each increment of 6 of
    QP.)
  • A change of step size by approximately 12 also
    means roughly a reduction of bit rate by
    approximately 12

21/6 ? 1.12
QSTEP 2(QP-4)/6
R? ? (1/1.12)?R if Q?STEP ? 1.12?QSTEP
43
Transform, Scaling, and Quantization(4)
  • Scanning order
  • Zig-zag scan
  • For 22 DC coefficients of the chroma component
  • raster-scan order
  • All inverse transform operations in H.264/AVC can
    be implemented using only additions and
    bit-shifting operations of 16-bit integer values.
    No drift problem between encoders and decoders.
  • Only 16-bit memory accesses are needed for a good
    implementation of the forward transform and
    quantization process in the encoder

44
Entropy Coding
  • Two methods of entropy coding are supported
  • An exp-Golomb code - a single infinite-extent
    codeword table for all syntax elements.
  • For quantized transform coefficients
  • Context-Adaptive Variable Length Coding (CAVLC)

45
CAVLC (1)
  • of nonzero quantized coefficients (N) and the
    actual size, and position of the coefficients are
    coded separately

7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0.
  • of nonzero coefficients (N) and Trailing T1s
  • T1s 2, N 5,
  • These two values are coded as a combined event.
    One out of 4 VLC tables is used based on the
    number of coefficients in neighboring blocks.

46
CAVLC (2)
7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0.
2) Encoding the value of Coefficients For T1s,
only sign need to be coded. Coefficient values
are coded in reverse order -2, 6, A
starting VLC is used for -2, and a new VLC may be
used based on the just coded coefficient. In this
way adaptation is obtained in the use of VLC
tables, Six exp-Golomb code tables are available
for this adaptation.
47
CAVLC (3)
7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0.
3) Sign Information For T1s, this is sent as
single bit. For the other coefficients, the
sign bit is included in the exp-Golomb codes
48
CAVLC (4)
7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0.
4) TotalZeroes The number of zeros between
the last nonzero coefficient of the scan and
its start. TotalZeroes 3 N5,
gt the number must in the range 0-11, 15 tables
are available for N in the
range 1-15. (If N16 there is no zero
coefficient.)
5) RunBefore In this example it must be
specified how the 3 zeros are distributed.
The number of 0s before the last coefficient is
coded. 2, gt range0-3 gt a suitable VLC
is used. 1, gt range0-1
49
CAVLC vs CABAC
  • The efficiency of entropy coding can be improved
    further if the Context-Adaptive Binary Arithmetic
    Coding (CABAC) is used.
  • Compared to CAVLC, CABAC typically provides a
    reduction in bit rate between 515.
  • The highest gains are typically obtained when
    coding interlaced TV signals.

50
CABAC
51
In-Loop Deblocking filter
  • Apply deblocking filter on p0 and q0 if each of
    conditions satisfied
  • p0-q0lta(QP)
  • p1-p0ltß(QP)
  • q1-q0ltß(QP)

q0
q2
q1
p0
p2
p1
ß lt a
. p1 and q1 if p2-p0ltß(QP) or q2q0lt ß(QP)
44 block edge
The filter reduces the bit rate by 510
typically.
52
Hypothetical Reference Decoder
  • In H.264/AVC HRD specifies operation of two
    buffers
  • The coded picture buffer (CPB)
  • Modeling the arrival and removal time of the
    coded bits.
  • The decoded picture buffer (DPB)
  • Similar in spirit to what MPEG-2 had, but is more
    flexible in support at a variety of bit rates
    without excessive delay.

53
Hypothetical Reference Decoder
  • H.264 hypothetical reference decoder (HRD)
    guarantee that the buffers never overflow or
    underflow
  • rate allocation allocate proper bits to each
    coding unit according to the buffer status
  • quantization parameter adjustment how to adjust
    the encoder parameters to properly encode each
    unit with the allocated bits
  • Find the relation between the rate and the
    quantization parameter

54
The Relation between QSTEP and QP
  • In H.264/AVC, the relation between QSTEP and QP
    is
  • QSTEP 2 (QP-4)/6.

55
The Relation between PSNR and the quantization
parameter QP
  • The Relation between PSNR and the quantization
    parameter QP is
  • where l and b are the constants.

56
Profiles and Levels
  • Baseline, Main, and Extended
  • Baseline supports all features in H.264/AVC
    except
  • Set 1 B slices, weighted prediction, CABAC,
    field coding, and picture or macroblock adaptive
    switching between frame and field coding.
  • Set 2 SP/SI slices, and slice data partitioning.

57
H.264/AVC Profiles
58
Structure of H.264/AVC video encoder
H.264/AVC Conceptual Layers
Video Coding Layer Encoder
Video Coding Layer Decoder
VCL-NAL Interface
Network Abstraction Layer Encoder
Network Abstraction Layer Decoder
NAL Decoder Interface
NAL Encoder Interface
Transport Layer
H.264 to File Format TCP/IP
H.264 to H.320
H.264 to MPEG-2
H.264 to H.324/M



Wired Networks
Wireless Networks
59
NAL (Network Abstraction Layer)
  • Designed in order to provide network
    friendliness
  • facilitates the ability to map H.264/AVC VCL data
    to transport layers such as
  • RTP/IP for any kind of real-time wire-line and
    wireless Internet services (conversational and
    streaming)
  • File formats, e.g., ISO MP4 for storage and MMS
  • H.32X for wireline and wireless conversational
    services
  • MPEG-2 systems for broadcasting services, etc.

60
Key concepts of NAL
  • NAL Units
  • Byte stream and Packet format uses of NAL units
  • Parameter sets
  • Access units

61
NAL units
1 byte header
payload
Integer number of bytes
Interleaved as necessary with emulation
prevention bytes, which are bytes inserted with a
specific value to prevent a particular pattern of
data called a start code prefix from being
accidentally generated inside the payload.
The NAL unit structure definition specifies a
generic format for use in both packet-oriented
and bitstream-oriented transport systems, and a
series of NAL units generated by an encoder is
referred to as a NAL unit stream.
62
NAL units in byte-stream format use
  • H.320 and MPEG-2/H.222.0 systems
  • require delivery of the entire or partial NAL
    unit stream as an ordered stream of bytes or
    bits.
  • Each NAL unit is prefixed by a specific pattern
    of three bytes called a start code prefix.

payload
63
NAL units in packet-transport system use
  • Internet protocol/RTP systems
  • The inclusion of start code prefixes in the data
    would be a waste of data carrying capacity, so
    instead the NAL units can be carried in data
    packets without start code prefixes.

payload
64
VCL and no-VCL NAL units
  • VCL NAL units
  • The data that represents the values of the
    samples in the video pictures
  • Non-VCL NAL
  • Any associated additional information such as
    parameter sets (important header data that can
    apply to a large number of VCL NAL units) and
    supplemental enhancement information (timing
    information and other supplemental data that may
    enhance usability of the decoded video signal but
    are not necessary for decoding the values of the
    samples in the video pictures).

65
Parameter Sets (1)
  • A parameter set is supposed to contain
    information that is expected to rarely change and
    offers the decoding of a large number of VCL NAL
    units.

66
Parameter Sets (2)
  • Two types of parameter sets
  • Sequence parameter sets
  • Apply to a series of consecutive coded video
    pictures called a coded video sequence
  • Picture parameter sets
  • Apply to the decoding of one or more individual
    pictures within a coded video sequence.

67
Parameter Sets (3) The Structure
VCL NAL unit
Identifier to Picture parameter set
Picture parameter set
Sequence parameter set
Identifier to Sequence parameter set
Non VCL NAL unit
68
Parameter Sets (4) Transmission
VCL NAL unit
Non VCL NAL unit
In-band
VCL NAL unit
Out of band
Non VCL NAL unit
69
Parameter set use with reliable out-of-band
parameter set exchange
NAL unit with VCL Data encoded with PS3
(address in Slice Header )
H.264/AVC Encoder
H.264/AVC Decoder
Reliable Parameter Set Exchange
1
2
3
3
2
1
  • Parameter Set 3
  • Video format PAL
  • Entr. Code CABAC

70
Access Units
  • A set of NAL units in a specified form is
    referred to as an access unit.

start
redundant coded picture
access unit delimiter
Supplemental Enhancement Information
end of sequence
SEI
end of stream
VCL NAL units slices or slice data partitions
primary coded picture
end
71
Comparisons (1)
72
Comparisons (2)
73
Comparisons (3)
74
Comparisons (4)
75
New Features in H.264
  • Multi-mode, multi-reference MC
  • Motion vector can point out of image border
  • 1/4-, 1/8-pixel motion vector precision
  • B-frame prediction weighting
  • 4?4 integer transform
  • Multi-mode intra-prediction
  • In-loop de-blocking filter
  • UVLC (Uniform Variable Length Coding)
  • NAL (Network Abstraction Layer)
  • SP-slices

76
MPEG-4 H.263 Additions Variable Shape Coding
  • Goal Support for interactive multimedia
  • Visual Object (AO), Audio Object (AO) and AVO
  • 18 video coding profiles
  • Roughly follows H.263 design and adds all prior
    features and (most important) shape coding
  • Includes zero-tree wavelet coding of still
    textured pictures, segmented coding of shapes,
    coding of synthetic content
  • 2D 3D mesh coding, face animation modeling
  • 10-bit and 12-bit video
  • Contains 9 parts. Part 10 is H.264/AVC

77
SP-Slices
  • Efficiently switching between two bitstreams
  • Provides VCR-like functions

78
B-frame Prediction Weighting
  • Playback order I0 B1 B2 B3 P4 B5
    B6 ...
  • Bitstream order I0 P4 B1 B3 B2 P8
    B5 ...
Write a Comment
User Comments (0)
About PowerShow.com