Title: Overview%20of%20the%20Scalable%20Video%20Coding%20Extension%20of%20the%20H.264/AVC%20Standard
1Overview of the Scalable Video Coding Extension
of the H.264/AVC Standard
- Heiko Schwarz, Detlev Marpe, and Thomas Wiegand
- CSVT, Sept. 2007
2Outline
- Introduction
- Problems
- Definition
- Functionality
- Goal
- Competition
- Applications
- Targets
- History of SVC
- Structure of SVC
- Temporal Scalability
- Spatial Scalability
- Quality Scalability
- Combined Scalability
- Profiles of SVC
- Conclusions
3Introduction - problem
- Non-Scalable Video Streaming
- Multiple video streams are needed for
heterogeneous clients
8Mb/s
512Kb/s
1Mb/s
6Mb/s
4Mb/s
4Introduction - definition
- Scalable video stream
-
- Scalability
- Removal of parts of the video bit-stream to adapt
to the various needs of end users and to varying
terminal capabilities or network conditions
High quality
Sub-stream n
Sub-stream ki
reconstruction
Sub-stream 2
Sub-stream k2
Low quality
Sub-stream 1
Sub-stream k1
5Introduction - functionality
- Functionality of SVC
- Graceful degradation when right parts of the
bit-stream are lost - Bit-rate adaptation to match the channel
throughput - Format adaptation for backwards compatible
extension - Power adaptation for trade-off between runtime
and quality
6Introduction - goal
- Goal of SVC
- Scalability mode
- Fidelity reduction (SNR scalability)
- Picture size reduction (spatial scalability)
- Frame rate reduction (temporal scalability)
- Sharpness reduction (frequency scalability)
- Selection of content (ROI or object-based
scalability)
Sub-stream ki
H.264/AVC bit-stream
(Quality)
Sub-stream k2
Sub-stream k1
7Introduction - competition
- SVC is an old research topic (gt 20 years) and has
been included in H.262/MPEG-2, H.263, and MPEG-4
Visual. - Rarely used because
- The characteristics of traditional video
transmission systems - Significant loss of coding efficiency and large
increase in decoder complexity - Competition
- Simulcast
- Transcoding
8Introduction - applications
- Applications
- Heterogeneous clients
- Unequal protection
- Surveillance
- Problems
- Increased decoder complexity
- Decreased coding efficiency
- Temporal scalability is more often supported
than spatial and quality scalability.
9Introduction - targets
- Targets
- Little decrease in coding efficiency
- Little increase in decoding complexity
- Support of temporal, spatial, and quality
scalability - A backward compatible base layer
- Simple bit-stream adaptations after encoding
10History of SVC
- October 2003 MPEG issues a call for proposals of
Scalable Video Coding - 12 wavelet-based
- 2 extensions of H.264/AVC
- October 2004 MSRA vs. HHI proposal
(Wavelet-based vs. H.264 Extension) - October 2004 HHI proposal adopted as starting
point (due to reduction of the encoder and
decoder and improvements in coding efficiency) - January 2005 MPEG and VCEG agree to jointly
finalize the SVC project as an Amendment of
H.264/AVC - Spring 2007 Finalization
11Structure of SVC
SNR scalable coding
Prediction
Base layer coding
Temporal scalable coding
Multiplex
Spatial decimation
SNR scalable coding
Temporal scalable coding
Prediction
Base layer coding
12Outline
- Introduction
- History of SVC
- Structure of SVC
- Temporal Scalability
- Hierarchical prediction structure
- Spatial Scalability
- Quality Scalability
- Combined Scalability
- Profiles of SVC
- Conclusions
13Temporal Scalability
- Hierarchical prediction structures
Hierarchical B pictures
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
GOP
Non-dyadic hierarchical prediction
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Hierarchical prediction with zero delay
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
14Temporal Scalability
- Combination with multiple reference picture
- Arbitrary modification of the prediction
structure - Issue of quantization
- Lower layers with higher fidelity ? Smaller QPs
are used in lower layers - Propagation of quantization error ? smaller QPs
are used in higher layers
15Temporal Scalability
- Quantization flow from top to bottom of pyramid
explains necessary to decrease the quality - Quantization step size should be increased in
next lower layer hierarchy level by (1.5)1/2
0.25
0.25
0.25
0.25
0.5
0.5
11.50 12?0.521.51 12?0.524 ?0.2522
?0.521.52
0.5
0.5
1
This slide is copied from http//iphome.hhi.de/wie
gand/assets/pdfs/H264AVC_SVC.pdf
16Temporal Scalability
Video Coding Experiment with H.264/MPEG4-AVC Forem
an, CIF 30Hz _at_ 1320kbps Performance as a function
of N Cascaded QP assignment QP(P) ? QP(B0)-3 ?
QP(B1)-4 ? QP(B2)-5
N1
I
P
P
P
P
P
P
P
P
N2
Temporal scalability
I
P
P
P
P
B0
B0
B0
B0
N4
I
P
B0
P
B0
B1
B1
B1
B1
N8
I
P
B0
B1
B1
B2
B2
B2
B2
This slide is copied from JVT-W132-Talk
17Temporal Scalability
- When different prediction references are
available at encoder and decoder, an additional
penalty occurs which is relatively small in case
of hierarchical B pictures with optimum
quantization - Can only be avoided by using closed-loop encoding
with same references
1
0.25
?1
0.5
12 0.52/1.5 (0.520.252)/2.25 0.252/2.25
1 0.167 0.1388 0.0277
0.5
0.25
?1/(1.5)1/2
0.5
0.5
?1/1.5
This slide is copied from http//iphome.hhi.de/wie
gand/assets/pdfs/H264AVC_SVC.pdf
18Temporal Scalability
- Coding efficiency of hierarchical prediction
- JSVM11, High profile with CABAC
- Only one reference frame
CIF
19Temporal Scalability
- Compared with IPPP (With and without delay
constraint) - Providing temporal scalability usually doesnt
have any negative impact on coding efficiency
20Outline
- Introduction
- History of SVC
- Structure of SVC
- Temporal Scalability
- Spatial Scalability
- Inter layer prediction
- Inter layer motion prediction
- Inter layer residual prediction
- Inter layer intra prediction
- Quality Scalability
- Combined Scalability
- Profiles of SVC
- Conclusions
21Spatial Scalability
texture
Hierarchical MCP Intra-prediction
Base layer coding
motion
- Inter-layer prediction
- Intra
- Motion
- Residual
Spatial decimation
texture
Hierarchical MCP Intra-prediction
Base layer coding
Multiplex
Scalable bit-stream
motion
- Inter-layer prediction
- Intra
- Motion
- Residual
Spatial decimation
H.264/AVC compatible base layer bit-stream
texture
H.264/AVC MCP Intra-prediction
Base layer coding
motion
H.264/AVC compatible coder
22Spatial Scalability
- Similar to MPEG-2, H.263, and MPEG-4
- Arbitrary resolution ratio
- The same coding order in all spatial layers
- Combination with temporal scalability
- Inter-layer prediction
Spatial 1 Temporal 2
Spatial 0 Temporal 0 Temporal 1
23Spatial Scalability
- The prediction signals are formed by
- MCP inside the enhancement layer (Temporal)
(small motion and high spatial detail) - Up-sampling from the lower layer (Spatial)
- Average of the above two predictions (Temporal
Spatial) - Inter-layer prediction
- Three kinds of inter-layer prediction
- Inter-layer motion prediction
- Inter-layer residual prediction
- Inter-layer intra prediction
- Base mode MB
- Only residual are transmitted, but no additional
side info.
24Spatial Scalability
- Inter-layer motion prediction
- base_mode_flag 1
- The reference layer is inter-coded
- Data are derived from the reference layer
- MB partitioning
- Reference indices
- MVs
- motion_pred_flag
- 1 MV predictors are obtained from the reference
layer - 0 MV predictors are obtained by conventional
spatial predictors.
(2x2,2y2)
(2x1,2y1)
16
16
(x1,y1)
(x2,y2)
Reference layer
8
8
25Spatial Scalability
- Inter-layer residual prediction
- residual_pred_flag 1
- Predictor
- Block-wise up-sampling by a bi-linear filter from
the corresponding 8?8 sub-MB in the reference
layer - Transform block basis
26Spatial Scalability
- Inter-layer intra prediction
- base_mode_flag 1
- The reference layer is intra-coded
- Up-sampling from the reference layer
- Luma one-dimensional 4-tap FIR filter
- Chroma bi-linear filter
27Spatial Scalability
- Past spatial scalable video
- Inter-layer intra prediction requires completely
decoding of base layer. - Multiple motion compensation and deblocking
filter are needed. - Full decoding inter-layer prediction
complexity gt simulcast. - Single-loop decoding
- Inter-layer intra prediction is restricted to MBs
for which the co-located base layer is
intra-coded
28Spatial Scalability
- Single-loop vs. multi-loop decoding
Inter
I
B
P
This slide is copied from http//iphome.hhi.de/wie
gand/assets/pdfs/H264AVC_SVC.pdf
29Spatial Scalability
- Generalized spatial scalability in SVC
- Arbitrary ratio
- Only restriction Neither the horizontal nor the
vertical resolution can decrease from one layer
to the next. - Cropping
- Containing new regions
- Higher quality of interesting regions
30Spatial Scalability
- Coding efficiency
- Multiple-loop gt Single-loop
31(No Transcript)
32Spatial Scalability
- Coding efficiency (IPPP)
- Multi-loop gt Single-loop
33Spatial Scalability
- Encoder control (JSVM)
- Base layer
-
- p0? is optimized for base layer
- Enhancement layer
-
- p1? is optimized for enhancement layer
- Decisions of p1 depend on p0
- Efficient base layer coding but inefficient
enhancement layer coding
34Spatial Scalability
- Encoder control (optimization)
- Base layer
- Considering enhancement layer coding
- Eliminating p0s disadvantaging enhancement
layer coding -
- Enhancement layer
- No change
- w
- w 0 JSVM encoder control
- w 1 Single-loop encoder control (base layer is
not controlled)
35Spatial Scalability
- Coding efficiency of optimal encoder control
- Optimized encoder vs. JSVM encoder (QPE QPB 4)
36Outline
- Introduction
- History of SVC
- Structure of SVC
- Temporal Scalability
- Spatial Scalability
- Quality Scalability
- CGS
- MGS
- Drift control
- Combined Scalability
- Profiles of SVC
- Conclusions
37Quality Scalability
- Coarse-grain quality scalability (CGS)
- A special case of spatial scalability
- Identical sizes (resolution) for base and
enhancement layers - Smaller quantization step sizes for higher
enhancement residual layers - Designed for only several selected bit-rate
points - Supported bit-rate points Number of layers
- Switch can only occur at IDR access units
38Quality Scalability
- Medium-grain quality scalability (MGS)
- More enhancement layers are supported
- Refinement quality layers of residual
- Key pictures
- Drift control
- Switch can occur at any access units
- CGS key pictures refinement quality layers
39Quality Scalability
- Drift control
- Drift The effect caused by unsynchronized MCP at
the encoder and decoder side - Trade-off of MCP in quality SVC
- Coding efficiency ? drift
40Quality Scalability
- MPEG-4 quality scalability with FGS
- Base layer is stored and used for MCP of
following pictures - Drift Drift free
- Complexity Low
- Efficiency Efficient based layer but inefficient
enhancement layer - Refinement data are not used for MCP
Refinement (possibly lost or truncated)
Base layer
41Quality Scalability
- MPEG-2 quality scalability (without FGS)
- Only 1 reference picture is stored and used for
MCP of following pictures - Drift Both base layer and enhancement layer
- Frequent intra updates is necessary
- Complexity Low
- Efficiency Efficient enhancement layer but
inefficient base layer
Refinement (possibly lost or truncated)
Base layer
42Quality Scalability
- 2-loop prediction
- Several closed encoder loops run at different
bit-rate points in a layered structure - Drift Enhancement layer
- Complexity High
- Efficiency Efficient base layer and medium
efficient enhancement layer
Refinement (possibly lost or truncated)
Base layer
43Quality Scalability
- SVC concepts
- Key picture
- Trade-off between coding efficiency and drift
- MPEG-4 FGS All key pictures
- MPEG-2 quality scalability Non-key pictures
Refinement (possibly lost or truncated)
Base layer
44Quality Scalability
- Drift control with hierarchical prediction
- Key pictures
- Based layer is stored and used for the MCP of
following pictures - Other pictures
- Enhancement layer is stored and used for the MCP
of following pictures - GOP size adjusts the trade-off between
enhancement layer coding efficiency and drift
Refinement (possibly lost or truncated)
Base layer
P
P
P
B1
B1
B2
B2
B2
B2
45Quality Scalability
- Comparisons of drift control
High efficiency
Low efficiency
Drift-free
Drift
46Quality Scalability
- Comparisons of coding efficiency
QSTEP 2 (QP-4)/6
High dQP
Low dQP
47Quality Scalability
- MGS with key pictures using optimized encoder
control
Only base layer
48Outline
- Introduction
- History of SVC
- Structure of SVC
- Temporal Scalability
- Spatial Scalability
- Quality Scalability
- Combined Scalability
- SVC encoder structure
- Dependence and Quality refinement layers
- Bit-stream format
- Bit-stream switching
- Profiles of SVC
- Conclusions
49Combined Scalability
The same motion/prediction information
Dependency layer
Temporal Decomposition
The same motion/prediction information
50Combined Scalability
- Dependency and Quality refinement layers
Q 2
D 2
Q 1
Q 0
Q 2
Scalable bit-stream
D 1
Q 1
Q 0
Q 2
D 0
Q 1
Q 0
51Combined Scalability
Q1
D1
Q0
T0
T2
T1
T2
T0
Q1
D0
Q0
52Combined Scalability
NAL unit header
NAL unit header extension
NAL unit payload
1
1
1
1
1
3
2
3
3
6
2
P
T
D
Q
P (priority_id) indicates the importance of a
NAL unit T (temporal_id) indicates temporal
level D (dependency_id) indicates spatial/CGS
layer Q (quality_id) indicates MGS/FGS layer
53Combined Scalability
- Bit-stream switching
- Inside a dependency layer
- Switching everywhere
- Outside a dependency layer
- Switching up only at IDR access units
- Switching down everywhere if using multiple-loop
decoding
54Outline
- Introduction
- History of SVC
- Structure of SVC
- Temporal Scalability
- Spatial Scalability
- Quality Scalability
- Combined Scalability
- Profiles of SVC
- Scalable Baseline
- Scalable High
- Scalable High Intra
- Conclusions
55Profiles of SVC
- Scalable Baseline
- For conversational and surveillance applications
requiring low decoding complexity - Spatial scalability fixed ratio (1, 1.5, or 2)
and MB-aligned cropping - Temporal and quality scalability arbitrary
- No interlaced coding tools
- B-slices, weighted prediction, CABAC, and 8x8
luma transform - The base layer conforms Baseline profile of
H.264/AVC
56Profiles of SVC
- Scalable High
- For broadcast, streaming, and storage
- Spatial, temporal, and quality scalability
arbitrary - The base layer conforms High profile of H.264/AVC
- Scalable High Intra
- Scalable High all IDR pictures
57Conclusions
- Temporal scalability
- Hierarchical prediction structure
- Spatial and quality scalability
- Inter-layer prediction of Intra, motion, and
residual information - Single-loop MC decoding
- Identical size for each spatial layer CGS
- CGS key pictures quality refinement layer
MGS - applications
- Power adaption decoding needed part of the
video stream - Graceful degradation when right parts are
lost - Format adaption backwards compatible extension
in mobile TV - Whats next in SVC?
- Bit-depth scalability (8-bit 420 ? 10-bit
420) - Color format scalability (420 ? 444)
58References
- H. Schwarz, D. Marpe, and T. Wiegand, Overview
of the Scalable Video Coding Extension of the
H.264/AVC Standard, CSVT 2007. - T. Wiegand, Scalable Video Coding, Joint Video
Team, doc. JVT-W132, San Jose, USA, April 2007. - T. Wiegand, Scalable Video Coding, Digital
Image Communication, Course at Technical
University of Berlin, 2006. (Available on
http//iphome.hhi.de/wiegand/dic.htm) - H. Schwarz, D. Marpe, and T. Wiegand,
Constrained Inter-Layer Prediction for
Single-Loop Decoding in Spatial Scalability,
Proc. of ICIP05.