Title: Overview of H.264/AVC
1Overview of H.264/AVC
2Outline
- Abstract
- Applications
- Network Abstraction Layer,NAL
- Conclusion(I)
- Design feature highlight
- Conclusion(II)
- Video Coding Layer,VCL
- Profile and potential application
- Conclusion(III)
3abstract
- H.264/AVC is newest video coding standard
- Main goals have been enhanced compression and
provision of network-friendly representation
addressing conversational(video telephony) and
nonconversational (storage,broadcast, or
streaming) application - H.264/AVC have achieved a significant improvement
in rate-distortion efficiency - Scope of standardization is illustrated below
4applications
- Broadcast over cable, cable modem
- Interactive or serial storage on optical and DVD
- Conversational service over LAN, modem
- Video-on-demand or streaming service over
ISDN,wireless network - Multimedia message service (MMS) over DSL, mobile
network - How to handle the variety of applications and
networks ?
5applications
- To address this need for flexibility and
customizability, the H.264/AVC design VCL and
NAL, structure of H.264/AVC encoder is shown below
6applications
- VCL(video coding layer), designed to efficiently
represent video content - NAL(network abstraction layer), formats the VCL
representation of the video and provides header
information in a manner appropriate for
conveyance by a variety of transport layers or
storage media
7Network Abstraction Layer
- To provide network friendliness to enable
simple and effective customization of the use of
the VCL - To facilitate the ability to map H.264/AVC data
to transport layers such as - RTP/IP for kind of real-time Internet services
- File formats,ISO MP4 for storage
- H.32X for conversational services
- MPEG-2 systems for broadcasting services
- The design of the NAL anticipates a variety of
such mappings
8Network Abstraction Layer
- Some key concepts of the NAL are NAL units, byte
stream, and packet format uses of NAL units,
parameter sets and access units - NAL units
- a packet that contains an integer number of bytes
- First byte is header byte containing indication
of type of data - Remaining byte contains payload data
- Payload data is interleaved as necessary with
emulation prevention bytes, preventing start code
prefix from being generated inside payload - Specifies a format for use in both packet- and
bitstream- oriented transport system
9Network Abstraction Layer
- NAL units in Byte-Stream format use
- byte stream format
- Each is prefixed by a unique start code to
identify the boundary - Some systems require delivery of NAL unit stream
as ordered stream of bytes (like H.320 and
MPEG-2/H.220) - NAL units in packet-transport system use
- Coded data is carried in packets framed by system
transport protocol - Can be carried by data packets without start code
prefix - In such system, inclusion of start code prefixes
in data would be waste
10Network Abstraction Layer
- VCL and Non-VCL NAL units
- VCL NAL units contain data represents the values
of the samples in video pictures - Non- VCL NAL units contain extra data like
parameter sets and supplemental enhancement
information (SEI) - parameter sets, important header data applying to
large number of VCL NAL units - SEI, timing information and other supplemental
data enhancing usability of decoded video signal
but not necessary for decoding the values in the
picture
11Network Abstraction Layer
- Parameter sets
- Contain information expected to rarely change and
offers the decoding of a large number of VCL NAL
units - Divided into two types
- Sequence parameter sets, apply to series of
consecutive coded video picture - Picture parameter sets, apply to the decoding of
one or more individual picture within a coded
video sequence - The above two mechanisms decouple transmission of
infrequently changing information - Can be sent well ahead of the VCL NAL units and
repeated to provide robustness against data loss
12Network Abstraction Layer
- Parameter sets
- Can be sent well ahead of the VCL NAL units and
repeated to provide robustness against data loss - Small amount of data can be used (identifier) to
refer to a larger amount of of information
(parameter set) - In some applications, these may be sent within
the channel (termed in-band transmission)
13Network Abstraction Layer
- Parameter sets
- In other applications, it can be advantageous to
convey parameters sets out of band using
reliable transport mechanism
14Network Abstraction Layer
- Access units
- The format of access unit is shown below
15Network Abstraction Layer
- Access units
- Contains a set of VCL NAL units to compose a
primary coded picture - Prefixed with an access unit delimiter to aid in
locating the start of the access unit - SEI contains data such as picture timing
information - Primary coded data consists of VCL NAL units
consisting of slices that represent the sample of
the video - Redundant coded picture are available for use by
decoder in recovering from loss of data
16Network Abstraction Layer
- Access units
- For the last coded picture of video sequence, end
of sequence NAL unit is present to indicate the
end of sequence - For the last coded picture in the entire NAL unit
stream, end of stream NAL unit is present to
indicate the stream is ending - Decoder are not required to decode redundant
coded pictures if they are present - Decoding of each access unit results in one
decoded picture
17Network Abstraction Layer
- Coded video sequences
- Consists of a series of access unit and use only
one sequence parameter set - Can be decoded independently of other coded video
sequence ,given necessary parameter set - Instantaneous decoding refresh(IDR) access unit
is at the beginning and contains intra picture - Presence of IDR access unit indicates that no
subsequent picture will reference to picture
prior to intra picture
18Conclusion(I)
- H.264/AVC represents a number of advances in
standard video coding technology in term of
flexibility for effective use over a broad
variety of network types and application domain
19Design feature highlight
- Variable block-size motion compensation with
small block size - With minimum luma block size as small as 4x4
- The matching chroma is half the length and width
20Design feature highlight
- Quarter-sample-accurate motion compensation
- Half-pixel is generated by using 6 tap FIR filter
- As first found in advanced profile of MPEG-4, but
further reduces the complexity - Multiple reference picture motion compensation
- Extends upon enhanced technique found in H.263
- Select among large numbers of pictures decoded
and stored in the decoder for pre-prediction - Same for bi-prediction which is restricted in
MPEG-2
21Design feature highlight
- Decoupling of reference order from display order
- A strict dependency between ordering for
referencing and display in prior standard - Allow encoder to choose ordering of pictures for
referencing and display purposes with a high
degree of flexibility - Flexibility is constrained by total memory
capability - Removal of restriction enable removing extra
delay associated with bi-predictive coding
22Design feature highlight
- Motion vector over boundaries
- Motion vectors are allowed to point outside
pictures - Especially useful for small picture and camera
movement - Decoupling of picture representation methods from
picture referencing capability - Bi-predictively-encoded pictures could not be
used as references in prior standard - Provide the encoder more flexibility to use a
picture for referencing that is closer to the
picture being coded
23Design feature highlight
- Weighted prediction
- Allow motion-compensated prediction signal to be
weighted and offset by amounts - Improve coding efficiency for scenes containing
fades
one grid means one pixel
24Design feature highlight
- Improved skipped and direct motion inference
- In prior standard ,skipped area of a
predictively-coded picture cant motion in the
scene content ,which is detrimental for global
motion - Infers motion in skipped motion
- For bi-predictively coded areas ,improves further
on prior direct prediction such as H.263 and
MPEG-4.
25Design feature highlight
- Directional spatial prediction for intra coding
- Extrapolating edges of previously decoded parts
of current picture is applied in intra-coded
regions of picture - Improve the quality of the prediction signal
- Allow prediction from neighboring areas that were
not intra-coded
26Design feature highlight
- In-the-loop deblocking filtering
- Block-based video coding produce artifacts known
as blocking artifacts originated from both
prediction and residual difference coding stages
of decoding process - Improvement in quality can be used in
inter-picture prediction to improve the ability
to predict other picture
27Design feature highlight
- In addition to improved prediction methods coding
efficiency is also enhanced, including the
following - Small block-size transform
- All major prior video coding standards used a
transform block size of 8x8 while new ones is
based primarily on 4x4 - Allow the encoder to represent the signal in a
more locally-adaptive fashion and reduce artifact
- Short word-length transform
- Arithmetic processing 32-bit ? 16-bits
28Design feature highlight
- Hierarchical block transform
- Extend the effective block size for low-frequency
chroma to 8x8 array and luma to 16x16 array
29Design feature highlight
- Exact-match inverse transform
- Previously transform was specified within error
tolerance bound due to impracticality of
obtaining exact match to ideal inverse transform - Each decoder would produce slightly different
decoded video, causing drift between encoder
and decoder - Arithmetic entropy coding
- Previously found as an optional feature of H.263
- Use a powerful Context-adaptive binary
arithmetic coding(CABAC)
30Design feature highlight
- Context-adaptive entropy coding
- Both CAVLC (context-adaptive variable length
coding) and CABAC use context-based adaptivity
to improve performance
31Design feature highlight
- Robustness to data errors/losses and flexibility
for operation over variety of network
environments is enable, including the following - Parameter set structure
- Key information was separated for handling in a
more flexible and specialize manner - Provide for robust and efficient conveyance
header information - Flexible slice size
- Rigid slice structure reduce coding efficiency by
increasing the quantity of header data and
decreasing the effectiveness of prediction in
MPEG-2
32Design feature highlight
- NAL unit syntax structure
- Each syntax structure in H.264/AVC is placed into
a logical data packet called a NAL unit - Allow greater customization of the method of
carrying the video content in a manner for each
specific network - Redundant pictures
- Enhance robustness to data loss
- Enable a representation of regions of pictures
for which the primary representation has been lost
33Design feature highlight
- Flexible macroblock ordering (FMO)
- Partition picture into regions called slice
groups, with each slice becoming independently
decodable subset of a slice group - Significantly enhance robustness by managing the
spatial relationship between the regions that are
coded in each slice - Arbitrary slice ordering (ASO)
- Enable sending and receiving the slices of the
picture in any order relative to each other as
found in H.263 - Improve end-to-end delay in real time
applications particularly for out-of-order
delivery behavior
34Design feature highlight
- Data partitioning
- Allow the syntax of each slice to be separated
into up to three different partitions(header
data, Intra-slice, Inter-slice, partition),
depending on a categorization of syntax elements - SP/SI synchronization/switching pictures
- Allow exact synchronization of the decoding
process of some decoder with an ongoing video - Enable switching a decoder between video streams
that use different data rate, recover from data
loss or error - Enable switching between different kind of video
streams, recover from data loss or error
35Design feature highlight
- SP/SI synchronization/switching pictures
36Design feature highlight
- SP/SI synchronization/switching pictures
37Conclusion(II)
- H.264/AVC represents a number of advances in
standard video coding technology in term of both
coding efficiency enhancement and flexibility for
effective use over a board variety of network
types and application domain
38Video Coding Layer
- Pictures, Frames, and Fields
- Picture can represent either an entire frame or a
single field - If two fields of a frame were captured at
different time instants the frame is referred to
as a interlaced frame, otherwise it is referred
to as a progressive frame
39Video Coding Layer
- YCbCr color space and 420 sampling
- Y represents brightness
- Cb?Cr represents color deviates from gray toward
blue and red - Division of the picture into macroblock
- Slices and slice groups
- Slices are a sequence of macroblocks processed in
the order of a raster scan when not using FMO - Some information from other slices maybe needed
to apply the deblocking filter across slice
boundaries
40Video Coding Layer
- Picture may be split into one or more slices
without FMO shown below - FMO modifies the way how pictures are
partitioned into slices and MBs by using slice
groups - Slice group is a set of MBs defined by MB to
slice group map specified by picture parameter
set and some information from slice header
41Video Coding Layer
- Slice group can be partitioned into one or more
slices, such that a slice is a sequence of MBs
within same slice group processed in the order of
raster scan - By using FMO, a picture can be split into many
macroblock scanning patterns such as the below
42Video Coding Layer
- Each slice can be coding using different types
- I slice
- A slice where all MBs are coded using intra
prediction - P slice
- In addition to intra prediction, it can be coded
with inter prediction with at most one
motion-compensated prediction - B slice
- In addition to coding type of P slice, it can be
coded with inter prediction with two
motion-compensated prediction - SP (switching P) slice
- Efficient switching between different pre-coded
pictures - SI (switching I) slice
- Allows exact match of a macroblock in an SP slice
for random access and error recovery
43Video Coding Layer
- If all slices in stream B are P-slices, decoder
wont have correct reference frame, solution is
to code frame as an I-slice like below - I-slice result in a peak in the coded bit rate at
each switching point
44Video Coding Layer
- SP-slices are designed to support switching
without increased bit-rate penalty of I-slices - Unlike normal P-slice, the subtraction occurs
in transform domain
45Video Coding Layer
- A simplified diagram of encoding and decoding
processing for SP-slices A2?B2?AB2 is shown (A
means reconstructed frame)
46Video Coding Layer
- If stream A and B are versions of the same
original sequence coded at different bit-rates
the SP-slice AB2 should be efficient
47Video Coding Layer
- SP-slices is to provide random access and
VCR-like functionalities.(e.g decoder can
fast-forward from A0 directly to frame A10 by
first decoding A0, then decoding SP-slice A0-10) - Second type of switching slice, SI-slice may be
used to switch from one sequence to a completely
different sequence
48Video Coding Layer
- Encoding and decoding process for macroblocks
- All luma and chroma samples of a MB are either
spatially or temporally predicted - Each color component of prediction is subdivided
into 4x4 blocks and is transformed using integer
transform and then be quantized and encoded by
entropy coding methods - The input video signal is split into MBs, the
association of MBs to slice groups and slices is
selected - An efficient parallel processing of MB is
possible when there are various slices in the
picture
49Video Coding Layer
- Encoding and decoding process for macroblocks
- block diagram of VCL for a MB is in the following
50Video Coding Layer
- Adaptive frame/field coding operation
- For regions of moving objects or camera motion,
two adjacent rows show a reduced degree of
dependency in interlaced frames but progressive
frames - To provide high coding efficiency, H.264/AVC
allows the following decisions when coding a
frame - To combine two fields and code them as one single
frame (frame mode) - To not combine the two fields and to code them as
separated coded fields (field mode) - To combine the two fields and compress them as a
single frame, before coding them to split the
pairs of the vertically adjacent MB into pairs of
two fields or frame MB
51Video Coding Layer
- The three options can be made adaptively and the
first two can be is referred to as
picture-adaptive frame/field (PAFF) coding - As a frame is coded as two fields, coded in ways
similar to frame except the following - Motion compensation utilizes reference fields
rather frames - The zig-zag scan is different
- Strong deblocking is not used for filtering
horizontal edges of MB in fields - A frame consists of mixed regions, its efficient
to code the nonmoving regions in frame mode,
moving regions in field mode
52Video Coding Layer
- A frame/field encoding decision can be made
independently for each vertical pairs of MB. The
coding option is referred as macroblock-adaptive
frame/field (MBAFF) coding. The below shows the
MBFAA MB pair concept.
53Video Coding Layer
- An important distinction between PAFF and MBAFF
is that in MBAFF, one field cant use MBs in
other field of the same frame - Sometimes PAFF coding can be more efficient than
MBAFF coding, particularly in the case of rapid
global motion, scene change, intra picture refresh
54Video Coding Layer
- Intra frame prediction
- In all slice coding type Intra_4x4 or intra_16x16
together with chroma prediction and I_PCM
prediction mode - Intra_4x4 mode is based on 4x4 luma block and
suited for significant detail of picture - When using, each 4x4 block is predicted from the
neighboring samples like the below
55Video Coding Layer
- Intra frame prediction
- 4x4 block prediction mode
- Suited to predict textures with structure in the
specified direction except the DC mode
prediction
56Video Coding Layer
- Intra frame prediction
- In earlier draft, the four samples below L were
also used for some prediction modes. They are
dropped due to the need to reduce memory access - Intra modes for neighboring 4x4 block are highly
correlated. For example, if previously-encoded
4x4 blocks A and B were predicted mode 2, its
likely that the best mode for block C is also
mode 2.
57Video Coding Layer
- Intra frame prediction
- Intra_16x16 mode is suited for smooth areas of a
picture - When using this mode, it contains
vertical?horizontal?DC and plane prediction - Plane prediction works well in areas of
smoothly-varying luminance
58Video Coding Layer
- Intra frame prediction
- Chroma of MB is predicted by the similar
prediction as Intra_16x16(the same four modes) - I_PCM mode allows the encoder to bypass the
prediction and transform coding process and
instead directly send the values of the encoded
samples - I_PCM mode server the following purposes
- Allow the encoder to precisely represent the
value of samples - Provide a way to accurately represent the values
of anomalous picture content - Enable placing a hard limit on the number of
bits, decoder must handle for MB without harm to
coding efficiency
59Video Coding Layer
- Intra frame prediction
- Constrained intra coding mode allows prediction
only from intra-coded neighboring MBs - Intra prediction across slice boundaries is not
used - Referring to neighboring samples of
previously-coded blocks may incur error
propagation in environments with transmission
error
60Video Coding Layer
- Inter frame prediction
- In P slices
- Each P MB type is partitioned into partitions
like the below - This method of partitioning MB is known as tree
structure motion compensation
61Video Coding Layer
- Inter frame prediction
- Choosing larger partition size means
- Small number of bits are required to signal the
choice of MV and the type of partition - Motion compensated residual contain a significant
amount of energy in frame areas with high detail - Choosing small partition size means
- Give a lower-energy residual after motion
compensation - Require larger number of bits to signal MV and
type of partition - The accuracy of motion compensation is in units
of one quarter of the distance between two luma
sample
62Video Coding Layer
- Inter frame prediction
- Half-sample values are obtained by applying a
one-dimensional 6-tap FIR filter vertically and
horizontally - 6 tap interpolation filter is relatively complex
but produces more accurate fit to the
integer-sample data and hence better motion
compensation performance - Quarter-sample values are generated by averaging
samples at integer- and half-sample position
63Video Coding Layer
64Video Coding Layer
- The above illustrates the half sample
interpolation
65Video Coding Layer
- Inter frame prediction
- The following illustrates the luma quarter-pel
positions - a round ((Gb)/2)
- d round ((Gh)/2)
- e round ((hb)/2)
66Video Coding Layer
- The prediction for chroma component are obtained
by bilinear interpolation - The displacements used for chroma have one-eighth
sample position accuracy - a round((8-dx)(8-dy)A dx(8-dy)B (8-dx)dyC
dxdyD/64)
67Video Coding Layer
- Inter frame prediction
- Motion prediction using full,half,and one-quarter
sample have improvements than the previous
standards for two reasons - More accurate motion representation
- More flexibility in prediction filtering
- Allows MV over picture boundaries
- No MV prediction takes place across slice
boundaries - Motion compensation for smaller regions than 8x8
use the same reference index for prediction of
all blocks within 8x8
68Video Coding Layer
- Inter frame prediction
- Choice of neighboring partitions of same and
different size are shown below - For transmitted partitions, excluding 16x8 and
8x16 partition sizes MVp is the median of the MV
for partitions A,B,C
69Video Coding Layer
- For 16x8 partitions MVp for the upper 16x8
partition is predicted from B, MVp for the lower
16x8 partition is predicted from A - For 8x16 partitions MVp for the left 8x16
partition is predicted from A, MVp for the right
8x16 partition is predicted from C - For skipped macroblocks a 16x16 vector MVp is
generated as in case(1) above
70Video Coding Layer
- P MB can be coded in P_Skip type useful for large
areas with no change or constant motion like slow
panning can be represented with very few bits - Support multi-picture motion-compensation like
below
71Video Coding Layer
- In B slices
- Intra coding are also supported
- Four other types are supported list 0, list 1,
bi-predictive, and direct prediction - For bi-predictive mode, the prediction signal is
formed by a weighted average of
motion-compensation list 0 and list 1 prediction
signal - The direct mode can be list 0 or list 1
prediction or bi-predictive - Support multi-frame motion compensation
72Video Coding Layer
- Transform, scaling, quantization
- Transform is applied to 4x4 block
- Instead of DCT, a separated integer transform
with similar properties as DCT is used - Inverse transform mismatches are avoided
- At encoder, transform, scanning, scaling, and
rounding as quantization followed by entropy
coding - At decoder, process of inverse encoding is
performed except for the rounding - Inverse transform is implemented using only
additions and bit-shifting operations of 16 bit
73Video Coding Layer
- Several reasons for using smaller size transform
- Remove statistical correlation efficiently
- Have visual benefits resulting in less noise
around edges - Require less computations amd a smaller
processing word-length - Quantization parameter(QP) can take 52 values
- Qstep double in size for every increment of 6 in
QP - With increasing 1 of QP means increasing 12.5
Qstep
74Video Coding Layer
- Wide range of quantizer step size make it
possible for encoder to control the trade-off
between bit rate and quality accurately and
flexibly - The values of QP may be different from luma and
chroma. QPchroma is derived from QPY by
user-defined offset - 4x4 luma DC coefficient and quantization (16x16
intra mode only ) - The DC coefficient of each 4x4 block is
transformed again using 4x4 Hadamard transform - In a intra-coded MB, much energy is concentrated
in the DC coefficients and this extra transform
helps to de-correlate the 4x4 luma DC coefficients
75Video Coding Layer
- 2x2 chroma DC coefficient transform and
quantization, as with Intra luma DC coefficients,
the extra transform help to de-correlate the 2x2
chroma DC coefficients and improve compression
performance - The complete process
- Encoding
- Input 4x4 residual samples
- Forward core transform
- ( followed by forward transform for Chroma DC or
Intra-16 Luma DC coefficients) - Post-scaling and quantization
- ( modified for Chroma DC or Intra-16 Luma DC)
76Video Coding Layer
- Decoding
- ( inverse transform for chroma DC or intra-16
luma DC coefficient ) - Re-scaling ( incorporating inverse transform
pre-scaling ) - (modified for chroma DC or Intra-16 Luma DC
coefficients) - Inverse core transform
-
- Post-scaling
- Output4x4 residual samples
77Video Coding Layer
- Flow chart
- An additional 2x2 transform is also applied to DC
coefficients of the four 4x4 blocks of chroma
78Video Coding Layer
- Entropy coding
- Simpler method use a single infinite-extent
codeword table for all syntax elements except
residual - mapping of codeword table is customized according
to data statistics - Codeword table chosen is an exp-Golomb code with
simple and regular decoding property - In CAVLC, VLC tables for various syntax elements
are switched depending on already transmitted
syntax elements - In CAVLC, number of non-zero quantized
coefficient and actual size and position of the
coefficients are coded separately
79Video Coding Layer
- Entropy coding
- VLC tables are designed to match the
corresponding conditioned statistics - CAVLC encoding of a block of transform
coefficients proceeds as follows - Encode number of non-zero coefficients and
trailing 1s - Encode total number of non-zero
coefficients(TotalCoeffs) and trailing /-1
values(T1) ? coeff_token - TotalCoeffs016 ,T103
- There are 4 look-up tables for coeff_token (3 VLC
and 1 FLC) - Encode the sign of each T1
- Coded in reverse order, starting with
highest-frequency
80Video Coding Layer
- Entropy coding
- Encoding levels of remaining non-zero
coefficients - Coded in reverse order
- There are 7 VLC tables to choose from
- Choice of table adapts depending on magnitude of
coded level - Encode total number of zeros before last
coefficient - TotalZeros is sum of all zeros preceding the
highest non-zero coefficient in the reorder array - Coded with a VLC
- Encode each run of zeros
- Encoded in reverse order
- Chosen depending on ZerosLeft?run_before
81Video Coding Layer
82Video Coding Layer
83Video Coding Layer
84Video Coding Layer
- In CABAC, it allows assignment of a non-integer
number of bits to each symbol of an alphabet - Usage of adaptive codes permits adaptation to
non-stationary symbol statistics - Statistics of already coded syntax elements are
used to estimate conditional probabilities used
for switching several estimated models - Arithmetic coding core engine and its associated
probability estimation are specified as
multiplication-free low complexity methods using
only shift and table look-ups
85Video Coding Layer
- Coding a data symbol involves the following
stages (take MVDx) - Binarization
- For MVDXlt9 its carried out by following table,
larger values are by Exp-Golomb codeword - the first bit is bin 1,second bit is bin 2
86Video Coding Layer
- Coding a data symbol involves the following
stages (take MVDx) - Context model selection
- Its by following table
- Arithmetic encoding
- Selected context model supplies two probability
estimates (1 and 0) to determine sub-range the
arithmetic coder uses
87Video Coding Layer
- Coding a data symbol involves the following
stages (take MVDx) - Probability update
- The value of bin 1 is 0, the frequency count of
0 is incremented
88Video Coding Layer
- In-loop deblocking filter
- Applied between inverse transform and
reconstruction of MB - Particular characteristics of block-based coding
is the accidental production of visible block
structures - Block edges are reconstructed with less accuracy
than interior pixels and blocking is most
visible artifacts - It has two benefits
- Block edges are smoothed
- Resulting in a smaller residuals after prediction
- In adaptive filter, strength of filtering is
controlled by several syntax elements
89Video Coding Layer
- In-loop deblocking filter
- Basic idea is that if a relatively larger
absolute difference between samples near a block
edge is measured , it is quite likely a blocking
artifact and should be reduced - If magnitude of difference is large and cant be
explained by coarse quantization, its likely
actual behavior of picture - Filtering is applied 4x4 block
90Video Coding Layer
- In-loop deblocking filter
- Filtering is applied 4x4 block
- Choice of filtering outcome depends on boundary
strength and gradient of image across boundary
91Video Coding Layer
- In-loop de-blocking filter
- Boundary strength Bs is chosen according to
following table - Filter implementation
- Bs ?1,2,3a 4-tap linear filter is applied
- Bs ?4 3?4?5-tap linear filter may be used
92Video Coding Layer
- Below shows principle using one dimensional edge
- Samples p0 and q0 as well as p1 and q1 are
filtered is determined using quantization
parameter (QP) dependent thresholds a(QP) and
ß(QP), ß(QP) is smaller than a(QP)
93Video Coding Layer
- Filtering of p0 and q0 takes place if each of the
below is satisfied - 1. p0 q0 lt a(QP)
- 2. p1 p0 lt ß(QP)
- 3. q1 q0 lt ß(QP)
- Filtering of p1 and q1 takes place if the below
is satisfied - 1. p2 p0 lt ß(QP)
- or 2. q2 q0 lt ß(QP)
94Video Coding Layer
Foreman.cif 30 Hz
Foreman.qcif 10 Hz
95Video Coding Layer
- Hypothetical reference decoder (HRD)
- For a standard, its not sufficient to provide a
coding algorithm - Its important in real-time system to specify how
bits are fed to a decoder and how the decoded
pictures are removed from decoder - Specifying input and output buffer models and
developing an implementation independent model of
receiver called HRD - Specifies operation of two buffers
- Coded picture buffer (CPB)
- Decoded picture buffer (DPB)
96Video Coding Layer
- CPB models arrival and removal time of the coded
bits - HRD is more flexible in support of sending video
at variety of bit rates without excessive delay - HRD specifies DPB model management to ensure that
excessive memory capability is not needed
97Profile and potential application
- Profiles
- Three profiles are defined, which are Baseline,
Main, and Extended profiles. - Baseline support all features except the
following - B slice, weighted prediction, CABAC, field
coding, and picture or MB adaptive switching
between frame/field coding - SP/SI slices, and slices data partition
- Main profile supports first set of above but FMO,
ASO, and redundant pictures - Extended profile supports all features of
baseline and the above both set except for CABAC
98Profile and potential application
- Areas for profiles of new standard to be used
- A list of possible application areas is list
below - Conversational services
- H.323 conversational video services that utilize
circuitswitched ISDN-based video conference - H.323 conversational services over internet with
best effort IP/RTP protocols - Entertainment video applications
- Broadcast via satellite, cable or DSL
- DVD for standard
- VOD(video on demand) via various channels
99Profile and potential application
- Streaming services
- 3GPP streaming using IP/RTP for transport and
RSTP for session setup - Streaming over wired Internet using IP/RTP
protocol and RTSP for session - Other services
- 3GPP multimedia messaging services
- Video mail
100Conclusion(III)
- Its VCL design is based on convectional
block-based hybrid video coding concepts, but
with some differences relative to prior standard,
they are illustrated below - Enhanced motion-prediction capability
- Use of a small block-size exact-match transform
- Adaptive in-loop de-blocking filter
- Enhanced entropy coding methods