Overview of H.264/AVC - PowerPoint PPT Presentation

About This Presentation
Title:

Overview of H.264/AVC

Description:

Slices are a sequence of macroblocks processed in the order of a raster scan when not using FMO ... same slice group processed in the order of raster scan ... – PowerPoint PPT presentation

Number of Views:196
Avg rating:3.0/5.0
Slides: 101
Provided by: vap
Category:
Tags: avc | overview | raster

less

Transcript and Presenter's Notes

Title: Overview of H.264/AVC


1
Overview of H.264/AVC
  • 2003.9.x
  • M.K.Tsai

2
Outline
  • Abstract
  • Applications
  • Network Abstraction Layer,NAL
  • Conclusion(I)
  • Design feature highlight
  • Conclusion(II)
  • Video Coding Layer,VCL
  • Profile and potential application
  • Conclusion(III)

3
abstract
  • H.264/AVC is newest video coding standard
  • Main goals have been enhanced compression and
    provision of network-friendly representation
    addressing conversational(video telephony) and
    nonconversational (storage,broadcast, or
    streaming) application
  • H.264/AVC have achieved a significant improvement
    in rate-distortion efficiency
  • Scope of standardization is illustrated below

4
applications
  • Broadcast over cable, cable modem
  • Interactive or serial storage on optical and DVD
  • Conversational service over LAN, modem
  • Video-on-demand or streaming service over
    ISDN,wireless network
  • Multimedia message service (MMS) over DSL, mobile
    network
  • How to handle the variety of applications and
    networks ?

5
applications
  • To address this need for flexibility and
    customizability, the H.264/AVC design VCL and
    NAL, structure of H.264/AVC encoder is shown below

6
applications
  • VCL(video coding layer), designed to efficiently
    represent video content
  • NAL(network abstraction layer), formats the VCL
    representation of the video and provides header
    information in a manner appropriate for
    conveyance by a variety of transport layers or
    storage media

7
Network Abstraction Layer
  • To provide network friendliness to enable
    simple and effective customization of the use of
    the VCL
  • To facilitate the ability to map H.264/AVC data
    to transport layers such as
  • RTP/IP for kind of real-time Internet services
  • File formats,ISO MP4 for storage
  • H.32X for conversational services
  • MPEG-2 systems for broadcasting services
  • The design of the NAL anticipates a variety of
    such mappings

8
Network Abstraction Layer
  • Some key concepts of the NAL are NAL units, byte
    stream, and packet format uses of NAL units,
    parameter sets and access units
  • NAL units
  • a packet that contains an integer number of bytes
  • First byte is header byte containing indication
    of type of data
  • Remaining byte contains payload data
  • Payload data is interleaved as necessary with
    emulation prevention bytes, preventing start code
    prefix from being generated inside payload
  • Specifies a format for use in both packet- and
    bitstream- oriented transport system

9
Network Abstraction Layer
  • NAL units in Byte-Stream format use
  • byte stream format
  • Each is prefixed by a unique start code to
    identify the boundary
  • Some systems require delivery of NAL unit stream
    as ordered stream of bytes (like H.320 and
    MPEG-2/H.220)
  • NAL units in packet-transport system use
  • Coded data is carried in packets framed by system
    transport protocol
  • Can be carried by data packets without start code
    prefix
  • In such system, inclusion of start code prefixes
    in data would be waste

10
Network Abstraction Layer
  • VCL and Non-VCL NAL units
  • VCL NAL units contain data represents the values
    of the samples in video pictures
  • Non- VCL NAL units contain extra data like
    parameter sets and supplemental enhancement
    information (SEI)
  • parameter sets, important header data applying to
    large number of VCL NAL units
  • SEI, timing information and other supplemental
    data enhancing usability of decoded video signal
    but not necessary for decoding the values in the
    picture

11
Network Abstraction Layer
  • Parameter sets
  • Contain information expected to rarely change and
    offers the decoding of a large number of VCL NAL
    units
  • Divided into two types
  • Sequence parameter sets, apply to series of
    consecutive coded video picture
  • Picture parameter sets, apply to the decoding of
    one or more individual picture within a coded
    video sequence
  • The above two mechanisms decouple transmission of
    infrequently changing information
  • Can be sent well ahead of the VCL NAL units and
    repeated to provide robustness against data loss

12
Network Abstraction Layer
  • Parameter sets
  • Can be sent well ahead of the VCL NAL units and
    repeated to provide robustness against data loss
  • Small amount of data can be used (identifier) to
    refer to a larger amount of of information
    (parameter set)
  • In some applications, these may be sent within
    the channel (termed in-band transmission)

13
Network Abstraction Layer
  • Parameter sets
  • In other applications, it can be advantageous to
    convey parameters sets out of band using
    reliable transport mechanism

14
Network Abstraction Layer
  • Access units
  • The format of access unit is shown below

15
Network Abstraction Layer
  • Access units
  • Contains a set of VCL NAL units to compose a
    primary coded picture
  • Prefixed with an access unit delimiter to aid in
    locating the start of the access unit
  • SEI contains data such as picture timing
    information
  • Primary coded data consists of VCL NAL units
    consisting of slices that represent the sample of
    the video
  • Redundant coded picture are available for use by
    decoder in recovering from loss of data

16
Network Abstraction Layer
  • Access units
  • For the last coded picture of video sequence, end
    of sequence NAL unit is present to indicate the
    end of sequence
  • For the last coded picture in the entire NAL unit
    stream, end of stream NAL unit is present to
    indicate the stream is ending
  • Decoder are not required to decode redundant
    coded pictures if they are present
  • Decoding of each access unit results in one
    decoded picture

17
Network Abstraction Layer
  • Coded video sequences
  • Consists of a series of access unit and use only
    one sequence parameter set
  • Can be decoded independently of other coded video
    sequence ,given necessary parameter set
  • Instantaneous decoding refresh(IDR) access unit
    is at the beginning and contains intra picture
  • Presence of IDR access unit indicates that no
    subsequent picture will reference to picture
    prior to intra picture

18
Conclusion(I)
  • H.264/AVC represents a number of advances in
    standard video coding technology in term of
    flexibility for effective use over a broad
    variety of network types and application domain

19
Design feature highlight
  • Variable block-size motion compensation with
    small block size
  • With minimum luma block size as small as 4x4
  • The matching chroma is half the length and width

20
Design feature highlight
  • Quarter-sample-accurate motion compensation
  • Half-pixel is generated by using 6 tap FIR filter
  • As first found in advanced profile of MPEG-4, but
    further reduces the complexity
  • Multiple reference picture motion compensation
  • Extends upon enhanced technique found in H.263
  • Select among large numbers of pictures decoded
    and stored in the decoder for pre-prediction
  • Same for bi-prediction which is restricted in
    MPEG-2

21
Design feature highlight
  • Decoupling of reference order from display order
  • A strict dependency between ordering for
    referencing and display in prior standard
  • Allow encoder to choose ordering of pictures for
    referencing and display purposes with a high
    degree of flexibility
  • Flexibility is constrained by total memory
    capability
  • Removal of restriction enable removing extra
    delay associated with bi-predictive coding

22
Design feature highlight
  • Motion vector over boundaries
  • Motion vectors are allowed to point outside
    pictures
  • Especially useful for small picture and camera
    movement
  • Decoupling of picture representation methods from
    picture referencing capability
  • Bi-predictively-encoded pictures could not be
    used as references in prior standard
  • Provide the encoder more flexibility to use a
    picture for referencing that is closer to the
    picture being coded

23
Design feature highlight
  • Weighted prediction
  • Allow motion-compensated prediction signal to be
    weighted and offset by amounts
  • Improve coding efficiency for scenes containing
    fades

one grid means one pixel
24
Design feature highlight
  • Improved skipped and direct motion inference
  • In prior standard ,skipped area of a
    predictively-coded picture cant motion in the
    scene content ,which is detrimental for global
    motion
  • Infers motion in skipped motion
  • For bi-predictively coded areas ,improves further
    on prior direct prediction such as H.263 and
    MPEG-4.

25
Design feature highlight
  • Directional spatial prediction for intra coding
  • Extrapolating edges of previously decoded parts
    of current picture is applied in intra-coded
    regions of picture
  • Improve the quality of the prediction signal
  • Allow prediction from neighboring areas that were
    not intra-coded

26
Design feature highlight
  • In-the-loop deblocking filtering
  • Block-based video coding produce artifacts known
    as blocking artifacts originated from both
    prediction and residual difference coding stages
    of decoding process
  • Improvement in quality can be used in
    inter-picture prediction to improve the ability
    to predict other picture

27
Design feature highlight
  • In addition to improved prediction methods coding
    efficiency is also enhanced, including the
    following
  • Small block-size transform
  • All major prior video coding standards used a
    transform block size of 8x8 while new ones is
    based primarily on 4x4
  • Allow the encoder to represent the signal in a
    more locally-adaptive fashion and reduce artifact
  • Short word-length transform
  • Arithmetic processing 32-bit ? 16-bits

28
Design feature highlight
  • Hierarchical block transform
  • Extend the effective block size for low-frequency
    chroma to 8x8 array and luma to 16x16 array

29
Design feature highlight
  • Exact-match inverse transform
  • Previously transform was specified within error
    tolerance bound due to impracticality of
    obtaining exact match to ideal inverse transform
  • Each decoder would produce slightly different
    decoded video, causing drift between encoder
    and decoder
  • Arithmetic entropy coding
  • Previously found as an optional feature of H.263
  • Use a powerful Context-adaptive binary
    arithmetic coding(CABAC)

30
Design feature highlight
  • Context-adaptive entropy coding
  • Both CAVLC (context-adaptive variable length
    coding) and CABAC use context-based adaptivity
    to improve performance

31
Design feature highlight
  • Robustness to data errors/losses and flexibility
    for operation over variety of network
    environments is enable, including the following
  • Parameter set structure
  • Key information was separated for handling in a
    more flexible and specialize manner
  • Provide for robust and efficient conveyance
    header information
  • Flexible slice size
  • Rigid slice structure reduce coding efficiency by
    increasing the quantity of header data and
    decreasing the effectiveness of prediction in
    MPEG-2

32
Design feature highlight
  • NAL unit syntax structure
  • Each syntax structure in H.264/AVC is placed into
    a logical data packet called a NAL unit
  • Allow greater customization of the method of
    carrying the video content in a manner for each
    specific network
  • Redundant pictures
  • Enhance robustness to data loss
  • Enable a representation of regions of pictures
    for which the primary representation has been lost

33
Design feature highlight
  • Flexible macroblock ordering (FMO)
  • Partition picture into regions called slice
    groups, with each slice becoming independently
    decodable subset of a slice group
  • Significantly enhance robustness by managing the
    spatial relationship between the regions that are
    coded in each slice
  • Arbitrary slice ordering (ASO)
  • Enable sending and receiving the slices of the
    picture in any order relative to each other as
    found in H.263
  • Improve end-to-end delay in real time
    applications particularly for out-of-order
    delivery behavior

34
Design feature highlight
  • Data partitioning
  • Allow the syntax of each slice to be separated
    into up to three different partitions(header
    data, Intra-slice, Inter-slice, partition),
    depending on a categorization of syntax elements
  • SP/SI synchronization/switching pictures
  • Allow exact synchronization of the decoding
    process of some decoder with an ongoing video
  • Enable switching a decoder between video streams
    that use different data rate, recover from data
    loss or error
  • Enable switching between different kind of video
    streams, recover from data loss or error

35
Design feature highlight
  • SP/SI synchronization/switching pictures

36
Design feature highlight
  • SP/SI synchronization/switching pictures

37
Conclusion(II)
  • H.264/AVC represents a number of advances in
    standard video coding technology in term of both
    coding efficiency enhancement and flexibility for
    effective use over a board variety of network
    types and application domain

38
Video Coding Layer
  • Pictures, Frames, and Fields
  • Picture can represent either an entire frame or a
    single field
  • If two fields of a frame were captured at
    different time instants the frame is referred to
    as a interlaced frame, otherwise it is referred
    to as a progressive frame

39
Video Coding Layer
  • YCbCr color space and 420 sampling
  • Y represents brightness
  • Cb?Cr represents color deviates from gray toward
    blue and red
  • Division of the picture into macroblock
  • Slices and slice groups
  • Slices are a sequence of macroblocks processed in
    the order of a raster scan when not using FMO
  • Some information from other slices maybe needed
    to apply the deblocking filter across slice
    boundaries

40
Video Coding Layer
  • Picture may be split into one or more slices
    without FMO shown below
  • FMO modifies the way how pictures are
    partitioned into slices and MBs by using slice
    groups
  • Slice group is a set of MBs defined by MB to
    slice group map specified by picture parameter
    set and some information from slice header

41
Video Coding Layer
  • Slice group can be partitioned into one or more
    slices, such that a slice is a sequence of MBs
    within same slice group processed in the order of
    raster scan
  • By using FMO, a picture can be split into many
    macroblock scanning patterns such as the below

42
Video Coding Layer
  • Each slice can be coding using different types
  • I slice
  • A slice where all MBs are coded using intra
    prediction
  • P slice
  • In addition to intra prediction, it can be coded
    with inter prediction with at most one
    motion-compensated prediction
  • B slice
  • In addition to coding type of P slice, it can be
    coded with inter prediction with two
    motion-compensated prediction
  • SP (switching P) slice
  • Efficient switching between different pre-coded
    pictures
  • SI (switching I) slice
  • Allows exact match of a macroblock in an SP slice
    for random access and error recovery

43
Video Coding Layer
  • If all slices in stream B are P-slices, decoder
    wont have correct reference frame, solution is
    to code frame as an I-slice like below
  • I-slice result in a peak in the coded bit rate at
    each switching point

44
Video Coding Layer
  • SP-slices are designed to support switching
    without increased bit-rate penalty of I-slices
  • Unlike normal P-slice, the subtraction occurs
    in transform domain

45
Video Coding Layer
  • A simplified diagram of encoding and decoding
    processing for SP-slices A2?B2?AB2 is shown (A
    means reconstructed frame)

46
Video Coding Layer
  • If stream A and B are versions of the same
    original sequence coded at different bit-rates
    the SP-slice AB2 should be efficient

47
Video Coding Layer
  • SP-slices is to provide random access and
    VCR-like functionalities.(e.g decoder can
    fast-forward from A0 directly to frame A10 by
    first decoding A0, then decoding SP-slice A0-10)
  • Second type of switching slice, SI-slice may be
    used to switch from one sequence to a completely
    different sequence

48
Video Coding Layer
  • Encoding and decoding process for macroblocks
  • All luma and chroma samples of a MB are either
    spatially or temporally predicted
  • Each color component of prediction is subdivided
    into 4x4 blocks and is transformed using integer
    transform and then be quantized and encoded by
    entropy coding methods
  • The input video signal is split into MBs, the
    association of MBs to slice groups and slices is
    selected
  • An efficient parallel processing of MB is
    possible when there are various slices in the
    picture

49
Video Coding Layer
  • Encoding and decoding process for macroblocks
  • block diagram of VCL for a MB is in the following

50
Video Coding Layer
  • Adaptive frame/field coding operation
  • For regions of moving objects or camera motion,
    two adjacent rows show a reduced degree of
    dependency in interlaced frames but progressive
    frames
  • To provide high coding efficiency, H.264/AVC
    allows the following decisions when coding a
    frame
  • To combine two fields and code them as one single
    frame (frame mode)
  • To not combine the two fields and to code them as
    separated coded fields (field mode)
  • To combine the two fields and compress them as a
    single frame, before coding them to split the
    pairs of the vertically adjacent MB into pairs of
    two fields or frame MB

51
Video Coding Layer
  • The three options can be made adaptively and the
    first two can be is referred to as
    picture-adaptive frame/field (PAFF) coding
  • As a frame is coded as two fields, coded in ways
    similar to frame except the following
  • Motion compensation utilizes reference fields
    rather frames
  • The zig-zag scan is different
  • Strong deblocking is not used for filtering
    horizontal edges of MB in fields
  • A frame consists of mixed regions, its efficient
    to code the nonmoving regions in frame mode,
    moving regions in field mode

52
Video Coding Layer
  • A frame/field encoding decision can be made
    independently for each vertical pairs of MB. The
    coding option is referred as macroblock-adaptive
    frame/field (MBAFF) coding. The below shows the
    MBFAA MB pair concept.

53
Video Coding Layer
  • An important distinction between PAFF and MBAFF
    is that in MBAFF, one field cant use MBs in
    other field of the same frame
  • Sometimes PAFF coding can be more efficient than
    MBAFF coding, particularly in the case of rapid
    global motion, scene change, intra picture refresh

54
Video Coding Layer
  • Intra frame prediction
  • In all slice coding type Intra_4x4 or intra_16x16
    together with chroma prediction and I_PCM
    prediction mode
  • Intra_4x4 mode is based on 4x4 luma block and
    suited for significant detail of picture
  • When using, each 4x4 block is predicted from the
    neighboring samples like the below

55
Video Coding Layer
  • Intra frame prediction
  • 4x4 block prediction mode
  • Suited to predict textures with structure in the
    specified direction except the DC mode
    prediction

56
Video Coding Layer
  • Intra frame prediction
  • In earlier draft, the four samples below L were
    also used for some prediction modes. They are
    dropped due to the need to reduce memory access
  • Intra modes for neighboring 4x4 block are highly
    correlated. For example, if previously-encoded
    4x4 blocks A and B were predicted mode 2, its
    likely that the best mode for block C is also
    mode 2.

57
Video Coding Layer
  • Intra frame prediction
  • Intra_16x16 mode is suited for smooth areas of a
    picture
  • When using this mode, it contains
    vertical?horizontal?DC and plane prediction
  • Plane prediction works well in areas of
    smoothly-varying luminance

58
Video Coding Layer
  • Intra frame prediction
  • Chroma of MB is predicted by the similar
    prediction as Intra_16x16(the same four modes)
  • I_PCM mode allows the encoder to bypass the
    prediction and transform coding process and
    instead directly send the values of the encoded
    samples
  • I_PCM mode server the following purposes
  • Allow the encoder to precisely represent the
    value of samples
  • Provide a way to accurately represent the values
    of anomalous picture content
  • Enable placing a hard limit on the number of
    bits, decoder must handle for MB without harm to
    coding efficiency

59
Video Coding Layer
  • Intra frame prediction
  • Constrained intra coding mode allows prediction
    only from intra-coded neighboring MBs
  • Intra prediction across slice boundaries is not
    used
  • Referring to neighboring samples of
    previously-coded blocks may incur error
    propagation in environments with transmission
    error

60
Video Coding Layer
  • Inter frame prediction
  • In P slices
  • Each P MB type is partitioned into partitions
    like the below
  • This method of partitioning MB is known as tree
    structure motion compensation

61
Video Coding Layer
  • Inter frame prediction
  • Choosing larger partition size means
  • Small number of bits are required to signal the
    choice of MV and the type of partition
  • Motion compensated residual contain a significant
    amount of energy in frame areas with high detail
  • Choosing small partition size means
  • Give a lower-energy residual after motion
    compensation
  • Require larger number of bits to signal MV and
    type of partition
  • The accuracy of motion compensation is in units
    of one quarter of the distance between two luma
    sample

62
Video Coding Layer
  • Inter frame prediction
  • Half-sample values are obtained by applying a
    one-dimensional 6-tap FIR filter vertically and
    horizontally
  • 6 tap interpolation filter is relatively complex
    but produces more accurate fit to the
    integer-sample data and hence better motion
    compensation performance
  • Quarter-sample values are generated by averaging
    samples at integer- and half-sample position

63
Video Coding Layer
64
Video Coding Layer
  • The above illustrates the half sample
    interpolation

65
Video Coding Layer
  • Inter frame prediction
  • The following illustrates the luma quarter-pel
    positions
  • a round ((Gb)/2)
  • d round ((Gh)/2)
  • e round ((hb)/2)

66
Video Coding Layer
  • The prediction for chroma component are obtained
    by bilinear interpolation
  • The displacements used for chroma have one-eighth
    sample position accuracy
  • a round((8-dx)(8-dy)A dx(8-dy)B (8-dx)dyC
    dxdyD/64)

67
Video Coding Layer
  • Inter frame prediction
  • Motion prediction using full,half,and one-quarter
    sample have improvements than the previous
    standards for two reasons
  • More accurate motion representation
  • More flexibility in prediction filtering
  • Allows MV over picture boundaries
  • No MV prediction takes place across slice
    boundaries
  • Motion compensation for smaller regions than 8x8
    use the same reference index for prediction of
    all blocks within 8x8

68
Video Coding Layer
  • Inter frame prediction
  • Choice of neighboring partitions of same and
    different size are shown below
  • For transmitted partitions, excluding 16x8 and
    8x16 partition sizes MVp is the median of the MV
    for partitions A,B,C

69
Video Coding Layer
  • For 16x8 partitions MVp for the upper 16x8
    partition is predicted from B, MVp for the lower
    16x8 partition is predicted from A
  • For 8x16 partitions MVp for the left 8x16
    partition is predicted from A, MVp for the right
    8x16 partition is predicted from C
  • For skipped macroblocks a 16x16 vector MVp is
    generated as in case(1) above

70
Video Coding Layer
  • P MB can be coded in P_Skip type useful for large
    areas with no change or constant motion like slow
    panning can be represented with very few bits
  • Support multi-picture motion-compensation like
    below

71
Video Coding Layer
  • In B slices
  • Intra coding are also supported
  • Four other types are supported list 0, list 1,
    bi-predictive, and direct prediction
  • For bi-predictive mode, the prediction signal is
    formed by a weighted average of
    motion-compensation list 0 and list 1 prediction
    signal
  • The direct mode can be list 0 or list 1
    prediction or bi-predictive
  • Support multi-frame motion compensation

72
Video Coding Layer
  • Transform, scaling, quantization
  • Transform is applied to 4x4 block
  • Instead of DCT, a separated integer transform
    with similar properties as DCT is used
  • Inverse transform mismatches are avoided
  • At encoder, transform, scanning, scaling, and
    rounding as quantization followed by entropy
    coding
  • At decoder, process of inverse encoding is
    performed except for the rounding
  • Inverse transform is implemented using only
    additions and bit-shifting operations of 16 bit

73
Video Coding Layer
  • Several reasons for using smaller size transform
  • Remove statistical correlation efficiently
  • Have visual benefits resulting in less noise
    around edges
  • Require less computations amd a smaller
    processing word-length
  • Quantization parameter(QP) can take 52 values
  • Qstep double in size for every increment of 6 in
    QP
  • With increasing 1 of QP means increasing 12.5
    Qstep

74
Video Coding Layer
  • Wide range of quantizer step size make it
    possible for encoder to control the trade-off
    between bit rate and quality accurately and
    flexibly
  • The values of QP may be different from luma and
    chroma. QPchroma is derived from QPY by
    user-defined offset
  • 4x4 luma DC coefficient and quantization (16x16
    intra mode only )
  • The DC coefficient of each 4x4 block is
    transformed again using 4x4 Hadamard transform
  • In a intra-coded MB, much energy is concentrated
    in the DC coefficients and this extra transform
    helps to de-correlate the 4x4 luma DC coefficients

75
Video Coding Layer
  • 2x2 chroma DC coefficient transform and
    quantization, as with Intra luma DC coefficients,
    the extra transform help to de-correlate the 2x2
    chroma DC coefficients and improve compression
    performance
  • The complete process
  • Encoding
  • Input 4x4 residual samples
  • Forward core transform
  • ( followed by forward transform for Chroma DC or
    Intra-16 Luma DC coefficients)
  • Post-scaling and quantization
  • ( modified for Chroma DC or Intra-16 Luma DC)

76
Video Coding Layer
  • Decoding
  • ( inverse transform for chroma DC or intra-16
    luma DC coefficient )
  • Re-scaling ( incorporating inverse transform
    pre-scaling )
  • (modified for chroma DC or Intra-16 Luma DC
    coefficients)
  • Inverse core transform
  • Post-scaling
  • Output4x4 residual samples

77
Video Coding Layer
  • Flow chart
  • An additional 2x2 transform is also applied to DC
    coefficients of the four 4x4 blocks of chroma

78
Video Coding Layer
  • Entropy coding
  • Simpler method use a single infinite-extent
    codeword table for all syntax elements except
    residual
  • mapping of codeword table is customized according
    to data statistics
  • Codeword table chosen is an exp-Golomb code with
    simple and regular decoding property
  • In CAVLC, VLC tables for various syntax elements
    are switched depending on already transmitted
    syntax elements
  • In CAVLC, number of non-zero quantized
    coefficient and actual size and position of the
    coefficients are coded separately

79
Video Coding Layer
  • Entropy coding
  • VLC tables are designed to match the
    corresponding conditioned statistics
  • CAVLC encoding of a block of transform
    coefficients proceeds as follows
  • Encode number of non-zero coefficients and
    trailing 1s
  • Encode total number of non-zero
    coefficients(TotalCoeffs) and trailing /-1
    values(T1) ? coeff_token
  • TotalCoeffs016 ,T103
  • There are 4 look-up tables for coeff_token (3 VLC
    and 1 FLC)
  • Encode the sign of each T1
  • Coded in reverse order, starting with
    highest-frequency

80
Video Coding Layer
  • Entropy coding
  • Encoding levels of remaining non-zero
    coefficients
  • Coded in reverse order
  • There are 7 VLC tables to choose from
  • Choice of table adapts depending on magnitude of
    coded level
  • Encode total number of zeros before last
    coefficient
  • TotalZeros is sum of all zeros preceding the
    highest non-zero coefficient in the reorder array
  • Coded with a VLC
  • Encode each run of zeros
  • Encoded in reverse order
  • Chosen depending on ZerosLeft?run_before

81
Video Coding Layer
  • Entropy coding
  • example

82
Video Coding Layer
  • Entropy coding
  • example

83
Video Coding Layer
  • Entropy coding
  • example

84
Video Coding Layer
  • In CABAC, it allows assignment of a non-integer
    number of bits to each symbol of an alphabet
  • Usage of adaptive codes permits adaptation to
    non-stationary symbol statistics
  • Statistics of already coded syntax elements are
    used to estimate conditional probabilities used
    for switching several estimated models
  • Arithmetic coding core engine and its associated
    probability estimation are specified as
    multiplication-free low complexity methods using
    only shift and table look-ups

85
Video Coding Layer
  • Coding a data symbol involves the following
    stages (take MVDx)
  • Binarization
  • For MVDXlt9 its carried out by following table,
    larger values are by Exp-Golomb codeword
  • the first bit is bin 1,second bit is bin 2

86
Video Coding Layer
  • Coding a data symbol involves the following
    stages (take MVDx)
  • Context model selection
  • Its by following table
  • Arithmetic encoding
  • Selected context model supplies two probability
    estimates (1 and 0) to determine sub-range the
    arithmetic coder uses

87
Video Coding Layer
  • Coding a data symbol involves the following
    stages (take MVDx)
  • Probability update
  • The value of bin 1 is 0, the frequency count of
    0 is incremented

88
Video Coding Layer
  • In-loop deblocking filter
  • Applied between inverse transform and
    reconstruction of MB
  • Particular characteristics of block-based coding
    is the accidental production of visible block
    structures
  • Block edges are reconstructed with less accuracy
    than interior pixels and blocking is most
    visible artifacts
  • It has two benefits
  • Block edges are smoothed
  • Resulting in a smaller residuals after prediction
  • In adaptive filter, strength of filtering is
    controlled by several syntax elements

89
Video Coding Layer
  • In-loop deblocking filter
  • Basic idea is that if a relatively larger
    absolute difference between samples near a block
    edge is measured , it is quite likely a blocking
    artifact and should be reduced
  • If magnitude of difference is large and cant be
    explained by coarse quantization, its likely
    actual behavior of picture
  • Filtering is applied 4x4 block

90
Video Coding Layer
  • In-loop deblocking filter
  • Filtering is applied 4x4 block
  • Choice of filtering outcome depends on boundary
    strength and gradient of image across boundary

91
Video Coding Layer
  • In-loop de-blocking filter
  • Boundary strength Bs is chosen according to
    following table
  • Filter implementation
  • Bs ?1,2,3a 4-tap linear filter is applied
  • Bs ?4 3?4?5-tap linear filter may be used

92
Video Coding Layer
  • Below shows principle using one dimensional edge
  • Samples p0 and q0 as well as p1 and q1 are
    filtered is determined using quantization
    parameter (QP) dependent thresholds a(QP) and
    ß(QP), ß(QP) is smaller than a(QP)

93
Video Coding Layer
  • Filtering of p0 and q0 takes place if each of the
    below is satisfied
  • 1. p0 q0 lt a(QP)
  • 2. p1 p0 lt ß(QP)
  • 3. q1 q0 lt ß(QP)
  • Filtering of p1 and q1 takes place if the below
    is satisfied
  • 1. p2 p0 lt ß(QP)
  • or 2. q2 q0 lt ß(QP)

94
Video Coding Layer
Foreman.cif 30 Hz
Foreman.qcif 10 Hz
95
Video Coding Layer
  • Hypothetical reference decoder (HRD)
  • For a standard, its not sufficient to provide a
    coding algorithm
  • Its important in real-time system to specify how
    bits are fed to a decoder and how the decoded
    pictures are removed from decoder
  • Specifying input and output buffer models and
    developing an implementation independent model of
    receiver called HRD
  • Specifies operation of two buffers
  • Coded picture buffer (CPB)
  • Decoded picture buffer (DPB)

96
Video Coding Layer
  • CPB models arrival and removal time of the coded
    bits
  • HRD is more flexible in support of sending video
    at variety of bit rates without excessive delay
  • HRD specifies DPB model management to ensure that
    excessive memory capability is not needed

97
Profile and potential application
  • Profiles
  • Three profiles are defined, which are Baseline,
    Main, and Extended profiles.
  • Baseline support all features except the
    following
  • B slice, weighted prediction, CABAC, field
    coding, and picture or MB adaptive switching
    between frame/field coding
  • SP/SI slices, and slices data partition
  • Main profile supports first set of above but FMO,
    ASO, and redundant pictures
  • Extended profile supports all features of
    baseline and the above both set except for CABAC

98
Profile and potential application
  • Areas for profiles of new standard to be used
  • A list of possible application areas is list
    below
  • Conversational services
  • H.323 conversational video services that utilize
    circuitswitched ISDN-based video conference
  • H.323 conversational services over internet with
    best effort IP/RTP protocols
  • Entertainment video applications
  • Broadcast via satellite, cable or DSL
  • DVD for standard
  • VOD(video on demand) via various channels

99
Profile and potential application
  • Streaming services
  • 3GPP streaming using IP/RTP for transport and
    RSTP for session setup
  • Streaming over wired Internet using IP/RTP
    protocol and RTSP for session
  • Other services
  • 3GPP multimedia messaging services
  • Video mail

100
Conclusion(III)
  • Its VCL design is based on convectional
    block-based hybrid video coding concepts, but
    with some differences relative to prior standard,
    they are illustrated below
  • Enhanced motion-prediction capability
  • Use of a small block-size exact-match transform
  • Adaptive in-loop de-blocking filter
  • Enhanced entropy coding methods
Write a Comment
User Comments (0)
About PowerShow.com