Image and Video Compression A presentation to Avocent

About This Presentation

Title:

Image and Video Compression A presentation to Avocent

Description:

Image and Video Compression A presentation to Avocent Noel O Connor, Andrew Kinane, Daniel Larkin ... Generic Codec Structure Discrete Cosine Transform (DCT) Why DCT? – PowerPoint PPT presentation

Number of Views:408

Avg rating:3.0/5.0

Slides: 93

Provided by: elmEengD2

Category:

more less

Transcript and Presenter's Notes

Title: Image and Video Compression A presentation to Avocent

1
Image and Video CompressionA presentation to
Avocent

Noel OConnor, Andrew Kinane, Daniel Larkin
19/09/2006

2
Overview

Lossless Compression
Entropy coding a brief review
Huffman Coding
Arithmetic Coding
Lossless Compression Standards
The FAX Group Standards, JBIG, Lossless JPEG
Lossy Compression
Generic Codec Structure
DCT/IDCT
Quantization
Motion Estimation
Motion Compensation
Lossy Compression Standards
JPEG, JPEG2000, H.261 / H.263 / H.264,
MPEG-1/-2/-4
Image Analysis Techniques
Visual Feature Extraction

3
Lossless CompressionEntropy Coding
4
Entropy Coding

Also referred to as source coding
Assign each symbol a binary codeword
Allocate a specific string of bits to a symbol
Based on information theory
S s1 sN is set of symbols to encode with
probabilities p1 pN
Entropy H(s) is measure of the information
content
Specifies lower bound on efficiency

5
Huffman Coding

A form of Variable Length Coding
Assign shorter code-words to symbols most likely
to occur, longer to those less likely
Problem must choose code-words carefully!
Must obey prefix condition so decoder can parse
bitstream

Sequence s1, s4, s3, s2 Bitstream 1 0 1 0
0 1 1 0 1 Decoder
s1
s4
s3
s2
s1
s2 or s4?
6
Huffman Coding

Ensures instantaneously parseable code-words
100 efficient when p1 pN are negative
exponents of 2 (0.5, 0.25, etc )
Algorithm generate Huffman coding tree
Form the tree
Sort the symbols by their probabilities
Merge the two smallest probabilities by adding
them and produce a new node in the tree
Repeat until only a singe node is reached
Assign bits
Traverse the tree from the root to the leaf nodes
assigning each branch encountered a one or zero.
Decoding based on storing codewords in specially
constructed LUT

7
Huffman Coding

Generate code-words for each grey level
S s1 s2 s3 s4 s5 0,4,5,6,7
p1 p2 p3 p4 p5 0.125, 0.484, 0.25, 0.125, 0.016

8
Huffman Coding

Generate code-words for each grey level
S s1 s2 s3 s4 s5 0,4,5,6,7
p1 p2 p3 p4 p5 0.125, 0.484, 0.25, 0.125, 0.016

9
Huffman Coding

Efficiency
Calculate Average Coding Rate
Symbol probability (pi) x code-word length (li)
Compare to entropy

H(s)
R
10
Huffman Coding

Problems
Lower bound of 1 bit/symbol
Does not facilitate adaptive coding
Example

11
Arithmetic Coding

Treat groups of symbols but maintain a
symbol-by-symbol encoding mechanism
Assign a single codeword to a group of symbols
Codeword represents a half-open interval on 0.0,
1.0)
By assigning enough precision bits, one interval
can be distinguished from another
Symbols with higher probabilities correspond to
larger intervals, thereby requiring less
precision bits

12
Arithmetic Coding

Sa,b p1 p2 1/3, 2/3
First symbol narrows interval to that symbols
range
Subsequent symbols further restrict the current
interval.
Decoding reverses this
Receives number in 0.0, 1.0)
Checks which symbols range contains this
decode symbol
Since lower upper bounds of symbol known,
their effects on the encoded number can be
reversed
Gives, a new number
REPEAT

13
Arithmetic Coding

Incremental transmission
Example message BILLltspacegtGATES

2
25
257
2572
257216
2572167
14
Arithmetic Coding

Can be performed very efficiently using 16/32 bit
integer mathematics
Bits are transmitted as they become available
Simplification use the value 0.999 rather than
1.0
In binary arithmetic this corresponds to 0.111
Only use fractional part gt only need integers
High initially stores 0xFFFF, whilst Low stores
0x0000
For each symbol encoded, examine most significant
bit of both High and Low
If these bits are the same, output bit

15
Lossless CompressionStandards
16
ITU-T Facsimile

ITU-T Rec. T4 (Group 3)
Targets scanned business documents
Binary images white (1), black (0)
Two modes
Modified Huffman (MH)
Run-length encoding is used to form runs of 1s
and 0s for each line in the image
Huffman coding applied to these (run,symbol)
pairs
Different Huffman codes for runs of 1s and 0s
A special end-of-line (EOL) symbol is encoded for
error detection purposes.
Modified Read (MR)
Pixel values from the previous line used as
predictors for current pixels to be encoded
Prediction residual is then encoded using Huffman
coding.
MR mode is periodically interspersed with MH
mode.

17
JBIG

Joint Binary Image Experts Group (JBIG) developed
jointly by ITU-T and ISO
Targets bi-level images
may be either business documents or grey-scale
images of natural scenes rendered as bi-level
images.
Uses adaptive arithmetic encoding
Modeling step estimates probability of next
symbol based on a context consisting of local
pixels
Probability is then used to drive the arithmetic
encoder
JBIG can be applied to grey-scale images by
treating each grey-level image plane as a
bi-level image.

18
Lossless JPEG

Joint Photographic Experts Group (JPEG) has a
lossless image compression mode.
Prediction for pixel to be encoded based on a
context of previously encoded pixels
Different ways for forming the prediction
Method used encoded as side-information for each
scan line.
To encode the prediction residual
(length, magnitude) pair formed
length indicates the number of bits used to
encode the magnitude
A static Huffman code is used.
magnitude is the actual residual value directly
encoded.

19
Lossless JPEG

p 190
p1 184, p2 176
P 180
R 180-190 -10
Encoded as the event (4,0101)
Negative residuals encoded as 1s complement
Huffman code for 4 is 001, then this give the
final codeword 0010101
Decoder
Calculates the prediction value (180)
Parses the Huffman code, which allows decoding of
the magnitude (0101)
Detects a leading zero gt knows the value must be
negative, so next four bits decoded as -10.
Reconstruction pP-R 180-(-10) 190

20
Lossy CompressionGeneric structure of a video
codec
21
Redundancy in Video Sequences

Video compression targets 3 kinds of redundancy
Spatial the correlation that exists between
(groups of) pixels
Temporal similarity between video frames
Perceptual Human Visual System (HVS) is less
sensitive to high-frequency information.
Lossy compression throws information away as part
of these processes
Remaining information is encoded losslessly using
entropy coding

22
Redundancy in Video Sequences

Spatial redundancy
Transform data to be encoded into a new
representation where data is less correlated
Leads to a more compact representation.
Temporal redundancy
Only encode difference between 2 video frames
(lower entropy)
Form prediction of frame to be encoded and encode
prediction residual
Perceptual redundancy
Suppress/remove high frequency components
corresponding to fine image detail.

23
Coding Modes

INTRA
Encode a frame completely independently (i.e.
with no reference to previous/future frames)
Forms random access point in bitstream, resets
encoding, limits error propagation
Equivalent to having a JPEG-encoded still image
at periodic intervals in bitstream.

Frame 0
24
Coding Modes

INTER
Use a previous/future frame (termed reference
frame) as the basis for a prediction of the
current frame
Could just simply subtract reference frame from
current frame
Or use a more sophisticated prediction method
Need to use reconstructed frame as basis for
prediction so that encoder/decoder stay
synchronised.

Frame 0
25
Coding Unit

Break image/frame up into 16 x 16 macro-blocks
For YUV
4 8x8 luminance pixel blocks
2 8x8 chrominance pixel blocks.
Coding decisions made on macro-block basis
INTRA/INTER coding mode
prediction method if INTER
Loss introduced.
Decisions flagged in bitstream syntax.

26
Generic Codec Structure
27
Discrete Cosine Transform (DCT)

Why DCT?
What is it?
How does it work?
How is it computed (in reality)?
Adoption and variations
What about the DWT?
Quantisation

28
Why DCT?

Neighbouring pixels are likely to be similar
The same is true for prediction residual data
Want to exploit this spatial correlation
We want a transform that
Removes correlation from data
Packs signal energy into as few coefficients as
possible
Coefficients suitable for entropy coding

29
Why DCT?

Optimal solution
Use eigenvectors of the covariance matrix of the
input pixel data
Order based on size of eigenvalue
Based on theory of principal component analysis
(PCA)
Referred to as the Karhunen-Loeve Transform (KLT)
rao90
Achieves complete de-correlation
Packs most energy into fewest coefficients
Minimises MSE for a given number of coefficients
(Quantisation)
Minimises the entropy
Disadvantages
Very computationally demanding
Transform kernel is data dependent
Kernel must be sent to decoder also!
Not practical in a real compression system
Compromise ? The DCT

30
What is the DCT?

Treat frame as a grid of 8x8 pixel blocks
Pixel data (intra block)
Prediction Residual (inter block)
Compute 8x8 2D DCT on each block
Formula
Basis functions derived
using Fourier theory

31
What is the DCT?

Fouriers theorem and the Nyquist sampling
criterion mean only certain discrete frequencies
can be present in an 8x8 block of sampled data.
DCT coefficients tell us how much of a
particular frequency is present in a particular
block
Very crude explanation!
Inverse DCT (IDCT) reverses this process
Essentially Fourier synthesis

32
How does the DCT work?

DCT does not compress anything in isolation!
This is achieved by quantiser and entropy coding
DCT output easier to compress though
Most natural video dominated by low frequencies

33
How does the DCT work?

Human eye less sensitive to high frequencies
Use a quantiser whose step size depends on
frequency
Effectively discard perceptually unimportant data
After quantisation there will be many zero valued
coeffs
Typically only 5 or 6 non-zero valued coeffs
xanthopoulos99
Suitable for run length and entropy coding

34
How does the DCT work?

Zig-zag scan
Keep statistically related coeffs together
Better run-length coding

35
How is the DCT Computed?

Most implementations exploit the fact that the 2D
DCT is separable
Compute 1D DCT on each column
Compute 1D DCT on each resultant row
16 x 1D 8-point DCTs in total
Need efficient implementation of 1D 8-point DCT
30 years of research in this field
Basic implementation (64 56)
Fast implementation loeffler89 (11 29)
Video codec optimised implementation AAN
arai89 (5 29)
Arithmetic precision a vital decision
If constraint is 1920x1080 _at_ 30Hz
97200 8x8 blocks per second
Need at least (17x106 45x106) per second using
Loeffler!

36
How is the DCT Computed?

Sometimes dedicated hardware needed
Performance and/or power reasons
Hardware architecture taxonomy

37
Adoption and Variations

8x8 DCT
Used in JPEG, H.261, H.263, MPEG-1, MPEG-2,
MPEG-4 with specific quality requirements
Shape Adaptive DCT
Used in MPEG-4 Advanced Coding Efficiency (ACE)
profile
Kernel basis functions determined by object shape
Integer DCT Approximation
Used in H.264
Block size of 4x4 and 8x8 depending on mode
Avoids the IDCT mismatch problem
Less computationally demanding (16bit integer
arith)
More features (can discuss later if necessary)

38
What about the DWT

Discrete Wavelet Transform (DWT)
Used by JPEG-2000
MPEG-4 uses SA-DWT (for static shape textures)
Why? ? Better than Fourier analysis for
non-stationary data
Inherently scalable
Involves successive LPF and HPF of data and
subsampling
More efficient at very low bit rates
DCT and coarse Q ? Blocking artefacts
DWT and coarse Q ? Blurring/smearing (much less
perceptible)
More computationally demanding than DCT

39
What is Quantisation?

A lossy process
Get rid of information
Gives compression gain
Try to minimise distortion
Try to reduce entropy
Two primary types
Scalar quantiser (one to one)
Vector quantiser (many to one)

40
Scalar Quantiser

Need to find optimal values for
Decision levels di
Reconstruction levels ri
Difficult in general!

41
Scalar Quantiser

Aim to mimimise distortion
Minimise MSE ? Lloyd-Max quantiser
A good quantiser design depends on probability
distribution of the input data
Want less error for more probable inputs
Case 1 Uniform distribution
Decision bands all same width
Reconstruction levels equally spaced
Referred to as a linear quantiser
Used frequently for simplicity

42
Scalar Quantiser

Case 2 Piecewise constant distribution
Used when of decision levels N is large
Decision level solution difficult (Use numerical
methods for Lagrange multipliers)
Reconstruction levels

43
Scalar Quantiser

Case 3 Nonuniform distribution
Need numerical methods for di and ri
Tables available for standard distributions
(Gaussian, Laplacian, Rayleigh,) for popular N
This is a true Lloyd-Max quantiser (or optimum
mean square quantiser)
Case 4 Uniform quantiser
Uniform refers to equal spacing between decision
levels regardless of distribution
Similar structure to Case 1 but different
performance because distribution not uniform
Commonly used (e.g in JPEG,)

44
Scalar Quantiser Performance

MSE correlates well with subjective degradation
Dont rely on MSE minimisation in isolation
though
Need to consider overall rate-distortion
Measures MSE as a function of number of bits n
Constants a and b depend on distribution
When designing a quantiser for each DCT
coefficient i need to know ni
64 quantisers
How to determine ni (number of bits per
coefficient)?
Depends on variance of coefficient i relative to
others and specified average bitrate nav
Bit allocation algorithm paradigm

45
Bit allocation algorithms

Try to keep constant
As variance increases, distortion decreases by
using more bits
Optimal allocation for N coefficients
Often a rate controller after entropy encoder
with feedback path to quantiser

46
Scalar Quantiser Summary

Uniform quantiser most commonly used
In fact, rather than transmitting a quantised
coefficient, usually transmit the quantisation
index
This has much lower entropy

47
Vector Quantiser

Quantise blocks of samples together
Each block assigned a single code
A code book used to find code for block
Code book can be dynamic or pre-defined
Each pattern has specific encoding
Can give very good performance
Quite computationally expensive
Difficult to design tables
Used by GIF standard

48
Demo

Compression gain
?
Perceptual quality

49
Motion Estimation Compensation

Exploiting temporal redundancy
Motion Estimation
Block matching algorithm overview
Matching Criteria
Selection of Search Strategies
More advanced motion estimation techniques
Software / Hardware Considerations
Motion Compensation
Adoption in standards discussed later

50
Exploiting Temporal Redundancy

Very slight change between successive frames (e.g
A B)
Camera Object Motion
Temporal prediction model at encoder decoder
provides compression if
model parameters correction terms lt raw pixel
information

e.g. Frame differencing (C)
Entropy
B 7.15 bits/pixels
C 4.38 bits/pixels
More complex models can reduce entropy further
Computational expense, memory and prediction
performance trade off
Temporal Prediction model
Motion estimation
Motion compensation

51
Taxonomy of Motion Estimation Algorithms

Good Motion Estimation reviews
Mitchell96Furht97Kuhn99

52
Block Matching Algorithm

For each MxN block in the current frame, find the
associated best matching block within a
predetermined or adaptive S pel search range in
a reference frame(s)
Estimates motion of a group of pixels
Assumes translational motion only
Typically operates on luminance component only
Good trade off between computationally complexity
prediction accuracy
Motion vector (relative offsets to the best
match) undergoes VLC
Prediction Residual undergoes further processing
(DCT, VLC, etc)

53
Matching Criteria

At each MxN block search position a matching
criteria evaluated
Wide variety of matching criteria
Mean Squared Error
Mean Absolute Differences
Sum of Absolute Differences
Reduced complexity matching criteria
Binary Block Match
Others
Cross correlation
SAD summation truncation
SAD estimation
Reduced Bit Mean Absolute Difference
Minimised Maximum Error function
Etc
Matching criteria is a complexity/prediction
performance trade off

54
Search Strategies (1/4)

Many possible search strategies!
Full Search search every position
Best results, but very computationally expensive
Operations required to generate 1 MV for 1
current block
(2S1)2 block matches
For each pixel in a M N block match subtract,
absolute, accumulate
After each block match, minimum SAD comparison
Therefore total operations
(2S1)2 (M N 3 1), e.g. s8, 289 (M N
3 1)
Reduce computational expense
Logarithmic reduces number of search positions
Assumes matching criteria monotonically increases
moving away from minimum point iteratively
converge to minimum point
Possibility of getting stuck in local minimum
Yields higher energy prediction residual
Pseudocode for the Three Step Search
1 R 2(log2S-1)
2 Search positions within the search window
defined using R
3 R R/2
4 if Rlt1 finished, else repeat go to 2.

55
Search Strategies (2/4)

Logarithmic searches contd.
Three Step Search Koga81
S 8, initial R4
Search positions defined using R
(x-R,y-R), (x,y-R), (xR,y-R) .(x,y),(xR,yR)
Operations required to generate 1 MV
(988) (M N 3 1)
Variants
2-D logarithmic Jain81, Parallel 1-D Chen91,
CDS Rao83, N3SS Li94, 4SS Po96

Hierarchical Search Strategies
Search fewer positions use fewer pixels in the
matching criteria
Achieved via sub-sampling current reference
frames
Disadvantage increased memory
Best match in lower resolution seeds search for
subsequent resolutions
Can help to avoid local minima due to low pass
filtering effect
Local minima still possible for small regions
which disappear during sub-sampling

56
Search Strategies (3/4)

3 Level Hierarchical Search Example
Level 1 Original
Sub-sampled by factor of 2 generating level 2
Level 1 sub-sampled by 4 generating level 3
Motion Estimation starts at level 3
block size N/4 X M/4
Search window S/4
FS or TSS employed within this window
Produces motion vector (Vx3, Vy3)
Motion Estimation level 2
block size N/2 X M/2
Centered on (x/22Vx3, y/22Vy3)
Search window 1 around this point
Produces motion vector (Vx2, Vy2)
Motion Estimation level 1
Centered on (x2Vx2, y2Vy2)
Search window 1 around this point
Produces final motion vector (Vx1, Vy1)

Operations required to generate 1 MV using a FS
at level 3
(2(S/4)1)2 (M/4 N/4 3 1) 9(M/2 N/4
3 1) 9(MN 3 1)

57
Search Strategies (4/4)

Scene adaptive search area
Zone based search strategies
Can employ stopping threshold in each zone
Advantageous in a rate/distortion sense
chan95Jung96Zhe97
Spiral Search
Dynamic search window size
Many techniques used to adjust range
Spatial correlation of MV Chain95In97
Gradient based methods
Block based gradient decent search Liu96
Stops after 4 steps
Diamond search Cote97
Early stopping technique
Skip to next block match when the minimum SAD has
been exceeded
Successive elimination algorithm Li95
Conservative block SAD Do98

58
Different Search Strategy Performance

Frame Differencing
0 Motion Vector
Entropy 4.38 bits/pixel
1 operation/pixel (subtraction)
Full Search
Block size 16x16
Search range 8
Entropy 2.61 bits/pixel
868 operations/pixel
Hierarchical Search
Block size 4x4, 8x8, 16x16
Search window 2,4, 8,
Entropy 3.08 bits/pixel
39 operations/pixel
Hierarchical Search
Block size 4x4, 16x16, 32x32
Search window 2, 4, 8
Entropy 2.91 bits/pixel
35 operations/pixel

59
More advanced techniques (1/2)

Bi-directional (Forward and Reverse) Prediction
Termed B-frames
Not feasible for real-time systems
Multiple Reference Frames
Improves prediction
Increases computational expense memory
requirements
Unrestricted Motion Vectors
Allow block matches outside the reference frame
Pixel padding used to extend beyond frame
boundaries
Predictive Motion Vectors
Rather than start at collocated block use a MV
predictor
Temporal and/or Spatial prediction
Lee97Kos97Zheng97
Can improve prediction residual quality
Can employ thresholds to gate-off motion
estimation
H/W Reduces pixel reusability between current
block positions

Global Motion Compensation
Default motion for the frame/object

60
More advanced techniques (2/2)

Sub-pel Motion Estimation
Real motion is not constrained by integer pixel
amounts
Half-pel quarter pel frequently used
But memory increases
H.264
6-tap FIR filter for ½ pel
Bilinear for ¼ pel

Variable Block Size Motion
Smaller block size will lead to smaller residual
But number of motion vectors signalling info
increases
41 MV per 16x16 block in H.264
MPEG-4 H.263 Advanced Prediction Motion
Estimation (4MV)
H.264
Dynamically adapts between multiple block sizes
(16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4)
Rate/Distortion Optimised
Motion Vector Coding Prediction
Adding MVs to bitstream can be costly,
particularly if block size is small
DPCM used to exploit spatial MV redundancies

61
ME Software/Hardware considerations

Software algorithmic complexity (simplified
analysis)
To support 1920x1280 9600 x 30 288K 16x16
blocks/sec
8 Search Window 289 Block matches per current
block
Total block matches 289 288K 83,232,000
matches/sec
Operations 83,232,000 (256 pixels31) 6.4
GOPS
Hardware implementations can be attractive
Systolic Array (1D/2D) approaches typically
employed
Memory bandwidth efficient high throughput
Full Search commonly used
Architectures also available for heuristic search
strategies
Architectures for H.264 Variable Block Size
emerging
Ball park figures for H.264 VBSME core
1-D 16 PE SA
Area 40-60K gates Memory Bandwidth 3 pixels
per clock cycle
1 16x16 block match every 4096 clock cycles (8
search range)
2-D 256 PE SA
Area 100-200K gates Memory Bandwidth 48
pixels per clock cycle
1 16x16 block match every 256 clock cycles (8
search range)
To support 1920x1280 9600 x 30 288K 16x16
blocks/sec

62
Motion Compensation

Straightforward relative to motion estimation
Reconstructed MB Residual Mot. Comp. MB
(pointed to by MVs)
Copy block of pixels from displaced block in the
reference frame into the current frame
Reference frame must be stored in decoder
For encoder and decoder to remain synchronised
Encoder also needs to do motion compensation
Considerations
Additional frame memory at the decoder
Low computational requirements

63
Lossy CompressionStandards
64
Standards Evolution
65
JPEG

Flexible image coding standard
4 Modes of operation
Lossless encoding (earlier)
Baseline sequential encoding
Progressive encoding
Hierarchical encoding (towards JPEG-2000)
Motion JPEG
Baseline encoding of each frame
No motion estimation
Not properly standardised

66
JPEG-2000

JPEG not optimised for a wide range of apps
JPEG-2000 even more flexible
Interesting features
Uses DWT instead of DCT
Region of Interest (ROI) coding
Scalability
Spatial scalability
SNR scalability
More resilient to channel errors
Individual quality packets independently decoded
Also supports lossless coding
Added flexibility comes at computational cost

67
JPEG/JPEG-2000 Summary

JPEG capable of average compression of 151 for
subjectively transparent quality
JPEG-2000 better compression _at_ fixed rate
For Foreman
Gain of 1.5?4 dB for range of 1.2?0.12 bpp
Applications
Internet
Digital photography
Many more

68
ITU-T H.261

ITU-T narrow bandwidth real-time apps
H.261 (p x 64)Kb/s over ISDN (1p30)
CIF and QCIF resolution
Real time video telephony/conferencing
Up to 3 frames interpolated by decoder
Supports framerates of 30Hz, 15Hz, 10Hz, 7.5Hz
Video compression tools
8x8 DCT
Uniform scalar quantiser (rate control optional)
Entropy coder is modified run length and Huffman
Motion Estimation
Only forward direction
Search window limited to 15
Integer pixel accuracy only
Motion Compensation is optional
Loop filter (alleviate blocking)

69
ISO/IEC MPEG-1

Storage of AV content for delivery at 1.5Mb/s
Flexible
Resolutions typically 768x586
Framerate typically 30Hz
H.261 was starting point for the standard
Compression gain at expense of latency
Specific features
Standard VLCs determined by Huffman coding
DCT DC coeffs are differentially predicted
Bi-directional prediction (I,P,B frames)
Motion compensation with half-pixel accuracy
Maximum MV range of (-512,511.5) for half pixel
and (-1024,1023) for integer pixel
Weighted quantisation (H.261 does not have this)
Random access to bitstream, FF, FR

70
ISO/IEC MPEG-1
71
ISO/IEC MPEG-2

High quality video _at_ 4-15Mb/s
VOD, Broadcast TV, DVD, HDTV, Satellite TV
Major differences w.r.t. MPEG-1
More resolutions, framerates, qualities and
bitrates
SIF (352x288_at_25Hz) ? HDTV (1920x1250_at_60Hz)
Profiles and levels
Has interlaced/progressive option
Frame/Field based ME, MC and DCT
Scalability (temporal, spatial, SNR)
Minor differences
More bits for quantisation
Alternate scan (as well as zigzag)

72
ITU-T H.263

Very low bitrate apps (lt 64kb/s)
Video telephony over PSTN, mobile telephony
Recommended resolutions subQCIF, QCIF, CIF,
4CIF, 16CIF
Non-interlaced _at_ 29.97Hz
Similar to H.261
Extensions (Some optional in Annex but included
in H.264)
MVs differentially encoded
Half-pixel accurate motion estimation
Extensions support quarter and one eighth
Unrestricted motion vector mode
MVs can point outside image, edge pixels form
prediction
Advanced prediction mode
MB can have 4 MVs associated with it
Syntax-based arithmetic encoding (SAC)
Optional mode to replace VLCs with arithmetic
encoding
PB frames
Error resilience
Synchronisation markers
Reversible VLCs

73
ISO/IEC MPEG-4

An all encompassing standard!
Improved compression at 5kb/s ? 1Gb/s
Resolutions of sub-QCIF to studio
Content-based interactivity (semantic objects)
Universal access (scalability, error resilience)
Synthetic and natural hybrid coding (SNHC)

74
ISO/IEC MPEG-4
75
ISO/IEC MPEG-4

Video coding tools
Integer, half and quarter pixel ME
Boundary MB ME padding or polygon matching
Global ME
Shape Adaptive DCT
AC/DC intra prediction
Enhanced scalability FGS
Still texture coding (uses SA-DWT)
Shape Coding tools
Context-based arithmetic encoding (CAE)
Compute context
Index into LUT for probability of 0,1
Drive arithmetic encoder

76
ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)

Targets enhanced compression for wide range of
apps
Improved prediction
Variable block-size MC with small block sizes
Up to quarter-pixel MC
Unrestricted motion vector mode
Multiple reference picture MC
Weighted prediction (generalised B-pictures)
Directional intra prediction (9 4x4 modes, 1
16x16 mode)
In the loop adaptive deblocking filter
Improved coding efficiency tools
Small block size transform
Hierarchical block transform
Short word length transform (16 bit integer
arith)
Exact match inverse transform
CAVLC, CABAC
Enhanced error robustness and network
friendliness

77
ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)
78
ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)

H.264 Version 1 has 3 profiles
Baseline
Main
Extended
Fidelity Range Extension (FRExt) Amendment
High Profile
High 10 Profile
High 422 Profile
High 444 Profile
Up to 12 bits per sample
Supports lossless region coding
Codes RGB to avoid colour space transformation
error

79
Comparing Standards

Video conferencing applications
Low latency real-time requirement
H.264/AVC MP would improve by further 10-20
Using low delay bi-prediction, CABAC

80
Comparing Standards

Video streaming applications
Less of delay constraint

81
Comparing Standards

Entertainment-quality applications
High resolution, delay tolerable

82
Comparing Standards

Professional motion picture production
Random access to individual frames
Up to HDTV, H.264/AVC MP comparable or better
than Motion-JPEG2000

83
Comparing Standards

PSNR while good does not take into account
intricacies of the human eye
Need subjective video tests
Other metrics
MPQM,
Experiments show that H.264 gives lowest bitrate
for subjectively equivalent video over a range of
apps
Improved performance comes at the cost of
computational complexity
Main bottleneck is ME (very memory intensive)

84
Image AnalysisVisual Feature Extraction
85
Visual Features - Still Images

What features are important?
Colour
Texture
The feel, appearance, consistency of a surface
In an image
Distribution over the entire image?
Of specific parts of the image?

No texture
Highly textured
86
Visual Features - Colour

Colour is visually important to humans
Colour features and similarity metrics easy to
compute
Histogram Swain and Ballard, 1992
Most commonly used structure to represent global
image features.
Invariant to translation and rotation and can be
made invariant to scale by normalisation
MPEG-7 Scalable Colour Description
H(16 levels) S(4 levels) V(4 Levels) histogram
encoded with a Haar transform for efficiency
scaling

87
Visual Features - Texture

Simple texture descriptors Pratt, 1991
Autocorrelation function
Co-occurrence matrices
Edge frequency
Primitive length
More sophisticated (based on transforms and/or
filtering)
Wavelet Mallat, 1990, Haar Theodoridis, 1999,
Gabor Bovis, 1990
Others
Mathematical morphology
Fractals

88
Visual Features - Texture

Example MPEG-7 Edge Histogram
Represents the global (and possibly local - Won,
2002) spatial distribution of edges
Need to first generate edge map
Roberts, Sobel and Prewitt, Canny,
Build histogram based on 5 edge types

89
Change Detection

Compare 2 temporally adjacent images and
determine how different they are
Why?
Surveillance-type applications
Assume static camera background
Anything changing between one object and next
must be an object!
In fact, this is naïve but starting point of many
object segmentation techniques
Temporal video structuring
Breaking video up into chunks for non-linear
browsing shots, scenes, events, story-lines

90
Temporal Video Structuring

Shot boundary detection

a video document
A set of keyframes
Keyframe-based video browsers
91
(No Transcript)
92
Temporal Video Structuring

Shot boundary detection
A shot is a continuous piece of video taken with
one camera
A shot cut is the abrupt or gradual transition
between two shots
Uncompressed domain
Calculate colour histogram for each frame
Calculate difference between histograms using
suitable metric L1 (city-block), L2 (Euclidean),
Mahanoblis, etc
Threshold
Compressed domain
Parse features directly from bitstream
E.g. use DCT coefficients for each frame to
reconstruct approximation of image
E.g. motion vectors for each pair of frame and
detect changes in global statistics