ECEC453 Image Processing Architecture - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

ECEC453 Image Processing Architecture

Description:

Lecture 6, 2/3/04. Lossy Video Coding Ideas. Technology of DCT and ... plication by a constant, multiplication by. 0.5 (shift). Characteristics. of optimized ... – PowerPoint PPT presentation

Number of Views:226

Avg rating:3.0/5.0

Slides: 49

Provided by: OlehTr8

Category:

more less

Transcript and Presenter's Notes

Title: ECEC453 Image Processing Architecture

1
ECE-C453Image Processing Architecture

Lecture 6, 2/3/04
Lossy Video Coding Ideas
Technology of DCT and Motion Estimation
Oleh Tretiak
Drexel University

2
Decorrelation Ideas

Orthogonal Transforms (KLR, DCT)
Main method for intra-frame coding
Wavelet
New stuff (JPEG 2000)
Predictive coding
Simple
Used for inter-frame coding (video)

Review
3
Lossy Predictive Coding

How to decorrelate?
Predict values
Block coding (DFT)
wavelet
Predictive (sample based, feedback)
encoder,Differential Pulse Code Modulation (DPCM)

Review
4
Review Image Decorrelation

x (x1, x2, ... xn), a sequence of image gray
values
Preprocess convert to y (y1, y2, ... yn), y
Ax, A an orthogonal matrix (A-1 AT)
Theoretical best (for Gaussian process) A is the
Karhunen-Loeve transformation matrix
Images are not Gaussian processes
Karhunen-Loeve matrix is image-dependent,
computationally expensive to find
Evaluating y Ax with K-L transformation is
computationally expensive
In practice, we use DCT (discrete cosine
transform) for decorrelation
Computationally efficient
Almost as good as the K-L transformation

Review
5
Review Block-Based Coding

Full image DCT - one set of decorrelated
coefficients for whole image
Block-based coding
Image divided into small blocks
Each block is decorrelated separately
Block decorrelation performs almost as well
(better?) than full image decorrelation
Current standards (JPEG, MPEG) use 8x8 DCT blocks

Review
6
Rate-Distortion 1D vs. 2D coding

Theory on tradeoff between distortion and least
number of bits
Interesting tradeoff only if samples are
correlated
Water-filling construction to compute R(d)

Review
7
Wavelet Transform

Filterbank and wavelets
2 D wavelets
Wavelet Pyramid

Review
8
Filterbank Pyramid
125
125
250
500
1000
Review
9
Lena Top Level, next level
Review
10
This Lecture

Idea
Video Coding by Pixel Prediction
Motion Estimation
Technology DCT, and how much it costs
Technology Motion Estimation Algorithms

11
Video Coding

Video Sequence of images
Reason for changes between successive images
Edits
Camera pan, zoom
Intra-frame motion
Intra-frame texture
Noise
Model Successive images are similar
Video coding uses intra-frame redundancy to
achieve lossy compression

12
Predicting sequential images
f(t-1)
f(t)
f(t)f(t1)
13
Motion Compensation

Macroblock size
MxN
Matching criterion
MAE (mean absolute error)
Search window
p pixel locations
Search algorithm
Full search
Logarithmic search
Parallel Hierarchical One-Dimensional Search
Pixel subsampling and projection
Hierarchical downsampling

14
Motion Estimation Methods
No compensation
Full search
logarithmic search
3 level hierarchical
15
DCT Technology

DCT Formula
How it works
DCT plus quantization
DCT implementations and cost
Direct
Separable
Fast
Refinements

16
What is the DCT?
Note in these equations, p stands for p.

One-dimensional 8 point DCT
Input x0, ... x7, output y0, ... y7
One-dimensional inverse DCT
Input y0, ... y7, output x0, ... x7
Matrix form of equations x, y are one column
matrices

17
Two-Dimensional DCT

Forward 2DDCT. Input xij i 0, ... 7, j 0,
... 7. Output ykl k 0, ... 7, l 0, ... 7
Matrix form, X, Y 8x8 matrices with
coefficients xij , ykl
The 2DDCT is separable!

Note in these equations, p stands for p.
18
General DCT

One dimension
Two dimensions

19
Example 4x4 DCT

See 06IPA.xls

20
Computational Complexity

1D DCT
N input and output samples N2 64 operations
(additions multiplications)
2D DCT - direct implementation
M N2 input values, M output values -gt M2 N4
2D DCT - separable implementation, Y TXTT
ZTT, where Z TX, all matrices are NxN -gt 2N3
operations
For N 8
2D DCT direct 4096 operations, 64 operations
per pixel
2D DCT separable 1024 operations, 16 ops/pixel
Big savings due to separable transform
Inverse DFT same story.

21
DCT Encoding in JPEG, MPEG

Take 8x8 blocks of pixels
Subtract range mean value
Compute 8x8 DCT
Quantize the DCT coefficients
Typically, many of the samples are equal to zero
Lossless entropy coding of the quantized samples
Different quantization step is used for different
DCT coefficients
ykl DCT coefficients, qkl quantizer steps
zkl quantized values

22
DCT Example

Data from lena, smooth area. RMS error 3.5

DCT
Original
DCT, quantized
Reconstructed
23
DCT example

Data from lena, busy area. RMS error 7.3

Original
DCT
DCT, quantized
Reconstructed
24
Overview DCT coding

Transformation decorrelates samples
Transformed samples are quantized, quantization
step depends on the coefficient. Degree of
compression and loss can be changed by scaling
the quantization steps
Many quantized samples are zero gt run length
coding
At receiver, perform inverse DCT
Many calculations!

JPEG standard quantization steps
25
Speeding up the DCT

Separable transform - basic speedup
Fast DCT transform - like FFT
Further speedup through Scaled DCT

26
Optimized (fast) DCT

1-D Chen DCT diagram. Dashed lines indicate
subtraction, multi-plication by a constant,
multiplication by 0.5 (shift).

Characteristics of optimized DCT algorithms
27
DCT Complexity

Direct DCT computation
64 DCT values, each requires 64 multiplications
additions gt 4096 multiply-accumulate (MA)
operations per block
Separable algorithm (operate on rows, then on
columns) gt 16 one-dimensional 8 point DCT
operations gt 1024 MA operations
Fast implementation Nlog2N operations 16x24
384 MA ops
Special methods many operations involve
multiplication by 1 or -1, take advantage of this!

28
Fast Scaled DCT

Picture of a butterfly at last stage of DCT
following quantizer

29
DCT refinements
Complexity of scaled DCT algorithms, excluding
quantization

Multiply-accumulate architectures
Basic operation is a bc d, well suited for
DCT
Super-scalar architectures
Multi-register, multi-ALU processors
Perform several operations in parallel

30
Motion Estimation

Architecture of Motion Estimation
Algorithms and Costs
Full Search
Logarithmic Search
PHODS
Downsample, projection
Hierarchical motion estimation
Other criteria
Multi-image estimation

31
Baseline Models

Previous frame predicts current frame
I(x, y, t) I(x, y, t-1) e(x, y, t)
Not effective in presence of motion zoom, pan,
etc.
Prediction to account for motion
I(x, y, t) I(xu, yv, t-1) e(x, y, t)
(u, v) motion (displacement) vector
Model works (somewhat) for pan, not for other
motion
Compromise Compute independent motion estimates
for rectangular image regions macroblocks.
Macroblocks are, in general, bigger than DCT
blocks

32
Generic Encoder - simplified
33
Generic Decoder
34
Motion Compensation

Macroblock size
MxN
Matching criterion
MAE (mean absolute error)
Search window
p pixel locations
Search algorithm
Full search
Logarithmic search
Parallel Hierarchical One-Dimensional Search
Pixel subsampling and projection
Hierarchical downsampling

35
Motion Estimation Terminology

Issues
Size of macroblock
Size of search region
In video coding standards, M N 16

36
Matching Criterion

Matching criterion what produces the fewest
coded bits for the error image
Coding for each value of motion vector (u, v) is
too time consuming (expensive)
In practice, mean absolute error (MAE) is most
popular
C - current image, R - reference image, (x, y) -
macroblock origin

37
Full-Search Method

Compute for (2p1)2 values of (i, j).
Each location requires 3MN operations
Picture dimensions IxJ, F pictures per second
3IJF(2p 1)2 operations per second
I 720, J 480, F 30, p 15 gt 30 GOPS
Guaranteed to find best (MAE) displacement
How to do it?
Special computers
Smaller p
Faster (suboptimal) algorithm

38
Logarithmic Search (1D)

Goal find minimum over u in -p, p
First step evaluate at -p/2, 0, p/2 (interval
p)
Next step choose interval of length p/2 around
minimum (2 more evaluations)
Continue until interval length is equal to 2.
This takes k ceiling(log2p) iterations
Example p 7

39
Logarithmic Search - 2D

First stage requires 3x3 9 evaluations
Subsequent stages require 8 evaluations
k ceiling(log2p) stages (iterations)
Rate 3IJF(8k1)
p 15, I 720, J 480, F 30 gt 1 GOPS
Can fail to find minimum
Bottom line Faster method, more error than full
search

40
PHODS

Parallel Hierarchical One-Dimensional Search
1-st Blue2-nd Green3-rd Red

Twice as fast as logarithmic Less reliable
41
Other Fast Methods

Subsample (do not use all points in macroblock)
Projection Row and column projection of pixels,
follow with 1-D search
Hierarchical motion estimation
Downsample reference image and current image
Perform low resolution search
Refine

42
Hierarchical Search

Prepare downsampled versions of current and
reference images
Full macroblock 16x16
Down 2 macroblock 8x8
Down 4 macroblock 4x4
Full search in Down 4 reference image
16 x speedup, smaller macroblock
16 x speedup, fewer displacement vectors
p 16, p 4
Around point of best match, do local search in
Down 2 reference image (3x3 search zone)
Repeat for Full reference image (3x3 search zone)