Title: CHAPTER 3 Fundamentals of Lossy Image Compression
1CHAPTER 3Fundamentals of Lossy Image Compression
2Lossy Compression System
 Lossy compression of images deals with
compression processes where decompression yields
an imperfect reconstruction of the original image
data.  There is always a bound on the minimum bit rate
of the compressed bit stream.  Image data tend to have a high degree of spatial
redundancy.  Within such a system, compression is achieved by
exploiting both the spatial redundancies within
the image and the perceptual characteristics of
the human visual system so that the loss due to
compression may not be discernible to the viewer. 
3SampleBased Coding
 There are two classes of lossy compression
schemes for images  samplebased coding
 blockbased coding
 Spatial domain block coding
 Transformdomain block coding
 In samplebased coding, the image samples are
compressed on a samplebysample bases. The
samples can be either in the spatial domain or in
the frequency.  Differential pulse code modulation (DPCM)
xij
eij
qij
Quantizer
qij

Pij
Pij
Predictor
Predictor
Encoder
Decoder
4Quantizer
 If the image is highly correlated, Pij will track
xij, and eij will consequently be quite small.  The residue signal eij is quantized. The
quantizer maps several of its inputs into a
single output. This process is irreversible and
is the main cause of information loss.  For a uniform quantizer, the quantization process
can be expressed as  Since the variance of eij is lower than the
 variance of xij, quantizing eij will not
introduce  significant distortion. Furthermore, the
lower  variance corresponds to lower entropy and
 thus to higher compression.
qij
5BlockBased Coding
 In spatialdomain block coding, the pixels are
grouped into blocks, and the blocks are then
compressed in the spatial domain.  In transformdomain block coding, the pixels are
grouped into blocks, and the blocks are then
transformed to another domain, such as the
frequency domain.  The motivation for transform coding is a more
compact representation of the data.  Some of the most commonly used transform include
the discrete Fourier transform (DFT), the
discrete cosine transform (DCT), the discrete
sine transform (DST), the discrete Hadamard
transform (DHT), and the KarhunenLoeve transform
(KLT).
6Compaction Efficiency for Various Image Transforms
7Compaction Efficiency for Various Image
Transforms (Cont.)
 The KLT basis is the most efficient in terms of
compaction efficiency, since all the energy is
compacted into the top left corner.  It packs the most energy in the least numbers of
elements in Y.  It minimizes the total entropy of the sequence,
and  It completely decorrelated the element in X.
 The KLT has several implementationrelated
deficiencies  The basis functions are image dependent. The
other basis functions (DFT, DCT, DST, and DHT)
are image independent.  The compaction efficiency of DCT basis is close
to the produced by the KLT. Therefore, it is
widely used in image and video compression
standards.
8Basic Transformation Forms
9Transform Coding
 Spatial image data (image or motioncompensated
residual image) are transformed into a different
representation, transform domain.  Make the image data easy to be compressed.
 Techniques
 Discrete cosine transform (DCT)
 Usually applied to small regular locks of image,
ex. 8 8 squares.  JPEG, H26X, MPEGx
 Discrete wavelet transform (DWT)
 Usually applied to larger image section, ex.
Tiles, or to complete image  JPEG 2000, MPEG4 still texture
10Blocks
 Process the data in blocks of 8 x 8 samples
 Convert RedGreenBlue into Luminance (greyscale)
and Chrominance (Blue colour difference and Red
colour difference)  Use half resolution for Chrominance (because eye
is more sensitive to greyscale than to colour)
11Discrete Cosine Transform
 Transform each block of 8 x 8 samples into a
block of 8 x 8 spatial frequency coefficients
12Discrete Cosine Transform
13An Example of Energy Compaction
14TwoDimensional DCT (1974)
15Discrete Cosine Transform
 Any 8 x 8 block of pixels
 can be represented as a
 sum of 64 basis patterns
 (black and white patterns)
 Output of the DCT is the
 set of weights for these
 basis patterns (the DCT
 coefficients)
 multiply each basis pattern
 by its weight and add them
 together
 result is the original image
16Discrete Cosine Transform
 Most image blocks only contain a few significant
coefficients (usually the lowest frequencies)
17Hardware Architectures of Discrete Cosine
Transform
18Hardware/Software Tradeoff
 For lowend applications, using software is
powerful enough.  For highend application, must use hardware
approach.  For middleend applications, either software or
hardware approach is possible, depending on the
target design platform.
19DCT Algorithm Classification
 Direct 2D Method
 The 2D transforms, DCT and IDCT, to be applied
directly on the N N input data items.  RowColumn Method
 The 2D transform can be carried out with two
passes of 1D transforms.  The separability property of 2D DCT/IDCT allows
the transform to be applied on one dimension
(row) then on the other (column)  Require 2N instances of Npoint 1D DCT to
implement an N N 2D DCT.
20Straightforward Approach
 Carry out the computation as full matrixvector
multiplications  1D transform requires N N multiplications and
N (N1) additions  2D transform requires N4 multiplications and N
N (N N 1) additions  Although requiring the most number of operations,
this method is very regular.  Most suitable for vector processors or deeply
pipelined architecture for high PE utilization  1D fast algorithm O(NlogN)
 2D fast algorithm O(N2logN)
211D DCT Definition
224Point DCT (N4)
234Point DCT Matrix Form
244Point DCT
254Point DCT
16 Mult reduced to 6
26Butterfly First DCT Stage
P0 M0
x(0) x(3)
P0 X(0) X(3) M0 X(0) X(3)

P1 M1
x(1) x(2)
P1 X(1) X(2) M1 X(1) X(2)

Reversed input order
27Butterfly Second Stage
X(0)P0P1c2 X(1)M0 c1 M1 c3
X(2)P0P1c2 X(3)M0 c3  M1 c1
P0 M0
X(0) X(1)
X(2) X(3)
P1 M1
c1
284Point DCT
P0 M0
P1 M1
298Point DCT
30RowColumn Method Example
 A. Madisetti and A. N. Willson Jr., A 100 MHz
2D 8 8 DCT/IDCT Processor for HDTV
Applications, IEEE Transactions on Circuits and
Systems for Video Technology, vol. 5, no. 2,
pp. 158165, Apr. 1995.
31Description of Algorithms
32Description of Algorithms (Cont.)
 A straightforward implementation requires N4
multiplications for the evaluation of the DCT and
IDCT, respectively.  Decomposition to triple matrix product results in
a reduction in computational complexity to 2N3
multiplications.  Since 2N3 multiplications must be performed in N2
clock cycles (or input sample periods), the
computational requirement of such an
implementation is 2N multiplies per input sample.  For an input sample rate of 100 MHz, the
computation requirement is 1.6 GOPS, where each
operation is a multiplyaccumulate.
33RowColumn Method
 Basic concept
 2D DCT 1D DCT (Row) 1D DCT (Column)
 Each 1D DCT unit must be capable of computing N
multiplies per input sample.
YAX
ZYAT
Transpose Memory
1D DCT/IDCT
1D DCT/IDCT
Z
X
DCT
DCT for row
for column
34RowColumn Method (Cont.)
 Let first consider the computation of the triple
matrix product Z AXAT for the DCT or Z ATXA
for the IDCT. This is computed as Y AX and Z
YAT for the DCT and Y ATX and Z YA for the
IDCT.
35Computation of the DCT
 Even rows of A are evensymmetric and odd rows
are oddsymmetric.
36Matrix Decomposition
 Reduce an 8 8 matrix computations to two 4 4
matrix computations.
37Computation of the IDCT
38System Architecture
39System Architecture (Cont.)
Z
X
Y
40Architecture of Data Reorder Unit (DRU)
INSEL
41Data Flow of DRU
X(3)X(2)X(1)X(0)
Y(3)Y(2)Y(1)Y(0)
x0x1x2x3
42Data Flow of DRU (Cont.)
X0X1X2X3
X7X6X5X4
X0 X6 X2 X4
X0X7 X1X6 X2X5 X3X4
X7 X1 X5 X3
X0X7 X1X6 X2X5 X3X4
The first four clock cycles
43Data Flow of DRU (Cont.)
The next four clock cycles
44ACF MatrixVector Multiplication
45ACF MatrixVector Multiplier
Broadcasting to a, c, f multipliers
Timing and Control
xe
Ye
Mult a
Mult c
Mult f
ACC 0
ACC 1
ACC 2
ACC 3
MUX 41
46BDEG MatrixVector Multiplication
47BDEG MatrixVector Multiplier
48Hardwired Multiplier
Signed Digit Representation of the DCT
Coefficients
49Accumulator
50Transpose Memory
51Transpose Memory (Cont.)
52Finite Wordlength Analysis
53Implementation Results
541D Approach with DA
55DCT Algorithm
56DCT Algorithm (Cont.)
57DCT Algorithm (Cont.)
58Block Diagram
59Input Data Format Converter
60PreAdd and Postadd
61DABased DCT Core
62DABased DCT Core (Cont.)
63DABased DCT Core (Cont.)
64Transpose Memory
651D Approach with Systolic Array
 IEEE Transactions onCircuits and Systems for
Video Technology, Volume 5, Issue 2, April 1995
Page(s)150  157
66DCT Algorithm
67Three Steps
68Systolic Array
69Systolic Array (Cont.)
70Features of 1D Approach with Systolic Array
71Direct 2D DCT Architecture
72Direct 2D DCT Architecture
73Data Flow Graph