Data Compression presentation

About This Presentation

Transcript and Presenter's Notes

Title: Data Compression

1
Data Compression
2
Terminology

Physical versus logical
Physical
Performed on data regardless of what information
it contains
Translates a series of bits to another series of
bits
Logical
Knowledge-based
Change United Kingdom to UK

3
Terminology

Symmetric
Compression and decompression roughly use the
same techniques and take just as long
Data transmission which requires compression and
decompression on-the-fly will require these types
of algorithms

4
Terminology

Asymmetric
Most common is where compression takes a lot more
time than decompression
In an image database, each image will be
compressed once and decompressed many times
Less common is where decompression takes a lot
more time than compression
Creating many backup files which will hardly ever
be read

5
Terminology

Non-adaptive
Contain a static dictionary of predefined
substrings to encode which are known to occur
with high frequency
Adaptive
Dictionary is built from scratch

6
Terminology

Semi-adaptive
In pass 1, an optimal dictionary is constructed
In pass 2, the actual compression occurs

7
Terminology

Lossless
decompress(compress(data)) data
Lossy
decompress(compress(data)) ? data
A small change in pixel values may be invisible,
however

8
Pixel Packing
9
Run-Length Encoding

Repeating string of characters, called a run, is
coded into two bytes
First byte contains the run count, one less than
the number of repetitions
Second byte contains the run value, the character
being repeated

10
Run-Length Encoding

77777zzzyyyyyyV becomes 472z5y0V
15 byte string becomes 8 bytes long
Compression ratio of almost 2 to 1
Some strings become twice as long
7fu5JLY9jhYIujG

11
(No Transcript)
12
Lempel-Ziv-Welch (LZW)

Lossless
GIF, TIFF, V.42bis modem compression standard,
PostScript Level 2
Substitutional or dictionary-based
Algorithm builds a data dictionary
Code emitted if pattern found in dictionary,
while if not already in dictionary, it is added
Not necessary to have dictionary to do
decompression

13
Lempel-Ziv-Welch (LZW)

History
1977
Abraham Lempel and Jakob Ziv published a paper on
a universal data compression algorithm
Called LZ77
1978
Lempel and Ziv formulated an improved,
dictionary-based data compression algorithm
Called LZ78

14
Lempel-Ziv-Welch (LZW)

History
1981
While working for Sperry, Lempel and Ziv, with
some other researchers filed for a patent for
LZ78
Granted in 1984
1984
While working for Sperry, Terry Welch modified
LZ78
Result was LZW algorithm
Published in IEEE Computer

15
Lempel-Ziv-Welch (LZW)

History
1985
Sperry granted a patent for Welchs modification
and for implementation of LZW
1986
Sperry and Burroughs merged to form Unisys
Ownership of Sperry patent transferred to Unisys

16
Lempel-Ziv-Welch (LZW)

History
1987
CompuServe created GIF file format
Required use of LZW algorithm
Didnt check patents for LZW
Unisys also didnt realize GIF used LZW 1988
Aldus released Revision 5.0 of TIFF file format
Used LZW algorithm
1990
Unisys licensed Adobe for use of LZW patent for
PostScript

17
Lempel-Ziv-Welch (LZW)

History
1991
Unisys licensed Aldus for use of LZW patent in
TIFF
1993
Unisys became aware the GIF file format used LZW
Negotiations began with CompuServe

18
Lempel-Ziv-Welch (LZW)

History
1994
Unisys and CompuServe came to an understanding
that LZW algorithm by CompuServe would be
licensed for the application of the GIF file
format in software used primarily to access the
CompuServe Information Service
1995
America Online and Prodigy also entered into
license agreements with Unisys for LZW

19
Lempel-Ziv-Welch (LZW)

GIF is not in public domain
Some people were suspicious regarding the
announcement of CompuServe that it was getting a
license from Unisys
In programming community it was known for many
years prior to this that GIF used LZW and that
LZW was patented by Unisys

20
Lempel-Ziv-Welch (LZW)

Some people were suspicious regarding the
announcement of CompuServe that it was getting a
license from Unisys
Unisys claimed that CompuServe only found out
rather late that this was the case
GIF was becoming an integral part of WWW for
exchanging low-resolution graphics

21
Lempel-Ziv-Welch (LZW)

Eventually, Unisys LZW patent and licensing
agreements held
Unisys reduced license fees after 1995
Unisys wouldnt charge anything for inadvertent
infringement by GIF software products delivered
prior to 1995
License fees still required for updates delivered
after 1995

22
Lempel-Ziv-Welch (LZW)

Not illegal to own, transmit, or receive GIF
files, just to compress or decompress them
without a license

23
Lempel-Ziv-Welch (LZW)
24
Lempel-Ziv-Welch (LZW)
25
Lempel-Ziv-Welch (LZW)
26
JPEG

Joint Photographic Experts Group
1982
ISO (International Standard Organization) formed
Photographic Experts Group (PEG)
Develop methods of transmitting video, images and
text over ISDN (Integrated Services Digital
Network) lines

27
JPEG

1986
Subgroup of CCITT (International Telegraph and
Telephone Consultative Committee) began to look
at methods of compressing color and gray-scale
data for fax transmission
Methods for this were similar to those being
considered by PEG

28
JPEG

1987
Two groups combined into JPEG
Most previous compression methods did poor job of
compressing continuous-tone image data

29
JPEG

Very few file formats can support 24-bit raster
images
GIF only works for 256 colors
LZW doesnt work well on scanned image data
TIFF and BMP didnt compress this type of image
data very well

30
JPEG

JPEG compresses continuous tone image data with a
pixel depth of 6-24 bits with good efficiency
JPEG itself doesnt define standard file format

31
JPEG

Toolkit of methods with quality-compression
trade-off
Lossy
Discards information that human eye cannot easily
see
Slight changes in color not perceived well
Slight changes in intensity are well perceived

32
JPEG

Works well with color or gray-scale continuous
tone images photographs, video stills, complex
graphics which resemble natural objects
Doesnt work well for animations, ray tracing,
line art, black-and-white documents, and typical
vector graphics

33
JPEG

End-user can tune quality of JPEG encoder through
use of Q-factor, which ranges from 1-100
Q-factor 1 produces smallest, worst quality
images
Q-factor 100 produces largest, best quality
images
Optimal value of Q-factor is image dependent

34
JPEG

JPEG introduces artifacts in images containing
large areas of a single color
JPEG is slow if implemented in software
Baseline JPEG
Minimal subset of JPEG which all JPEG-aware
applications are required to support

35
JPEG
36
JPEG

Color transform
Encodes each component in a color model
separately
Is independent of any color space model

37
JPEG

Color transform
Best compression ratios result if a luminance
(gray scale)/chrominance (color) color space,
such as YUV, is used
Human eyes more sensitive to luminance
information (Y) than to chrominance information
(U, V)
The other models spread human sensitive
information across each of their 3 components

38
JPEG

Down-sampling
Average groups of pixels together
To exploit humans lesser sensitivity to
chrominance information, we use fewer pixels for
the chrominance channels
In an image of 1000 ? 1000 pixels, we might use
1000 ? 1000 luminance pixels, but only
500 ? 500 chrominance pixels
Each chrominance pixel covers the same area as a
2 ? 2 block of luminance pixels

39
JPEG

Down-sampling
For each 2 ? 2 block, we can store 6 pixel values
4 luminance values and 2 chrominance values 1
for each of 2 channels
instead of 12
4 pixel values for each of 3 channels
This 50 reduction in data has almost no
perceivable effect

40
JPEG

Discrete cosine transform
For each color channel, the image data is divided
into 8 ? 8 blocks
DCT applied to each block
Low-order, or DC, term represents average value
in the block
Successive higher-order, or AC, terms represent
the strength of more rapid changes across the
block

41
JPEG

Discrete cosine transform
Can discard high-frequency data
DCT is lossless except for roundoff errors
DCT is most costly step in JPEG

42
JPEG

Scan-order of each 8 ? 8 block of pixels for DCT

43
JPEG

An 8 ? 8 block from an 8 bit image

44
JPEG

The DCT coefficients corresponding to the
previous 8 ? 8 block

45
JPEG

Quantization
Divide DCT output by a quantization coefficient
and round result to integer
The larger the coefficient, the more data is lost
Each of the 64 positions of the DCT output block
has its own coefficient
Higher order terms have a larger coefficient
Different coefficients for luminance and
chrominance channels

46
JPEG

Quantization
This is the step controlled by the quality-factor
Selecting quantization coefficients is an art

47
JPEG

Sample quantization table
Coefficients based on human perception

48
JPEG

Labels
Label labij corresponding to the quantized value
of the transform coefficient cij is
where Qij is the (i,j)th element of the
quantization table

49
JPEG

Quantizer labels corresponding to the previous 8
? 8 block

50
Encoding

Huffman compress resulting coefficients
Can use arithmetic coding as well

51
Huffman Coding

Lossless

52
Huffman Coding
53
Huffman Coding
54
Arithmetic Coding

Lossless

String cadd
55
Arithmetic Coding
56
JPEG Extensions

Progressive
For applications that need to receive JPEG data
streams and display them on the fly
Baseline JPEG image can be displayed only after
all of the image data has been received

57
JPEG Extensions

Progressive
Instead of interlacing, where a majority of the
image must be sent to be able to tell what it is,
we send successively better resolution images
Lossless JPEG

58
Fractal Compression

Suppose we have a linear, non-identity, function
of one variable, g, having xf as a fixed point
g(xf) xf
We can compute the fixed point by the
approximation x, g(x), g(g(x)), g(g(g(x))),
, where x is any initial approximation

59
Fractal Compression

Example
f(x) ax b
Fixed point is solution to xf axf b or
For a 0.5, b 1, we have that xf 2

60
Fractal Compression

Example
To calculate the fixed point by the previous
approximation, use the initial guess 1 and
calculate g(1), g(g(1)), g(g(g(1))), , where
g(x) x/2 1
The approximations are 1.5, 1.75, 1.875, 1.9375,
, which converges to 2, the fixed point

61
Fractal Compression

Given an image I, treated as an array of
integers, suppose we have a non-identity function
g(I) I
If it was cheaper to encode g than to encode I,
we could communicate g and reconstruct I by the
sequence of approximations I0, g(I0), g(g(I0)),
g(g(g(I0))), , where I0 is the all zero image

62
Fractal Compression

Partition image into equal size range blocks
For each range block, Rk, find a domain block,
Dk, twice the size of a range block, and a
function gk such that

63
Fractal Compression

Consider the function
This function has a fixed point If g(If), where
This function has a fixed point If g(If), where

64
Fractal Compression

gk is a composition of a geometric transformation
followed by a massic transformation
Geometric transformation
Moves domain block
Changes the size of the domain block
Massic transformation
Adjusts intensity and orientation of pixels

65
Fractal Compression

Write a Comment

User Comments (0)

About PowerShow.com

Data Compression PowerPoint PPT Presentation