Data Compression - PowerPoint PPT Presentation

About This Presentation
Title:

Data Compression

Description:

Selecting quantization coefficients is an art. Data Compression. 47. JPEG ... Fractal Compression ... Fractal Compression. Example. f(x) = ax b. Fixed point ... – PowerPoint PPT presentation

Number of Views:1667
Avg rating:3.0/5.0
Slides: 66
Provided by: billg96
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Data Compression


1
Data Compression
2
Terminology
  • Physical versus logical
  • Physical
  • Performed on data regardless of what information
    it contains
  • Translates a series of bits to another series of
    bits
  • Logical
  • Knowledge-based
  • Change United Kingdom to UK

3
Terminology
  • Symmetric
  • Compression and decompression roughly use the
    same techniques and take just as long
  • Data transmission which requires compression and
    decompression on-the-fly will require these types
    of algorithms

4
Terminology
  • Asymmetric
  • Most common is where compression takes a lot more
    time than decompression
  • In an image database, each image will be
    compressed once and decompressed many times
  • Less common is where decompression takes a lot
    more time than compression
  • Creating many backup files which will hardly ever
    be read

5
Terminology
  • Non-adaptive
  • Contain a static dictionary of predefined
    substrings to encode which are known to occur
    with high frequency
  • Adaptive
  • Dictionary is built from scratch

6
Terminology
  • Semi-adaptive
  • In pass 1, an optimal dictionary is constructed
  • In pass 2, the actual compression occurs

7
Terminology
  • Lossless
  • decompress(compress(data)) data
  • Lossy
  • decompress(compress(data)) ? data
  • A small change in pixel values may be invisible,
    however

8
Pixel Packing
9
Run-Length Encoding
  • Repeating string of characters, called a run, is
    coded into two bytes
  • First byte contains the run count, one less than
    the number of repetitions
  • Second byte contains the run value, the character
    being repeated

10
Run-Length Encoding
  • 77777zzzyyyyyyV becomes 472z5y0V
  • 15 byte string becomes 8 bytes long
  • Compression ratio of almost 2 to 1
  • Some strings become twice as long
  • 7fu5JLY9jhYIujG

11
(No Transcript)
12
Lempel-Ziv-Welch (LZW)
  • Lossless
  • GIF, TIFF, V.42bis modem compression standard,
    PostScript Level 2
  • Substitutional or dictionary-based
  • Algorithm builds a data dictionary
  • Code emitted if pattern found in dictionary,
    while if not already in dictionary, it is added
  • Not necessary to have dictionary to do
    decompression

13
Lempel-Ziv-Welch (LZW)
  • History
  • 1977
  • Abraham Lempel and Jakob Ziv published a paper on
    a universal data compression algorithm
  • Called LZ77
  • 1978
  • Lempel and Ziv formulated an improved,
    dictionary-based data compression algorithm
  • Called LZ78

14
Lempel-Ziv-Welch (LZW)
  • History
  • 1981
  • While working for Sperry, Lempel and Ziv, with
    some other researchers filed for a patent for
    LZ78
  • Granted in 1984
  • 1984
  • While working for Sperry, Terry Welch modified
    LZ78
  • Result was LZW algorithm
  • Published in IEEE Computer

15
Lempel-Ziv-Welch (LZW)
  • History
  • 1985
  • Sperry granted a patent for Welchs modification
    and for implementation of LZW
  • 1986
  • Sperry and Burroughs merged to form Unisys
  • Ownership of Sperry patent transferred to Unisys

16
Lempel-Ziv-Welch (LZW)
  • History
  • 1987
  • CompuServe created GIF file format
  • Required use of LZW algorithm
  • Didnt check patents for LZW
  • Unisys also didnt realize GIF used LZW 1988
  • Aldus released Revision 5.0 of TIFF file format
  • Used LZW algorithm
  • 1990
  • Unisys licensed Adobe for use of LZW patent for
    PostScript

17
Lempel-Ziv-Welch (LZW)
  • History
  • 1991
  • Unisys licensed Aldus for use of LZW patent in
    TIFF
  • 1993
  • Unisys became aware the GIF file format used LZW
  • Negotiations began with CompuServe

18
Lempel-Ziv-Welch (LZW)
  • History
  • 1994
  • Unisys and CompuServe came to an understanding
    that LZW algorithm by CompuServe would be
    licensed for the application of the GIF file
    format in software used primarily to access the
    CompuServe Information Service
  • 1995
  • America Online and Prodigy also entered into
    license agreements with Unisys for LZW

19
Lempel-Ziv-Welch (LZW)
  • GIF is not in public domain
  • Some people were suspicious regarding the
    announcement of CompuServe that it was getting a
    license from Unisys
  • In programming community it was known for many
    years prior to this that GIF used LZW and that
    LZW was patented by Unisys

20
Lempel-Ziv-Welch (LZW)
  • Some people were suspicious regarding the
    announcement of CompuServe that it was getting a
    license from Unisys
  • Unisys claimed that CompuServe only found out
    rather late that this was the case
  • GIF was becoming an integral part of WWW for
    exchanging low-resolution graphics

21
Lempel-Ziv-Welch (LZW)
  • Eventually, Unisys LZW patent and licensing
    agreements held
  • Unisys reduced license fees after 1995
  • Unisys wouldnt charge anything for inadvertent
    infringement by GIF software products delivered
    prior to 1995
  • License fees still required for updates delivered
    after 1995

22
Lempel-Ziv-Welch (LZW)
  • Not illegal to own, transmit, or receive GIF
    files, just to compress or decompress them
    without a license

23
Lempel-Ziv-Welch (LZW)
24
Lempel-Ziv-Welch (LZW)
25
Lempel-Ziv-Welch (LZW)
26
JPEG
  • Joint Photographic Experts Group
  • 1982
  • ISO (International Standard Organization) formed
    Photographic Experts Group (PEG)
  • Develop methods of transmitting video, images and
    text over ISDN (Integrated Services Digital
    Network) lines

27
JPEG
  • 1986
  • Subgroup of CCITT (International Telegraph and
    Telephone Consultative Committee) began to look
    at methods of compressing color and gray-scale
    data for fax transmission
  • Methods for this were similar to those being
    considered by PEG

28
JPEG
  • 1987
  • Two groups combined into JPEG
  • Most previous compression methods did poor job of
    compressing continuous-tone image data

29
JPEG
  • Very few file formats can support 24-bit raster
    images
  • GIF only works for 256 colors
  • LZW doesnt work well on scanned image data
  • TIFF and BMP didnt compress this type of image
    data very well

30
JPEG
  • JPEG compresses continuous tone image data with a
    pixel depth of 6-24 bits with good efficiency
  • JPEG itself doesnt define standard file format

31
JPEG
  • Toolkit of methods with quality-compression
    trade-off
  • Lossy
  • Discards information that human eye cannot easily
    see
  • Slight changes in color not perceived well
  • Slight changes in intensity are well perceived

32
JPEG
  • Works well with color or gray-scale continuous
    tone images photographs, video stills, complex
    graphics which resemble natural objects
  • Doesnt work well for animations, ray tracing,
    line art, black-and-white documents, and typical
    vector graphics

33
JPEG
  • End-user can tune quality of JPEG encoder through
    use of Q-factor, which ranges from 1-100
  • Q-factor 1 produces smallest, worst quality
    images
  • Q-factor 100 produces largest, best quality
    images
  • Optimal value of Q-factor is image dependent

34
JPEG
  • JPEG introduces artifacts in images containing
    large areas of a single color
  • JPEG is slow if implemented in software
  • Baseline JPEG
  • Minimal subset of JPEG which all JPEG-aware
    applications are required to support

35
JPEG
36
JPEG
  • Color transform
  • Encodes each component in a color model
    separately
  • Is independent of any color space model

37
JPEG
  • Color transform
  • Best compression ratios result if a luminance
    (gray scale)/chrominance (color) color space,
    such as YUV, is used
  • Human eyes more sensitive to luminance
    information (Y) than to chrominance information
    (U, V)
  • The other models spread human sensitive
    information across each of their 3 components

38
JPEG
  • Down-sampling
  • Average groups of pixels together
  • To exploit humans lesser sensitivity to
    chrominance information, we use fewer pixels for
    the chrominance channels
  • In an image of 1000 ? 1000 pixels, we might use
    1000 ? 1000 luminance pixels, but only
  • 500 ? 500 chrominance pixels
  • Each chrominance pixel covers the same area as a
  • 2 ? 2 block of luminance pixels

39
JPEG
  • Down-sampling
  • For each 2 ? 2 block, we can store 6 pixel values
  • 4 luminance values and 2 chrominance values 1
    for each of 2 channels
  • instead of 12
  • 4 pixel values for each of 3 channels
  • This 50 reduction in data has almost no
    perceivable effect

40
JPEG
  • Discrete cosine transform
  • For each color channel, the image data is divided
    into 8 ? 8 blocks
  • DCT applied to each block
  • Low-order, or DC, term represents average value
    in the block
  • Successive higher-order, or AC, terms represent
    the strength of more rapid changes across the
    block

41
JPEG
  • Discrete cosine transform
  • Can discard high-frequency data
  • DCT is lossless except for roundoff errors
  • DCT is most costly step in JPEG

42
JPEG
  • Scan-order of each 8 ? 8 block of pixels for DCT

43
JPEG
  • An 8 ? 8 block from an 8 bit image

44
JPEG
  • The DCT coefficients corresponding to the
    previous 8 ? 8 block

45
JPEG
  • Quantization
  • Divide DCT output by a quantization coefficient
    and round result to integer
  • The larger the coefficient, the more data is lost
  • Each of the 64 positions of the DCT output block
    has its own coefficient
  • Higher order terms have a larger coefficient
  • Different coefficients for luminance and
    chrominance channels

46
JPEG
  • Quantization
  • This is the step controlled by the quality-factor
  • Selecting quantization coefficients is an art

47
JPEG
  • Sample quantization table
  • Coefficients based on human perception

48
JPEG
  • Labels
  • Label labij corresponding to the quantized value
    of the transform coefficient cij is
  • where Qij is the (i,j)th element of the
    quantization table

49
JPEG
  • Quantizer labels corresponding to the previous 8
    ? 8 block

50
Encoding
  • Huffman compress resulting coefficients
  • Can use arithmetic coding as well

51
Huffman Coding
  • Lossless

52
Huffman Coding
53
Huffman Coding
54
Arithmetic Coding
  • Lossless

String cadd
55
Arithmetic Coding
56
JPEG Extensions
  • Progressive
  • For applications that need to receive JPEG data
    streams and display them on the fly
  • Baseline JPEG image can be displayed only after
    all of the image data has been received

57
JPEG Extensions
  • Progressive
  • Instead of interlacing, where a majority of the
    image must be sent to be able to tell what it is,
    we send successively better resolution images
  • Lossless JPEG

58
Fractal Compression
  • Suppose we have a linear, non-identity, function
    of one variable, g, having xf as a fixed point
  • g(xf) xf
  • We can compute the fixed point by the
    approximation x, g(x), g(g(x)), g(g(g(x))),
    , where x is any initial approximation

59
Fractal Compression
  • Example
  • f(x) ax b
  • Fixed point is solution to xf axf b or
  • For a 0.5, b 1, we have that xf 2

60
Fractal Compression
  • Example
  • To calculate the fixed point by the previous
    approximation, use the initial guess 1 and
    calculate g(1), g(g(1)), g(g(g(1))), , where
    g(x) x/2 1
  • The approximations are 1.5, 1.75, 1.875, 1.9375,
    , which converges to 2, the fixed point

61
Fractal Compression
  • Given an image I, treated as an array of
    integers, suppose we have a non-identity function
    g(I) I
  • If it was cheaper to encode g than to encode I,
    we could communicate g and reconstruct I by the
    sequence of approximations I0, g(I0), g(g(I0)),
    g(g(g(I0))), , where I0 is the all zero image

62
Fractal Compression
  • Partition image into equal size range blocks
  • For each range block, Rk, find a domain block,
    Dk, twice the size of a range block, and a
    function gk such that

63
Fractal Compression
  • Consider the function
  • This function has a fixed point If g(If), where
  • This function has a fixed point If g(If), where

64
Fractal Compression
  • gk is a composition of a geometric transformation
    followed by a massic transformation
  • Geometric transformation
  • Moves domain block
  • Changes the size of the domain block
  • Massic transformation
  • Adjusts intensity and orientation of pixels

65
Fractal Compression
Write a Comment
User Comments (0)
About PowerShow.com