Data Compression Basics - PowerPoint PPT Presentation

About This Presentation
Title:

Data Compression Basics

Description:

Motivation of Data Compression. Lossless and Lossy Compression Techniques. Static Lossless Compression: Huffman Coding. Correctness of Huffman Coding : prefix property. – PowerPoint PPT presentation

Number of Views:761
Avg rating:3.0/5.0
Slides: 21
Provided by: hat2
Category:

less

Transcript and Presenter's Notes

Title: Data Compression Basics


1
Data Compression Basics Huffman Coding
  • Motivation of Data Compression.
  • Lossless and Lossy Compression Techniques.
  • Static Lossless Compression Huffman Coding.
  • Correctness of Huffman Coding prefix property.

2
Why Data Compression?
  • Data storage and transmission cost money. This
    cost increases with the amount of data available.
  • This cost can be reduced by processing the data
    so that it takes less memory and less
    transmission time.
  • Data transmission is faster by using better
    transmission media or by compressing the data.
  • Data compression algorithms reduce the size of a
    given data without affecting its content.
    Examples
  • . Huffman coding
  • . Run-Length coding
  • . Lempel-Ziv coding

3
Lossless and Lossy Compression Techniques
  • Data compression techniques are broadly
    classified into lossless and lossy.
  • Lossless techniques enable exact reconstruction
    of the original document from the compressed
    information while lossy techniques do not.
  • Run-length, Huffman and Lempel-Ziv are lossless
    while JPEG and MPEG are lossy techniques.
  • Lossy techniques usually achieve higher
    compression rates than lossless ones but the
    latter are more accurate.

4
Lossless and Lossy Compression Techniques (cont'd)
  • Lempel-Ziv reads variable-sized input and outputs
    fixed length bits while Huffman coding is the
    exact opposite.
  • Lossless techniques are classified into static
    and adaptive.
  • In a static scheme, like Huffman coding, the data
    is first scanned to obtain statistical
    information before compression begins.
  • Adaptive models like Lempel-Ziv begin with an
    initial statistical distribution of the text
    symbols but modifies this distribution as each
    character or word is encoded.
  • Adaptive schemes fit the text more closely but
    static schemes involve less computations and are
    faster.

5
Introduction to Huffman Coding
  • What is the likelihood that all symbols in a
    message to be transmitted have the same number of
    occurrences?
  • Huffman coding assigns different bits to
    characters based on their frequency of
    occurrences in the given message.
  • The string to be transmitted is first analysed to
    find the relative frequencies of its constituent
    characters.
  • The coding process generates a binary tree, the
    Huffman code tree, with branches labeled with
    bits (0 and 1).
  • The Huffman tree must be sent with the compressed
    information to enable the receiver decode the
    message.

6
Example 1 Huffman Coding
  • Example 1 Information to be transmitted over the
    internet contains the following characters with
    their associated frequencies as shown in the
    following table
  • .Use Huffman technique to answer the following
    questions
  • Build the Huffman code tree for the message.
  • Use the Huffman tree to find the codeword for
    each character.
  • If the data consists of only these characters,
    what is the total number of bits to be
    transmitted? What is the percentage saving if the
    data is sent with 8-bit ASCII values without
    compression?
  • Verify that your computed Huffman codewords are
    correct.

t s o n l e a Characters
53 22 18 45 13 65 45 Frequency

7
Example 1 Huffman Coding (Solution)
  • Solution The Huffman coding process uses a
    priority queue and binary trees using the
    frequencies.
  • We begin by filling the priority queue with
    one-node binary trees each containing a frequency
    count and the symbol with that frequency.
  • The initial priority queue is built by arranging
    the one-node binary trees in decreasing order of
    frequency.
  • The object with the lowest priority is designated
    as the front of the queue.
  • At each step, the priority queue is manipulated
    as outlined next

8
Example 1 Huffman Coding (Solution)
  • The priority queue is manipulated as follows
  • 1. Dequeue two trees from the front of the queue.
  • 2. Construct a new binary tree from the two trees
    as follows
  • a. Construct a new tree by using the two trees
    that were dequeued as
  • the left and right subtrees of the new tree
  • b. Give the new tree the priority that is the sum
    of the priorities of its left and right subtrees.
  • 3. Enqueue the new tree using as its priority the
    sum of the priorities of the two trees used to
    construct it.
  • 4. Continue this process until only one tree is
    in the priority queue.

9
Example 1 Huffman Coding Step 1
  • front
  • l o s n a
    t e
  • 13 18 22 45 45 53
    65

10
Example 1 Solution (cont'd)
  • front
  • s n a
    t e
  • 22 31 45 45
    53 65
  • l o

11
Example 1 Solution (cont'd)
  • front
  • n a
    t e
  • 45 45 53
    53 65
  • s 31
  • l
    o

12
Example 1 Solution (cont'd)
  • front
  • t e
  • 53 53 65
    90
  • s 31
    n a
  • l o

13
Example 1 Solution (cont'd)
  • front
  • e
  • 65 90
    106
  • n a 53
    t
  • s
    31

  • l o

14
Example 1 Solution (cont'd)
  • front
  • 106 155
  • 53 t e
    90
  • s 31 n
    a
  • l o

15
Example 1 Solution (cont'd)
  • 261
  • 106 155
  • 53 t e
    90
  • s 31 n
    a
  • l o

16
Example 1 Solution (cont'd)
  • 261
  • 106 155
  • 53 t e
    90
  • s 31 n
    a
  • l o

1
0
1
1
0
0
1
0
0
1
0
1
17
Example 1 Solution (cont'd)
  • 261
  • 106 155
  • 53 t e
    90
  • s 31 n
    a
  • l o

1
0
1
1
0
0
1
0
0
1
0
1
18
Example 1 Solution (cont'd)
  • The sequence of zeros and ones that are the arcs
    in the path from the root to each terminal node
    are the desired codes
  • Character a e l
    n o s
    t
  • if we assume the message consists of only the
    characters a,e,l,n,o,s and t then the number of
    bits transmitted will be
  • 265253345345322418413 696 bits
  • If the message is sent uncompressed with 8-bit
    ASCII
  • representation for the characters, we have
  • 2618 2088 bits, i.e. we saved about 70
    transmission time.


01 000 0011 110 0010 10 111 Codeword
19
Example 1 Solution The Prefix Property
  • Data encoded using Huffman coding is uniquely
    decodable. This is because Huffman codes satisfy
    an important property called the prefix property.
  • This property guarantees that no codeword is a
    prefix of another Huffman codeword
  • For example, 10 and 101 cannot simultaneously be
    valid Huffman codewords because the first is a
    prefix of the second.
  • Thus, any bitstream is uniquely decodable with a
    given Huffman code.
  • We can see by inspection that the codewords we
    generated (shown in the preceding slide) are
    valid Huffman codewords.

20
Exercises
  • Using the Huffman tree constructed in this
    session, decode the following sequence of bits,
    if possible. Otherwise, where does the decoding
    fail?
  • 10100010111010001000010011
  • Using the Huffman tree construted in this
    session, write the bit sequences that encode the
    messages
  • test , state , telnet , notes
  • Mention one disadvantage of a lossless
    compression scheme and one disadvantage of a
    lossy compression scheme.
  • Write a Java program that implements the Huffman
    coding algorithm.
Write a Comment
User Comments (0)
About PowerShow.com