Data Compression and Data Structures - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Data Compression and Data Structures

Description:

All data (text, numbers, colours) is represented by a binary number ... Fancy text (e.g. Chinese alphabet, emoticons) can be represented by two bytes (Unicode) ... – PowerPoint PPT presentation

Number of Views:189
Avg rating:3.0/5.0
Slides: 12
Provided by: sandyg
Category:

less

Transcript and Presenter's Notes

Title: Data Compression and Data Structures


1
Data Compression and Data Structures
  • Computing Away from the ComputerSandy
    GrahamUniversity of Waterloo

2
Outline
  • How is data represented?
  • What is data/text compression?
  • How does Huffman coding work?
  • What are the required data structures?

3
How is data represented?
  • All data (text, numbers, colours) is represented
    by a binary number
  • Binary numbers have only two digits 0 and 1
  • Eight binary digits grouped together form a byte
  • Simple text (i.e. letters, numbers, punctuation)
    can be represented with a single byte (ASCII
    code)
  • Fancy text (e.g. Chinese alphabet, emoticons)
    can be represented by two bytes (Unicode)

4
What is data compression?
  • Reduce the size of the original file
  • examples zip, jpeg, mp3
  • lossless vs. lossy
  • Text compression often relies upon patterns in
    the English language
  • different frequencies of letters - Huffman coding
  • patterns of letters - LZW
  • Compression ratio
  • (size of the output stream)/(size of the input
    stream)
  • efficiency of the algorithm often depends on the
    data

5
How does Huffman coding work?
  • n bits can provide codes for 2n different
    characters (eg. 4 bits can provide enough codes
    for 16 different characters)
  • Letters that appear more frequently in the text
    use shorter codes than letters that appear less
    frequently
  • The code for any letter must not be the prefix of
    any other code
  • Use a binary tree to determine a prefix-free code
    for all characters in the original text

6
What are the requireddata structures? - Tree
Eg. a binary tree with three levels
root
branch
data
node
leaf
7
Activity 1
  • Fill in the chart of characters and their codes
  • use the binary tree to find the path to each
    letter
  • the left branch in the tree represents a 0, the
    right branch in the tree represents a 1
  • Encode characters using the chart
  • ensure the codes are prefix-free
  • Decode the message using the tree or chart
  • calculate the frequencies of the characters
  • ensure the more frequent characters have the
    shorter codes
  • Calculate the compression ratio
  • how many bits would it take to code the message
    without the compression
  • how many bits in the compressed version of the
    text

8
More data structures - Queue
front
rear
  • You can only affect data in the queue in the
    front or in the rear
  • Enqueue - add a value to the rear of the queue
  • Dequeue - remove the value from the front of the
    queue

9
More data structures - Priority Queue
  • Similar to a regular queue
  • Enqueue (with an associated priority)
  • Dequeue the highest priority item
  • Intuitive implementation - keep a list sorted in
    order by priority
  • Efficient implementation a heap
  • a complete binary tree with every parent node
    having higher priority than its children
  • nicely implemented with an array

10
Activity 2Creating the Tree
  • 1. Create a single node tree for each character.
    Include the frequency of each character in the
    node. Arrange the trees in order by frequency
    from lowest frequency to highest. You have
    created a queue.
  • 2. While there is more than one tree in the
    queue
  • a. Remove the two trees from the front of the
    queue (lowest frequencies in their root node).
  • b. Create a new tree with these two subtrees as
    the left and right branches. The new root node
    contains the sum of the frequencies of the roots
    of the left and right branches.
  • c. Insert the new tree to the queue in a
    position that will keep the roots in order by
    frequency. In the case of a tie, insert the new
    tree towards the end of the queue.
  • 3. The remaining tree is the Huffman Coding tree.

11
Activity 3 Another algorithm to build Huffman
code tree.
  • 1. Start with as many leaves as there are
    symbols.
  • 2. Enqueue all leaf nodes into the first queue
    (by frequency in increasing order so that the
    lowest frequency letter is in the head of the
    queue).
  • 3. While there is more than one tree in the
    queues
  • a. Dequeue the two trees with the
    lowest frequencies by examining the roots of the
    trees at the fronts of both queues. Note that
    you may end up dequeuing from the same queue
    twice. In the case of a tie, choose the tree
    with fewer levels.
  • b. Create a new tree, with the two
    just-removed trees as children (either tree can
    be either child) and the sum of the frequencies
    of the roots as the value of the root of the new
    tree.
  • c. Enqueue the new tree into the rear
    of the second queue.
  • 4. The remaining node is the root node the tree
    has now been generated.
Write a Comment
User Comments (0)
About PowerShow.com