CSC 2300 Data Structures - PowerPoint PPT Presentation

About This Presentation
Title:

CSC 2300 Data Structures

Description:

The ASCII character set consists of about 100 'printable' characters. ... A file with only the characters a, e, i, s, t, blankspace, newline. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 18
Provided by: stude6
Learn more at: http://www.cs.rpi.edu
Category:

less

Transcript and Presenter's Notes

Title: CSC 2300 Data Structures


1
CSC 2300Data Structures Algorithms
  • April 27, 2007
  • Chap. 10. Algorithm Design Techniques

2
Today
  • File Compression
  • Huffman Code

3
ASCII
  • What does ASCII stand for?
  • The ASCII character set consists of about 100
    printable characters.
  • How many bits to represent these characters?
  • The set includes some nonprintable characters.
  • An 8th bit is added as a parity bit.

4
Example
  • A file with only the characters a, e, i, s, t,
    blankspace, newline.
  • There are seven characters, and so three bits are
    sufficient.
  • i see a seat
  • 010101011001001101000101011001000100110 (39
    bits)
  • How to do better?

5
Binary Tree
  • Binary tree
  • The data reside only at the leaves.
  • Can you improve this representation?

6
Example
  • newline becomes 11
  • i see a seat
  • 01010101100100110100010101100100010011 (38 bits)
  • A reduction of 1 bit.
  • Want more significant improvement.
  • How?

7
The Two Trees
  • What can you say about the structure of the
    better tree?
  • It a a full tree.
  • All nodes either are leaves or have two children.
  • An optimal code will always have this property.
  • Why?
  • Nodes with only one child can always move up one
    level.

8
Prefix Code
  • If the characters are placed only at the leaves,
    the given sequence of bits can be decoded
    unambiguously.
  • Prefix code no character code is a prefix of
    another character code.
  • Example 01001111000010110001000111
  • What is it?
  • is
  • a tie

9
Optimal Prefix Code
  • Binary tree
  • How to find optimal code?

10
Our Example
  • i see a seat
  • 1011000000101110011100000010010001 (34 bits)
  • The code in the table is not optimal for our
    example.
  • Why not?
  • Exercise. Find the optimal code for our example.

11
Huffmans Algorithm
  • Assume that there are C characters.
  • Maintain a forest of trees.
  • The weight of a tree is equal to the sum of the
    frequencies of its leaves.
  • For C 1 times, select the two trees T1 and T2
    of smallest weights, breaking ties arbitrarily,
    and form a new tree with subtrees T1 and T2.
  • At the beginning, there are C single-node trees.
    At the end, there is one single tree, which is
    the optimal Huffman coding tree.

12
Example
  • Initial stage
  • After first merge

13
Example
  • After first merge
  • After second merge
  • After third merge

14
Example
  • After third merge
  • After fourth merge

15
Example
  • After fourth merge
  • After fifth merge

16
Example
  • After fifth merge
  • After final merge

17
Implementation
  • If we maintain the trees in a priority queue,
    ordered by weight, what is the running time?
  • O( C log C ).
  • We say that Huffmans method is a two-pass
    algorithm. What are the two passes?
  • The first pass selects the frequency data and the
    second pass performs the encoding.
Write a Comment
User Comments (0)
About PowerShow.com