Huffman code and Lossless Decomposition - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Huffman code and Lossless Decomposition

Description:

Shannon-Fano, Huffman, UNIX compact ... A compact code constructed using the binary Huffman code construction method. Data Compression ... – PowerPoint PPT presentation

Number of Views:160

Avg rating:3.0/5.0

Slides: 46

Provided by: Lee144

Category:

more less

Transcript and Presenter's Notes

Title: Huffman code and Lossless Decomposition

1
Huffman code and Lossless Decomposition
CS 157 B Lecture 14

Prof. Sin-Min Lee
Department of Computer Science

2
Data Compression

Data discussed so far have used FIXED length for
representation
For data transfer (in particular), this method is
inefficient.
For speed and storage efficiencies, data symbols
should use the minimum number of bits possible
for representation.

3
Data Compression

Methods Used For Compression
Encode high probability symbols with fewer bits
Shannon-Fano, Huffman, UNIX compact
Encode sequences of symbols with location of
sequence in a dictionary
PKZIP, ARC, GIF, UNIX compress, V.42bis
Lossy compression
JPEG and MPEG

4
Data Compression

Average code length
Instead of the length of individual code
symbols or words, we want to know the
behavior of the complete information source

5
Data Compression

Average code length
Assume that symbols of a source alphabet
a1,a2,,aM are generated with probabilities
p1,p2,,pM
P(ai) pi (i 1, 2, , M)
Assume that each symbol of the source alphabet is
encoded with codes of lengths l1,l2,,lM

6
Data Compression

Average code length
Then the Average code length, L, of an
information source is given by

7
Data Compression

Variable Length Bit Codings
Rules
Use minimum number of bits
AND
No code is the prefix of another code
AND
3. Enables left-to-right, unambiguous decoding

8
Data Compression

Variable Length Bit Codings
No code is a prefix of another
For example, cant have A map to 10 and B map
to 100, because 10 is a prefix (the start of) 100.

9
Data Compression

Variable Length Bit Codings
Enables left-to-right, unambiguous decoding
That is, if you see 10, you know its A, not
the start of another character.

10
Data Compression

Variable Length Bit Codings
Suppose A appears 50 times in text, but B
appears only 10 times
ASCII coding assigns 8 bits per character, so
total bits for A and B is 60 8 480
If A gets a 4-bit code and B gets a 12-bit
code, total is 50 4 10 12 320

11
Data Compression

Variable Length Bit Codings
Example

Average code length 1.75
12
Data Compression

Variable Length Bit Codings
Question
Is this the best that we can get?

13
Data Compression

Huffman code
Constructed by using a code tree, but starting at
the leaves
A compact code constructed using the binary
Huffman code construction method

14
Data Compression

Huffman code Algorithm
Make a leaf node for each code symbol
Add the generation probability of each symbol to
the leaf node
Take the two leaf nodes with the smallest
probability and connect them into a new node
Add 1 or 0 to each of the two branches
The probability of the new node is the sum of the
probabilities of the two connecting nodes
If there is only one node left, the code
construction is completed. If not, go back to (2)

15
Data Compression

Huffman code Example
Character (or symbol) frequencies
A 20 (.20) e.g., A occurs 20 times in a 100
character document, 1000 times in a 5000
character document, etc.
B 9 (.09)
C 15 (.15)
D 11 (.11)
E 40 (.40)
F 5 (.05)
Also works if you use character counts
Must know frequency of every character in the
document

16
Data Compression

Huffman code Example
Symbols and their associated frequencies.
Now we combine the two least common symbols
(those with the smallest frequencies) to make a
new symbol string and corresponding frequency.

C .15
A .20
D .11
F .05
B .09
E .40
17
Data Compression

Huffman code Example
Heres the result of combining symbols once.
Now repeat until youve combined all the symbols
into a single string.

C .15
A .20
D .11
BF .14
E .40
F .05
B .09
18
Data Compression
Huffman code Example
E .40
BFD .25
C .15
A .20
D .11
BF .14
F .05
B .09
19
Data Compression
ABCDEF1.0

Now assign 0s/1s to each branch
Codes (reading from top to bottom)
A 010
B 0000
C 011
D 001
E 1
F 0001
Note
None are prefixes of another

E .40
ABCDF .60
AC .35
BFD .25
C .15
A .20
D .11
BF .14
F .05
B .09
Average Code Length ?
20
Data Compression

Huffman code
There is no unique Huffman code
Assigning 0 and 1 to the branches is arbitrary
If there are more nodes with the same
probability, it doesnt matter how they are
connected
Every Huffman code has the same average code
length!

21
Data Compression

Huffman code
Quiz
Symbols A, B, C, D, E, F are being produced by
the information source with probabilities 0.3,
0.4, 0.06, 0.1, 0.1, 0.04 respectively.
What is the binary Huffman code?
A 00, B 1, C 0110, D 0100, E 0101, F
0111
A 00, B 1, C 01000, D 011, E 0101, F
01001
A 11, B 0, C 10111, D 100, E 1010, F
10110

22
Data Compression

Huffman code
Applied extensively
Network data transfer
MP3 audio format
Gif image format
HDTV
Modelling algorithms

23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
Loss-less Decompositions

Definition A decomposition of R into (R1, R2) is
called lossless if, for all legal instance of
r(R)
r ?R1 (r ) ?R2 (r )
In other words, projecting on R1 and R2, and
joining back, results in the relation you started
with
Rule A decomposition of R into (R1, R2) is
lossless, iff
R1 n R2 ? R1 or R1 n R2 ? R2
in F.

27
(No Transcript)
28
Exercise
29
Answer
30
Dependency-preserving Decompositions

Is it easy to check if the dependencies in F hold
?
Okay as long as the dependencies can be checked
in the same table.
Consider R (A, B, C), and F A ? B, B ? C
1. Decompose into R1 (A, B), and R2 (A, C)
Lossless ? Yes.
But, makes it hard to check for B ? C
The data is in multiple tables.
2. On the other hand, R1 (A, B), and R2 (B,
C),
is both lossless and dependency-preserving
Really ? What about A ? C ?
If we can check A ? B, and B ? C, A ? C is
implied.

31
Dependency-preserving Decompositions

Definition
Consider decomposition of R into R1, , Rn.
Let Fi be the set of dependencies F that
include only attributes in Ri.
The decomposition is dependency preserving,
if
(F1 ? F2 ? ? Fn ) F

32
Example Decompose Lossless but not dependency
preserving
Why ?
33
(No Transcript)
34
BCNF

Given a relation schema R, and a set of
functional dependencies F, if every FD, A ? B, is
either
1. Trivial
2. A is a superkey of R
Then, R is in BCNF (Boyce-Codd Normal Form)
Why is BCNF good ?

35
BCNF