Review of Information Theory - PowerPoint PPT Presentation

About This Presentation
Title:

Review of Information Theory

Description:

Review of Information Theory Chapter 2 Information Suppose that we have the source alphabet of q symbols s1, s2,.., sq, each with its probability p(si)=pi. – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 41
Provided by: ISh105
Category:

less

Transcript and Presenter's Notes

Title: Review of Information Theory


1
Review of Information Theory
  • Chapter 2

2
Information
  • Suppose that we have the source alphabet of q
    symbols s1, s2,.., sq, each with its probability
    p(si)pi. How much information do we get when we
    receive one of these symbols ?
  • For example, if p11, then there is no
    surprise, no information since you know what
    the message must be.

3
Information
  • On the other hand, if the probabilities are all
    very different, then when a symbol with a low
    probability arrives, you feel more surprised, get
    more information, than when a symbol with a
    higher probability arrives.
  • I(p) A function which measures the amount of
    information-surprise, uncertainty -- in the
    occurrence of an event of probability p.

4

Information
  • Properties of I(p)
  • 1) I(p) ? 0 (a real nonnegative measure)
  • 2) I(p1p2) I(p1)I(p2) for independent event.
  • 3) I(p) is a continuous function of p.
  • Define I(p) log(1/p) log p

5
Entropy
  • If we get I(si) units of information when we
  • receive the symbol si , how much do we get on the
    average ?
  • Since pi is the probability of getting the
  • information I(si), then on the average we get for
    each symbol si,
  • pi I(si) pi log(1/ pi)

6

Entropy
  • Thus, on the average, over the whole alphabet of
    symbols si, we will get

  • Conventionally, we write
  • as the entropy of the signaling system S having
    symbols si and probability pi.

7
Entropy
  • Note that this is the entropy function for a
    distribution when all that is considered are the
    probabilities pi of the symbols si.
  • Inter-pixel correlation is not considered here.
  • If this is to be considered, we should compute
    the entropy function by a Markov process.

8
Entropy
  • Example Ss1, s2, s3, s4, p11/2, p21/4,
    p3 p41/8.
  • Then H2(S) 1/2 log22 1/4 log24 1/8log28
    1/8log28
  • (1/2)(1/4)2(1/8)3(1/8)3
  • 1(3/4) bits of information.
  • In this example, Lavg of Huffman code can reach
    Hr(S). But, this is not the general case.

9
Entropy
  • Consider the distribution consisting of just two
    events. Let p be the probability of the first
    symbol (event). Then, the entropy function is
  • H2(P) p log2(1/p) (1- p)log21/(1-p)
  • The maximum of H2(P) occurs when p 1/2.
  • H2(P) I(p)

10
Entropy
11
Entropy and Coding
  • Instantaneous Codes A code is said to be
    instantaneous if when a complete symbol is
    received, the receiver immediately knows this,
    and do not have to look further before deciding
    what message symbol is received.
  • No encoded symbol of this code is a prefix of any
    other symbol.

12
Entropy and Coding
  • Example s10
    s10 s210
    s201
    s3110
    s3011 s4111
    s4111 instantaneous
    non-instantaneous
  • 101100111- s2s3s1s4
    011111111- s3s4s4
  • It is necessary and sufficient that an
    instantaneous code have no code word si which is
    a prefix of another code word, sj.

13
Entropy and Coding
  • Theorem. (The Kraft Inequality) A necessary and
    sufficient condition for the existence of an
    instantaneous code S of q symbols si with
    encoded words of lengths l1? l2?. ? lq is
  • where r is the radix ( number of symbols ) of
    the alphabet of the encoded symbols.

14
Entropy and Coding
  • Theorem.
  • The entropy supplies a lower bound on the average
    cod length Lavg for any instantaneous decodable
    system.

15
Shannon-Fano Coding
  • Shannon-Fano coding is less efficient than is
    Huffman coding, but has the advantage that you
    can go directly from the probability pi to the
    code word length li.

16
Shannon-Fano Coding
  • Given the source symbols s1, s2, , sq and their
    corresponding probabilities p1, p2, , pq, then
    for each pi there is an integer li such that
  • Kraft Inequality.
  • Therefore, there is an instantaneous decodable
    code having these Shannon-Fano lengths.

17
Shannon-Fano Coding
  • Entropy of the distribution pi
  • logr(1/pi) ? li lt logr(1/pi) 1
  • ? Hr(S) ? pilogr (1/pi) ? ? pili lt Hr(S)1
  • Thus,
  • Hr(S) ? Lavglt Hr(S) 1
  • The entropy serves as a lower bound and part of
    the upper bound on the average length for the
    Shannon-Fano coding.

18
Shannon-Fano Coding
  • Example p1 p2 1/4, p3 p4 p5 p61/8
  • Shannon-Fano lengths
  • l1 l2 2, l3 l4 l5 l63
  • we then assign
  • s100 s3100
  • s201 s4101
  • s5110
  • s6111

19
Shannon-Fano Coding
  • Since Shannon-Fano coding obeys the Kraft
    inequality, there are always enough symbols to
    assign for an instantaneous code.
  • Thus, orderly assignment leads readily to the
    decoding tree and the prefix condition is met.

20
How Bad Is Shannon-Fano Coding ?
  • Consider the source alphabet s1, s2 with
    probabilities
  • p1 1 , p2 (K ? 2)
  • we get
  • log2(1/p1) ? log2 2 1
  • Thus, l11. But for l2, we have
  • log2(1/p2) log22K K l2
  • Hence where the Huffman encoding gives both code
    words one binary digit, Shannon-Fano has s1, a
    1-bit word, and s2, a K-bit word.

21
How Bad Is Shannon-Fano Coding ?
  • Consider Lavg for Huffman encoding and
    Shannon-Fano encoding.
  • Clearly, Lavg (Huffman) 1

22
For Shannon-Fano, we have
Lavg 1(1 ) K( )


1
(K-1) / K
1 (K-1) /
2
1 1/4 1.25
3 1 1/4
1.25 4
1 3/16 1.1875
5 1 1/8
1.125 6
1 5/64 1.078125
etc.
How Bad Is Shannon-Fano Coding ?

23
Extensions of a Code
  • If we encode not one symbol at a time, but make
    up a code for blocks of symbols of the source
    code at a time, then we can expect to get a
    better approximation to the lower bound Hr(S),
    because the probabilities of the extension are
    more variable than the original probabilities.
  • Definition The n-th extension of a source
    alphabet s1, s2,. sq symbols of the form
    si1si2.sin having the probabilities Qi
    pi1pi2.pin. Each block of the n original
    symbols now becomes a single symbol ti with
    probability Qi. We label this alphabet SNT.

24
Extensions of a Code
  • Example Encoding
  • s1 p1 2/3 s1 0
  • s2 p2 1/3 s2
    1 Lavg 1
  • The second extension takes two symbols at a time.

    Encoding
  • s1s1 p11 4/9 s1s1 1
  • s1s2 p12 2/9 s1s2
    01
  • s2s1 p21 2/9 s2s1
    000
  • s2s2 p22 1/9 s2s2
    001
  • Lavg 17/18 0.9444
  • Lavg for the third extension is 76/87
    0.93827
  • Lavg for the fourth extension 0.93827.

25
Extensions of a Code
  • Theorem. The entropy of the n-th extension of
    an alphabet is n times the entropy of the
    original alphabet. I.E.
  • Hr(T) n Hr(S).
  • Proof. Hr(T) Hr( )

26
Extensions of a Code
  • Thus, Hr(T) n Hr(S)

27
Extensions of a Code
  • Hr(Sn)? Lavg(T)ltHr(Sn)1
  • ? nHr(S)? Lavg(T)ltnHr(S)1
  • ? Hr(S)? Lavg(T)/nltHr(S)1/n ---------(A)
  • Thus, for a sufficient large nth extension of the
    code, we can bring the average code word length
    as close as we please to the entropy Hr(S).
  • Shannons noiseless coding theorem the n-th
    extension of a code satisfies inequality (A).

28
Extensions of a Code
  • Since Huffman coding is optimal, we have


  • Hr(S) ? (T) / n ? (T) / n lt Hr
    (S) 1/n.
  • Note Proof of that Huffman code is optimal can
    be found in many texts, including Fundamental of
    Computer Algorithms by Horowitz and Shani
    (Greedy Method ).

29
Example s1 p1 2/3, s2 p2 1/3
The entropy is H2 (S) 2/3 log2
(3/2) (1/3)log23 0.9182958. (lower bound)
n (T)/n
(T)/n n (T)/n
(T)/n 1
1.00000 1.33333
4 0.93827 1.08333
2 0.94444 1.33333
5 0.92263
0.93333 3 0.93827
1.00000
Extensions of a Code

30
Entropy of a Markov Process
  • For the input stream of data, a Markov process is
    used to handle the correlation between successive
    symbols.
  • Given that we
    have already seen the previous m symbols,
    this is the probability that the
    next symbol will be si.

31
Entropy of a Markov Process
  • Given that we have already seen ( are in the
    state)
  • define the amount of information we get when we
  • receive a symbol si as
  • For example, in English, the letter q is almost
    always followed by the letter u. Thus,
    I(uq) ? 0 and I(pq) ? ?

32
Entropy of a Markov Process
  • In computer language, a zero memory source uses
    each source symbol independently of what has
    proceeded it. A m-memory source uses the
    previous m symbols, and this idea corresponds
    exactly to a m-th order Markov process.
  • In general, for a m-memory source, there are q m
    states in the Markov process.

33
Entropy of a Markov Process
  • Example
  • p(aa) 1/3 p(ba) 1/3 p(ca) 1/3
  • p(ab) 1/4 p(bb) 1/2 p(cb) 1/4
  • p(ac) 1/4 p(bc) 1/4 p(cc) 1/2





















34
Entropy of a Markov Process
  • What is the probability distribution of the
    states a, b, and c ?
  • pap(aa)pbp(ab)pcp(ac)pa
  • pap(ba)pbp(bb)pcp(bc)pb
  • pap(ca)pbp(cb)pcp(cc)pc
  • pa pbpc 1
  • ? pa 3/11, pb pc 4/11. (equilibrium
    solution)

35
Entropy of a Markov Process
  • Conditional entropy of the source alphabet S
    given that we have the sequence of symbols
    is defined as

36
Entropy of a Markov Process
  • Define the entropy of the Markov system as the
    probabilities of being in a state times the
    conditional entropy of the states.

37
Example p(00,0) 0.8 p(11,1)
p(10,0) 0.2 p(01,1)
p(00,1) 0.5 p(11,0)
p(10,1) 0.5 p(01,0) Using the obvious
symmetry, we see that
p(0,0) p(1,1) and p(0,1) p(1,0)
Therefore, we must have the equations

0.4 p(0,0)p(0,1)0
Entropy of a Markov Process

38
Since we must be at some place, we also have
p(0,0) p(0,1) p(1,0)
p(1,1) 1 or 2p(0,0) 2p(0,1)
1? p(0,0) 5/14 p(1,1) p(0,1) 2/14
p(1,0). 0 0 0
0.8 5/14
4/14 0 0 1 0.2
5/14 1/14 0
1 0 0.5
2/14 1/14 0 1 1
0.5 2/14
1/14 1 0 0 0.5
2/14 1/14
1 0 1 0.5
2/14 1/14 1 1
0 0.2 5/14
1/14 1 1 1
0.8 5/14
4/14
Entropy of a Markov Process

39
From the table we can now compute
Entropy of a Markov Process

40
Entropy of a Markov Process
  • Since the inter-sample correlations are very high
    for natural digital signal, we must use Markov
    process to measure the entropy of that signal.
  • m ?
Write a Comment
User Comments (0)
About PowerShow.com