Recent Results in Non-Asymptotic Shannon Theory - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Recent Results in Non-Asymptotic Shannon Theory

Description:

Title: Sparse Approximation of Higher-dimensional functions containing smooth discontinuities Author: Owlnet Lab Last modified by: Rice University – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 54
Provided by: Owlne3
Category:

less

Transcript and Presenter's Notes

Title: Recent Results in Non-Asymptotic Shannon Theory


1
Recent Results in Non-Asymptotic Shannon Theory
  • Dror Baron
  • Supported by AFOSR, DARPA, NSF, ONR, and Texas
    Instruments
  • Joint work with M. A. Khojastepour, R. G.
    Baraniuk, and S. Sarvotham

2
we may someday see the end of wireline
  • S. Cherry, Edholms law of bandwidth, IEEE
    Spectrum, vol. 41, no. 7, July 2004, pp. 58-60

3
But will there ever be enough data rate?
  • R. Lucky, 1989
  • We are not very good at predicting uses until
    the actual service becomes available. I am not
    worried we will think of something when it
    happens.
  • There will always be new applications that gobble
    up more data rate!

4
How much can we improve wireless?
  • Spectrum is limited natural resource
  • Information theory says we need lots of power for
    high data rates - even with infinite bandwidth!
  • Solution transmit more power BUT
  • Limited by environmental concerns
  • Will batteries support all that power?
  • Sooner or later wireless rates will hit a wall!

5
Where can we improve?
  • Algorithms and hardware gains
  • Power-efficient computation
  • Efficient power amplifiers
  • Advances in batteries
  • Directional antennas
  • Communication gains
  • Channel coding
  • Source coding
  • Better source and channel models

6
Where will the last dB of communication
gains come from?Network information theory
(Shannon theory)
7
Traditional point to point information theory
Channel
Decoder
Encoder
  • Single source
  • Single transmitter
  • Single receiver
  • Single communication stream
  • Most aspects are well-understood

8
Network information theory
Channel
Encoder
Decoder
  • Network of
  • Multiple sources
  • Multiple transmitters
  • Multiple receivers
  • Multiple communication streams
  • Few results
  • My goal understand various costs of network
    information theory

Encoder
Channel
9
What costs has information theory overlooked?
10
Channel coding has arrived
Channel
Decoder
Encoder
  • Turbo codes Berrou et al., 1993
  • 0.5 dB gap to capacity (rate R below capacity)
  • BER10-5
  • Block length n6.5104
  • Regular LDPC codes Gallager, 1963
  • Irregular LDPC Richardson et al., 2001
  • 0.13 dB gap to capacity
  • BER10-6
  • n106

11
Distributed source coding has also arrived
Decoder
Encoder
x
y
  • Encoder for x based on syndrome of channel code
  • Decoder for x has correlated side information y
  • Various types of channel codes can be used
  • Slepian-Wolf via LDPC codes Xiong et al., 2004
  • H(XY)0.47
  • R0.5 (rate above Slepian-Wolf limit)
  • BER10-6
  • Block length n105

12
Hey! Did you notice those block lengths?
  • Information theory provides results in the
    asymptotic regime
  • Channel coding 8?gt0, rate RC-? achievable with
    ?!0 as n!1
  • Slepian-Wolf coding 8?gt0, rate RH(XY)?
    achievable with ?!0 as n!1
  • Best practical results achieved for n105
  • Do those results require large n?

13
But we live in a finite world
  • Real world data doesnt always have n106
  • IP packets
  • Emails, text messages
  • Sensornet applications
  • (small battery ! small n)
  • How do those methods perform for n104? 103?
  • How quickly can we approach the performance
    limits of information theory?

14
And we dont know the statistics either!
  • Lossless coding (single source)
  • Length-n input xBer(p)
  • Encode with wrong parameter q
  • K-L divergence penalty with variable rate codes
  • Performance loss (minor bitrate penalty)
  • Channel coding, distributed source coding
  • Encode with wrong parameter qltplt0.5
  • Fixed rate codes based on joint-typicality
  • Typical set Tq for q is smaller than Tp for p
  • As n!1, Pr(error)!1
  • Performance collapse!

15
Main challenges
  • How quickly can we approach the performance
    limits of information theory?
  • Will address for channel coding and Slepian-Wolf
  • What can we do when the source statistics are
    unknown?
  • Will address for Slepian-Wolf

16
But first . . .What does the prior art indicate?
17
Underlying problem
  • Shannon 1958
  • This inverse problem is perhaps the more
    natural in applications given a required level
    of probability of error, how long must the code
    be?
  • Motivation may have been phone and space
    communication
  • Small probability of codeword error ?
  • Wireless paradigm
  • Given k bits, what are the minimal channel
    resources to attain probability of error ??
  • Can retransmit packet ? fix large ?
  • n depends on packet length
  • Need to characterize R(n,?)

18
Error exponents
  • Fix rate RltC and codeword length n
  • Bounds on probability of error
  • Random coding Prerror2-nEr(R)
  • Sphere packing Prerror2-nEsp(R)o(n)
  • Er(R)Esp(R) for R near C

19
Error exponents
  • Fix rate RltC and codeword length n
  • Bounds on probability of error
  • Random coding Prerror2-nEr(R)
  • Sphere packing Prerror2-nEsp(R)o(n)
  • Er(R)Esp(R) for R near C
  • Shannons regime
  • This inverse problem is perhaps the more
    natural in applications given a required level
    of probability of error, how long must the code
    be?
  • Fix RltC
  • E(R)O(1)
  • log(?)O(n) good for small ?

20
Error exponents
  • Fix rate RltC and codeword length n
  • Bounds on probability of error
  • Random coding Prerror2-nEr(R)
  • Sphere packing Prerror2-nEsp(R)o(n)
  • Er(R)Esp(R) for R near C
  • Wireless paradigm
  • Given k bits, what are the minimal channel
    resources to attain probability of error ??
  • Fix ?
  • nE(R)O(1)
  • o(n) term dominates
  • Bounds diverge

21
Error exponents fail for RC-?/n0.5
22
How quickly can we approach the channel
capacity?(known statistics)
23
Binary symmetric channel (BSC) setup
s
xf(s)
y
sg(y)
Encoder f
Decoder g
zBer(n,p)
  • s21,,M input message
  • x, y, and z binary length-n sequences
  • zBernoulli(n,p) implies crossover probability p
  • Code (f,g,n,M,?) includes
  • Encoder xf(s)21,,M
  • Rate Rlog(M)/n
  • Channel yx?z
  • Decoder g reconstructs s by sg(y)
  • Error probability Prg(y)?s?

24
Non-asymptotic capacity
25
Key to solution Packing typical sets
  • Need to encode typical set TZ for z
  • Code needs to cover z2Tz
  • Need Pr(z2TZ)¼ 1-?
  • Probability ? of codeword error
  • What about rate?
  • Output space 2n possible sequences
  • Cant pack more than 2n/Tz sets into output
  • M2n/Tz
  • Minimal cardinality Tmin covers 1-?
  • CNA1-log(Tmin)

output space
Tz
26
Whats the cardinality of Tmin?
  • Consider empirical statistics nz?i zi, PZnz/n
  • plt0.5 ? Pr(z) monotone decreasing in nz
  • Minimal Tmin has form Tmin,z PZ?(?)
  • Determine ?(?) with central limit theorem (CLT)
  • EPZp, Var(PZ)p(1-p)/n
  • PzN(p,p(1-p)/n)
  • Asymptotic
  • ?p?
  • LLN ??0
  • Non-asymptotic
  • ?p?p(1-p)/n0.5
  • CLT ? ! ?(?)

27
Tight non-asymptotic capacity
  • Theorem
  • CNA(n,?)C-K(?)/n0.5o(n-0.5)
  • K(?)?-1(?) p(1-p)0.5 log((1-p)/p)
  • Gap to capacity is K(?)/n0.5o(n-0.5)
  • Note o(n-0.5) asymptotically negligible w.r.t.
    K/n0.5
  • Tightened Wolfowitz bounds up to o(n-0.5)
  • Gap to capacity of LDPC codes 2-3x greater
  • We know how quickly we can approach C

28
Non-asymptotic capacity of BSC
29
Gaussian channel results
s
xf(s)
y
sg(y)
Encoder f
Decoder g
zN(0,?2)
  • Continuous channel
  • Power constraint ?i(xi)2 nP
  • Shannon 1958 derived CNA(n,?) for Gaussian
    channel via cone packing (non-i.i.d. codebook)
  • Information spectrum bounds on probabilities of
    error indicate Gaussian codebooks are sub-optimal
  • i.i.d. codebooks arent good enough!

30
Excess power of Gaussian channel
31
How quickly can we approach the Slepian-Wolf
limit?(known statistics)
32
But first . . .Slepian-Wolf Review
33
Slepian-Wolf setup
x
fX(x)
gX(fX(x),fY(y))
Encoder fX
Decoder g
y
fY(y)
gY(fX(x),fY(y))
Encoder fY
  • x and y are correlated length-n sequences
  • Code (fX,fY,gX,gY,n,MX,MY,?X,?Y) includes
  • Encoders fX(x)21,,MX, fY(y)21,,MY
  • Rates RXlog(MX)/n, RYlog(MY)/n
  • Decoder g reconstructs x and y by gX(fX(x),fY(y))
    and gY(fX(x),fY(y))
  • Error probabilities PrgX(fX(x),fY(y))?x?X and
    PrgY(fX(x),fY(y))?y?Y

34
Slepian-Wolf theorem
  • Theorem SlepianWolf,1973
  • RXH(XY) (conditional entropy)
  • RYH(YX)
  • RXRYH(X,Y) (joint entropy)

RY
Slepian-Wolf rate region
H(Y)
H(YX)
RX
H(X)
H(XY)
35
Slepian-Wolf with binary symmetric correlation
structure (known statistics)
36
Binary symmetric correlation setup
  • y, z, and z are length-n Bernoulli sequences
  • Correlation channel z is independent of y
  • Bernoulli parameters p,q20,0.5), rp(1-q)(1-p)q
  • Code (f,g,n,M,?) includes
  • Encoder f(x)21,,M
  • Rate Rlog(M)/n
  • Decoder g(f(x),y)20,1n
  • Error probability Prg(f(x),y)?x?

37
Relation to general Slepian-Wolf setup
  • x, y, and z are Bernoulli
  • Correlation z independent of y
  • Focus on encoding x at rate approaching H(Z)
  • Neglect well-known encoding of y at rate RYH(Y)

RY
our setup
H(Y)
H(YX)
RX
H(X)
H(Z)
38
Non-asymptotic Slepian-Wolf rate
  • Definition RNA(n,?)min9 code (f,g,n,M,?)
    log(M)/n
  • Prior art Wolfowitz,1978
  • Converse result RNA(n,?) H(XY)KC(?)/n0.5
  • Achievable result RNA(n,?) H(XY)KA(?)/n0.5
  • Bounds are loose KA(?)gtKC(?)
  • Can we tighten Wolfowitzs bounds?

39
Tight non-asymptotic rate
  • Theorem
  • RNA(n,?)H(Z)K(?)/n0.5o(n-0.5)
  • K(?)?-1(?) q(1-q)0.5 log((1-q)/q)
  • Redundancy rate is K(?)/n0.5o(n-0.5)
  • Note o(n-0.5) decays faster than K/n0.5
  • Tightened Wolfowitz bounds up to o(n-0.5)
  • We know how quickly we can approach H(Z) with
    known statistics

RNA(n,?)
tight bound
H(XY)
n
40
What can we do when the source statistics are
unknown?(universality)
41
Universal setup
  • Unknown Bernoulli parameters p, q, r
  • Encoder observes x and ny?iyi
  • Communication of ny requires log(n) bits
  • Variable rate used
  • Need distribution for nz
  • Distribution depends on nx and ny (not x)
  • Codebook size Mnx,ny

42
Distribution of nz
  • CLT was key to solution with known statistics
  • How can we apply CLT when q is unknown?
  • Consider a numerical example
  • p0.3, q0.1, rp(1-q)(1-p)q
  • PXr, PYp, PZq (empirical true)
  • We plot Pr(nznx,ny) as function of nz20,,n

43
Pr(nznx,ny) for n102
44
Pr(nznx,ny) for n103
45
Pr(nznx,ny) for n104
46
Pr(nznx,ny) for n104
where
47
Universal rate
  • Theorem
  • RNA(n,?)H(PZ)K(?)/n0.5o(n-0.5)
  • K(?)f(PY)K(?)
  • f(PY)2PY(1-PY)0.5/1-2PY

f(PY)!1
f(PY)!0
48
Why is f(PY) small when PY is small?
  • Known statistics ? Var(nz)nq(1-q) regardless of
    empirical statistics
  • PY!0 ? can estimate nZ with small variance
  • Universal scheme outperforms known statistics
    when PY is small
  • Key issue variable rate coding (universal) beats
    fixed rate coding (known statistics)
  • Can cut down expected redundancy (known
    statistics) by communicating ny to encoder
  • log(n) bits for ny will save O(n0.5)

49
Redundancy for PY¼0.5
  • f(PY) blows up as PY approaches 0.5
  • Redundancy is O(n-0.5) with enormous constant
  • Another scheme has O(n-1/3) redundancy
  • Better performance for PY0.5O(n-1/6)
  • Universal redundancy can be huge!
  • Ongoing research improvement of O(n-1/3)

50
Numerical example
  • n104
  • q0.1
  • Slepian-Wolf requires nH2(q)4690 bits
  • Non-asymptotic approach (known statistics) with
    ?10-2 requires nRNA(n,?)4907 bits
  • Universal approach with PY0.3 requires 5224 bits
  • With PY0.4 we need 5863 bits
  • In practice, penalty for universality is huge!

51
Summary
  • Network information theory (Shannon theory) may
    enable to increase wireless data rates
  • Practical channel codes and distributed source
    codes approaching limits, rely on large n
  • How quickly can we approach the performance
    limits of information theory?
  • CNAC-K(?)/n1/2o(n-1/2)
  • RNAH(Z)K(?)/n1/2o(n-1/2)
  • Gap to capacity of LDPC codes 2-3x greater

52
Universality
  • What can we do when the source statistics are
    unknown? (Slepian Wolf)
  • PYlt0.5 H(PZ)K(?)/n1/2o(n-1/2)
  • PY¼0.5 H(PZ)O(n-1/3) can be huge!
  • Universal channel coding with feedback for BSC
  • Capacity-achieving code requires PY0.5
  • Universality with current scheme is O(n-1/3)

Channel
Decoder
Encoder
feedback
53
Further directions
  • Gaussian channel (briefly discussed)
  • Shannon 1958 derived CNA(n,?) for Gaussian
    channel with cone packing (non-i.i.d. codebook)
  • Gaussian codebooks are sub-optimal!
  • Other channels
  • CNA(n,?) C-KA(?)/n0.5 via information spectrum
  • Gaussian codebook distribution sub-optimal
  • Must consider non-i.i.d. codebook constructions
  • Penalties for finite n and unknown statistics
    exist everywhere in Shannon theory!!
  • www.dsp.rice.edu
Write a Comment
User Comments (0)
About PowerShow.com