Recent Results in Non-Asymptotic Shannon Theory - PowerPoint PPT Presentation

1 / 53

About This Presentation

Title:

Recent Results in Non-Asymptotic Shannon Theory

Description:

Title: Sparse Approximation of Higher-dimensional functions containing smooth discontinuities Author: Owlnet Lab Last modified by: Rice University – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 54

Provided by: Owlne3

Learn more at: https://people.engr.ncsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Recent Results in Non-Asymptotic Shannon Theory

1
Recent Results in Non-Asymptotic Shannon Theory

Dror Baron
Supported by AFOSR, DARPA, NSF, ONR, and Texas
Instruments
Joint work with M. A. Khojastepour, R. G.
Baraniuk, and S. Sarvotham

2
we may someday see the end of wireline

S. Cherry, Edholms law of bandwidth, IEEE
Spectrum, vol. 41, no. 7, July 2004, pp. 58-60

3
But will there ever be enough data rate?

R. Lucky, 1989
We are not very good at predicting uses until
the actual service becomes available. I am not
worried we will think of something when it
happens.
There will always be new applications that gobble
up more data rate!

4
How much can we improve wireless?

Spectrum is limited natural resource
Information theory says we need lots of power for
high data rates - even with infinite bandwidth!
Solution transmit more power BUT
Limited by environmental concerns
Will batteries support all that power?
Sooner or later wireless rates will hit a wall!

5
Where can we improve?

Algorithms and hardware gains
Power-efficient computation
Efficient power amplifiers
Advances in batteries
Directional antennas
Communication gains
Channel coding
Source coding
Better source and channel models

6
Where will the last dB of communication
gains come from?Network information theory
(Shannon theory)
7
Traditional point to point information theory
Channel
Decoder
Encoder

Single source
Single transmitter
Single receiver
Single communication stream
Most aspects are well-understood

8
Network information theory
Channel
Encoder
Decoder

Network of
Multiple sources
Multiple transmitters
Multiple receivers
Multiple communication streams
Few results
My goal understand various costs of network
information theory

Encoder
Channel
9
What costs has information theory overlooked?
10
Channel coding has arrived
Channel
Decoder
Encoder

Turbo codes Berrou et al., 1993
0.5 dB gap to capacity (rate R below capacity)
BER10-5
Block length n6.5104
Regular LDPC codes Gallager, 1963
Irregular LDPC Richardson et al., 2001
0.13 dB gap to capacity
BER10-6
n106

11
Distributed source coding has also arrived
Decoder
Encoder
x
y

Encoder for x based on syndrome of channel code
Decoder for x has correlated side information y
Various types of channel codes can be used
Slepian-Wolf via LDPC codes Xiong et al., 2004
H(XY)0.47
R0.5 (rate above Slepian-Wolf limit)
BER10-6
Block length n105

12
Hey! Did you notice those block lengths?

Information theory provides results in the
asymptotic regime
Channel coding 8?gt0, rate RC-? achievable with
?!0 as n!1
Slepian-Wolf coding 8?gt0, rate RH(XY)?
achievable with ?!0 as n!1
Best practical results achieved for n105
Do those results require large n?

13
But we live in a finite world

Real world data doesnt always have n106
IP packets
Emails, text messages
Sensornet applications
(small battery ! small n)
How do those methods perform for n104? 103?
How quickly can we approach the performance
limits of information theory?

14
And we dont know the statistics either!

Lossless coding (single source)
Length-n input xBer(p)
Encode with wrong parameter q
K-L divergence penalty with variable rate codes
Performance loss (minor bitrate penalty)
Channel coding, distributed source coding
Encode with wrong parameter qltplt0.5
Fixed rate codes based on joint-typicality
Typical set Tq for q is smaller than Tp for p
As n!1, Pr(error)!1
Performance collapse!

15
Main challenges

How quickly can we approach the performance
limits of information theory?
Will address for channel coding and Slepian-Wolf
What can we do when the source statistics are
unknown?
Will address for Slepian-Wolf

16
But first . . .What does the prior art indicate?
17
Underlying problem

Shannon 1958
This inverse problem is perhaps the more
natural in applications given a required level
of probability of error, how long must the code
be?
Motivation may have been phone and space
communication
Small probability of codeword error ?
Wireless paradigm
Given k bits, what are the minimal channel
resources to attain probability of error ??
Can retransmit packet ? fix large ?
n depends on packet length
Need to characterize R(n,?)

18
Error exponents

Fix rate RltC and codeword length n
Bounds on probability of error
Random coding Prerror2-nEr(R)
Sphere packing Prerror2-nEsp(R)o(n)
Er(R)Esp(R) for R near C

19
Error exponents

Fix rate RltC and codeword length n
Bounds on probability of error
Random coding Prerror2-nEr(R)
Sphere packing Prerror2-nEsp(R)o(n)
Er(R)Esp(R) for R near C
Shannons regime
This inverse problem is perhaps the more
natural in applications given a required level
of probability of error, how long must the code
be?
Fix RltC
E(R)O(1)
log(?)O(n) good for small ?

20
Error exponents

Fix rate RltC and codeword length n
Bounds on probability of error
Random coding Prerror2-nEr(R)
Sphere packing Prerror2-nEsp(R)o(n)
Er(R)Esp(R) for R near C
Wireless paradigm
Given k bits, what are the minimal channel
resources to attain probability of error ??
Fix ?
nE(R)O(1)
o(n) term dominates
Bounds diverge

21
Error exponents fail for RC-?/n0.5
22
How quickly can we approach the channel
capacity?(known statistics)
23
Binary symmetric channel (BSC) setup
s
xf(s)
y
sg(y)
Encoder f
Decoder g
zBer(n,p)

s21,,M input message
x, y, and z binary length-n sequences
zBernoulli(n,p) implies crossover probability p
Code (f,g,n,M,?) includes
Encoder xf(s)21,,M
Rate Rlog(M)/n
Channel yx?z
Decoder g reconstructs s by sg(y)
Error probability Prg(y)?s?

24
Non-asymptotic capacity
25
Key to solution Packing typical sets

Need to encode typical set TZ for z
Code needs to cover z2Tz
Need Pr(z2TZ)¼ 1-?
Probability ? of codeword error
What about rate?
Output space 2n possible sequences
Cant pack more than 2n/Tz sets into output
M2n/Tz
Minimal cardinality Tmin covers 1-?
CNA1-log(Tmin)

output space
Tz
26
Whats the cardinality of Tmin?

Consider empirical statistics nz?i zi, PZnz/n
plt0.5 ? Pr(z) monotone decreasing in nz
Minimal Tmin has form Tmin,z PZ?(?)
Determine ?(?) with central limit theorem (CLT)
EPZp, Var(PZ)p(1-p)/n
PzN(p,p(1-p)/n)

Asymptotic
?p?
LLN ??0

Non-asymptotic
?p?p(1-p)/n0.5
CLT ? ! ?(?)

27
Tight non-asymptotic capacity

Theorem
CNA(n,?)C-K(?)/n0.5o(n-0.5)
K(?)?-1(?) p(1-p)0.5 log((1-p)/p)
Gap to capacity is K(?)/n0.5o(n-0.5)
Note o(n-0.5) asymptotically negligible w.r.t.
K/n0.5
Tightened Wolfowitz bounds up to o(n-0.5)
Gap to capacity of LDPC codes 2-3x greater
We know how quickly we can approach C

28
Non-asymptotic capacity of BSC
29
Gaussian channel results
s
xf(s)
y
sg(y)
Encoder f
Decoder g
zN(0,?2)

Continuous channel
Power constraint ?i(xi)2 nP
Shannon 1958 derived CNA(n,?) for Gaussian
channel via cone packing (non-i.i.d. codebook)
Information spectrum bounds on probabilities of
error indicate Gaussian codebooks are sub-optimal
i.i.d. codebooks arent good enough!

30
Excess power of Gaussian channel
31
How quickly can we approach the Slepian-Wolf
limit?(known statistics)
32
But first . . .Slepian-Wolf Review
33
Slepian-Wolf setup
x
fX(x)
gX(fX(x),fY(y))
Encoder fX
Decoder g
y
fY(y)
gY(fX(x),fY(y))
Encoder fY

x and y are correlated length-n sequences
Code (fX,fY,gX,gY,n,MX,MY,?X,?Y) includes
Encoders fX(x)21,,MX, fY(y)21,,MY
Rates RXlog(MX)/n, RYlog(MY)/n
Decoder g reconstructs x and y by gX(fX(x),fY(y))
and gY(fX(x),fY(y))
Error probabilities PrgX(fX(x),fY(y))?x?X and
PrgY(fX(x),fY(y))?y?Y

34
Slepian-Wolf theorem

Theorem SlepianWolf,1973
RXH(XY) (conditional entropy)
RYH(YX)
RXRYH(X,Y) (joint entropy)

RY
Slepian-Wolf rate region
H(Y)
H(YX)
RX
H(X)
H(XY)
35
Slepian-Wolf with binary symmetric correlation
structure (known statistics)
36
Binary symmetric correlation setup

y, z, and z are length-n Bernoulli sequences
Correlation channel z is independent of y
Bernoulli parameters p,q20,0.5), rp(1-q)(1-p)q
Code (f,g,n,M,?) includes
Encoder f(x)21,,M
Rate Rlog(M)/n
Decoder g(f(x),y)20,1n
Error probability Prg(f(x),y)?x?

37
Relation to general Slepian-Wolf setup

x, y, and z are Bernoulli
Correlation z independent of y
Focus on encoding x at rate approaching H(Z)
Neglect well-known encoding of y at rate RYH(Y)

RY
our setup
H(Y)
H(YX)
RX
H(X)
H(Z)
38
Non-asymptotic Slepian-Wolf rate

Definition RNA(n,?)min9 code (f,g,n,M,?)
log(M)/n
Prior art Wolfowitz,1978
Converse result RNA(n,?) H(XY)KC(?)/n0.5
Achievable result RNA(n,?) H(XY)KA(?)/n0.5
Bounds are loose KA(?)gtKC(?)
Can we tighten Wolfowitzs bounds?

39
Tight non-asymptotic rate

Theorem
RNA(n,?)H(Z)K(?)/n0.5o(n-0.5)
K(?)?-1(?) q(1-q)0.5 log((1-q)/q)
Redundancy rate is K(?)/n0.5o(n-0.5)
Note o(n-0.5) decays faster than K/n0.5
Tightened Wolfowitz bounds up to o(n-0.5)
We know how quickly we can approach H(Z) with
known statistics

RNA(n,?)
tight bound
H(XY)
n
40
What can we do when the source statistics are
unknown?(universality)
41
Universal setup

Unknown Bernoulli parameters p, q, r
Encoder observes x and ny?iyi
Communication of ny requires log(n) bits
Variable rate used
Need distribution for nz
Distribution depends on nx and ny (not x)
Codebook size Mnx,ny

42
Distribution of nz

CLT was key to solution with known statistics
How can we apply CLT when q is unknown?
Consider a numerical example
p0.3, q0.1, rp(1-q)(1-p)q
PXr, PYp, PZq (empirical true)
We plot Pr(nznx,ny) as function of nz20,,n

43
Pr(nznx,ny) for n102
44
Pr(nznx,ny) for n103
45
Pr(nznx,ny) for n104
46
Pr(nznx,ny) for n104
where
47
Universal rate

Theorem
RNA(n,?)H(PZ)K(?)/n0.5o(n-0.5)
K(?)f(PY)K(?)
f(PY)2PY(1-PY)0.5/1-2PY

f(PY)!1
f(PY)!0
48
Why is f(PY) small when PY is small?

Known statistics ? Var(nz)nq(1-q) regardless of
empirical statistics
PY!0 ? can estimate nZ with small variance
Universal scheme outperforms known statistics
when PY is small
Key issue variable rate coding (universal) beats
fixed rate coding (known statistics)
Can cut down expected redundancy (known
statistics) by communicating ny to encoder
log(n) bits for ny will save O(n0.5)

49
Redundancy for PY¼0.5

f(PY) blows up as PY approaches 0.5
Redundancy is O(n-0.5) with enormous constant
Another scheme has O(n-1/3) redundancy
Better performance for PY0.5O(n-1/6)
Universal redundancy can be huge!
Ongoing research improvement of O(n-1/3)

50
Numerical example

n104
q0.1
Slepian-Wolf requires nH2(q)4690 bits
Non-asymptotic approach (known statistics) with
?10-2 requires nRNA(n,?)4907 bits
Universal approach with PY0.3 requires 5224 bits
With PY0.4 we need 5863 bits
In practice, penalty for universality is huge!

51
Summary

Network information theory (Shannon theory) may
enable to increase wireless data rates
Practical channel codes and distributed source
codes approaching limits, rely on large n
How quickly can we approach the performance
limits of information theory?
CNAC-K(?)/n1/2o(n-1/2)
RNAH(Z)K(?)/n1/2o(n-1/2)
Gap to capacity of LDPC codes 2-3x greater

52
Universality

What can we do when the source statistics are
unknown? (Slepian Wolf)
PYlt0.5 H(PZ)K(?)/n1/2o(n-1/2)
PY¼0.5 H(PZ)O(n-1/3) can be huge!
Universal channel coding with feedback for BSC
Capacity-achieving code requires PY0.5
Universality with current scheme is O(n-1/3)

Channel
Decoder
Encoder
feedback
53
Further directions

Gaussian channel (briefly discussed)
Shannon 1958 derived CNA(n,?) for Gaussian
channel with cone packing (non-i.i.d. codebook)
Gaussian codebooks are sub-optimal!
Other channels
CNA(n,?) C-KA(?)/n0.5 via information spectrum
Gaussian codebook distribution sub-optimal
Must consider non-i.i.d. codebook constructions
Penalties for finite n and unknown statistics
exist everywhere in Shannon theory!!
www.dsp.rice.edu