Title: Network Coding for Error Correction and Security
1Network Coding forError Correction and Security
- Raymond W. Yeung
- The Chinese University of Hong Kong
2Outline
- Introduction
- Network Coding vs Algebraic Coding
- Network Error Correction
- Secure Network Coding
- Applications of Random Network Coding in P2P
- Concluding Remarks
3Introduction
4A Network Coding Example
5b1
b2
b1
b2
b1
b2
b1
b2
b1
b2
b1
b2
b1
b2
b1b2
b2
b1
b1b2
b1b2
6A Network Coding Example
7b1
b2
b1
b2
b1b2
b1
b2
b2
b1
b2
b1
b1b2
b1b2
b1
b2
8Wireless/Satellite Application
50 saving for downlink bandwidth!
9Two Themes of Network Coding
- When there is 1 source to be multicast in a
network, store-and-forward may fail to optimize
bandwidth. - When there are 2 or more independent sources to
be transmitted in a network (even for unicast),
store-and-forward may fail to optimize bandwidth. - In short, Information is NOT a commodity!
10Model of a Point-to-Point Network
- A network is represented by a graph G (V,E)
with node set V and edge (channel) set E. - A symbol from an alphabet F can be transmitted on
each channel. - There can be multiple edges between a pair of
nodes.
11Single-Source Network Coding
- The source node s generates an information vector
- x (x1 x2 xk) ? Fk.
- What is the condition for a node t to be able to
receive the information vector x? - Max-Flow Bound. If maxflow(t) lt k, then node t
cannot possibly receive x.
12The Basic Results
- If network coding is allowed, a node t can
receive the information vector x iff - maxflow(t) k
- i.e., the max-flow bound can be achieved
simultaneously by all such nodes t. (Ahlswede et
al. 00) - Moreover, this can be achieved by linear network
coding for a sufficiently large base field. (Li,
Y and Cai, Koetter and Medard, 03)
13Global Encoding Kernels of a Linear Network Code
- Recall that x (x1 x2 xk) is the multicast
message. - For each channel e, assign a column vector fe
such that the symbol sent on channel e is x fe.
The vector fe is called the global encoding
kernel of channel e. - The global encoding kernel of a channel is
analogous to a column in the generator matrix of
a classical block code. - The global encoding kernel of an output channel
at a node must be a linear combination of the
global encoding kernels of the input channels.
14An Example
15b1
b2
b1
b2
b1
b2
b1b2
b1b2
b1b2
16Network Coding vs Algebraic Coding
17A Linear Multicast
- A message of k symbols from a base field F is
generated at the source node s. - A k-dimensional linear multicast has the
following property A non-source node t can
decode the message correctly if and only if - maxflow(t) ? k.
- By the Max-flow bound, this is also a necessary
condition for a node t to decode (for any given
network code). - Thus the tightness of the Max-flow bound is
achieved by a linear multicast, which always
exists for sufficiently large base fields.
18An (n,k) Code with dmin d
- Consider a (n,k) classical block code with
minimum distance d. - Regard it as a network code on an
- combination network.
- Since the (n,k) code can correct d-1 erasures,
all the nodes at the bottom can decode.
19The Combination Network
s
n
n-d1
n-d1
20- For the nodes at the bottom,
- maxflow(t) n-d1.
- By the Max-flow bound,
- k ? maxflow(t) n-d1
- or d ? n-k1, the Singleton bound.
- Therefore, the Singleton bound is a special case
of the Max-flow bound for network coding. - An MDS code is a classical block code that
achieves tightness of the Singleton bound. - Since a linear multicast achieves tightness of
the Max-flow bound, it is formally a network
generalization of an MDS code.
21Two Ramifications of Single-Source Network Coding
- The starting point of classical coding theory and
information-theoretic cryptography is the
existence of a conduit through which we can
transmit information from Point A to Point B
without error. - Single-source network coding provides a new such
conduit. - Therefore, we expect that both classical coding
theory and information-theoretic cryptography can
be extended to networks.
22Network Error Correction
23Point-to-Point Error Correction in a Network
- Classical error-correcting codes are devised for
point-to-point communications. - Such codes are applied to networks on a
link-by-link basis.
24Channel Decoder
Channel Decoder
Network Encoder
Channel Encoder
25A Motivation for Network Error Correction
- Observation Only the receiving nodes have to know
the message transmitted the immediate nodes
dont. - In general, channel coding and network coding do
not need to be separated ? - Network Error Correction
- Network error correction generalizes classical
point-to-point error correction.
26Network Codec
27What Does Network Error Correction Do?
- A distributed error-correcting scheme over the
network. - Does not explicitly decode at intermediate nodes
as in point-to-point error correction. - At a sink node t, if c errors can be corrected,
it means that the transmitted message can be
decoded correctly as long as the total number of
errors, which can happen anywhere in the network,
is at most c.
28Classical Algebraic Coding
y x z
error vector
received vector
codeword
y, x, and z are all in the same space.
29Minimum Distance Classical Case
- Hamming distance is the most natural distance
measure. - For a code C, dmin min d(v1,v2), where v1,v2 ?
C and v1 ? v2. - If dmin 2c1, then C can
- Correct c errors
- Detect 2c errors
- Correct 2c erasures
30Sphere Packing
dmin
31Coding Bounds Classical Case
- Upper bounds
- Hamming bound
- Singleton bound
- Lower bound
- Gilbert-Varsharmov bound
32Network Coding
yt
x
yu
u
s
yv
v
z
33Input/Output Relation
- The network code is specified by the local
encoding kernels at each non-source node. - Fix a sink node t.
- The codeword x, the error vector z, and the
received vectors yt are all in different spaces. - In this tutorial, we consider only linear network
codes. Then - yt x Fs,t z Ft
- where Fs,t and Ft depend on t.
- In the classical case, Fs,t Ft are the identity
matrix.
34Distance Properties of Linear Network Codes
(Yang, Y, Zhang 07)
- The network Hamming distance can be defined for
linear network codes. - Many concepts in algebra coding based on the
Hamming distance can be extended to network
coding.
35How to Measure the Distance between Two Codewords?
- Fix both the network code and the codebook C,
i.e., the set of all possible codewords
transmitted into the network. - For a sink node t,
- yt(x,z) x Fs,t z Ft
- For two codewords x1, x2 ? C , define their
distance by - Dtmsg(x1,x2) arg minz wH(z)
- where the minimum is taken over all error
vectors z such that - yt(x1,0) yt(x2,z) , or
- yt(x1,z) yt(x2,0)
- Idea Dtmsg(x1,x2) is the minimum Hamming weight
of an error vector z that makes x1 and x2
indistinguishable at node t. - Dtmsg defines a metric on the input space of the
linear network code.
36Minimum Distance for a Sink Node
- For a sink node t,
- dmin,t minx1?x2 Dtmsg(x1,x2)
- Each sink node has a different view of the
codebook as each is associated with a different
distance measure. - dmin,t is the minimum distance as seen by sink
node t. - If the codebook C is linear, dmin,t has the
following equivalent definition - dmin,t min wH(z) z ? At
- where
- At z yt(x,z) 0 for some x ? C .
37Error Correction/Detection and Erasure Correction
for a Linear Network Code
- If dmin,t 2c1, then sink node t can
- Correct c errors
- Detect 2c errors
- Correct 2c erasures
- Some form of sphere packing is at work.
- Much more complicated when the network code is
nonlinear.
38Sphere Packing
dmin
39Remark on Error Detection
- In network coding, some error patterns have no
effect on the sink nodes. These are invisible
error patterns that cannot be (or do not need to
be) detected. - Also called Byzantine modification detection
(Ho et al, ISIT 04)
40Remark on Erasure Correction
- In classical algebraic coding, erasure correction
has three equivalent interpretation - A symbol is erased means that it is not available
- A symbol is erased means that the erasure symbol
is received - The error locations are known.
- In our context, erasure correction means that the
locations of the errors are known by the sink
nodes but not the intermediate nodes.
41Coding Bounds for Network Codes
- Cai Y (02, 06) obtained the Hamming bound, the
Singleton bound and the Gilbert-Varshamov bound
for network codes. - These bounds are natural extension of the bounds
for algebraic codes. - Let the base field be GF(q), n mint maxflow(t)
- and dmin mint dmin,t
42Upper Bounds
- Hamming bound
- where .
- Singleton bound
- The Singleton bound is asymptotically tight,
i.e., when q is sufficiently large.
43Refined Coding Bounds
- Observation Sink nodes with larger maximum flow
can have better error correction capability. - For a given linear network code, refined Hamming
bounds and Singleton bounds specific to the
individual sink nodes can be obtained.
44Refined Hamming Bound
- A network code with rank(Fs,t) mt, codebook C,
and dmin,t gt 0, satisfies - where , for all sink node
t.
45Refined Singleton Bound
- A network code with rank(Fs,t) mt, codebook C,
and dmin,t gt 0, satisfies - for all sink node t.
46Remark
- Note that mt ? maxflow(t) for all sink nodes t.
- Thus the refined Hamming bounds imply the Hamming
bound, and the refined Singleton bounds imply the
Singleton bound.
47Tightness of the Refined Singleton Bounds
- These bounds are shown to be asymptotically tight
for linear network codes by construction, i.e.,
it is possible to construct a codebook that
achieves tightness of the individual bound at
every sink node t for any given linear network
code provided that q is sufficiently large. - This implies that for large base fields, only
linear transformations need to be performed at
the intermediate nodes! No decoding needed.
48Construction of Network Codes that Achieve the
Refined Singleton bounds
- Deterministic algorithms
- Alg1 Yang, Ngai and Y (ISIT 07)
- Alg2 Matsumoto (IEICE, 07) obtained an algorithm
based on robust network codes. - Alg3 Yang and Y (ITW, Bergen 07)
- All these algorithms have almost the same
complexity in terms of the field size requirement
and time complexity. - These algorithms imply that when q is very large,
network codes satisfying these bounds can be
constructed randomly with high probability.
49Gilbert Bound
- Let ns be the outgoing degree of source node s.
- Let
- be the d-ball about x with respect to the metric
Dtmsg.
50Gilbert Bound
- Given a network code, let Cmax be the maximum
possible size of the codebook such that dmin,t
dt for each sink node t. Then, - where
51Idea of the Gilbert Bound
- If dmin,t dt for each sink node t, then for any
x, there exists a codeword v such that Dtmsg(v,x)
lt dt , otherwise can add one more codeword to the
codebook.
v
- Thus all the (dt-1)-balls around the codewords
cover the whole input space.
x
dt -1
52Varshamov Bound
- Given a set of local encoding kernels, let ?max
be the maximum possible dimension of the linear
codebook such that dmin,t dt for each sink node
t. Then, - where
53Error Correction Capability of Random Network
Codes
- Balli, Yan and Zhang 07
- Study the distribution of dmin,t for random
network codes based on a refined bound on the
probability of decoding error for a random linear
network code for multicast.
54Algorithms forNetwork Error Correction
55For Deterministic and Random Network Codes
- Zhang 07 (to appear in IT)
- Proposed the minimum rank decoding principle
which is equivalent to minimum distance decoding. - Can decode up to errors for each sink
node t. - A fast decoding algorithm for packet networks
with random network coding (the same network code
is used repeatedly). - Yan, Balli and Zhang 07
- Decoding beyond the error correction capability.
- Balli, Yan and Zhang 07
- A hybrid approach that combines link-by-link
error detection and network erasure correction.
56For Random Network Codes
- Jaggi, Langberg et al. (INFOCOM 07)
- Consider packet networks (the same network code
is used repeatedly). - Scenario 1 Alice and Bob has a low-rate secret
channel. - A polynomial-time algorithm that achieves the
optimal rate asymptotically. - Scenario 2 Alice and Bob has no shared secret.
- A polynomial-time algorithm that achieves the
Singleton bound asymptotically. - Extendable to the refined Singleton bounds?
57For Random Network Codes
- Koetter and Kschischang (ISIT 07)
- Let the input space of the random network code be
Fn, where n mint maxflow(t). - At a sink node t, the transfer matrix is likely
to be full rank. - The codebook is the collection of all
k-dimensional subspaces of Fn, each called a
codeword. - If a codeword A is chosen, then transmit a set of
vectors in A that span A. Does not matter which
set. - If the transfer matrix at a sink node t is
full-rank (with high probability), the received
vectors also spans A. - Can be regarded as a more general theoretical
framework for random linear network coding.
58- Koetter and Kschischang (cont.)
- Thus the codeword can be decoded correctly in the
absence of error. - In the presence of error, decoding is done
according to a distance measure between
subspaces. - Yet to understand the performance of such codes
in a given network.
59Applications of Network Error Correction
60Errors due to Noise in Channels
- Separation of channel coding and network coding
is asymptotically optimal provided two conditions
are satisfied - All channels are memoryless.
- The channels are independent.
- (Borade 02, Song Y 06)
- If not, there is no separation theorem.
- Then applying turbo codes link-by-link does not
guarantee optimality. - Linear network error-correcting code is an
attractive solution for its low encoding
complexity.
61Malicious Injection of Errors
- Malicious nodes in the network may inject errors
deliberately to disturb data transmission. - Classical error correction does not help because
redundancy is injected only in time. - Network error correction is a natural solution
because redundancy is injected in both time and
space.
62Further Reading for Network Error Correction
- R. W. Yeung and N. Cai, Network error
correction, Part I II, Communications in
Information and Systems, 2006. First presented
at ITW 2002. - Ho et al, Byzantine modification detection in
multicast networks using randomized network
coding, ISIT 2004. - R. W. Yeung, S.-Y. R. Li, N. Cai and Z. Zhang,
Network Coding Theory, now Publishers, 2005
(Foundation and Trends in Communications and
Information Theory). - S. Yang and R. W. Yeung, Characterizations of
network error correction/detection and erasure
correction, NetCod 2007. - Z. Zhang, Linear network error correction codes
in packet networks, to appear in IEEE IT. - S. Yang, C. K. Ngai, and R. W. Yeung,
Construction of linear network codes that
achieve a refined Singleton bound, ISIT 2007. - R. Koetter and F. Kschischang, Coding for errors
and erasures in random network coding, ISIT
2007. - S. Yang and R. W. Yeung, Refined coding bounds
for network error correction, ITW, Bergen 2007. - S. Jaggi et al., Resilient network coding In the
presence of Byzantine adversaries, INFOCOM 2007. - Z. Zhang, Some recent progress in network error
correction progress, NetCod 2008.
63- H. Balli, X. Yan, and Z. Zhang, Error correction
capability of random network error correction
codes, submitted to IT. - X. Yan, H. Balli, and Z. Zhang, Decode network
error correction codes beyond error correction
capability, submitted to IT. - H. Balli, X. Yan, and Z. Zhang, A hybrid network
error correction coding system, in preparation.
64Secure Network Coding
65Problem Formulation
- The underlying model is the same as network
multicast using network coding except that some
sets of channels can be wiretapped. - Let A be a collection of subsets of the edge set
E. - A subset in A is called a wiretap set.
- Each wiretap set may be fully accessed by a
wiretapper. - No wiretapper can access more than one wiretap
set. - The network code needs to be designed in a way
such that no matter which wiretap set the
wiretapper has access to, the multicast message
is information-theoretically secure. - The model is a network generalization of secret
sharing (Blakley, Shamir, 78) and wiretap channel
II (Ozarow and Wyner 84).
66A Coding Scheme (Cai-Y 02)
- The multicast message is (m,k), where
- m is the secure message
- k is the key (randomness)
- Both m and k are generated at the source node.
67A Example of a Secure Network Code
68mk
m-k
m-k
mk
m-k
k
mk
- One of the 3 red channels can be wiretapped
- m is the secure message
- k is the key
k
k
69Another Example of Secure Network Coding
- The (1,2)-threshold Secret Sharing Scheme
70k
m-k
mk
- One of the 3 red channels can be wiretapped
- m is the secure message
- k is the key
71Construction of Secure Network Codes
- Let n mint maxflow(t).
- A sufficient condition under which a secure
linear network code can be constructed has been
obtained (Cai and Y, 02 and 07). - Important Special Case If A consists of all the
r-subsets of E, where r lt n, then we can
construct a secure network code with multicast
message (m,k) such that m n - r and k r. - For this case, the condition is also necessary.
- Interpretation For a sink node t, if r channels
in the network are wiretapped, the number of
secure paths from the source node to T is still
at least n - r. So n - r symbols can go through
securely.
72Idea of Code Construction
- Start with a linear network code for multicasting
n symbols. - For all wiretap set A ? A, let fA fe e ? A
, the set of global encoding kernels of the
channels in A. - Let dim(span(fA)) ? r for all A ? A. sufficient
condition - When the base field F is sufficiently large, we
can find b1, b2, , bn-r ? Fn such that - b1, b2, , bn-r are linearly independent of fA
- for all A ? A.
- Extend b1, b2, , bn-r to b1, b2, , bn-r ,
bn-r1 , , bn to form a basis for Fn, and let
let M b1 b2 bn . - M is invertible.
73- Let the multicast message be (m,k), with m
- n-r and k r.
- Take a linear transformation of the given linear
network code by the matrix M-1 to obtain the
desired secure network code.
74Optimality of the Cai-Yeung Construction
- When the wiretap set A consists of all r-subsets
of E, the construction is optimal in terms of - the size of the message (maximum)
- the size of the key (minimum).
- The proof of the latter involves a set of
inequalities due to T. S. Han.
75Hans Inequalities
76Algorithms for Secure Network Coding
- Jain 2004
- A security protocal that uses both network coding
and one-way functions. - Feldman et al, 2004
- A characterization of secure network codes in
terms of a generalized distance measure. - A smaller field size can be used by giving up a
small amount of overall capacity.
77Algorithms for Secure Network Coding
- Bhattad and Narayanan 05
- Propose weakly secure network coding for which
the wiretaper cannot obtain any useful
information. - Very simple scheme.
- Not information-theoretically secure.
78Algorithms for Secure Network Coding
- Jaggi, Langberg et al., 07
- An efficient algorithm using random network
coding in an unknown network topology that
achieves asymptotically the same optimal rate as
Cai-Yeung. - Requires repeated use of the same random network
code.
79Further Reading for Secure Network Coding
- N. Cai and R. W. Yeung, Secure network coding,
ISIT 02. Full version available upon request. - K. Jain, Security based on network topology
against the wiretapping attacking, IEEE Wireless
Comm., Feb 2004. - J. Feldman, T. Malkin, C. Stein, R. A. Servedio
On the capacity of secure network coding, 2004
Allerton Conference. - K. Bhattad and K.R. Nayayanan, Weakly secure
network coding, NetCod 2005. - N. Cai and R. W. Yeung, A security condition for
multi-source linear network coding, ISIT 2007. - S. Jaggi et al., Resilient Network Coding In the
Presence of Byzantine Adversaries, INFOCOM 2007. - E. Soljanin and S. El Rouayheb. On wiretap
networks II, ISIT 2007.
80Applications of Random Network Coding in P2P
81What is Peer-to-Peer (P2P)?
- Client-Server is the traditional architecture for
content distribution in a computer network. - P2P is the new architecture in which users who
download the file also help disseminating it. - Extremely efficiently for large-scale content
distribution, i.e., when there are a lot of
clients. - P2P traffic occupies at least 70 of Internet
bandwidth. - BitTorrent is the most popular P2P system.
82What is Avalanche?
- Avalanche is a Microsoft P2P prototype that uses
random linear network coding. - It is one of the first applications /
implementations of network coding by Gkantsidis
and Rodriguez 05. - It has recently been further developed into
Microsoft Secure Content Distribution (MSCD).
83How Avalanche Works?
- When the server or a client uploads to a
neighbor, it transmits a random linear
combination of the blocks it possesses. The
linear coefficients are attached with the
transmitted block. - Analogy Color-mixing.
- Each transmitted block is some linear combination
of the original blocks of the seed file. - Download is complete if enough linearly
independent blocks have been received, and
decoding can be done accordingly.
84The Butterfly Network A Review
Synchronization here
85What Exactly is Avalanche Doing?
- In Avalanche, there does not seem to be any need
of synchronization. - Is Avalanche doing the same kind of network
coding we have been talking about? - If not, what is it doing and is it optimal in any
sense?
86A Time-Parametrized Graph
t0
t1
t2
t3
Server
1
2
Client A
2
1
4
1
Client B
2
1
Client C
87Analysis of Avalanche(Y, NetCod 2007)
- The time-parametrized graph, not the physical
network, is the graph to look at. - By examining the maximum flows in this graph, the
following questions can be answered - When can a client receive the whole file?
- If the server and/or some clients leave the
system, can the remaining clients recover the
whole file? - If some blocks are lost at some clients, how does
it affect the recovery of the whole file?
88Some Remarks
- Avalanche is not doing the usual kind of random
network coding, but it can be analyzed by the
tools we are familiar with. - Avalanche minimizes delay with respect to the
given transmission schedule if computation is
ignored. - Extra computation is the price to pay.
- Avalanche provides the maximum possible
robustness for the system. - P2P is perhaps the most natural environment for
applying random network coding because the subnet
is formed on the fly.
89Networks with Packet Loss A Toy Example
- One packet is sent on each channel per unit time.
- Packet loss rate 0.1 on each channel.
- By using a fountain code, information can be
transmitted from A to C at rate (0.9)2 0.81. - By using an Avalanche-type system, information
can be transmitted from A to C at rate 0.9
max-flow from A to C.
90An Explanation
?
?
?
?
?
?
91Networks with Packet Loss
- By using an Avalanche-type system, the max-flow
from the source to the sink (amortized by the
packet loss rate) can be achieved automatically,
which is the fundamental limit. - Virtually nothing needs to be done.
92Concluding Remarks
- The theory of network coding naturally ramifies
in the direction of error correction and
information-theoretic cryptography. - The development in these areas of network coding
are still in its infancy. - Many potential applications in networking,
wireless, information security, etc. - Applications are driven theory.
- A lot of very exciting research ahead.