Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures - PowerPoint PPT Presentation

About This Presentation

Title:

Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures

Description:

Tornado (Efficient Erasure Correcting Codes by M. Luby et. al., IEEE ... Decentralized erasure code (Ubiquitous Access to Distributed Data in Large-Scale ... – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 23

Provided by: AK52

Learn more at: http://dna-pubs.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures

1
Data Persistence in Sensor Networks Towards
Optimal Encoding for Data Recovery in Partial
Network Failures

Abhinav Kamra, Jon Feldman, Vishal Misra and Dan
Rubenstein
DNA Research Group, Columbia University

2
Motivation and Model

Typical Scenario of Sensor Networks
Large number of nodes deployed to sense''
environment
Data collected periodically pulled/pushed through
a sink/gateway node
Nodes prone to failure (disaster, battery life,
targeted attack)

Want data to survive individual node failures
Data Persistence''

3
Overview

Erasure codes
LT-Codes
Soliton distribution
Coding for failure-prone sensor networks
Major results
A brief sketch of proofs
A case study of failure-prone sensor networks

4
Erasure Codes
Message
n
Encoding Algorithm
Encoding
cn
Transmission
Received
Decoding Algorithm
Message
n
5
Luby Transform Codes

Simple Linear Codes
Improvement over Tornado codes
Rateless Codes

6
Erasure Codes LT-Codes
b1
b2
b3
b4
b5
F
n5 input blocks
7
LT-Codes Encoding

Pick degree d1 from a pre-specified distribution.
(d12)
Select d1 input blocks uniformly at random. (Pick
b1 and b4 )
Compute their sum (XOR).
Output sum, block IDs

E(F)
c1
b1
b2
b3
b4
b5
F
8
LT-Codes Encoding
E(F)
9
LT-Codes Decoding
10
Degree Distribution for LT-Codes

Soliton Distribution
Avg degree H(N) ln(N)
In expectation Exactly one degree 1 symbol in
each round of decoding
Distribution very fragile in practice

11
Failure-prone Sensor Networks

All earlier works
How many encoded symbols needed to recover all
original symbols (all or nothing decoding)
Failure-prone networks
How many original symbols can be recovered from
given surviving encoded symbols

12
Iterative Decoder
x1
x3
x3
x1
x3
x4
x1
Received Symbols
x3
x4

5 original symbols x1 x5
4 encoded symbols received
Each encoded symbol is XOR of component original
symbols

13
Sensor Network Model

Encoded Symbols remaining k
Want to maximize r, the recovered original data
symbols
No idea apriori what k will be

14
Coding is bad, for small k

N original symbols
k encoded symbols received
If k 0.75N, no coding required

15
Proof Sketch

Theorem To recover first N/2 symbols, it is best
to not do any encoding
Proof
Let C(i, j) Expected symbols recovered from i
degree 1 and j symbols of degree 2 or more.
C(i, j) C(i1, j-1) if C(i,
j) N/2
Sort given symbols in decoding order
All degree 1 symbols will be decoded before other
symbols
Last symbol in decoded order will be of degree gt
1 (see b.)
Replace this symbol by a random degree 1 symbol
New degree 1 symbol more likely to be useful
Hence, more degree 1 symbols gt Better output
No coding is best to recover any first N/2
symbols
All degree 1 gt Coupon Collectors gt 3N/4
symbols to recover N/2 distinct symbols

16
Ideal Degree Distribution

Theorem To recover r data units such that
r lt jN/(j1), the optimal degree distribution
has symbols of degree j or less only.

17
Lower degree are better for small k

If k kj, use symbols of up to degree j
So, use kj kj-1 degree j symbols in close to
optimal distribution

18
Case Study Single-sink Sensor Network
Storage
19
Case Study Single-sink Sensor Network

Network prone to failure
Nodes store unencoded symbols at first and higher
degrees with time
Sink receives low degree symbols first and higher
degree as time goes on

20
Distributed SimulationClique Topology

N 128 nodes in a clique topology
Sink receives one symbol per unit time

21
Distributed SimulationChain Topology
1 2 3 N

N 128 nodes in a chain topology

22
Related Work

Bulk Data Distribution Coding is useful
Tornado (Efficient Erasure Correcting Codes by M.
Luby et. al., IEEE Transactions on Information
Theory, vo. 47, no. 2, 2001)
LT-Codes (LT Codes by M. Luby, FOCS 2002)
Reliable Storage in Sensor Networks
Decentralized erasure code (Ubiquitous Access to
Distributed Data in Large-Scale Sensor Networks
through Decentralized Erasure Codes by A.
Dimakis et. al., IPSN 2005)
Random Linear Coding (How Good is Random Linear
Coding Based Distributed Networked Storage? by
M. Medard et. al., NetCod 2005)