Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks

Description:

Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University – PowerPoint PPT presentation

Number of Views:194
Avg rating:3.0/5.0
Slides: 89
Provided by: Michael3309
Category:

less

Transcript and Presenter's Notes

Title: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks


1
Digital Fountains,and Their Application to
Informed Content Delivery over Adaptive Overlay
Networks
  • Michael Mitzenmacher
  • Harvard University

2
The Talk
  • Survey of the area
  • My work, and work of others
  • History, perspective
  • Less on theoretical details, more on big ideas
  • Start with digital fountains
  • What they are
  • How they work
  • Simple applications
  • Content delivery
  • Digital fountains, and other tools

3
Data in the TCP/IP World
  • Data is an ordered sequence of bytes
  • Generally split into packets
  • Typical download transaction
  • I need the file packets 1-100,000.
  • Sender sends packets in order (windows)
  • Packet 75 is missing, please re-send.
  • Clean semantics
  • File is stored this way
  • Reliability is easy
  • Works for point-to-point downloads

4
Problem Case Multicast
  • One sender, many downloaders
  • Midnight madness problem new software
  • Video-on-demand (not real time)
  • Can download to each individual separately
  • Doesnt scale
  • Can broadcast
  • All users must start at the same time?
  • Heterogeneous packet loss
  • Heterogeneous download rates

5
Digital Fountain Paradigm
  • Stop thinking of data as an
  • ordered stream of bytes.
  • Data is like water from a fountain
  • Put out your cup, stop when the cup is full.
  • You dont care which drops of water you get.
  • You dont care what order the drops get to your
    cup.

6
What is a Digital Fountain?
  • For this talk, a digital fountain is an
    ideal/paradigm for data transmission.
  • Vs. the standard (TCP) paradigm data is an
    ordered finite sequence of bytes.
  • Instead, with a digital fountain, a k symbol file
    yields an infinite data stream once you have
    received any k symbols from this stream, you can
    quickly reconstruct the original file.

7
Digital Fountains for Multicast
  • Packets sent from a single source along a tree.
  • Everyone grabs what they can.
  • Starting time does not matter start whenever.
  • Packet loss does not matter avoids feedback
    explosion of lost packets.
  • Heterogeneous download rates do not matter drop
    packets at routers as needed for proper rate.
  • When a user has filled their cup, they leave the
    multicast session.

8
Digital Fountains for Parallel Downloads
  • Download from multiple sources simultaneously and
    seamlessly.
  • All sources fill the cup since each fountain
    has an infinite collection of packets, no
    duplicates.
  • Relative fountain speeds unimportant just need
    to get enough.
  • No coordination among sources necessary.
  • Combine multicast and parallel downloading.
  • Wireless networks, multiple stations and antennas.

9
Digital Fountains forPoint-to-Point Data
Transmission
  • TCP has problems over long-distance connections.
  • Packets must be acknowledged to increase sending
    window (packets in flight).
  • Long round-trip time leads to slow acks, bounding
    transmission window.
  • Any loss increases the problem.
  • Using digital fountain TCP-friendly congestion
    control can greatly speed up connections.
  • Separates the what you send from how much you
    send.
  • Do not need to buffer for retransmission.

10
One-to-Many TCP
  • Setting Web server with popular files, may have
    many open connections serving same file.
  • Problem has to have a separate buffer, state
    for each connection to handle retransmissions.
  • Limits number of connections per server.
  • Instead, use a digital fountain to generate
    packets useful for all connections for that file.
  • Separates the what you send from how much you
    send.
  • Do not need to buffer for retransmission.
  • Keeps TCP semantics, congestion control.

11
  • Digital fountains seem great!
  • But do they really exist?

12
How Do We Build a Digital Fountain?
  • We can construct (approximate) digital fountains
    using erasure codes.
  • Including Reed-Solomon, Tornado, LT, fountain
    codes.
  • Generally, we only come close to the ideal of the
    paradigm.
  • Streams not truly infinite encoding or decoding
    times coding overhead.

13
Digital Fountains through Erasure Codes
Message
n
Encoding Algorithm
Encoding
cn
Transmission
Received
Decoding Algorithm
Message
n
14
Reed-Solomon Codes
  • In theory, can produce an unlimited number of
    encoding symbols, only need k to recover.
  • In practice, limited by
  • Field size (usually 256 or 65,536)
  • Quadratic encoding/decoding times
  • These problems ameliorated by striping data.
  • But raises overhead now many more than k
    packets required to recover.
  • Conclusion may be suitable for some
    applications, but far from practical or
    theoretical goals of a digital fountain.

15
Tornado Codes
  • Irregular low-density parity check codes.
  • Based on graphs k input symbols lead to n
    encoding symbols, using XORs.
  • Sparse set of equations derived from input
    symbols.
  • Solve received set of equations using back
    substitution.
  • Properties
  • Graph of size n agreed on by encoder, decoder,
    and stored.
  • Need k(1e) symbols to decode, for some e gt 0.
  • Encoding/decoding time proportional to n ln
    (1/e).

16
Tornado CodesAn Example
17
Encoding Process
18
Decoding Process Substitution Recovery
indicates right node has one edge
19
Regular Graphs
Random Permutation of the Edges
Degree 6
Degree 3
20
Decoding Process Analysis
Induced Graph
Recovered

Missing/not yet recovered

21
3-6 Regular Graph Analysis
Left
Right
Left
22
3-6 Regular Graph Equation
Want y lt x for all 0 lt x lt a
Works for a lt 0.43
23
Irregular Graphs
  • 3-6 regular graphs can correct up to 0.43
    fraction of erasures.
  • Best possible, with n/2 constraints for n
    symbols, would be 0.5.
  • 3-6 gives best performance of all regular graphs.
  • Need irregular graphs, with varying degrees, to
    reach optimality.

24
Tornado Codes Weaknesses
  • Encoding size n must be fixed ahead of time.
  • Memory, encoding and decoding times proportional
    to n, not k.
  • Overhead factor of (1e).
  • Hard to design around. In practice e 0.05.
  • Conclusion Tornado codes a dramatic step
    forward, allowing good approximations to digital
    fountains for many applications.
  • Key problem fixed encoding size.

25
Decoding Process Direct Recovery
26
Digital Fountains through Erasure Codes Problem
Message
n
Encoding Algorithm
Encoding
cn
Transmission
Received
Decoding Algorithm
Message
n
27
Digital Fountains through Erasure Codes Solution
Message
n
Encoding Algorithm
Encoding
Transmission
Received
Decoding Algorithm
Message
n
28
LT Codes
  • Key idea graph is implicit, rather than
    explicit.
  • Each encoding symbol is the XOR of a random
    subset of neighbors, independent of other
    symbols.
  • Each encoding symbol carries a small header,
    telling what message symbols it is the XOR of.
  • No initial graph graph derived from received
    symbols.
  • Properties
  • Infinite supply of packets possible.
  • Need k o(k) symbols to decode.
  • Decoding time proportional to k ln k.
  • On average, ln k time to produce an encoding
    symbol.

29
LT Codes
  • Conclusion making the graph implicit gives us
    an almost ideal digital fountain.
  • One remaining issue why does average degree
    need to be around ln k?
  • Standard coupon collectors problem for each
    message symbol to be hit by some equation, need k
    ln k variables in the equations.
  • Can remove this problem by pre-coding.

30
Rateless/Raptor Codes
  • Pre-coding independently described by
    Shokrollahi, Maymoukov.
  • Rough idea
  • Expand original k message symbols to k (1e)
    symbols using (for example) a Tornado code.
  • Now use an LT code on the expanded message.
  • Dont need to recover all of the expanded message
    symbols, just enough to recover original message.

31
Raptor/Rateless Codes
  • Properties
  • Infinite supply of packets possible.
  • Need k(1e) symbols to decode, for some e gt 0.
  • Decoding time proportional to k ln (1/e).
  • On average, ln (1/e) (constant) time to produce
    an encoding symbol.
  • Very efficient.

Raptor codes give, in practice, a digital
fountain.
32
Impact on Coding
  • These codes are examples of low-density
    parity-check (LDPC codes).
  • Subsequent work designed LDPC codes for
    error-correction using these techniques.
  • Recent developments LDPC codes approaching
    Shannon capacity for most basic channels.

33
Putting Digital Fountains To Use
  • Digital fountains are out there.
  • Digital Fountain, Inc. sells them.
  • Limitations to their use
  • Patent issues.
  • Perceived complexity.
  • Lack of reference implementation.
  • What is the killer app?

34
Patent Issues
  • Several patents / patents pending on irregular
    LDPC codes, LT codes, Raptor codes by Digital
    Fountain, Inc.
  • Supposition this stifles external innovation.
  • Potential threat of being sued.
  • Potential lack of commercial outlet for research.
  • Suggestion unpatented alternatives that lead to
    good approximations of a digital fountain would
    be useful.
  • There is work going on in this area, but more is
    needed to keep up with recent developments in
    rateless codes.

35
Perceived Complexity
  • Digital fountains are now not that hard
  • but networking people do not want to deal with
    developing codes.
  • A research need
  • A publicly available, easy to use, reasonably
    good black box digital fountain implementation
    that can be plugged in to research prototypes.
  • Issue patents.
  • Legal risk suggests such a black box would need
    to be based on unpatented codes.

36
Whats the Killer App?
  • Multicast was supposed to be the killer app.
  • But IP multicast was/is a disaster.
  • Distribution now handled by contend distributions
    companies, e.g. Akamai.
  • Possibilities
  • Overlay multicast.
  • Big wireless e.g. automobiles, satellites.
  • Others???

37
Conclusions, Part I
Stop thinking of data as an ordered stream of
bytes. Think of data as a digital
fountain. Digital fountains are implementable in
practice with erasure codes.
38
A Short Breather
  • Weve covered digital fountains.
  • Next up
  • Digital fountains for overlay networks.
  • And other tricks!
  • Pause for questions, 30 second stretch.

39
Overlays for Content Delivery
  • A substitute for IP multicast.
  • Build distribution topology out of unicast
    connections (tunnels).
  • Requires active participation of end-systems.
  • Native IP multicast unnecessary.
  • Saves considerable bandwidth over N unicast
    solution.
  • Basic paradigm easy to build and deploy.

40
Limitations of Existing Schemes
  • Tree-like topologies.
  • Rooted in history (IP Multicast).
  • Limitations
  • bandwidth decreases monotonically from the
    source.
  • losses increase monotonically along a path.
  • Does this matter in practice?
  • Anecdotal and experimental evidence says yes
  • Downloads from multiple mirror sites in parallel.
  • Availability of better routes.
  • Peer-to-peer Morpheus, Kazaa and Grokster.

41
An Illustrative Example
1

1. A basic tree topology.
42
Our Philosophy
  • Go beyond trees.
  • Use additional links and bandwidth by
  • downloading from multiple peers in parallel
  • taking advantage of perpendicular bandwidth
  • Has potential to significantly speed up
    downloads
  • But only effective if
  • collaboration is carefully orchestrated
  • methods are amenable to frequent adaptation of
    the overlay topology

43
Suitable Applications
  • Prerequisite conditions
  • Available bandwidth between peers.
  • Differences in content received by peers.
  • Rich overlay topology.
  • Applications
  • Downloads of large, popular files.
  • Video-on-demand or nearly real-time streams.
  • Shared virtual environments.

44
Use Digital Fountains!
  • Intrinsic resilience to packet loss, reordering.
  • Better support for transient connections via
    stateless migration, suspension.
  • Peers with full content can always generate
    useful symbols.
  • Peers with partial content are more likely to
    have content to share.
  • But using a digital fountain comes at a price
  • Content is no longer an ordered stream.
  • Therefore, collaboration is more difficult.

45
Informed Content DeliveryDefinitions and
Problem Statement
  • Peers A and B have working sets of symbols SA, SB
    drawn from a large universe U and want to
    collaborate effectively.
  • Key components
  • Summarize Furnish a concise and useful sample
    of a working set to a peer.
  • Approximately Reconcile Compute as many
    elements in SA - SB as possible and transmit
    them.
  • Do so with minimal control messaging overhead.

46
Summarization
  • Goal each peer has a 1 packet calling card.
  • Can be used to estimate SA SB.
  • One possibility random sampling.
  • B sends A a random sample of k elements of SB.
  • Each element is in SA with probability
  • Negative must search SA.
  • Negative hard to work with multiple summaries.
  • Alternative min-wise independent sampling.

47
Min-Wise Summaries
  • Let U be the set of 64 bit numbers, and p be a
    random permutation on U. Then
  • Calling card for A keep vector of k values min
    pj(A), j1k.
  • To estimate , count the
    j for which min pj(A) min pj(B), divide by k.

48
Min-Wise Summaries Example
49
Recoding An Intermediate Solution
  • Problem What to transmit when peers have
    similar content?
  • Solution Allow peers to probabilistically
    hedge their bets, minimizing chance of
    transmission of useless content.
  • Example
  • Suppose the resemblance between SA and SB is
    0.9.If A sends a symbol at random the
    probability of it being useful to B is 0.1.
  • A better strategy is to XOR 10 random symbols
    together.
  • B can extract one useful symbol with
    probability 10 x (1/10) x
    (9/10)9 gt 1/e ? 0.37

50
Approximate Reconciliation
  • Suppose summarization suggests collaboration is
    worthwhile.
  • Goal compute as many elements in SA - SB as
    possible, with low communication.
  • Idea we do not need all of SA - SB , just as
    much as possible.
  • Use Bloom filters.

51
Lookup Problem
  • Given a set SA x1,x2,x3,xn on a universe U,
    want to answer queries of the form
  • Bloom filter provides an answer in
  • Constant time (time to hash).
  • Small amount of space.
  • But with some probability of being wrong.

52
Bloom Filters
Start with an m bit array, filled with 0s.
Hash each item xj in S k times. If Hi(xj) a,
set Ba 1.
To check if y is in S, check B at Hi(y). All k
values must be 1.
Possible to have a false positive all k values
are 1, but y is not in S.
53
Errors
  • Assumption We have good hash functions, look
    random.
  • Given m bits for filter and n elements, choose
    number k of hash functions to minimize false
    positives
  • Let
  • Let
  • As k increases, more chances to find a 0, but
    more 1s in the array.
  • Find optimal at k (ln 2)m/n by calculus.

54
Example
m/n 8
Opt k 8 ln 2 5.45...
55
Bloom Filters for Reconciliation
  • B transmits a Bloom filter of its set to A A
    then sends packets from the set difference.
  • All elements will be in difference no false
    negatives.
  • Not all element in difference found false pos.
  • Improvements
  • Compressed Bloom filters
  • Approximate Reconciliation Trees

56
Experimental Scenarios
  • Three methods for collaboration
  • Uninformed A transmits symbols at random to B.
  • Speculative B transmits a minwise summary to
    A A then sends recoded symbols to B.
  • Reconciled B transmits a Bloom filter of its
    set to A A then sends packets from the set
    difference.
  • Overhead
  • Decoding overhead with erasure codes, fixed
    2.5.
  • Reception overhead useless duplicate packets.
  • Recoding overhead useless recoding packets.

57
Pairwise Reconciliation
128MB file 96K input symbols 115K distinct
symbols in system initially
58
Four peers in parallel
128MB file 96K input symbols 105K distinct
symbols in system initially
59
Four peers, periodic updates
128MB file 96K input symbols 105K distinct
symbols in system initiallyFilters updated at
every 10.
60
Subsequent Work
  • Maymounkov each source sends a stream of
    consecutive encoded packets.
  • Possibly simplifies collaboration, with loss of
    flexibility.
  • Bullet (SOSP 03)
  • An implementation with our ideas, plus purposeful
    distribution of different content.
  • Network coding
  • Nodes inside the network can compute on the
    input, rather than just the endpoints.
  • Potentially more powerful paradigm
  • Practice?

61
Conclusions
  • Even with ultimate routing topology optimization,
    the choice of what to send is paramount to
    content delivery.
  • Digital fountain model ideal for fluid and
    ephemeral network environments.
  • Collaborations with coded content worthwhile.
  • Richly connected topologies are key to harnessing
    perpendicular bandwidth.
  • Wanted more algorithms for intelligent
    collaboration.

62
Why regular graphs are bad
Right degree 2d implies Prright degree 1
d
Left node has on average
neighbors of degree one.
63
Irregular Graphs
64
Degree Sequence Functions
  • Left Side
  • fraction of edges of degree i on the left in
    the original graph.
  • Right Side
  • fraction of edges of degree i on the right
    in the original graph.

65
Irregular Graph Analysis
Left
Right
Left
66
Irregular Graph Condition
Want y lt x for all 0 lt x lt a
67
Good Left Degree SequenceTruncated Heavy Tail
D 9, N
Fraction of nodes of degree i is
Average node degree is
68
Good Right Degree SequencePoisson
Average node degree is
69
Good Degree Sequence Functions
Want y lt x for all 0 lt x lt a
Works for
70
Tornado Code Performance
Reception Efficiency
Time overhead (Average left degree)
71
Why irregular graphs are good
Average right degree 2ln(D) implies Prright
degree 1 1/(D1)
D1
Left node of max degree has on average one
neighbor of degree one.
72
Digital FountainsA Survey and Look Forward
  • Michael Mitzenmacher

73
Goals of the Talk
  • Explain the digital fountain paradigm for network
    communication.
  • Examine related advances in coding.
  • Summarize work on applications.
  • Speculate on what comes next.

74
How Do We Build a Digital Fountain?
  • We can construct (approximate) digital fountains
    using erasure codes.
  • Including Reed-Solomon, Tornado, LT, fountain
    codes.
  • Generally, we only come close to the ideal of the
    paradigm.
  • Streams not truly infinite encoding or decoding
    times coding overhead.

75
Reed-Solomon Codes
  • In theory, can produce an unlimited number of
    encoding symbols, only need k to recover.
  • In practice, limited by
  • Field size (usually 256 or 65,536)
  • Quadratic encoding/decoding times
  • These problems ameliorated by striping data.
  • But raises overhead now many more than k
    packets required to recover.
  • Conclusion may be suitable for some
    applications, but far from practical or
    theoretical goals of a digital fountain.

76
Tornado Codes
  • Irregular low-density parity check codes.
  • Based on graphs k input symbols lead to n
    encoding symbols, using XORs.
  • Sparse set of equations derived from input
    symbols.
  • Solve received set of equations using back
    substitution.
  • Properties
  • Graph of size n agreed on by encoder, decoder,
    and stored.
  • Need k(1e) symbols to decode, for some e gt 0.
  • Encoding/decoding time proportional to n ln
    (1/e).

77
Tornado Codes Weaknesses
  • Encoding size n must be fixed ahead of time.
  • Memory, encoding and decoding times proportional
    to n, not k.
  • Overhead factor of (1e).
  • Hard to design around. In practice e 0.05.
  • Conclusion Tornado codes a dramatic step
    forward, allowing good approximations to digital
    fountains for many applications.
  • Key problem fixed encoding size.

78
LT Codes
  • Key idea graph is implicit, rather than
    explicit.
  • Each encoding symbol is the XOR of a random
    subset of neighbors, independent of other
    symbols.
  • Each encoding symbols carries a small header,
    telling what message symbols it is the XOR of.
  • No initial graph graph derived from received
    symbols.
  • Properties
  • Infinite supply of packets possible.
  • Need k o(k) symbols to decode.
  • Decoding time proportional to k ln k.
  • On average, ln k time to produce an encoding
    symbol.

79
LT Codes
  • Conclusion making the graph implicit gives us
    an almost ideal digital fountain.
  • One remaining issue why does average degree
    need to be around ln k?
  • Standard coupon collectors problem for each
    message symbol to be hit by some equation, need k
    ln k variables in the equations.
  • Can remove this problem by pre-coding.

80
Rateless/Raptor Codes
  • Pre-coding independently described by
    Shokrollahi, Maymoukov.
  • Rough idea
  • Expand original k message symbols to k (1e)
    symbols using (for example) a Tornado code.
  • Now use an LT code on the expanded message.
  • Dont need to recover all of the expanded message
    symbols, just enough to recover original message.

81
Raptor/Rateless Codes
  • Properties
  • Infinite supply of packets possible.
  • Need k(1e) symbols to decode, for some e gt 0.
  • Decoding time proportional to k ln (1/e).
  • On average, ln (1/e) (constant) time to produce
    an encoding symbol.
  • Conclusion these codes can be made very
    efficient and deliver on the promise of the
    digital fountain paradigm.

82
Applications
  • Long-distance transmission (avoiding TCP)
  • Reliable multicast
  • Parallel downloads
  • One-to-many TCP
  • Content distribution on overlay networks
  • Streaming video

83
Point-to-Point Data Transmission
  • TCP has problems over long-distance connections.
  • Packets must be acknowledged to increase sending
    window (packets in flight).
  • Long round-trip time leads to slow acks, bounding
    transmission window.
  • Any loss increases the problem.
  • Using digital fountain TCP-friendly congestion
    control can greatly speed up connections.
  • Separates the what you send from how much you
    send.
  • Do not need to buffer for retransmission.

84
Reliable Multicast
  • Many potential problems when multicasting to
    large audience.
  • Feedback explosion of lost packets.
  • Start time heterogeneity.
  • Loss/bandwidth heterogeneity.
  • A digital fountain solves these problems.
  • Each user gets what they can, and stops when they
    have enough.

85
Downloading in Parallel
  • Can collect data from multiple digital fountains
    for the same source seamlessly.
  • Since each fountain has an infinite collection
    of packets, no duplicates.
  • Relative fountain speeds unimportant just need
    to get enough.
  • Combined multicast/multigather possible.

86
One-to-Many TCP
  • Setting Web server with popular files, may have
    many open connections serving same file.
  • Problem has to have a separate buffer, state
    for each connection to handle retransmissions.
  • Limits number of connections per server.
  • Instead, use a digital fountain to generate
    packets useful for all connections for that file.
  • Separates the what you send from how much you
    send.
  • Do not need to buffer for retransmission.
  • Keeps TCP semantics, congestion control.

87
Distribution on Overlay Networks
  • Encoded data make sense for overlay networks.
  • Changing, heterogeneous network conditions.
  • Allows multicast.
  • Allows downloading from multiple sources, as well
    as peers.
  • Problem peers may be getting same encoded
    packets as you, via the multicast.
  • Not standard digital fountain paradigm.
  • Requires reconciliation techniques to find peers
    with useful packets.

88
Video Streaming
  • For near-real-time video
  • Latency issue.
  • Solution break into smaller blocks, and encode
    over these blocks.
  • Equal-size blocks.
  • Blocks increases in size geometrically, for only
    logarithmically many blocks.
  • Engineering to get right latency, ensure blocks
    arrive on time for display.
Write a Comment
User Comments (0)
About PowerShow.com