Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker

Description:

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, ... mutable content. anonymity. Outline. Introduction. Design. Evaluation. Strengths & Weaknesses ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 54
Provided by: nik110
Category:

less

Transcript and Presenter's Notes

Title: Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker


1
A Scalable, Content-Addressable Network
1,2
3
1
  • Sylvia Ratnasamy, Paul Francis, Mark Handley,
    Richard Karp, Scott Shenker

1,2
1
2
3
1
Tahoe Networks
U.C.Berkeley
ACIRI
2
Outline
  • Introduction
  • Design
  • Evaluation
  • Strengths Weaknesses

3
Internet-scale hash tables
  • Hash tables
  • essential building block in software systems
  • Internet-scale distributed hash tables
  • equally valuable to large-scale distributed
    systems?
  • peer-to-peer systems
  • Napster, Gnutella, Groove, FreeNet, MojoNation
  • large-scale storage management systems
  • Publius, OceanStore, PAST, Farsite, CFS ...
  • mirroring on the Web

4
Content-Addressable Network(CAN)
  • CAN Internet-scale hash table
  • Interface
  • insert(key,value)
  • value retrieve(key)
  • Properties
  • scalable
  • operationally simple
  • good performance
  • Related systems Chord/Pastry/Tapestry/Buzz/Plaxto
    n ...

5
Problem Scope
  • Design a system that provides the interface
  • scalability
  • robustness
  • performance
  • security
  • Application-specific, higher level primitives
  • keyword searching
  • mutable content
  • anonymity

6
Outline
  • Introduction
  • Design
  • Evaluation
  • Strengths Weaknesses
  • Ongoing Work

7
CAN basic idea
8
CAN basic idea
insert(K1,V1)
9
CAN basic idea
insert(K1,V1)
10
CAN basic idea
(K1,V1)
11
CAN basic idea
retrieve (K1)
12
CAN solution
  • virtual Cartesian coordinate space
  • entire space is partitioned amongst all the nodes
  • every node owns a zone in the overall space
  • abstraction
  • can store data at points in the space
  • can route from one point to another
  • point node that owns the enclosing zone

13
CAN simple example
node Iinsert(K,V)
I
14
CAN simple example
node Iinsert(K,V)
I
(1) a hx(K)
x a
15
CAN simple example
node Iinsert(K,V)
I
(1) a hx(K) b hy(K)
y b
x a
16
CAN simple example
node Iinsert(K,V)
I
(1) a hx(K) b hy(K)
(2) route(K,V) -gt (a,b)
17
CAN simple example
node Iinsert(K,V)
I
(1) a hx(K) b hy(K)
(K,V)
(2) route(K,V) -gt (a,b) (3) (a,b) stores
(K,V)
18
CAN simple example
node Jretrieve(K)
(1) a hx(K) b hy(K)
(K,V)
(2) route retrieve(K) to (a,b)
J
19
CAN
  • Data stored in the CAN is addressed by name
    (i.e. key), not location (i.e. IP address)

20
CAN routing table
21
CAN routing
(a,b)
(x,y)
22
CAN routing
  • A node only maintains state for its immediate
    neighboring nodes
  • Compared to geographical routing
  • can be considered as greedy forwarding in
    Cartesian space instead of physical space.

23
CAN node insertion
Bootstrap node
new node
1) Discover some node I already in CAN
24
CAN node insertion
I
new node
1) discover some node I already in CAN
25
CAN node insertion
(p,q)
2) pick random point in space
I
new node
26
CAN node insertion
(p,q)
J
I
new node
3) I routes to (p,q), discovers node J
27
CAN node insertion
new
J
4) split Js zone in half new owns one half
28
CAN node insertion
  • Inserting a new node affects only a single other
    node and its immediate neighbors
  • Problem
  • Inefficient if the new node and its neighbor(J)
    is far away from each other in terms of
    communication.

29
CAN node failures
  • Need to repair the space
  • recover database (weak point)
  • soft-state updates
  • use replication, rebuild database from replicas
  • repair routing
  • takeover algorithm

30
CAN takeover algorithm
  • Simple failures
  • know your neighbors neighbors
  • a node periodically broadcast its zone
    coordinates and a list of its neighbors and their
    zone coordinates.
  • when a node fails, one of its neighbors takes
    over its zone
  • self-set timer decides which neighbor to take
    over.
  • More complex failure modes
  • simultaneous failure of multiple adjacent nodes
  • scoped flooding to discover neighbors
  • hopefully, a rare event

31
CAN node failures
  • Only the failed nodes immediate neighbors are
    required for recovery

32
Design recap
  • Basic CAN
  • completely distributed
  • self-organizing
  • nodes only maintain state for their immediate
    neighbors
  • Comment
  • basic CAN does not work very well, additional
    design features are necessary

33
Design improvements
  • The neighboring relationship in coordinate space
    may be completely different from that in
    underlying IP network.
  • How can coordinate space approximately map to
    physical space?
  • Topologically-sensitive CAN construction
  • distributed binning

34
Distributed Binning
  • Goal
  • bin nodes such that co-located nodes land in same
    bin
  • neighbors in the coordinate space are likely
    close in IP network
  • reduce per-hop latency, prevent overly network
    routing anomaly
  • Idea
  • well known set of landmark machines
  • each CAN node, measures its RTT to each landmark
  • orders the landmarks in order of increasing RTT
  • CAN construction
  • place nodes from the same bin close together on
    the CAN

35
Distributed Binning
  • 4 Landmarks (placed at 5 hops away from each
    other)
  • naïve partitioning

dimensions2
dimensions4
w/o binning w/ binning
w/o binning w/ binning
?
20
15
latency Stretch
10
5
1K
4K
1K
4K
256
256
number of nodes
36
Design improvements
  • Multi-dimensioned coordinated spaces
  • To reduce path length
  • path length is O(d n 1/d)
  • Hash function more complex?

37
Design improvements
  • Multiple, independent spaces (realities)
  • To forward a message, a node checks all its
    neighbors on each reality instead of one reality,
    and do greedy forwarding.
  • Reduce routing path length
  • other benefits
  • Improve data availability (hash table are
    replicated on each reality).
  • Improve routing fault tolerance.

38
Design improvements
  • Better CAN routing metrics.
  • Use RTT instead of Cartesian distance when
    selecting a next hop neighbor.
  • To improve per-hop(CAN hop) latency

39
Design improvements
  • Overloading coordinate zone
  • allow multiple node to share the same zone.
  • A node maintain a list of its peers in addition
    to its neighbors.
  • A node selects one neighbor from the peers of the
    neighboring zone
  • The contents of the hash table may be either
    divided or replicated across the nodes in a zone.
  • Reduce path length
  • reduce of zones
  • reduce per-hop latency
  • has more choice in selecting a neighbor.

40
CAN load balancing
  • Two pieces
  • Dealing with hot-spots
  • popular (key,value) pairs
  • nodes cache recently requested entries
  • overloaded node replicates popular entries at
    neighbors
  • Need to deal with cache consistency and update
    policy problem.
  • Uniform coordinate space partitioning
  • uniformly spread (key,value) entries
  • uniformly spread out routing load

41
Uniform Partitioning
  • Added check
  • at join time, pick a zone
  • check neighboring zones
  • pick the largest zone and split that one

42
Uniform Partitioning
65,000 nodes, 3 dimensions
w/o check
w/ check
Percentage of nodes
V
2V
4V
8V
Volume
43
CAN Robustness
  • Completely distributed
  • no single point of failure ( not applicable to
    pieces of database when node failure happens)
  • Not exploring database recovery (in case there
    are multiple copies of database)
  • Resilience of routing
  • can route around trouble

44
Outline
  • Introduction
  • Design
  • Evaluation
  • Strengths Weaknesses

45
Evaluation
  • Scalability
  • Low-latency
  • Load balancing
  • Robustness

46
CAN scalability
  • For a uniformly partitioned space with n nodes
    and d dimensions
  • per node, number of neighbors is 2d
  • average routing path is (dn1/d)/4 hops
  • simulations show that the above results hold in
    practice
  • Can scale the network without increasing per-node
    state
  • Chord/Plaxton/Tapestry/Buzz
  • log(n) nbrs with log(n) hops

47
CAN low-latency
  • Problem
  • latency stretch (CAN routing delay)
    (IP routing delay)
  • application-level routing may lead to high
    stretch
  • Solution
  • increase dimensions, realities (reduce the path
    length)
  • Heuristics (reduce the per-CAN-hop latency)
  • RTT-weighted routing
  • multiple nodes per zone (peer nodes)
  • deterministically replicate entries

48
CAN low-latency
dimensions 2
w/o heuristics
w/ heuristics
Latency stretch
16K
32K
65K
131K
nodes
49
CAN low-latency
dimensions 10
w/o heuristics
w/ heuristics
Latency stretch
16K
32K
65K
131K
nodes
50
Outline
  • Introduction
  • Design
  • Evaluation
  • Strengths Weaknesses

51
Strengths
  • More resilient than flooding broadcast networks
  • Efficient at locating information
  • Fault tolerant routing
  • Node Data High Availability (w/ improvement)
  • Manageable routing table size network traffic

52
Weaknesses
  • Impossible to perform a fuzzy search
  • Susceptible to malicious activity
  • Maintain coherence of all the indexed data
    (Network overhead, Efficient distribution)
  • Still relatively higher routing latency
  • Poor performance w/o improvement

53
Compare Can and Pastry
  • CAN is greedy forwarding in Cartesian coordinate
    space.
  • Pastry is maximum address prefix matching in a
    tree structure like routing table.
  • The routing table at each node in Pastry
    maintains more information
  • Routing table maintenance
  • Both are local.

54
Compare Can and Pastry
  • State maintained at each node
  • CAN 2d
  • Pastry O(log N )
  • Overlay network path length
  • CAN average routing path is (dn1/d)/4
  • Pastry O(log N)
  • Latency stretch
  • Pastry 50 longer than underlying IP network.
  • CAN most of time much longer except using CAN
    with many additional features.
Write a Comment
User Comments (0)
About PowerShow.com