Failure Recovery for Structured P2P Network: Protocol Design and Performance Evaluation - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Failure Recovery for Structured P2P Network: Protocol Design and Performance Evaluation

Description:

Generate join and failure events for 10,000 simulation seconds ... Sustainable churn rate is upper bounded by the network's join capacity ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 29
Provided by: huaiy
Category:

less

Transcript and Presenter's Notes

Title: Failure Recovery for Structured P2P Network: Protocol Design and Performance Evaluation


1
Failure Recovery for Structured P2P Network
Protocol Design and Performance Evaluation
  • Huaiyu Liu
  • Joint work with Prof. Simon S. Lam

2
Structured P2P Network
  • The routing scheme Hypercube routing scheme used
    by PRR, Pastry, Tapestry, etc.
  • Important issue Design of protocols to construct
    and maintain consistent neighbor tables under
    node dynamics
  • Question How high a rate of node dynamics can be
    supported by a structured P2P network?

3
Outline
  • The problem
  • Overview of hypercube routing scheme
  • Our approach
  • K-consistent network
  • Basic failure recovery protocol
  • Integrate failure recovery with a join protocol
  • Churn experiments
  • Conclusions

4
Overview of Hypercube Routing Scheme
  • Each node has an ID, represented by d digits of
    base b.
  • E.g. 10323 (d 5, b 4)
  • Routing to a destination node is resolved digit
    by digit.

21233
Example source 21233, destination 03231
5
Neighbor Table
  • d levels, b entries at each level
  • Neighbors stored in an entry must have the
    required suffix of the entry

Example neighbor table of node 21233 (d5, b4)
Level 0
Level 1
Level 2
Level 3
Level 4
6
Outline
  • The problem
  • Overview of hypercube routing scheme
  • Our approach
  • K-consistent network
  • Basic failure recovery protocol
  • Integrate failure recovery with a join protocol
  • Churn experiments
  • Conclusions

7
K-consistent Network Definition
  • A network is K-consistent iff Every table entry
    stores min(K,H) neighbors, where H is the number
    of nodes with the required suffix of the entry

8
K-consistent Network Benefits
  • K-consistency
  • implies consistency, which guarantees a path for
    any source-destination pair
  • provides K disjoint paths for each
    source-destination pair with prob. close to 1
  • facilitates failure recovery

9
Basic Failure Recovery
  • Assumption
  • A network of n nodes, initially K-consistent
  • f out of n nodes fail (fail-stop)
  • Objective When all failure recovery processes
    terminate,
  • all recoverable holes are repaired
  • the network is K-consistent again

10
Basic Failure Recovery Protocol
  • A sequence of search steps, based on local
    information (neighbors and reverse neighbors)

Neighbors of node 21233
Reverser neighbors of 21233 the set of nodes
that stores 21233 as a neighbor
STEP (a) search among neighbors and
reverse-neighbors
11
Basic Failure Recovery Protocol
  • A sequence of search steps, based on local
    information (neighbors and reverse neighbors)

Neighbors of node 21233
STEP (b) query remaining neighbors in the same
entry
12
Basic Failure Recovery Protocol
  • A sequence of search steps, based on local
    information (neighbors and reverse neighbors)

Neighbors of node 21233
STEP (c) query remaining neighbors at the same
level
13
Basic Failure Recovery Protocol
  • A sequence of search steps, based on local
    information (neighbors and reverse neighbors)

Neighbors of node 21233
STEP (d) query all remaining neighbors
14
Failure Recovery is Effective
  • 2,080 experiments, K15, n10008000
  • 5 - 50 nodes fail
  • All recoverable holes are repaired in each
    experiment, for K2

15
Failure Recovery is Efficient
  • Majority of holes repaired in step (a), no
    communication cost
  • Almost all holes repaired by step (c), at most
    2Kb messages for repairing a hole

Cumulative percentage of holes repaired
Example 800 out of 4000 nodes fail, b16, d40
16
Integrated Protocols
  • Integrate failure recovery with our join protocol
    ICDCS03
  • Distinguish T-nodes from S-nodes
  • S-nodes Nodes finished joining
  • T-nodes Nodes joining a network
  • Requires extensions to both protocols
  • Give failure recovery actions higher priority, to
    prevent circular reasoning

17
Results for Concurrent Joins and failures
  • 980 experiments
  • Start with a K-consistent network
  • Massive joins and failures occur concurrently
  • For K2, K-consistency is maintained at the end
    in every experiment

18
Outline
  • The problem
  • Background
  • Our approach
  • K-consistent network
  • Basic failure recovery
  • Integrate failure recovery with a join protocol
  • Churn experiments
  • Conclusions

19
Churn Experiments
  • How high a rate of node dynamics can be
    sustained?
  • Start with a K-consistent network of 2000-node
  • Generate join and failure events for 10,000
    simulation seconds
  • Join rate failure rate (churn
    rate)
  • Take a snapshot every 50 seconds
  • Evaluate connectivity and consistency measures
  • Convergence to K-consistency at the end

20
Observations
  • Sustainable churn rate is upper bounded by the
    networks join capacity
  • Join capacity the rate at which new nodes can
    join the network successfully
  • The limiting factors
  • K
  • failure rate
  • timeout value in each failure recovery step

21
Number of Nodes and S-nodes vs. Time
Timeout 10sec, K3
Timeout 5sec, K3
22
When Join Capacity is Exceeded
  • Number of T-nodes keeps increasing
  • Unable to converge to K-consistency at the end

K3 Timeout 10sec
23
How to Increase Join Capacity
  • Choose a smaller K or a smaller timeout value

K2, timeout 10 sec
K3, timeout 5 sec
24
Churn Experiment Summary
25
Churn Experiment Summary
n 2000, K3, timeout 10 sec
26
Max Churn Rate vs. Network Size
  • Max sustainable churn rate increases at least
    linearly with network size
  • Stability improves when number of S-node
    increases
  • Smaller K leads to higher join capacity

27
Min Avg. Lifetime vs. Network Size
  • The trend suggests when ngt2000, avg. lifetime lt
    12.1 min for K3,
  • lt
    8.3 min for K2

28
Conclusions
  • Our protocols are effective, efficient, and
    stable, for average node lifetime as short as 8.3
    min, given n2000, K2, timeout 5sec
  • Each network has a join capacity that
  • upper bounds its join rate
  • decreases when failure rate increases
  • can be increased by a smaller K or a smaller
    timeout value
  • Recommended values for K
  • for network with a high churn rate, K2 or 3
  • for network with a low churn rate, K3 or higher
    (say, 4 or 5)
Write a Comment
User Comments (0)
About PowerShow.com