Failure Recovery for Structured P2P Network: Protocol Design and Performance Evaluation - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Failure Recovery for Structured P2P Network: Protocol Design and Performance Evaluation

Description:

Generate join and failure events for 10,000 simulation seconds ... Sustainable churn rate is upper bounded by the network's join capacity ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 29

Provided by: huaiy

Category:

more less

Transcript and Presenter's Notes

Title: Failure Recovery for Structured P2P Network: Protocol Design and Performance Evaluation

1
Failure Recovery for Structured P2P Network
Protocol Design and Performance Evaluation

Huaiyu Liu
Joint work with Prof. Simon S. Lam

2
Structured P2P Network

The routing scheme Hypercube routing scheme used
by PRR, Pastry, Tapestry, etc.
Important issue Design of protocols to construct
and maintain consistent neighbor tables under
node dynamics
Question How high a rate of node dynamics can be
supported by a structured P2P network?

3
Outline

The problem
Overview of hypercube routing scheme
Our approach
K-consistent network
Basic failure recovery protocol
Integrate failure recovery with a join protocol
Churn experiments
Conclusions

4
Overview of Hypercube Routing Scheme

Each node has an ID, represented by d digits of
base b.
E.g. 10323 (d 5, b 4)
Routing to a destination node is resolved digit
by digit.

21233
Example source 21233, destination 03231
5
Neighbor Table

d levels, b entries at each level
Neighbors stored in an entry must have the
required suffix of the entry

Example neighbor table of node 21233 (d5, b4)
Level 0
Level 1
Level 2
Level 3
Level 4
6
Outline

The problem
Overview of hypercube routing scheme
Our approach
K-consistent network
Basic failure recovery protocol
Integrate failure recovery with a join protocol
Churn experiments
Conclusions

7
K-consistent Network Definition

A network is K-consistent iff Every table entry
stores min(K,H) neighbors, where H is the number
of nodes with the required suffix of the entry

8
K-consistent Network Benefits

K-consistency
implies consistency, which guarantees a path for
any source-destination pair
provides K disjoint paths for each
source-destination pair with prob. close to 1
facilitates failure recovery

9
Basic Failure Recovery

Assumption
A network of n nodes, initially K-consistent
f out of n nodes fail (fail-stop)
Objective When all failure recovery processes
terminate,
all recoverable holes are repaired
the network is K-consistent again

10
Basic Failure Recovery Protocol

A sequence of search steps, based on local
information (neighbors and reverse neighbors)

Neighbors of node 21233
Reverser neighbors of 21233 the set of nodes
that stores 21233 as a neighbor
STEP (a) search among neighbors and
reverse-neighbors
11
Basic Failure Recovery Protocol

A sequence of search steps, based on local
information (neighbors and reverse neighbors)

Neighbors of node 21233
STEP (b) query remaining neighbors in the same
entry
12
Basic Failure Recovery Protocol

A sequence of search steps, based on local
information (neighbors and reverse neighbors)

Neighbors of node 21233
STEP (c) query remaining neighbors at the same
level
13
Basic Failure Recovery Protocol

A sequence of search steps, based on local
information (neighbors and reverse neighbors)

Neighbors of node 21233
STEP (d) query all remaining neighbors
14
Failure Recovery is Effective

2,080 experiments, K15, n10008000
5 - 50 nodes fail
All recoverable holes are repaired in each
experiment, for K2

15
Failure Recovery is Efficient

Majority of holes repaired in step (a), no
communication cost
Almost all holes repaired by step (c), at most
2Kb messages for repairing a hole

Cumulative percentage of holes repaired
Example 800 out of 4000 nodes fail, b16, d40
16
Integrated Protocols

Integrate failure recovery with our join protocol
ICDCS03
Distinguish T-nodes from S-nodes
S-nodes Nodes finished joining
T-nodes Nodes joining a network
Requires extensions to both protocols
Give failure recovery actions higher priority, to
prevent circular reasoning

17
Results for Concurrent Joins and failures

980 experiments
Start with a K-consistent network
Massive joins and failures occur concurrently
For K2, K-consistency is maintained at the end
in every experiment

18
Outline

The problem
Background
Our approach
K-consistent network
Basic failure recovery
Integrate failure recovery with a join protocol
Churn experiments
Conclusions

19
Churn Experiments

How high a rate of node dynamics can be
sustained?

Start with a K-consistent network of 2000-node
Generate join and failure events for 10,000
simulation seconds
Join rate failure rate (churn
rate)
Take a snapshot every 50 seconds
Evaluate connectivity and consistency measures
Convergence to K-consistency at the end

20
Observations

Sustainable churn rate is upper bounded by the
networks join capacity
Join capacity the rate at which new nodes can
join the network successfully
The limiting factors
K
failure rate
timeout value in each failure recovery step

21
Number of Nodes and S-nodes vs. Time
Timeout 10sec, K3
Timeout 5sec, K3
22
When Join Capacity is Exceeded

Number of T-nodes keeps increasing
Unable to converge to K-consistency at the end

K3 Timeout 10sec
23
How to Increase Join Capacity

Choose a smaller K or a smaller timeout value

K2, timeout 10 sec
K3, timeout 5 sec
24
Churn Experiment Summary
25
Churn Experiment Summary
n 2000, K3, timeout 10 sec
26
Max Churn Rate vs. Network Size

Max sustainable churn rate increases at least
linearly with network size
Stability improves when number of S-node
increases
Smaller K leads to higher join capacity

27
Min Avg. Lifetime vs. Network Size

The trend suggests when ngt2000, avg. lifetime lt
12.1 min for K3,
lt
8.3 min for K2

28
Conclusions

Our protocols are effective, efficient, and
stable, for average node lifetime as short as 8.3
min, given n2000, K2, timeout 5sec
Each network has a join capacity that
upper bounds its join rate
decreases when failure rate increases
can be increased by a smaller K or a smaller
timeout value
Recommended values for K
for network with a high churn rate, K2 or 3
for network with a low churn rate, K3 or higher
(say, 4 or 5)