Making P2P Networks Scalable - PowerPoint PPT Presentation

About This Presentation

Title:

Making P2P Networks Scalable

Description:

Making P2P Networks Scalable a paper presentation by Derek Tingle – PowerPoint PPT presentation

Number of Views:76

Avg rating:3.0/5.0

Slides: 35

Provided by: swar95

Learn more at: https://www.cs.swarthmore.edu

Category:

more less

Transcript and Presenter's Notes

Title: Making P2P Networks Scalable

1
Making P2P Networks Scalable

a paper presentation by
Derek Tingle

2
P2P Basics

Files stored on clients machines
Typically read only
Search mechanism
Download mechanism
Wildly popular

3
Gnutella

Decentralized
Unstructured
Flood search
Routing Table

4
(No Transcript)
5
Gnutella

Decentralized
Unstructured
Flood search
Routing table
DANGER!
Not scalable

6
Design Goals

Allow the Gnutella-like p2p to handle higher
amount of queries
Make it scalable
Utilize heterogeneity of machines

7
Search Protocol

GIA search is based on random walks
Like floods, but less messages
But because random walks are blind, there are
scaling issues
So GIA uses biased walks
Toward high degree or high capacity?

8
High degree AND high capacity

Using a dynamic topology adaptation algorithm
Ensures
high capacity nodes have a high degree
low capacity nodes are close to high capacity
nodes
Level of satisfaction S
Measures how close the sum of capacities of a
node's neighbors normalized by degrees is to that
node's own capacity
The lower the satisfaction level, the shorter the
adaptation interval

9
To add a node n...

Accept if num_nabr is still lt max_nbrs
Select the node with the highest degree out of
the subset of neighbors with a capacity less than
that of n
Only drop that node if it has less neighbors than
n

10
One-hop replication

Each node records an index of neighbor nodes'
content
Ensures that high capacity nodes can respond to a
greater number of queries

11
Flow control

To avoid overloading a node
Only can direct queries to a neighbor if the
neighbor is ready
Node uses tokens to signify it can handle queries
Node gives out tokens at the rate it can process
queries
If queries are being queued, decrease allocation
rate
Weights the allocation of tokens for capacity
If a node isn't using tokens, they are allocated
to other neighbors
Can be piggy backed

12
Search Protocol (again)?

Biased random walks aren't random
Send queries to highest capacity neighbor with
tokens
Time To Live decremented at each node
Book-keeping limits same path traversal
MAX_RESPONSES decremented for each found answer
Append address of owning node to the forwarded
query

13
Query Resilience

Can't let a query die with a node
Keep-alive messages
query responses
dummy query responses
Originator can resend query if no keep-alive
messages arrive for a while
When the topology adapts, the previous
connections are maintained for a while

14
Simulations

GIA compared to
Flood
Random Walks over Random Topologies
Supernode mechanisms
queries only flooded between supernodes

15
Assumptions

All nodes produce queries at same rate
Capacity number of messages processed per unit
time
queues have infinite length
specific keyword searches
min_alloc min(C/num_nbrs) 4
For Flood and Super
average diameter is 7
TTL is 10
Look at relative performance, not absolute

16
Metrics

Success rate fraction of queries issued that
reach the file
hop-count
delay time taken from query's start to finish
Collapse Point (CP) node query rate at point
beyond which success rate drops below 90
Average hop-count before collapse

17
Performance Comparison

Search terminates after finding a single answer
5000 and 10,000 nodes for each system
.01, .05, .1, .5, 1 replication rates

18
Performance results

RWRT better than Flood at high replication, equal
at low replication
GIA has higher hop counts than Flood and Super
GIA hop counts lower as replication goes up
Flood and Super aren't scalable... duh.

19
Multiple Search Responses

Same tests, MAX_RESPONSES 1, 10, 20
Flood and Super unchanged
GIA and RWRT decline as M_R increases

20
(No Transcript)
21
Node Failure model

Force nodes to fail at a uniformly random time
between 0 and MAXLIFETIME
MAXLIFTIME 10s, 100s, 1000s, forever
Even at 10s, GIA is 2-4 orders of magnitude
better than RWRT, Super, and Flood when they
aren't fialing.

22
Types of P2P searching

Centralized (Napster)?
Based on user provided file lists
Decentralized
Queries are distributed to peers
Unstructured (Gnutella)?
Structured (Chord)?

23
Distributed Hash Tables

Pros
Scalable
Quick lookup
O(log n) steps
O(n) steps for Gnutella
Can find needles

Cons (why not DHTs)
P2P Clients are transient
Require O(log n) repair operations after each
failure
DHTs only support exact match searches
P2Ps look for hay
(Not really a con)?

24
Analysis and Comparison of P2P Search Models

Dimitrios Tsoumakos
Nick Roussopoulos

25
Blind

Gnutella (flood)?
Modified-BFS
Iterative Deepening
Random Walks

26
Informed

Gnutella2 (Super-peer)?
Intelligent-BFS
APS
Local Indices
Routing Indices
Distributed Resource Location Protocol
Gnutella with Shortcuts
GIA

27
Gnutella2

Uses super-peers (hubs)?
They act as local servers for their peers
Hubs are connected
Queries the hubs sequentially

28
Intelligent-BFS

Query similarity metric to find similar queries
Forwards to neighbors most likely to answer that
query
Focuses on object discovery rather than message
reduction
Increased number of hits
Does not handle node departures well
Assumes a node specializes in one file type

29
APS

Uses indices to weight random walks
Each index value represents a query for a
specific object directed toward a specific node
Index value is raised or lowered based on outcome
of query
Optimistic and pessimistic update approaches
Originating node sends query to all neighbors,
those neighbors send query to one neighbor

30
Local Indices

Each node indexes objects stored on nodes within
a radius r and can answer queries for them
A BFS like search is performed
Queries hop a distance of 2r1 nodes
Accuracy and hits are very high
Decreases actual processing time
Floods the network with messages
Churn is very costly b/c flooding is used to
update the repository for all joins/leaves

31
Routing Indices

Files are assumed to fall into themes
Each node stores the number of files of each
theme reachable from each outgoing path
Three functions used to determine best outgoing
path
Queries forwarded to best outgoing path
Flooding is used for creation and update, so
serious issues with dynamic networks
Bloom filters...

32
Distributed Resource Location Protocol

Initially, random flooding is used to find
objects
When an object is discovered, the query
backtracks, storing the location of the found
object on those nodes
If a node knows where a queried object is
located, it can directly contact that node
Depending on specificity of queries, only one
replica of a certain object is ever found
In a dynamic network, much flooding

33
Gnutella with Shortcuts

Uses standard flooding initially
If a peer provides an answer, it is indexed on
the requesting nodes
Nodes forward queries to the ranked shortcuts
first, then flood if necessary
Shortcuts ranked by success rate
Very high success rate
Works well when users make related queries

34
Results

All algorithms that implement flooding in some
fashion have high success rates
Systems that use shortcuts aren't hurt as badly
by departures as expected, because the more flood
searches utilized, the more accurate the
shortcuts
GIA is middle of the pack
No collapse point test

Write a Comment

User Comments (0)