- PowerPoint PPT Presentation

About This Presentation

Title:

Description:

Hybrid Search Schemes for Unstructured Peer-to-Peer Networks Random Walks in Peer-to-Peer Networks Christos Gkantsidis, Milena Mihail, Amin Saberi – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 44

Provided by: Valued1277

Learn more at: https://www.math.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title:

1
Hybrid Search Schemes for Unstructured
Peer-to-Peer NetworksRandom Walks in
Peer-to-Peer Networks

Christos Gkantsidis, Milena Mihail, Amin Saberi
Presented by Paul Bogdan
February 28th, 2007

2
Hybrid Search Schemes for Unstructured
Peer-to-Peer Networks
Christos Gkantsidis, Milena Mihail, Amin Saberi
3
Outline

Random Graph Models
Flooding and Normalization
Random Walks and Replication
Generalized Search Schemes
Experimental evaluation

4
Motivation

Flooding small time-to-live (TTL) performs well
in regular graphs
Performance metric number of exchanged
messages/distinct response
Its performance decreases when TTL increases or
for irregular networks
Random Walk performs better than flooding
scalability, granularity
Hybrid Generalized search schemes
Random Walks with lookahead, Random Walks with
1-step replication

5
Contribution

Random walks (RW) with shallow flooding offer
good performance (analytic justification)
R1 In a random graph model with O(n) nodes of
constant degree and
O(n1/2) nodes of degree O(n1/2) the expected time
to discover O(n) is O(n1/2).
R2 Random Walks with look-ahead 1 or 1-step
replication perform better
when there is discrepancy on the degrees of the
underlying topology.
Normalized Flooding (NF) solution
R3 NF achieves comparable performance to
flooding in regular graphs.
R4 NF with 1-step replication achieves
performance comparable to RW
with 1-step replication.
R5 Local information of the network (nodes
degree) offers global benefit.
Generalized Search Schemes

6
Random Graph Models

Random Regular Graphs Gn,d
Gn,d represents a graph with n nodes and each
node is of degree d.
Gn,d has a sum of degree D nd .
Random Graphs with super-nodes - Gn,d,a,ß
Given a and ß constants, Gn,d,a,ß denotes a
graphs with an1/2 of degree ßn1/2 (i.e. large
vertices) and the remaining nodes of degree d
(i.e. small vertices).
Gn,d,a,ß has a sum of degree D (aßd)n.

7
Flooding and Normalization

Theorem 3.1. Let us consider Gn,d random regular
graph, flooding scenario from node v with
time-to-live t, S the number of distinct nodes
queried by flooding with S V / 2
Claims
(1)
(2)
(3)

(1)
Proof

(2)
Proof

10
(No Transcript)
11

(3)
Proof

12
Flooding and Normalization

Theorem 3.2. Let Gn,d,a,ß be a random graph with
supernodes and a flooding scenario from node v of
degree d with time-to-live t.
Claim For some t O(log log n), the number of
distinct responses is O(n).
Proof
Consider flooding with t c logd-1(log n)1 and
vertices visited with TTL t-1.
Assumption this set (of visited nodes) doesnt
contain a large degree vertex.
From d-regular graphs we know that this set
contains at least (d - 1)t-1 edges.
The probability that no vertex in G(St-1(v)) is
bounded by (d/(daß))(d - 1)(t-1)
(d/(daß))clog n so within the first O(loglog n)
steps we see a large vertex.

13
Flooding and Normalization

Theorem 3.3. Let Gn,d,a,ß be a random graph
with supernodes, a normalized
flooding scenario from node v with TTL
. Then the number of distinct
responses is O((d - 1)t-1) and the number of
messages per response is O(1).
Proof
From Theorem 3.1. the number of minigroups seen
is (d - 1)t-1
The expected number of small vertices is Q (d
(d - 1)t-1)/(daß)
Let Xi, i 1,,N be random variables with P
Xi1pi and PXi01-pi
Using the above Chernoff bound the probability
that less than Q/2 are seen is
vanishingly small.

14
Random Walks and Replication

Random Walk with Look-Ahead
a random walk with shallow flooding on each step
of the walk
RW with lookahead 1 visits O(n) nodes with
response O(n(1/2))
Theorem 4.2. Let Gn,d,a,ß be a random graph with
supernodes and consider a
random walk from a node v. Then, in 1-step
replication scenario, the expected
number of messages and response time to obtain
distinct
responses is

Theorem 4.3. Let Gn,d,a,ß be a random graph with
supernodes and consider
Normalized flooding from v with TTL t (log
n)/(2log(d-1)). Then, in 1-step
replication scenario, the number of distinct
responses is at least
and the number of messages is at most
Proof
The number of minigroups seen is (d - 1)t 1 and
using the Chernoff bounds
there will be
minigroups corresponding to large vertices.

16
Generalized Search Schemes

Searching procedure
A node of degree d initiates a search based on a
budget k
budget number of messages that are propageted
in the network
Among its d neighbors the node picks certain
quantities k1,k2,,kd such that k1 k2 kd
k
For every neighbor i the master node forwards the
message with budget ki ( for ki 0 the message
is not transmitted)
Each neighbor i reduces the budget by 1 unit and
repeat the process until the budget is greater
than 0
Every node that receives the message for the
second yime from another neighbor forwards the
message with the corresponding budget
Random Walks Flooding

17
Experimental Evaluation

Methodology
Performance Metrics
Median and Mean number of distinct peers
discovered (hits)
Minimum, Maximum, Standard Deviation of the
number of hits
Number of messages
Granularity of number of messages
Response time
Topologies
Random d-Regular Graphs
Power Law Graphs
Bimodal topologies
Clustered topologies

18
Normalized Flooding (NF)

Mean number of unique peers discovered as a
function of the initial TTL
NF and Standard Flooding behave similarly in
Regular Graphs
NF controls the number of messages and provides
higher efficiency

19
Normalized Flooding (NF)

The number of unique peers increases
exponentially with TTL in NF case
The number of peers increases faster than
exponentially with TTL in topologies with high
degrees

20
Random Walk with 1-step replication
21
Random Walk with LookAhead (RWLA)

RWLA performance is similar to long RW without
lookahead (in terms of unique peers discovered)
RWLA response time is much smaller compared to
standard RW

22
Edge Criticality Searching with weights

Generalized Searching performs similarly to
Standard Flooding in regular graphs
Generalized Searching behaves similarly to
Standard Flooding in other topologies if
normalized edge criticality is used.

23
Conclusions

Normalized Flooding (NF) could substitute the
Standard Flooding in irregular graphs
RW with 1-step replication performs better than
RW and NF in irregular graphs
Open for improvements
Generalized schemes (analytic investigation)
Quantifying Directional flooding

24
Random Walks in Peer-to-Peer (P2P)
Networks

Christos Gkantsidis, Milena Mihail, Amin Saberi

25
Outline

Motivation
Statistical Estimation and Random Walks (RW)
Searching
Methodology and Topologies importance
Construction and Summary

26
Motivation

Random Walks (RW) were proposed for constructing
searching and topology maintenance protocols in
P2P networks
RW improve searching performance as compared to
flooding (Cao et al., 2002)
A RW approach to constructing and maintaining
unstructured topologies provides good
connectivity properties (i.e. constant degree,
constant expansion)
Claim RW approach is a good candidate
to simulate uniform sampling
the number of simulation steps required can be as
low as the number of samples in independent
uniform sampling
Searching and Overlay Topology Construction
RW searching performs better than flooding for
the same number of messages and for cluster and
slow dynamic topologies
Construction of P2P networks by random walks

27
Statistical Estimation Random Walks

Coupon collection and Chernoff bounds
n - type of coupons each time one is drawn
(uniformly distributed)
Tn - time by which we extracted coupons belonging
to all n types
Tan - time by which we encountered an distinct
types, 0 lt a lt 1
X1,,Xk independent Bernoulli trials, PXi1pi
and PXi01-pi
p - probability that a random drawn object has a
particular property
the probability that the property is found in
substantially fewer draws than its frequency in
the search space and the quality of the estimator
X/k are bounded by

28
Statistical Estimation Random Walks

Random Walks (RW), Convergence and Cover Time
G (V,E) undirected graph, V n, and di-
degree of vertex I
Aij - adjacency matrix, P - transition matrix
which satisfies
f V?0,1 which satisfies
Convergence rate metric - the rate at which the
RW approaches the stationary distribution
Cover time metric - the time by which all nodes
were visited
Trajectory sample average - the rate at which the
value of f averaged over successive vertices of
the RW trajectory approaches p

29
Statistical Estimation Random Walks

Convergence rate is related to the second
eigenvalue of P
(1)
yt the vertex that the RW visited at time t
Cover time
(2)
Trajectory sample average
(3)

(1) 11, (2) 12, 13 , (3) 3, 4, 5, 6
30
Statistical Estimation Random Walks

Second Eigenvalue, Expansion and Conductance
S subset of V, C(S) cutset of V (i.e. edges with
one point in S and the other one in V\S), vol(S)
(i.e. the sum of degrees of vertices in S)
Expansion
Conductance
Known bound

11, 14, 15, 16, 17, 18, 19
31
Searching

Performance metrics for Flooding and RW
average number of distinct copies of an item
located in the search
number of messages used by the searching
algorithm
RW performs better than flooding if
multiple search requests for the same item with
slow-changing topology
peer clustering ( see 20, 21, 22, 23, 24, 25
for details)
Searching analysis
Methodology
Flat topologies with Uniformly Distributed
Content
Topologies with Peer Clustering
Re-issuing the Same Query
Real topologies

32
Searching - Methodology

Performance Metrics
mean of the number of distinct copies (i.e. Mean)
discrepancy around the mean (i.e. Std) and the
failure probability
Cost
number of messages or queries performed during
search
Peer-to-peer topologies ( 1 million nodes)
Flat regular expanders, Two tier topologies with
clustering, Power law graphs, Samples from real
topologies
Dynamic topologies
rewiring
Content placement
Content clustering affects the performance of
searching

33
Searching Flat Topologies

Experiment
one request in a network of 500K peers
Mean hits, Minimum of hits and Std are similar
for Flooding and RW
the entire distribution of hits is similar for
Flooding and RW

34
Searching -Topologies with Peer Clustering

Cluster topology consists of
5 flat regular graphs of size 40K from each one
pick randomly 1000 nodes to construct another
flat regular graph
Number of hits for RW is more concentrated around
the mean compared to Flooding

35
Searching - Reissuing the Same Query

Experiment setup repeat 4 times the below
procedure
each peer sends a request and waits for response
between requests 2 of the links are rewired
each peer initiates a new searching
RW have better performance than Flooding
Mean Hits and Failure Probability

36
Searching - Reissuing the Same Query

Performance of successive searches depends
on the number of topology changes considered
between consecutive searches
Performance of Flooding increases as the rate of
topological changes increases
RW Performance remains the same for small
variations

37
Searching Real Topologies

The number of hits for RW is more concentrated
around the mean than in Flooding
P2P have good expansion properties

38
Construction

P2P network construction concerns with
peers arrive and leave the network dynamically
strong and weak decentralization
low network overhead per addition or deletion

39
Baseline Construction of Expander Graphs

ABASE (undirected graph) consists of
n vertices where each one chooses randomly d
vertices
total number of edges nd and expected vertex
degree 2d
Theorem 4.1. Let G(V,E) a graph constructed by
ABASE.
Then, G is an expander with high probability and
for positive
constant a lt 1

40
Baseline Construction of Expander Graphs with
Constant Overhead in Random Bits

ABASE construction algorithm
start a RW at a random vertex on H (constant
degree expander graph)
when ABASE needs a random number this is taken
from the RW on H
Theorem 4.2. Let G(V,E) a graph constructed by
ABASE.
There are positive constants a, 0 lt ß lt 0.5 such
that any
subset S of at least ßV and at most 0.5V has
cutset
expansion a almost surely.

41
Distributed Construction of Expanders with
Constant Overhead on Network Resources

AH construction
d daemons , one for each Hamilton cycle
a new arriving node, it contacts the daemon
associated with the i-th Hamilton cycle
it attaches after c number of steps between the
peer that currently hosts daemon i and one of its
neighbors in the cycle i

42
Distributed Construction of Expanders with
Constant Overhead on Network Resources

AM construction
d daemons , one for each Hamilton cycle
the arrival of a new arriving node consists of
two X and Y nodes X and Y contact the central
server to discover the location of the d daemons
X becomes the neighbor of daemon i and Y the
neighbor of the initial daemons neighbor

43
Summary