- PowerPoint PPT Presentation

About This Presentation
Title:

Description:

Hybrid Search Schemes for Unstructured Peer-to-Peer Networks Random Walks in Peer-to-Peer Networks Christos Gkantsidis, Milena Mihail, Amin Saberi – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 44
Provided by: Valued1277
Learn more at: https://www.math.cmu.edu
Category:
Tags: clustering | graph

less

Transcript and Presenter's Notes

Title:


1
Hybrid Search Schemes for Unstructured
Peer-to-Peer NetworksRandom Walks in
Peer-to-Peer Networks
  • Christos Gkantsidis, Milena Mihail, Amin Saberi
  • Presented by Paul Bogdan
  • February 28th, 2007

2
Hybrid Search Schemes for Unstructured
Peer-to-Peer Networks
Christos Gkantsidis, Milena Mihail, Amin Saberi
3
Outline
  • Random Graph Models
  • Flooding and Normalization
  • Random Walks and Replication
  • Generalized Search Schemes
  • Experimental evaluation

4
Motivation
  • Flooding small time-to-live (TTL) performs well
    in regular graphs
  • Performance metric number of exchanged
    messages/distinct response
  • Its performance decreases when TTL increases or
    for irregular networks
  • Random Walk performs better than flooding
  • scalability, granularity
  • Hybrid Generalized search schemes
  • Random Walks with lookahead, Random Walks with
    1-step replication

5
Contribution
  • Random walks (RW) with shallow flooding offer
    good performance (analytic justification)
  • R1 In a random graph model with O(n) nodes of
    constant degree and
  • O(n1/2) nodes of degree O(n1/2) the expected time
    to discover O(n) is O(n1/2).
  • R2 Random Walks with look-ahead 1 or 1-step
    replication perform better
  • when there is discrepancy on the degrees of the
    underlying topology.
  • Normalized Flooding (NF) solution
  • R3 NF achieves comparable performance to
    flooding in regular graphs.
  • R4 NF with 1-step replication achieves
    performance comparable to RW
  • with 1-step replication.
  • R5 Local information of the network (nodes
    degree) offers global benefit.
  • Generalized Search Schemes

6
Random Graph Models
  • Random Regular Graphs Gn,d
  • Gn,d represents a graph with n nodes and each
    node is of degree d.
  • Gn,d has a sum of degree D nd .
  • Random Graphs with super-nodes - Gn,d,a,ß
  • Given a and ß constants, Gn,d,a,ß denotes a
    graphs with an1/2 of degree ßn1/2 (i.e. large
    vertices) and the remaining nodes of degree d
    (i.e. small vertices).
  • Gn,d,a,ß has a sum of degree D (aßd)n.

7
Flooding and Normalization
  • Theorem 3.1. Let us consider Gn,d random regular
    graph, flooding scenario from node v with
    time-to-live t, S the number of distinct nodes
    queried by flooding with S V / 2
  • Claims


  • (1)


  • (2)


  • (3)

8


  • (1)
  • Proof

9

  • (2)
  • Proof

10
(No Transcript)
11



  • (3)
  • Proof

12
Flooding and Normalization
  • Theorem 3.2. Let Gn,d,a,ß be a random graph with
    supernodes and a flooding scenario from node v of
    degree d with time-to-live t.
  • Claim For some t O(log log n), the number of
    distinct responses is O(n).
  • Proof
  • Consider flooding with t c logd-1(log n)1 and
    vertices visited with TTL t-1.
  • Assumption this set (of visited nodes) doesnt
    contain a large degree vertex.
  • From d-regular graphs we know that this set
    contains at least (d - 1)t-1 edges.
  • The probability that no vertex in G(St-1(v)) is
    bounded by (d/(daß))(d - 1)(t-1)
    (d/(daß))clog n so within the first O(loglog n)
    steps we see a large vertex.

13
Flooding and Normalization
  • Theorem 3.3. Let Gn,d,a,ß be a random graph
    with supernodes, a normalized
  • flooding scenario from node v with TTL
    . Then the number of distinct
  • responses is O((d - 1)t-1) and the number of
    messages per response is O(1).
  • Proof
  • From Theorem 3.1. the number of minigroups seen
    is (d - 1)t-1
  • The expected number of small vertices is Q (d
    (d - 1)t-1)/(daß)
  • Let Xi, i 1,,N be random variables with P
    Xi1pi and PXi01-pi
  • Using the above Chernoff bound the probability
    that less than Q/2 are seen is
  • vanishingly small.

14
Random Walks and Replication
  • Random Walk with Look-Ahead
  • a random walk with shallow flooding on each step
    of the walk
  • RW with lookahead 1 visits O(n) nodes with
    response O(n(1/2))
  • Theorem 4.2. Let Gn,d,a,ß be a random graph with
    supernodes and consider a
  • random walk from a node v. Then, in 1-step
    replication scenario, the expected
  • number of messages and response time to obtain
    distinct
  • responses is

15
  • Theorem 4.3. Let Gn,d,a,ß be a random graph with
    supernodes and consider
  • Normalized flooding from v with TTL t (log
    n)/(2log(d-1)). Then, in 1-step
  • replication scenario, the number of distinct
    responses is at least
  • and the number of messages is at most
  • Proof
  • The number of minigroups seen is (d - 1)t 1 and
    using the Chernoff bounds
  • there will be
    minigroups corresponding to large vertices.

16
Generalized Search Schemes
  • Searching procedure
  • A node of degree d initiates a search based on a
    budget k
  • budget number of messages that are propageted
    in the network
  • Among its d neighbors the node picks certain
    quantities k1,k2,,kd such that k1 k2 kd
    k
  • For every neighbor i the master node forwards the
    message with budget ki ( for ki 0 the message
    is not transmitted)
  • Each neighbor i reduces the budget by 1 unit and
    repeat the process until the budget is greater
    than 0
  • Every node that receives the message for the
    second yime from another neighbor forwards the
    message with the corresponding budget
  • Random Walks Flooding

17
Experimental Evaluation
  • Methodology
  • Performance Metrics
  • Median and Mean number of distinct peers
    discovered (hits)
  • Minimum, Maximum, Standard Deviation of the
    number of hits
  • Number of messages
  • Granularity of number of messages
  • Response time
  • Topologies
  • Random d-Regular Graphs
  • Power Law Graphs
  • Bimodal topologies
  • Clustered topologies

18
Normalized Flooding (NF)
  • Mean number of unique peers discovered as a
    function of the initial TTL
  • NF and Standard Flooding behave similarly in
    Regular Graphs
  • NF controls the number of messages and provides
    higher efficiency

19
Normalized Flooding (NF)
  • The number of unique peers increases
    exponentially with TTL in NF case
  • The number of peers increases faster than
    exponentially with TTL in topologies with high
    degrees

20
Random Walk with 1-step replication
21
Random Walk with LookAhead (RWLA)
  • RWLA performance is similar to long RW without
    lookahead (in terms of unique peers discovered)
  • RWLA response time is much smaller compared to
    standard RW

22
Edge Criticality Searching with weights
  • Generalized Searching performs similarly to
    Standard Flooding in regular graphs
  • Generalized Searching behaves similarly to
    Standard Flooding in other topologies if
    normalized edge criticality is used.

23
Conclusions
  • Normalized Flooding (NF) could substitute the
    Standard Flooding in irregular graphs
  • RW with 1-step replication performs better than
    RW and NF in irregular graphs
  • Open for improvements
  • Generalized schemes (analytic investigation)
  • Quantifying Directional flooding

24
Random Walks in Peer-to-Peer (P2P)
Networks
  • Christos Gkantsidis, Milena Mihail, Amin Saberi

25
Outline
  • Motivation
  • Statistical Estimation and Random Walks (RW)
  • Searching
  • Methodology and Topologies importance
  • Construction and Summary

26
Motivation
  • Random Walks (RW) were proposed for constructing
    searching and topology maintenance protocols in
    P2P networks
  • RW improve searching performance as compared to
    flooding (Cao et al., 2002)
  • A RW approach to constructing and maintaining
    unstructured topologies provides good
    connectivity properties (i.e. constant degree,
    constant expansion)
  • Claim RW approach is a good candidate
  • to simulate uniform sampling
  • the number of simulation steps required can be as
    low as the number of samples in independent
    uniform sampling
  • Searching and Overlay Topology Construction
  • RW searching performs better than flooding for
    the same number of messages and for cluster and
    slow dynamic topologies
  • Construction of P2P networks by random walks

27
Statistical Estimation Random Walks
  • Coupon collection and Chernoff bounds
  • n - type of coupons each time one is drawn
    (uniformly distributed)
  • Tn - time by which we extracted coupons belonging
    to all n types
  • Tan - time by which we encountered an distinct
    types, 0 lt a lt 1
  • X1,,Xk independent Bernoulli trials, PXi1pi
    and PXi01-pi
  • p - probability that a random drawn object has a
    particular property
  • the probability that the property is found in
    substantially fewer draws than its frequency in
    the search space and the quality of the estimator
    X/k are bounded by

28
Statistical Estimation Random Walks
  • Random Walks (RW), Convergence and Cover Time
  • G (V,E) undirected graph, V n, and di-
    degree of vertex I
  • Aij - adjacency matrix, P - transition matrix
    which satisfies
  • f V?0,1 which satisfies
  • Convergence rate metric - the rate at which the
    RW approaches the stationary distribution
  • Cover time metric - the time by which all nodes
    were visited
  • Trajectory sample average - the rate at which the
    value of f averaged over successive vertices of
    the RW trajectory approaches p

29
Statistical Estimation Random Walks
  • Convergence rate is related to the second
    eigenvalue of P

  • (1)
  • yt the vertex that the RW visited at time t
  • Cover time

  • (2)
  • Trajectory sample average

  • (3)

(1) 11, (2) 12, 13 , (3) 3, 4, 5, 6
30
Statistical Estimation Random Walks
  • Second Eigenvalue, Expansion and Conductance
  • S subset of V, C(S) cutset of V (i.e. edges with
    one point in S and the other one in V\S), vol(S)
    (i.e. the sum of degrees of vertices in S)
  • Expansion
  • Conductance
  • Known bound

11, 14, 15, 16, 17, 18, 19
31
Searching
  • Performance metrics for Flooding and RW
  • average number of distinct copies of an item
    located in the search
  • number of messages used by the searching
    algorithm
  • RW performs better than flooding if
  • multiple search requests for the same item with
    slow-changing topology
  • peer clustering ( see 20, 21, 22, 23, 24, 25
    for details)
  • Searching analysis
  • Methodology
  • Flat topologies with Uniformly Distributed
    Content
  • Topologies with Peer Clustering
  • Re-issuing the Same Query
  • Real topologies

32
Searching - Methodology
  • Performance Metrics
  • mean of the number of distinct copies (i.e. Mean)
  • discrepancy around the mean (i.e. Std) and the
    failure probability
  • Cost
  • number of messages or queries performed during
    search
  • Peer-to-peer topologies ( 1 million nodes)
  • Flat regular expanders, Two tier topologies with
    clustering, Power law graphs, Samples from real
    topologies
  • Dynamic topologies
  • rewiring
  • Content placement
  • Content clustering affects the performance of
    searching

33
Searching Flat Topologies
  • Experiment
  • one request in a network of 500K peers
  • Mean hits, Minimum of hits and Std are similar
    for Flooding and RW
  • the entire distribution of hits is similar for
    Flooding and RW

34
Searching -Topologies with Peer Clustering
  • Cluster topology consists of
  • 5 flat regular graphs of size 40K from each one
    pick randomly 1000 nodes to construct another
    flat regular graph
  • Number of hits for RW is more concentrated around
    the mean compared to Flooding

35
Searching - Reissuing the Same Query
  • Experiment setup repeat 4 times the below
    procedure
  • each peer sends a request and waits for response
  • between requests 2 of the links are rewired
  • each peer initiates a new searching
  • RW have better performance than Flooding
  • Mean Hits and Failure Probability

36
Searching - Reissuing the Same Query
  • Performance of successive searches depends
  • on the number of topology changes considered
    between consecutive searches
  • Performance of Flooding increases as the rate of
    topological changes increases
  • RW Performance remains the same for small
    variations

37
Searching Real Topologies
  • The number of hits for RW is more concentrated
    around the mean than in Flooding
  • P2P have good expansion properties

38
Construction
  • P2P network construction concerns with
  • peers arrive and leave the network dynamically
  • strong and weak decentralization
  • low network overhead per addition or deletion

39
Baseline Construction of Expander Graphs
  • ABASE (undirected graph) consists of
  • n vertices where each one chooses randomly d
    vertices
  • total number of edges nd and expected vertex
    degree 2d
  • Theorem 4.1. Let G(V,E) a graph constructed by
    ABASE.
  • Then, G is an expander with high probability and
    for positive
  • constant a lt 1

40
Baseline Construction of Expander Graphs with
Constant Overhead in Random Bits
  • ABASE construction algorithm
  • start a RW at a random vertex on H (constant
    degree expander graph)
  • when ABASE needs a random number this is taken
    from the RW on H
  • Theorem 4.2. Let G(V,E) a graph constructed by
    ABASE.
  • There are positive constants a, 0 lt ß lt 0.5 such
    that any
  • subset S of at least ßV and at most 0.5V has
    cutset
  • expansion a almost surely.

41
Distributed Construction of Expanders with
Constant Overhead on Network Resources
  • AH construction
  • d daemons , one for each Hamilton cycle
  • a new arriving node, it contacts the daemon
    associated with the i-th Hamilton cycle
  • it attaches after c number of steps between the
    peer that currently hosts daemon i and one of its
    neighbors in the cycle i

42
Distributed Construction of Expanders with
Constant Overhead on Network Resources
  • AM construction
  • d daemons , one for each Hamilton cycle
  • the arrival of a new arriving node consists of
    two X and Y nodes X and Y contact the central
    server to discover the location of the d daemons
  • X becomes the neighbor of daemon i and Y the
    neighbor of the initial daemons neighbor

43
Summary
  • For Searching
  • Random Walks (RW) are superior to Flooding
  • For Construction
  • RW add new peers with constant overhead
  • Open Problems
  • Strong Decentralized Construction algorithm
  • Can we handle better deletions and expansions of
    small sets?
  • How the P2P network parameters (e.g. capacities)
    affect the performance of RW?
Write a Comment
User Comments (0)
About PowerShow.com