ALGORITHMS FOR PERFORMANCE AND TRUST IN PEERTOPEER SYSTEMS - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

ALGORITHMS FOR PERFORMANCE AND TRUST IN PEERTOPEER SYSTEMS

Description:

How to find short paths in a distributed fashion? Local contact. Shortcut ... Contains the blogrolling structure of upwards of 72,000 blogs. ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 68
Provided by: HuiZ5
Category:

less

Transcript and Presenter's Notes

Title: ALGORITHMS FOR PERFORMANCE AND TRUST IN PEERTOPEER SYSTEMS


1
ALGORITHMS FOR PERFORMANCE AND TRUST IN
PEER-TO-PEER SYSTEMS
Hui Zhang University of Southern California
2
Outline
  • Introduction to Peer-to-Peer (P2P) systems three
    layers in system design.
  • Algorithms for performance on the overlay routing
    layer.
  • Algorithms for performance on the underlying
    network layer.
  • Algorithms for trust at the application layer.
  • Conclusion future work.

3
P2P Networked systems
  • A collaborating group of Internet end-hosts which
    overlay their own special-purpose network atop
    the Internet.
  • Examples
  • file sharing Napster,Gnutella - anonymous
    publishing Freenet
  • distributed storage Dabek2001,Kubiatowicz2000,IV
    Y,PAST
  • web caching Iyer2002 - DoS attack
    preventionKeromytis2002
  • application-layer multicast Castro2003,Ratnasamy
    2001,Zhuang2001
  • naming Cox2002,Balakrishnan2004 - event
    notificationScribe
  • indirection services Stoica2002, etc.

4
Challenges in P2P system design
  • Allow rapid deployment through self organization
  • Scale with increasing network size
  • Adapt to dynamics from both the underlying
    network and the application layer.

5
Three layers in a P2P system design
Application layer
Overlay routing layer
Underlying network layer
6
Algorithmic issues in P2P system design
  • Overlay routing layer
  • Distributed hash table algorithms.
  • Underlying network layer
  • Exploiting network proximity and Internet
    topology.
  • Application layer
  • Distributed and robust rating algorithms.

7
Outline
  • Introduction to Peer-to-Peer (P2P) systems three
    layers in system design.
  • Algorithms for performance on the overlay routing
    layer.
  • Algorithms for performance on the underlying
    network layer.
  • Algorithms for trust at the application layer.
  • Conclusion future works.

8
Distributed Hash Tables
  • Support a hash table-like functionality on
    Internet-like scale
  • A global key space each data item is a key in
    the space, and each node is responsible for a
    portion of the key space.
  • Given a key, map it onto a node.
  • Examples Freenet Clarke et al. 2001,
    CANRatnasamy et al. 2001, Chord Stoica et al.
    2001, PastryRowstron et al. 2001,
    TapestryZhao et al. 2001, KademliaMaymounkov
    et al. 2002, ViceroyMalkhi et al. 2002,
    KoordeKaashoek et al. 2003, etc.

9
Small-world model Kleinberg1999
  • Small-world graphs have
  • short-distance clustering (like regular graph)
  • long-distance shortcuts (result in short global
    path length like random graph)
  • How to find short paths in a distributed fashion?

Local contact
log2(N) average routing path length
Shortcut
Probability 1/j
i
ij
An one-dimensional small-world network example
10
Small-world Freenet
  • Files are identified by binary file keys obtained
    by applying a hash function.
  • Each node maintains a datastore and a routing
    table of ltkey,pointergt values.
  • Greedy forwarding search with backtracking.
  • Enhanced-clustering cache replacement
  • Each node chooses a seed randomly when joining
    the network
  • When a new key (file) u is to be cached, the node
    chooses in the current datastore the key v
    farthest from the seed
  • If Distance (u, seed) lt Distance (v, seed), cache
    u and evict v with probability 1-P (clustering)
  • If Distance (u, seed) gt Distance (v, seed), cache
    u and evict v with probability P (randomness).

11
Small-world Freenet - analysis
  • The expected steps to deliver a message in the
    idealized small-world Freenet is O(log N) if the
    routing table size is ?(log2 N), where N is the
    network size.

12
Small-world Freenet routing structure
  • When i has log(N) shortcuts, its routing table
    resembles that of a Chord node in a probabilistic
    way!

i
i1
iN
iN/4
iN/2
Node is shortcut distribution on the distance
space
13
Outline
  • Introduction to Peer-to-Peer (P2P) systems three
    layers in system design.
  • Algorithms for performance on the overlay routing
    layer.
  • Algorithms for performance on the underlying
    network layer.
  • Algorithms for trust at the application layer.
  • Conclusion future works.

14
Frugal routing
  • A Distributed Hash Table routing scheme is frugal
    if
  • The search space for the key decreases by a
    constant factor after each lookup hop
  • The next hop after each intermediate node x in
    the route depends only on x and the destination.
  • Examples small-world Freenet, Chord, Pastry,
    Tapestry.
  • Latency stretch Ratnasamy et al. 2001

The lookup latency on the overlay topology
between two nodes

The unicast latency on the underlying topology
15
Latency stretch vs. latency expansion
  • Theorem 1
  • If the underlying topology G is drawn from
    a family of graphs with exponential latency
    expansion, then the expected latency stretch of
    any frugal routing scheme is ?(logN), where N is
    the network size.
  • Theorem 2
  • If
  • (1) the underlying topology G is drawn from a
    family of graphs with d-power-law latency
    expansion, and
  • (2) for each node u in a frugal routing network,
    it samples (log N)d nodes in each range with
    uniform randomness and keeps the pointer to the
    nearest node for future routing,
  • then the expected latency stretch of a
    request is O(1).

16
Lookup-Parasitic Random Sampling
1. Recursive lookup. 2. Each intermediate hop
appends its IP address to the lookup message.
3. When the lookup reaches its target, the
target informs each listed hop of its
identity. 4. Each intermediate hop then sends one
(or a small number) of pings to get a reasonable
estimate of the latency to the target, and update
its routing table accordingly. 5. When the target
key is random to the initial node, a sample on
each range (of some node) happens with the same
probability 1/2.
17
Internet latency expansion
  • The performance of many other networking
    algorithms relies on the latency expansion
    characteristic of the underlying network.
  • Request latency reduction in web cache systems
    Plaxton et al. 1997 .
  • Nearest neighbor search Karger et al. 2002 .
  • Locality-aware DHT design Abraham et al. 2004 .
  • Gossip-based communication mechanisms Kempe et
    al. 2002.
  • The Internet router-level topology has an
    exponential expansion Phillips et al. 1999.
  • The expansion is defined on router-level hops
  • How about the expansion in terms of latency?

18
Internet latency expansion measurement
methodology
  • Collected two router-level topologies.
  • - one in May 2002 with 328378 routers, and the
    other in November 2003 with 356648 routers.
  • Randomly sampled about 100,000 node pairs from
    each topology and used their latency to estimate
    Internet latency expansion.
  • Approximated link latency between any two nodes
    by the accumulated geographic distance of the
    path between the two nodes in shortest path
    routing.
  • - assign geo-locations to nodes using the
    Geotrack tool.

19
Internet router-level topology latency expansion
  • The hop-count expansion of the Internet
    router-level topology is exponential
  • The latency expansion of the Internet
    router-level topology is power-law and has an
    exponent between 1 and 2.

Internet latency expansion (in log-log)
Comparison of the two topologies (in log-log)
20
Latency expansion at the city level
  • Nu(x) ? Cu(x)
  • Nu(x) the latency expansion function defined
    on routers.
  • Cu(x) the latency expansion function defined
    on cities.

The Internet link topology has low expansion rate
after being embedded into the two-dimensional
geographical sphere.
21
Outline
  • Introduction to Peer-to-Peer (P2P) systems three
    layers in system design.
  • Algorithms for performance on the overlay routing
    layer.
  • Algorithms for performance on the underlying
    network layer.
  • Algorithms for trust at the application layer.
  • Conclusion future works.

22
Autonomous peers and selfish behaviors
  • Peers have many options to control their own
    participation in an open P2P system.
  • Resource configuration parameters, file
    re-sharing decision, on-off switch, .
  • Peers are prone to show selfish behaviors when
    there is no incentive to cooperate.
  • Free-riding phenomenon Adar et al. 2000Saroiu
    et al. 2001.
  • Most files (e.g., 98 reported in Adar et al.
    2000) belong to a small percentage of the users
    (20\, respectively).
  • A majority of the users are one-time,
    one-hour'' users.

23
Building social trust in P2P users
  • Reputation systems Okita2003
  • A means of describing social trust networks.
  • A rating system is used to produce a consensus
    about the merit of any given member.
  • Effective at
  • Incentivizing user cooperation.
  • Isolating malicious users.
  • Adjudging node reliability,
  • On-line examples
  • Livejournal, Friendster, eBay, Advogato.

24
Eigenvector-based reputation systems
  • A referential link structure
  • Nodes represent entities (users, merchants,
    authors of blogs.)
  • Links represent endorsement of one user by
    another.
  • Eigenvector or stationary distribution based
    rating schemes.
  • HIST Kleinberg1999, PageRank Brin et al.
    1998, etc.

25
PageRank Brin1998
  • A rating scheme to rank hypertext documents on
    the WWW.
  • An iterative algorithm to calculate the
    importance of a web page based on the importance
    of its parent pages.
  • Can be applied to other systems than WWW.

26
PageRank random walk model
node
referential link
The walker
X
1/2
1/3
Z
Y
  • As time goes on, the expected percentage of steps
    the walker is at each node v converges to the
    PageRank weight PR(v).

27
P2P rating schemes
  • Design issues
  • distributed rating
  • collusion-proof rating

28
A supervisorbased distributed implementation of
PageRank
  • In a supervisory directed graph
  • Each user v has a designated supervisor to
    calculate PR(v).
  • Any user is random to its supervisors.
  • No small supervising loop exists.
  • There is a fast reactive approach for any user j
    to deliver a message to any other user is
    supervisors, and the path never includes i.

29
A Chord supervising overlay
Network user
Supervising Pointer
A Chord network with 8 users and 8-bit key space
30
PageRank is it collusion-proof?
  • Can a node easily boost its rank by manipulating
    its out-going (endorsing) links with others?

31
Amp(G) a metric on group collusion
WG(G) PR(i)PR(j)
Win(G)
32
Answer for (11 ?) in PageRank
  • In the original PageRank system,
  • where ? is the resetting probability.

33
Two experimental topologies
  • W, a Web link topology
  • Contains the link structure of upwards of 80
    million URLs.
  • Source the Stanford WebBase.
  • B, a weblog blogrolling topology
  • Contains the blogrolling structure of upwards of
    72,000 blogs.
  • Source www.blogstreet.com, the XML-RPC webblog
    service.

34
Experiment Collusion200
  • Model a small number of web pages simultaneously
    colluding.
  • Methodology
  • 100 colluding groups
  • Each colluding group has the circle topology
    consisting of two nodes with adjacent ranks
  • Arbitrarily chose nodes originally ranked around
    1000th, 2000th, , 100000th.
  • ? 0.15.

35
Experiment result of Collusion200 (I)
W - Amplification factors of the 100
colluding groups in Collusion200.
36
Experiment result of Collusion200 (III)
W new PR rank after Collusion200.
37
There is a long flat portion
The PR weight distribution of 4 topologies.
38
Next step how to detect collusions?
  • Theorem on group detection hardness.
  • Max G?G Amp(G) is a NP-Hard
    problem.

39
An observation on collusion behaviors
  • To increase their PR weight, i.e., the stationary
    weight in the random walk, the colluding nodes
    will stall the random walk.
  • When the resetting probability ? increases, the
    colluding nodes must suffer a significant drop in
    PR weight.
  • Therefore, we expect the PR weight of colluding
    nodes to be highly correlated with 1/ ? (the
    average walk length), while that of non-colluding
    nodes is relatively insensitive to the change in
    ?.

40
Adaptive-resetting scheme
  • Part I collusion detection
  • Given the topology, calculate the PR vector under
    different ? values.
  • ? 0.0375, 0.05, 0.075, 0.15, 0.3, 0.45,
    0.6, ?default 0.15.
  • Calculate the correlation coefficient between the
    curve of each node x's PR weight and the curve of
    1/ ?. Label it as co-co(x).

41
Experiment result of Collusion200 (IV)
W - Amplification factors of the 100
colluding groups in Collusion200.
42
Experiment result of Collusion200 (VI)
W new PR rank after Collusion200.
43
Topology analysis on W and B
a small loop of two top nodes in W
a star-like sub-graph in B
44
Dropped out
New top-25 URL list in W
Dropping
New
45
Conclusion
  • A set of algorithms for performance and trust in
    P2P systems.
  • Their technical merits were well acknowledged.
  • They have or will been implemented in real
    applications.
  • Small-world Freenet source code implemented and
    tested in Freenet.
  • LPRS-Chord in grid computing Min et al. 2004.
  • Collusion-robust reputation systems for Weblog
    community.

46
Future work
  • Building dynamic virtual communities with P2P
    techniques.
  • An unified information management infrastructure
    to accommodate and connect diverse virtual
    communities.
  • Web study from algorithmic perspective.
  • Web ranking
  • Web community detection.

47
Publications
  • Improving Eigenvector-based Reputation Systems
    Against Collusion. With Ashish Goel, Ramesh
    Govindan, Kahn Mason, and Benjamin Van Roy.
    Invited for Special issue of Journal of Internet
    Mathematics for WAW04, under review.
  • Improving Lookup Latency in Distributed Hash
    Table Systems using Random Sampling. With Ashish
    Goel and Ramesh Govindan. To appear in ACM/IEEE
    Transactions on Networking.
  • An Empirical Evaluation of Internet Latency
    Expansion. With Ashish Goel and Ramesh Govindan.
    To appear in ACM SIGCOMM Computer Communication
    Review.
  • Using the Small-World Model to Improve Freenet
    Performance. With Ashish Goel and Ramesh
    Govindan. Computer Networks Journal. Volume 46,
    Issue 4, Page 555-574, November 2004.
  • Advanced Query Techniques for Wide-Area Network
    Monitoring . With Xin Li, Fang Bian, Christophe
    Diot, Ramesh Govindan, Wei Hong, and Gianluca
    Iannacoone. The first IEEE International Workshop
    on Networking Meets Databases, 2005.
  • Fast, Memory-Efficient Traffic Estimation by
    Coincidence Counting . With Fang Hao,
    Muralidharan S. Kodialam, T. V. Lakshman. IEEE
    INFOCOM 2005.
  • Making Eigenvector-based Reputation Systems
    Robust to Collusion . With Ashish Goel, Ramesh
    Govindan, Kahn Mason, and Benjamin Van Roy. The
    third Workshop on Algorithms and Models for the
    Web Graph, October 2004.
  • The Design of A Distributed Rating Scheme for
    Peer-to-peer Systems . With Debojyoti Dutta,
    Ashish Goel and Ramesh Govindan. The first
    Workshop on Economic Issues in Peer-to-Peer
    Systems, Berkeley, CA (June 5-6, 2003).
  • Incrementally Improving Lookup Latency in
    Distributed Hash Table Systems . With Ashish Goel
    and Ramesh Govindan. appeared in ACM SIGMETRICS,
    2003.
  • Using the Small-World Model to Improve Freenet
    Performance. With Ashish Goel and Ramesh
    Govindan. appeared in IEEE INFOCOM, 2002.

48
Thanks!
49
backup
50
In a Freenet node
Node x
datastore
routing table
51
In a Freenet node
Node x
datastore
routing table
52
In a Freenet node
Node x
datastore
routing table
53
In a Freenet node
Node x
datastore
routing table
54
In a Freenet node
Node x
datastore
routing table
55
In a Freenet node
Node x
datastore
routing table
56
In a Freenet node
Node x
datastore
routing table
57
In a Freenet node
Node x
datastore
routing table
58
In a Freenet node
Node x
datastore
routing table
59
In a Freenet node
Node x
datastore
routing table
60
In a Freenet node
request for file 70
Node x
datastore
routing table
61
In a Freenet node
request for file 70
Node x
datastore
routing table
62
In a Freenet node
Node B
request for file 70
Node x
datastore
routing table
63
Latency stretch in frugal routing
latency for each lookup on the overlay topology

average latency on the underlying topology
  • In frugal routing, ?(logN) hops per lookup in
    average
  • ?(logN) stretch with no optimization.
  • Could it be done better, e.g., O(1) stretch,
    without much change?

64
Term definition Latency expansion
  • Let Nu(x) denote the number of nodes in the
    network G that are within latency x of node u.
  • - Power-law latency expansion Nu(x) grows (i.e.
    expands') proportionally to xd, for all nodes
    u.
  • Examples ring (d1), mesh (d2).
  • - Exponential latency expansion Nu(x)
    grows proportionally to ?x for some constant ? gt
    1.
  • Examples random graphs.

65
LPRS-Chord topology with power-law expansion
Ring Stretch
(at time 2logN)
66
LPRS-Chord convergence time
Convergence Time
67
LPRS-Chord on Internet subgraphs
Stretch on the router-level graph (at time 2logN)
Write a Comment
User Comments (0)
About PowerShow.com