ALGORITHMS FOR PERFORMANCE AND TRUST IN PEER-TO-PEER SYSTEMS - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

ALGORITHMS FOR PERFORMANCE AND TRUST IN PEER-TO-PEER SYSTEMS

Description:

Algorithms for Performance in P2P systems: underlying network layer. ... HIST [Kleinberg1999], PageRank [Brin et al. 1998]. Eigenvector-based reputation systems ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 73
Provided by: HuiZ5
Category:

less

Transcript and Presenter's Notes

Title: ALGORITHMS FOR PERFORMANCE AND TRUST IN PEER-TO-PEER SYSTEMS


1
ALGORITHMS FOR PERFORMANCE AND TRUST IN
PEER-TO-PEER SYSTEMS
Hui Zhang University of Southern California
2
Outline
  • Introduction to Peer-to-Peer (P2P) systems three
    layers in system design.
  • Algorithms for Performance in P2P systems
    overlay routing layer.
  • Algorithms for Performance in P2P systems
    underlying network layer.
  • Algorithms for Trust in P2P systems application
    layer.
  • Conclusion future works.

3
P2P Networked systems
  • A collaborating group of Internet end-hosts which
    overlay their own special-purpose network atop
    the Internet.
  • Examples
  • file sharing Napster,Gnutella - anonymous
    publishing Freenet
  • distributed storage Dabek2001,Kubiatowicz2000,IV
    Y,PAST
  • web caching Iyer2002 - DoS attack
    preventionKeromytis2002
  • application-layer multicast Castro2003,Ratnasamy
    2001,Zhuang2001
  • naming Cox2002,Balakrishnan2004 - event
    notificationScribe
  • indirection services Stoica2002, etc.

4
What challenges does a P2P system designer take?
  • Allows rapid deployment through self
    organization
  • Scales with increasing network size
  • Adapts to dynamics from both the underlying
    network and the application layer.

5
Three layers in a P2P system design
Application layer
Overlay routing layer
Underlying network layer
6
Algorithmic issues in P2P system design
  • Overlay routing layer
  • Distributed hash table algorithms.
  • Underlying network layer
  • Network proximity exploitation and Internet
    topology.
  • Application layer
  • Distributed and robust rating algorithms.

7
Outline
  • Introduction to Peer-to-Peer (P2P) systems three
    layers in system design.
  • Algorithms for Performance in P2P systems
    overlay routing layer.
  • Algorithms for Performance in P2P systems
    underlying network layer.
  • Algorithms for Trust in P2P systems application
    layer.
  • Conclusion future works.

8
Distributed Hashing Table algorithms
  • Support a hash table-like functionality on
    Internet-like scale
  • A global key space each data item is a key in
    the space, and each node is responsible for a
    portion of the key space.
  • Given a key, map it onto a node.
  • Examples Freenet, Chord, Pastry, Tapestry,
    Kademlia, Viceroy, Koorde, etc.

9
Small-world model Kleinberg1999
  • A network between order and randomness
  • short-distance clustering (like regular graph)
  • long-distance shortcuts (result in short global
    path length like random graph)

Local contact
log2(N) average routing path length
Shortcut
Probability 1/j
i
ij
An one-dimensional small-world network example
10
Small-world Freenet
  • Files are identified by binary file keys obtained
    by applying a hash function.
  • Each node maintains a datastore and a routing
    table of ltkey,pointergt values.
  • Greedy forwarding search with backtracking.
  • Enhanced-clustering cache replacement
  • Each node chooses a seed randomly when joining
    the network
  • When a new key (file) u is to be cached, the node
    chooses in the current datastore the key v
    farthest from the seed
  • If Distance (u, seed) lt Distance (v, seed), cache
    u and evict v with probability 1-P (clustering)
  • If Distance (u, seed) gt Distance (v, seed), cache
    u and evict v with probability P (randomness).

11
Small-world Freenet inside routing table
Random Freenet
Regular Freenet
Small-world Freenet
12
Small-world Freenet - performance
Avg. hops per successful request vs. work load
Hit ratio vs. work load
13
Small-world Freenet - analysis
  • An idealized small-world network model for
    Freenet.
  • The expected steps to delivery a message in the
    network model is O(log N) if the routing table
    size is ?(log2 N), where N is the network size.

14
Small-world Freenet routing structure
  • When x views the rest of the nodes as log(n)
    groups G0, G1, , Glog(n)-1, where Gi consists of
    the nodes whose distance to x are between 2i and
    2i1, x will have one shortcut to each group with
    the same probability.
  • When x has log(N) shortcuts, its routing table
    resembles that of ChordStoica et al. 2001 in a
    probabilistic way!

15
Outline
  • Introduction to Peer-to-Peer (P2P) systems three
    layers in system design.
  • Algorithms for Performance in P2P systems
    overlay routing layer.
  • Algorithms for Performance in P2P systems
    underlying network layer.
  • Algorithms for Trust in P2P systems application
    layer.
  • Conclusion future works.

16
Frugal routing
  • A Distributed Hash Table routing scheme is frugal
    if
  • The search space for the key decreases by a
    constant factor after each lookup hop
  • The next hop after each intermediate node x in
    the route depends only on x and the destination.
  • Examples small-world Freenet, Chord, Pastry,
    Tapestry.

17
Chord key space
Network node
A Chord network with 8 nodes and 8-bit key space
18
Chord routing table setup
Network node
Pointer
0
255
A Chord network with 8 nodes and 8-bit key space
19
Large routing latency in Chord
Network node
Overlay routing
physical link
0
255
A Chord network with 8 nodes and 8-bit key space
20
Latency stretch in frugal routing
latency for each lookup on the overlay topology

average latency on the underlying topology
  • In frugal routing, ?(logN) hops per lookup in
    average
  • ?(logN) stretch with no optimization.
  • Could it be done better, e.g., O(1) stretch,
    without much change?

21
Term definition Latency expansion
  • Let Nu(x) denote the number of nodes in the
    network G that are within latency x of node u.
  • - Power-law latency expansion Nu(x) grows (i.e.
    expands') proportionally to xd, for all nodes
    u.
  • Examples ring (d1), mesh (d2).
  • - Exponential latency expansion Nu(x)
    grows proportionally to ?x for some constant ? gt
    1.
  • Examples random graphs.

22
Latency expansion vs. latency stretch
  • Theorem 1
  • If the underlying topology G is drawn from
    a family of graphs with exponential latency
    expansion, then the expected latency of Chord is
    ?(LlogN), where L is the expected latency
    between pairs of nodes in G.
  • Theorem 2
  • If
  • (1) the underlying topology G is drawn from a
    family of graphs with d-power-law latency
    expansion, and
  • (2) for each node u in the Chord network, it
    samples (log N)d nodes in each range with uniform
    randomness and keeps the pointer to the nearest
    node for future routing,
  • then the expected latency of a request is
    O(L), where L is the expected latency between
    pairs of nodes in G.

23
An optimal proximity-aware routing
Network node
Pointer
logd(N) samples
Distance measurement
2m-1
0
A Chord network with m-bit key space
24
An optimal proximity-aware routing
Network node
Pointer
Distance measurement
2m-1
0
A Chord network with m-bit key space
25
Two remaining questions
  • How does each node efficiently achieve (log N)d
    samples from each range?
  • Do real networks have power-law latency expansion
    characteristic?

26
Uniform sampling in terms of ranges
Node x the node at hop x
Node 0 the request initiator
Node t the request terminator
Node 1 node t is in its range-x (lt d)
routing path
Node 0 node t is in its range-d
Node 2 node t is in its range-y (lt x)
Node t
For a routing request with ?(log N) hops, final
node t will be a random node in ?(log N)
different ranges.
27
Lookup-Parasitic Random Sampling
1. Recursive lookup. 2. Each intermediate hop
appends its IP address to the lookup message.
3. When the lookup reaches its target, the
target informs each listed hop of its
identity. 4. Each intermediate hop then sends one
(or a small number) of pings to get a reasonable
estimate of the latency to the target, and update
its routing table accordingly.
28
LPRS-Chord convergence time
Convergence Time
29
LPRS-Chord topology with power-law expansion
Ring Stretch
(at time 2logN)
30
Measurement on Internet latency expansion
  • The performance of many other networking
    algorithms relies on the latency expansion
    characteristic of the underlying network.
  • Request latency reduction in web cache systems
    Plaxton et al. 1997 .
  • Nearest neighbor search Karger et al. 2002 .
  • Locality-aware DHT design Abraham et al. 2004 .
  • Gossip-based communication mechanisms Kempe et
    al. 2002.

31
Internet latency expansion measurement
methodology
  • Collected two router-level topologies.
  • - one in May 2002 with 328378 routers, and the
    other in November 2003 with 356648 routers.
  • Randomly sampled about 100,000 node pairs from
    each topology and calculated their latency to
    estimate Internet latency expansion.
  • Approximated link latency between any two nodes
    by the accumulated geographic distance of the
    path between the two nodes in shortest path
    routing.
  • - assign geo-locations to nodes using the
    Geotrack tool.

32
Internet router-level topology latency expansion
Internet latency expansion (in log-log)
Comparison of the two topologies (in log-log)
33
LPRS-Chord Internet subgraphs
Stretch on the router-level graph (at time 2logN)
34
Outline
  • Introduction to Peer-to-Peer (P2P) systems three
    layers in system design.
  • Algorithms for Performance in P2P systems
    overlay routing layer.
  • Algorithms for Performance in P2P systems
    underlying network layer.
  • Algorithms for Trust in P2P systems application
    layer.
  • Conclusion future works.

35
Building social trust in P2P users
  • Reputation systems are effective at
  • 1. Incentivizing user participation.
  • Free-riding phenomenon Adar et al. 2000Saroiu
    et al. 2001.
  • 2. Isolating malicious users.
  • Propagation of virus or inauthentic files
    VBS.Gnutella.

36
Eigenvector-based reputation systems
  • A referential link structure
  • Nodes represent entities (users, merchants,
    authors of blogs.)
  • Links represent endorsement of one user by
    another.
  • Eigenvector or stationary distribution based
    rating schemes.
  • HIST Kleinberg1999, PageRank Brin et al.
    1998.

37
PageRank Brin1998
  • A rating scheme to rank hypertext documents on
    the WWW.
  • An iterative algorithm to calculate the
    importance of a web page based on the importance
    of its parent pages.
  • Can be applied to other systems than WWW.

38
PageRank random walk model
node
referential link
The walker
X
1/2
1/3
Z
Y
  • As time goes on, the expected percentage of steps
    the walker is at each node v converges to the
    PageRank weight PR(v).

39
PageRank is it collusion-proof?
  • Can a node easily boost its rank by manipulating
    its out-going links with others?

40
Amp(G) a metric on group collusion
WG(G) PR(i)PR(j)
Win(G)
41
Answer for (11 ?) in PageRank
  • In the original PageRank system,
  • where ? is the resetting probability.

42
Two experimental topologies
  • W, a Web link topology
  • Contains the link structure of upwards of 80
    million URLs.
  • Source the Stanford WebBase.
  • B, a weblog blogrolling topology
  • Contains the blogrolling structure of upwards of
    72,000 blogs.
  • Source www.blogstreet.com, the XML-RPC webblog
    service.

43
Experiment 1 Collusion200
  • Model a small number of web pages simultaneously
    colluding.
  • Methodology
  • 100 colluding groups
  • Each colluding group has the circle topology
    consisting of two nodes with adjacent ranks
  • Arbitrarily chose nodes originally ranked around
    1000th, 2000th, , 100000th.
  • ? 0.15.

44
Experiment result of Collusion200 (I)
W - Amplification factors of the 100
colluding groups in Collusion200.
45
Experiment result of Collusion200 (III)
W new PR rank after Collusion200.
46
There is a long flat portion
The PR weight distribution of 4 topologies.
47
Next step how to detect collusions?
  • Identifying colluding groups is unlikely to be
    computationally tractable.
  • The densest k-subgraph problem Feige et al.
    1997.
  • The classical CLIQUE problem.
  • The problem of finding hiding large cliques in
    random graphs Juels 1998.

48
Hardness on Amp
  • Theorem on Hardness.
  • Max G?G Amp(G) is a NP-Hard
    problem.

49
An observation on collusion behaviors
  • To increase their PR weight, i.e., the stationary
    weight in the random walk, the colluding nodes
    will stall the random walk.
  • When the resetting probability ? increases, the
    colluding nodes must suffer a significant drop in
    PR weight.
  • Therefore, we expect the PR weight of colluding
    nodes to be highly correlated with 1/ ? (the
    average walk length), while that of non-colluding
    nodes is relatively insensitive to the change in
    ?.

50
An intuitive example
node
referential link
51
An intuitive example
node
referential link
A colluding group
52
An intuitive example
node
referential link
A colluding group
53
Co-co distribution in real-world graphs
The co-co PDF distribution in W and B the 0,
0.1 range actually corresponds to -1, 0.1
range.
54
Adaptive-resetting scheme
  • Part I collusion detection
  • Given the topology, calculate the PR vector under
    different ? values.
  • ? 0.0375, 0.05, 0.075, 0.15, 0.3, 0.45,
    0.6, ?default 0.15.
  • Calculate the correlation coefficient between the
    curve of each node x's PR weight and the curve of
    1/ ?. Label it as co-co(x).

55
Experiment result of Collusion200 (IV)
W - Amplification factors of the 100
colluding groups in Collusion200.
56
Experiment result of Collusion200 (V)
W new PR weight after Collusion200.
57
Experiment result of Collusion200 (VI)
W new PR rank after Collusion200.
58
Experiment 2 Collusion22
  • Model various colluding subgraphs.
  • Methodology
  • 3 colluding groups

node
referential link
G1 10-node ring
G2 10-node star topology
G3 2-node ring
59
Experiment result of Collusion22 (I)
Amplification factors of the 3 colluding groups
in Collusion22.
60
Experiment result of Collusion22 (II)
W new PR weight after Collusion22.
61
Dropped out
New top-25 URL list in W
Dropping
New
62
Conclusion
  • A set of algorithms for performance and trust in
    P2P systems.
  • Applicable to other fields.

63
Future works
  • Integrating P2P networking and web techniques.
  • Dynamic communities
  • Web study from algorithmic perspective.
  • Web Spamming.
  • Community detection.

64
Backup slides
65
Reputation systems Okita2003
  • A means of describing social trust networks.
  • The basic concept is a democratic meritocracy.
  • A rating system is used to evaluate individual
    members, and those results are then collated to
    produce a consensus about the merit of any given
    member.
  • Examples
  • Livejournal, Friendster, eBay, Advogato

66
PageRank algorithm Brin1998
  • Assume N pages.
  • Assign all pages the initial value 1/N
  • Let Nu be the out-degree of Page u, Rank(v) the
    importance of Page v, Bv the set of pages
    pointing to v.

67
Experiment result of Collusion200 (II)
Figure A W new PR weight after Collusion200.
68
Experiment result of Collusion200 (VII)
Figure B B new PR rank after Collusion200
69
Experiment result of Collusion200 (X)
Figure C B new PR weight after Collusion200
70
Correlation coefficient
71
Experiment result of Collusion22 (III)
Figure D W new PR rank after Collusion22.
72
How about using finer statistics of the random
walk
  • The revisit intervals of the random walk on a
    colluding node will likely to have a large
    variance compared to its expectation.
Write a Comment
User Comments (0)
About PowerShow.com