Loading...

PPT – ALGORITHMS FOR PERFORMANCE AND TRUST IN PEER-TO-PEER SYSTEMS PowerPoint presentation | free to view - id: 106fea-ZDc1Z

The Adobe Flash plugin is needed to view this content

ALGORITHMS FOR PERFORMANCE AND TRUST IN

PEER-TO-PEER SYSTEMS

Hui Zhang University of Southern California

Outline

- Introduction to Peer-to-Peer (P2P) systems three

layers in system design. - Algorithms for Performance in P2P systems

overlay routing layer. - Algorithms for Performance in P2P systems

underlying network layer. - Algorithms for Trust in P2P systems application

layer. - Conclusion future works.

P2P Networked systems

- A collaborating group of Internet end-hosts which

overlay their own special-purpose network atop

the Internet. - Examples
- file sharing Napster,Gnutella - anonymous

publishing Freenet - distributed storage Dabek2001,Kubiatowicz2000,IV

Y,PAST - web caching Iyer2002 - DoS attack

preventionKeromytis2002 - application-layer multicast Castro2003,Ratnasamy

2001,Zhuang2001 - naming Cox2002,Balakrishnan2004 - event

notificationScribe - indirection services Stoica2002, etc.

What challenges does a P2P system designer take?

- Allows rapid deployment through self

organization - Scales with increasing network size
- Adapts to dynamics from both the underlying

network and the application layer.

Three layers in a P2P system design

Application layer

Overlay routing layer

Underlying network layer

Algorithmic issues in P2P system design

- Overlay routing layer
- Distributed hash table algorithms.
- Underlying network layer
- Network proximity exploitation and Internet

topology. - Application layer
- Distributed and robust rating algorithms.

Outline

- Introduction to Peer-to-Peer (P2P) systems three

layers in system design. - Algorithms for Performance in P2P systems

overlay routing layer. - Algorithms for Performance in P2P systems

underlying network layer. - Algorithms for Trust in P2P systems application

layer. - Conclusion future works.

Distributed Hashing Table algorithms

- Support a hash table-like functionality on

Internet-like scale - A global key space each data item is a key in

the space, and each node is responsible for a

portion of the key space. - Given a key, map it onto a node.
- Examples Freenet, Chord, Pastry, Tapestry,

Kademlia, Viceroy, Koorde, etc.

Small-world model Kleinberg1999

- A network between order and randomness
- short-distance clustering (like regular graph)
- long-distance shortcuts (result in short global

path length like random graph)

Local contact

log2(N) average routing path length

Shortcut

Probability 1/j

i

ij

An one-dimensional small-world network example

Small-world Freenet

- Files are identified by binary file keys obtained

by applying a hash function. - Each node maintains a datastore and a routing

table of ltkey,pointergt values. - Greedy forwarding search with backtracking.
- Enhanced-clustering cache replacement
- Each node chooses a seed randomly when joining

the network - When a new key (file) u is to be cached, the node

chooses in the current datastore the key v

farthest from the seed - If Distance (u, seed) lt Distance (v, seed), cache

u and evict v with probability 1-P (clustering) - If Distance (u, seed) gt Distance (v, seed), cache

u and evict v with probability P (randomness).

Small-world Freenet inside routing table

Random Freenet

Regular Freenet

Small-world Freenet

Small-world Freenet - performance

Avg. hops per successful request vs. work load

Hit ratio vs. work load

Small-world Freenet - analysis

- An idealized small-world network model for

Freenet. - The expected steps to delivery a message in the

network model is O(log N) if the routing table

size is ?(log2 N), where N is the network size.

Small-world Freenet routing structure

- When x views the rest of the nodes as log(n)

groups G0, G1, , Glog(n)-1, where Gi consists of

the nodes whose distance to x are between 2i and

2i1, x will have one shortcut to each group with

the same probability. - When x has log(N) shortcuts, its routing table

resembles that of ChordStoica et al. 2001 in a

probabilistic way!

Outline

- Introduction to Peer-to-Peer (P2P) systems three

layers in system design. - Algorithms for Performance in P2P systems

overlay routing layer. - Algorithms for Performance in P2P systems

underlying network layer. - Algorithms for Trust in P2P systems application

layer. - Conclusion future works.

Frugal routing

- A Distributed Hash Table routing scheme is frugal

if - The search space for the key decreases by a

constant factor after each lookup hop - The next hop after each intermediate node x in

the route depends only on x and the destination. - Examples small-world Freenet, Chord, Pastry,

Tapestry.

Chord key space

Network node

A Chord network with 8 nodes and 8-bit key space

Chord routing table setup

Network node

Pointer

0

255

A Chord network with 8 nodes and 8-bit key space

Large routing latency in Chord

Network node

Overlay routing

physical link

0

255

A Chord network with 8 nodes and 8-bit key space

Latency stretch in frugal routing

latency for each lookup on the overlay topology

average latency on the underlying topology

- In frugal routing, ?(logN) hops per lookup in

average - ?(logN) stretch with no optimization.
- Could it be done better, e.g., O(1) stretch,

without much change?

Term definition Latency expansion

- Let Nu(x) denote the number of nodes in the

network G that are within latency x of node u. - - Power-law latency expansion Nu(x) grows (i.e.

expands') proportionally to xd, for all nodes

u. - Examples ring (d1), mesh (d2).
- - Exponential latency expansion Nu(x)

grows proportionally to ?x for some constant ? gt

1. - Examples random graphs.

Latency expansion vs. latency stretch

- Theorem 1
- If the underlying topology G is drawn from

a family of graphs with exponential latency

expansion, then the expected latency of Chord is

?(LlogN), where L is the expected latency

between pairs of nodes in G.

- Theorem 2
- If
- (1) the underlying topology G is drawn from a

family of graphs with d-power-law latency

expansion, and - (2) for each node u in the Chord network, it

samples (log N)d nodes in each range with uniform

randomness and keeps the pointer to the nearest

node for future routing, - then the expected latency of a request is

O(L), where L is the expected latency between

pairs of nodes in G.

An optimal proximity-aware routing

Network node

Pointer

logd(N) samples

Distance measurement

2m-1

0

A Chord network with m-bit key space

An optimal proximity-aware routing

Network node

Pointer

Distance measurement

2m-1

0

A Chord network with m-bit key space

Two remaining questions

- How does each node efficiently achieve (log N)d

samples from each range? - Do real networks have power-law latency expansion

characteristic?

Uniform sampling in terms of ranges

Node x the node at hop x

Node 0 the request initiator

Node t the request terminator

Node 1 node t is in its range-x (lt d)

routing path

Node 0 node t is in its range-d

Node 2 node t is in its range-y (lt x)

Node t

For a routing request with ?(log N) hops, final

node t will be a random node in ?(log N)

different ranges.

Lookup-Parasitic Random Sampling

1. Recursive lookup. 2. Each intermediate hop

appends its IP address to the lookup message.

3. When the lookup reaches its target, the

target informs each listed hop of its

identity. 4. Each intermediate hop then sends one

(or a small number) of pings to get a reasonable

estimate of the latency to the target, and update

its routing table accordingly.

LPRS-Chord convergence time

Convergence Time

LPRS-Chord topology with power-law expansion

Ring Stretch

(at time 2logN)

Measurement on Internet latency expansion

- The performance of many other networking

algorithms relies on the latency expansion

characteristic of the underlying network. - Request latency reduction in web cache systems

Plaxton et al. 1997 . - Nearest neighbor search Karger et al. 2002 .
- Locality-aware DHT design Abraham et al. 2004 .
- Gossip-based communication mechanisms Kempe et

al. 2002.

Internet latency expansion measurement

methodology

- Collected two router-level topologies.
- - one in May 2002 with 328378 routers, and the

other in November 2003 with 356648 routers. - Randomly sampled about 100,000 node pairs from

each topology and calculated their latency to

estimate Internet latency expansion. - Approximated link latency between any two nodes

by the accumulated geographic distance of the

path between the two nodes in shortest path

routing. - - assign geo-locations to nodes using the

Geotrack tool.

Internet router-level topology latency expansion

Internet latency expansion (in log-log)

Comparison of the two topologies (in log-log)

LPRS-Chord Internet subgraphs

Stretch on the router-level graph (at time 2logN)

Outline

- Introduction to Peer-to-Peer (P2P) systems three

layers in system design. - Algorithms for Performance in P2P systems

overlay routing layer. - Algorithms for Performance in P2P systems

underlying network layer. - Algorithms for Trust in P2P systems application

layer. - Conclusion future works.

Building social trust in P2P users

- Reputation systems are effective at
- 1. Incentivizing user participation.
- Free-riding phenomenon Adar et al. 2000Saroiu

et al. 2001. - 2. Isolating malicious users.
- Propagation of virus or inauthentic files

VBS.Gnutella.

Eigenvector-based reputation systems

- A referential link structure
- Nodes represent entities (users, merchants,

authors of blogs.) - Links represent endorsement of one user by

another. - Eigenvector or stationary distribution based

rating schemes. - HIST Kleinberg1999, PageRank Brin et al.

1998.

PageRank Brin1998

- A rating scheme to rank hypertext documents on

the WWW. - An iterative algorithm to calculate the

importance of a web page based on the importance

of its parent pages. - Can be applied to other systems than WWW.

PageRank random walk model

node

referential link

The walker

X

1/2

1/3

Z

Y

- As time goes on, the expected percentage of steps

the walker is at each node v converges to the

PageRank weight PR(v).

PageRank is it collusion-proof?

- Can a node easily boost its rank by manipulating

its out-going links with others?

Amp(G) a metric on group collusion

WG(G) PR(i)PR(j)

Win(G)

Answer for (11 ?) in PageRank

- In the original PageRank system,
- where ? is the resetting probability.

Two experimental topologies

- W, a Web link topology
- Contains the link structure of upwards of 80

million URLs. - Source the Stanford WebBase.
- B, a weblog blogrolling topology
- Contains the blogrolling structure of upwards of

72,000 blogs. - Source www.blogstreet.com, the XML-RPC webblog

service.

Experiment 1 Collusion200

- Model a small number of web pages simultaneously

colluding. - Methodology
- 100 colluding groups
- Each colluding group has the circle topology

consisting of two nodes with adjacent ranks - Arbitrarily chose nodes originally ranked around

1000th, 2000th, , 100000th. - ? 0.15.

Experiment result of Collusion200 (I)

W - Amplification factors of the 100

colluding groups in Collusion200.

Experiment result of Collusion200 (III)

W new PR rank after Collusion200.

There is a long flat portion

The PR weight distribution of 4 topologies.

Next step how to detect collusions?

- Identifying colluding groups is unlikely to be

computationally tractable. - The densest k-subgraph problem Feige et al.

1997. - The classical CLIQUE problem.
- The problem of finding hiding large cliques in

random graphs Juels 1998.

Hardness on Amp

- Theorem on Hardness.
- Max G?G Amp(G) is a NP-Hard

problem.

An observation on collusion behaviors

- To increase their PR weight, i.e., the stationary

weight in the random walk, the colluding nodes

will stall the random walk.

- When the resetting probability ? increases, the

colluding nodes must suffer a significant drop in

PR weight. - Therefore, we expect the PR weight of colluding

nodes to be highly correlated with 1/ ? (the

average walk length), while that of non-colluding

nodes is relatively insensitive to the change in

?.

An intuitive example

node

referential link

An intuitive example

node

referential link

A colluding group

An intuitive example

node

referential link

A colluding group

Co-co distribution in real-world graphs

The co-co PDF distribution in W and B the 0,

0.1 range actually corresponds to -1, 0.1

range.

Adaptive-resetting scheme

- Part I collusion detection
- Given the topology, calculate the PR vector under

different ? values. - ? 0.0375, 0.05, 0.075, 0.15, 0.3, 0.45,

0.6, ?default 0.15. - Calculate the correlation coefficient between the

curve of each node x's PR weight and the curve of

1/ ?. Label it as co-co(x).

Experiment result of Collusion200 (IV)

W - Amplification factors of the 100

colluding groups in Collusion200.

Experiment result of Collusion200 (V)

W new PR weight after Collusion200.

Experiment result of Collusion200 (VI)

W new PR rank after Collusion200.

Experiment 2 Collusion22

- Model various colluding subgraphs.
- Methodology
- 3 colluding groups

node

referential link

G1 10-node ring

G2 10-node star topology

G3 2-node ring

Experiment result of Collusion22 (I)

Amplification factors of the 3 colluding groups

in Collusion22.

Experiment result of Collusion22 (II)

W new PR weight after Collusion22.

Dropped out

New top-25 URL list in W

Dropping

New

Conclusion

- A set of algorithms for performance and trust in

P2P systems. - Applicable to other fields.

Future works

- Integrating P2P networking and web techniques.
- Dynamic communities
- Web study from algorithmic perspective.
- Web Spamming.
- Community detection.

Backup slides

Reputation systems Okita2003

- A means of describing social trust networks.
- The basic concept is a democratic meritocracy.
- A rating system is used to evaluate individual

members, and those results are then collated to

produce a consensus about the merit of any given

member. - Examples
- Livejournal, Friendster, eBay, Advogato

PageRank algorithm Brin1998

- Assume N pages.
- Assign all pages the initial value 1/N
- Let Nu be the out-degree of Page u, Rank(v) the

importance of Page v, Bv the set of pages

pointing to v.

Experiment result of Collusion200 (II)

Figure A W new PR weight after Collusion200.

Experiment result of Collusion200 (VII)

Figure B B new PR rank after Collusion200

Experiment result of Collusion200 (X)

Figure C B new PR weight after Collusion200

Correlation coefficient

Experiment result of Collusion22 (III)

Figure D W new PR rank after Collusion22.

How about using finer statistics of the random

walk

- The revisit intervals of the random walk on a

colluding node will likely to have a large

variance compared to its expectation.