A%20Measurement%20Study%20of%20Peer-to-Peer%20File%20Sharing%20Systems%20by%20Stefan%20Saroiu%20P.%20Krishna%20Gummadi%20Steven%20D.%20Gribble - PowerPoint PPT Presentation

About This Presentation
Title:

A%20Measurement%20Study%20of%20Peer-to-Peer%20File%20Sharing%20Systems%20by%20Stefan%20Saroiu%20P.%20Krishna%20Gummadi%20Steven%20D.%20Gribble

Description:

A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble Presentation by Nanda Kishore Lella Lella.2_at_wright.edu – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 65
Provided by: Digit77
Learn more at: http://cecs.wright.edu
Category:

less

Transcript and Presenter's Notes

Title: A%20Measurement%20Study%20of%20Peer-to-Peer%20File%20Sharing%20Systems%20by%20Stefan%20Saroiu%20P.%20Krishna%20Gummadi%20Steven%20D.%20Gribble


1
A Measurement Study of Peer-to-Peer File
Sharing SystemsbyStefan SaroiuP. Krishna
GummadiSteven D. Gribble
  • Presentation
  • by
  • Nanda Kishore Lella
  • Lella.2_at_wright.edu

2
Outline
  • P2P Overview
  • What is a peer?
  • Example applications
  • Benefits of P2P
  • P2P Content Sharing
  • Challenges
  • Group management/data placement approaches
  • Measurement studies
  • Conclusion

3
What is Peer-to-Peer (P2P)?
  • Most people think of P2P as music sharing
  • Examples
  • Napster
  • Gnutella

4
What is a peer?
  • Contrasted with Client-Server model
  • Servers are centrally maintained and administered
  • Client has fewer resources than a server

5
What is a peer?
  • A peers resources are similar to the resources
    of the other participants
  • P2P peers communicating directly with other
    peers and sharing resources

6
P2P Application Taxonomy
P2P Systems
Distributed Computing
File Sharing
Collaboration
Platforms JXTA
7
P2P Goals/Benefits
  • Cost sharing
  • Resource aggregation
  • Improved scalability/reliability
  • Increased autonomy
  • Anonymity/privacy
  • Dynamism
  • Ad-hoc communication

8
P2P File Sharing
  • Content exchange
  • Gnutella
  • File systems
  • Oceanstore
  • Filtering/mining
  • Opencola

9
Research Areas
  • Peer discovery and group management
  • Data location and placement
  • Reliable and efficient file exchange
  • Security/privacy/anonymity/trust

10
Current Research
  • Group management and data placement
  • Chord, CAN, Tapestry, Pastry
  • Anonymity
  • Publius
  • Performance studies
  • Gnutella measurement study etc.

11
Management/Placement Challenges
  • Per-node state
  • Bandwidth usage
  • Search time
  • Fault tolerance/resiliency

12
Approaches
  • Centralized
  • Flooding
  • Document Routing

13
Centralized
Bob
Alice
  • Napster model
  • Benefits
  • Efficient search
  • Limited bandwidth usage
  • Efficient network handling
  • Drawbacks
  • Central point of failure
  • Limited scale

Jane
Judy
14
Flooding
Carl
Jane
  • Gnutella model
  • Benefits
  • No central point of failure
  • Limited per-node state
  • Drawbacks
  • Slow searches
  • Bandwidth intensive

Bob
Alice
Judy
15
Document Routing
001
012
  • FreeNet, Chord, CAN, Tapestry, Pastry model
  • Benefits
  • More efficient searching
  • Limited per-node state
  • Drawbacks
  • Limited fault-tolerance vs redundancy

212 ?
212 ?
332
212
305
16
Document Routing CAN
  • Associate to each node and item a unique id in an
    d-dimensional space
  • Goals
  • Scales to hundreds of thousands of nodes
  • Handles rapid arrival and failure of nodes
  • Properties
  • Routing table size O(d)
  • Guarantees that a file is found in at most dn1/d
    steps, where n is the total number of nodes

Slide modified from another presentation
17
CAN Example Two Dimensional Space
  • Space divided between nodes
  • All nodes cover the entire space
  • Each node covers either a square or a rectangular
    area
  • Example
  • Node n1(1, 2) first node that joins ? cover the
    entire space

7
6
5
4
3
n1
2
1
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
18
CAN Example Two Dimensional Space
  • Node n2(4, 2) joins ? space is divided between
    n1 and n2

7
6
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
19
CAN Example Two Dimensional Space
  • Node n2(4, 2) joins ? space is divided between
    n1 and n2

7
6
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
20
CAN Example Two Dimensional Space
  • Nodes n4(5, 5) and n5(6,6) join

7
6
n5
n4
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
21
CAN Example Two Dimensional Space
  • Nodes n1(1, 2) n2(4,2) n3(3, 5)
    n4(5,5)n5(6,6)
  • Items f1(2,3) f2(5,1) f3(2,1) f4(7,5)

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
22
CAN Example Two Dimensional Space
  • Each item is stored by the node who owns its
    mapping in the space

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
23
CAN Query Example
  • Each node knows its neighbors in the d-space
  • Forward query to the neighbor that is closest to
    the query id
  • Can route around some failures
  • some failures require local flooding
  • Example assume n1 queries f4

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
24
CAN Query Example
Slide modified from another presentation
25
CAN Query Example
Slide modified from another presentation
26
CAN Query Example
Slide modified from another presentation
27
Node Failure Recovery
  • Simple failures
  • know your neighbors neighbors
  • when a node fails, one of its neighbors takes
    over its zone
  • More complex failure modes
  • simultaneous failure of multiple adjacent nodes
  • scoped flooding to discover neighbors
  • hopefully, a rare event

Slide modified from another presentation
28
Document Routing Chord
  • MIT project
  • Uni-dimensional ID space
  • Keep track of log N nodes
  • Search through log N nodes to find desired key

29
Document Routing Chord(2)
  • Each node and key is assigned an id.
  • If a node needs a key,
  • searches in its table of
  • n nodes for the key.
  • If fails, goes to the last node of its table and
    repeats until it finds the key.
  • Search through log N nodes to find desired key

30
Doc Routing Tapestry/Pastry
  • Global mesh of meshes
  • Suffix-based routing
  • Uses underlying network distance in constructing
    mesh

31
Naming in Tapestry
  • Every node has a 4 bit
  • name similar to IP address
  • Each bit in the name
  • can hold 16 types
  • Keys present at the node are in accordance with
  • the node name.

32
Tapestry Routing
33
Remaining Problems?
  • Hard to handle highly dynamic environments
  • Methods dont consider peer characteristics

34
Measurement Studies
  • Gnutella vs. Napster

35
napster.com
P
P
D
S
S
Q
P
P
P
P
R
S
S
P
Q
P
P
P
Q
D
P
P
Q
Napster
Gnutella
Q
peer
query
P
D
file download
R
response
server
S
36
Methodology
  • 2 stages
  • periodically crawl Gnutella/Napster
  • discover peers and their metadata
  • feed output from crawl into measurement tools
  • bottleneck bandwidth SProbe
  • latency SProbe
  • peer availability LF
  • degree of content sharing Napster crawler

37
Crawling
  • May 2001
  • Napster crawl
  • query index server and keep track of results
  • query about returned peers
  • dont capture users sharing unpopular content
  • Gnutella crawl
  • send out ping messages with large TTL

38
(No Transcript)
39
Measurement Study
  • How many peers are server-likeclient-like?
  • Bandwidth, latency
  • Connectivity
  • Who is sharing what?

40
(No Transcript)
41
Graph results
  • CDF cumulative distribution function
  • From this graph, we see that while 78 of the
    participating peers have downstream bottleneck
  • bandwidths of at least 1000Kbps
  • Only 8 of the peers have upstream bottleneck
    bandwidths of at least 10Mbps.
  • 22 of the participating peers have upstream
    bottleneck bandwidths of 100Kbps or less.

42
(No Transcript)
43
Reported Bandwidth
44
Graph results
  • The percentage of Napster users connected with
    modems (of 64Kbps or less) is
  • About 25, while the percentage of Gnutella users
    with similar connectivity is as low as 8.
  • 50 of the users in Napster and 60 of the users
    in Gnutella use broadband connections
  • only about 20 of the users in Napster and 30 of
    the users in Gnutella have very high bandwidth
    connections
  • Overall, Gnutella users on average tend to have
    higher
  • downstream bottleneck bandwidths than
    Napster users.

45
(No Transcript)
46
Graph results
  • Approximately 20 of the peers have latencies of
    at least 280ms,
  • Another 20 have latencies of at most 70ms

47
(No Transcript)
48
Graph results
  • This graph illustrates the presence of two
    clusters a smaller one situated at (20-60Kbps,
    100-1,000ms) and a larger one at over (1,000Kbps,
    60-300ms).
  • horizontal lines in the graph they predicate that
    the latency also depends on the location of the
    peer for measuring system.

49
Measured Uptime
50
(No Transcript)
51
Number of Shared Files
52
(No Transcript)
53
Correlation of Free-Riding with B/W
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
Power law
  • A connected cluster of peers that spans the
    entire network survives even in the presence of a
    large percentage p of random peer breakdowns,
    where p can be as large as

where m is the minimum node degree and K is the
maximum node degree. a lt3.
58
(No Transcript)
59
Gnutella
  • Popular sites
  • 212.239.171.174
  • adams-00-305a.Stanford.EDU
  • 0.0.0.0

Fri Feb 16 052152-052322 PST
1771 hosts
60
30 random failures
1771 471 294 hosts
Fri Feb 16 052152-052322 PST
61
4 orchestrated failures
Fri Feb 16 052152-052322 PST
1771 - 63 hosts
62
Results Overview
  • Lots of heterogeneity between peers
  • Systems should consider peer capabilities
  • Peers lie
  • Systems must be able to verify reported peer
    capabilities or measure true capabilities

63
Points of Discussion
  • Is it all hype?
  • Should P2P be a research area?
  • Do P2P applications/systems have common research
    questions?
  • What are the killer apps for P2P systems?

64
Conclusion
  • P2P is an interesting and useful model
  • There are lots of technical challenges to be
    solved
Write a Comment
User Comments (0)
About PowerShow.com