Characterizing Unstructured Overlay Topologies in Modern P2P FileSharing Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Characterizing Unstructured Overlay Topologies in Modern P2P FileSharing Systems

Description:

Daniel Stutzbach University of Oregon. Reza Rejaie ... Eccentricity. Small world properties. Resiliency. Dynamic Properties. Existence of stable core: ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 19
Provided by: agth
Category:

less

Transcript and Presenter's Notes

Title: Characterizing Unstructured Overlay Topologies in Modern P2P FileSharing Systems


1
Characterizing UnstructuredOverlay Topologies in
Modern P2P File-Sharing Systems
  • Daniel Stutzbach University of Oregon
  • Reza Rejaie University of Oregon
  • Subhabrata Sen ATT Labs

Internet Measurement Conference Berkeley, CA,
USA October 19th, 2005
2
Motivation
  • P2P file-sharing systems are very popular in
    practice.
  • Several million simultaneous users collectively.
  • 60 of all Internet traffic CacheLogic Research
    2005
  • Most use an unstructured overlay
  • Understanding overlay properties is important
  • Understanding how existing P2P systems function
  • Developing and evaluating new systems
  • Unstructured overlays are not well-understood.
  • We studied overlay properties in Gnutella.
  • Size one of the largest P2P systems more than 1
    million users
  • Mature In use for several years older studies
    for comparisons
  • Open No reverse-engineering needed

3
Defining the Problem
Ultrapeer
Top-level overlay
  • Gnutella uses a two-tier overlay.
  • Improves scalability.
  • Ultrapeers form an unstructured mesh.
  • Leaf peers connect to the ultrapeers.
  • eDonkey, FastTrack are similar.
  • Studying the overlay requires snapshots.
  • Snapshots capture the overlay as a graph.
  • Individual snapshots reveal graph properties.
  • Consecutive snapshots reveal dynamics.
  • However, capturing accurate snapshots is
    difficult.

Leaf
4
Challenges in Capturing Accurate Snapshots
  • Snapshots are captured iteratively by a crawler.
  • An ideal snapshot is instantaneous.
  • But the overlay is large and rapidly changing.
  • Therefore, captured snapshots are distorted.
  • Sampling
  • Partial snapshots are less distorted, but may be
    unrepresentative
  • For some types of analysis, the whole graph is
    needed.
  • Previous studies capture either
  • Complete snapshots slowly, or
  • Partial snapshots.

5
Cruiser a Fast Gnutella Crawler
  • Features
  • Distributed, highly parallelized implementation
  • Dynamic adaptation to bandwidth and CPU
    constraints
  • Cruiser is orders of magnitude faster.
  • Captures one million nodes in around 7 minutes
  • 140,000 peers/min, compared to 2,500 peers/min
    Saroiu 02
  • We investigated the effects of speed on
    distortion.
  • Daniel Stutzbach and Reza Rejaie, Capturing
    Accurate Snapshots of the Gnutella Network, the
    Global Internet Symposium, March, 2005.
  • 4 node distortion
  • 15 edge distortion

6
Data Set
  • More than 80,000 snapshots, over the past year.
  • To examine static properties, we focus on four
  • To examine dynamic properties, we use slices
  • Each slice is 2 days of 500 back-to-back
    snapshots
  • Captured starting 10/14/04, 10/21/04, 11/25/04,
    12/21/04, and 12/27/04

7
Summary of Characterizations
  • Graph Properties
  • Implementation heterogeneity
  • Degree Distribution
  • Top-level degree distribution
  • Ultrapeer-leaf connectivity
  • Degree-distance correlation
  • Reachability
  • Path lengths
  • Eccentricity
  • Small world properties
  • Resiliency
  • Dynamic Properties
  • Existence of stable core
  • Uptime distribution
  • Biased connectivity
  • Properties of stable core
  • Largest connected component
  • Path lengths
  • Clustering coefficient

8
Top-level Degree
Max 30 in most clients
Max 75 in some clients
Custom
  • This is the degree distribution among ultrapeers.
  • There are obvious peaks at 30 and 70 neighbors.
  • A substantial number of ultrapeers have fewer
    than 30.
  • What happened to the power-law seen in prior
    studies?

9
What happened to power-law?
Ripeanu 02 ICJ
  • When a crawl is slow, many short-lived peers
    report long-lived peers as neighbors.
  • However, those neighbors are not all present at
    the same time.
  • Degree distribution from a slow crawl resembles
    prior results.

10
Shortest-Path Distances
  • Distribution of distances among ultrapeers and
    among all peers
  • In the top-level, 70 of distances are exactly 4
    hops.
  • Across all peers, most distances are 5 or 6 hops.
  • Shows the effect of the two-tier with multiple
    parents
  • Despite large size, distances are short.

11
Is Gnutella a Small World?
  • Small worlds arise naturally in many places.
  • Movies actors, power grid, co-authors of papers
  • They have short distances, but significant
    clustering, compared to a similar random graph.
  • Conclusion Gnutella is a small world.
  • Very high clustering adversely affects flooding
    queries
  • But Gnutella isnt clustered enough to affect
    performance.

12
Resiliency to Node Failure
  • After removing nodes, this figure shows how many
    remain connected.
  • The Gnutella topology is extremely resilient to
    random node failure.
  • Its resilient even when the highest-degree nodes
    are removed first.
  • Complex algorithms are not necessary for ensuring
    resilience.

13
What about Dynamic Properties?
  • Prior work suggests many peers are short-lived
    while others are very long-lived.
  • How do these nodes interact?
  • Methodology
  • Capture a long series of back-to-back snapshots
  • Annotate the last snapshot with the uptime of
    each peer
  • Examine the properties of the annotated topology
  • Group peers by uptime

Present for 5 snapshots
Present for 2 snapshots
Departed peer
Newly arrived peer
Time
14
Stable Core
gt 20 h
  • Most peers are recent arrivals.
  • Other peers have been around for a long time.
  • We can select a set of peers based on a minimum
    uptime threshold.
  • We call this the stable core.
  • Does the longevity of a peer affect who its
    neighbors are?

gt 10 h
15
Biased Connectivity
  • Hypothesis long-lived nodes tend to be more
    connected to other long-lived nodes
  • Rationale Once connected, they stay connected.
  • The longer theyre around, the more opportunities
    they have to neighbor.
  • Approach Check for biased connectivity
  • Randomize the edges to create a graph without
    biased connectivity
  • Then compare
  • Are there more edges in the observed stable core
    compared to random?

16
Stable Core Edges
  • 2040 more edges in the stable core compared
    to random.
  • There is an onion-like bias where long-lived
    peers are more likely to be connected to other
    long-lived peers.
  • We examined other properties of the stable core.
  • Despite high churn, there is a relatively stable
    backbone.

17
Summary
  • Characterizations of recent and accurate
    snapshots
  • Graph properties
  • The degree distribution in Gnutella is not power
    law.
  • Gnutella exhibits small world characteristics.
  • Gnutella is resilient.
  • Dynamic properties
  • There is a stable core within the topology
  • Peer churn causes the stable core to have an
    onion-like shape.
  • This effect is likely to occur in any
    unstructured system.

18
Future Work
  • Examining long-term trends in Gnutella using many
    snapshots.
  • Characterizing churn
  • Characterizing properties of other
    widely-deployed P2P systems
  • Kad (a DHT with more than 1 million users)
  • BitTorrent
  • Developing sampling techniques for P2P

19
Ultrapeer-gtLeaf Degree
LimeWire
BearShare
Other
Custom
  • LimeWire ultrapeers have a limit of 30 leaf
    peers.
  • BearShare ultrapeers have a limit of 45 leaf
    peers.
  • There are distinct spikes at those points, with
    an even distribution of fewer leaf peers.
Write a Comment
User Comments (0)
About PowerShow.com