We developed a fast and tunable crawler, Cruiser. - PowerPoint PPT Presentation

About This Presentation
Title:

We developed a fast and tunable crawler, Cruiser.

Description:

We developed a fast and tunable crawler, Cruiser. Cruiser uses a master-slave architecture, parallel crawling, and leverages the ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 2
Provided by: davi138
Category:

less

Transcript and Presenter's Notes

Title: We developed a fast and tunable crawler, Cruiser.


1
Daniel Stutzbach and Reza Rejaie University of
Oregon http//mirage.cs.uoregon.edu/P2P
1. Motivation
2. Approach
  • We developed a fast and tunable crawler, Cruiser.
  • Cruiser uses a master-slave architecture,
    parallel crawling, and leverages the two-tier
    topology adopted in popular P2P applications.
  • Cruiser captures a Gnutella snapshot with
    1-million nodes in around 7 minutes (140,000
    peers/min).
  • Cruiser enables us to examine the effect of
    various crawling parameters on snapshot accuracy.
  • There are two dimensions of snapshot accuracy
  • Completeness the fraction of the topology
    captured
  • Distortion the percentage difference between the
    snapshot and the real topology
  • Peer-to-Peer (P2P) applications have millions of
    users and make up a significant and growing
    fraction of Internet traffic.
  • Little is known about the properties and dynamics
    of unstructured overlays in deployed P2P
    applications.
  • Characterization of P2P overlays requires
    capturing accurate and fine-grain snapshots of
    the overlays.
  • Snapshots (as graphs) are captured with a
    crawler, recording peers (as nodes) connections
    (as edges).
  • Captured snapshots by a crawler can be distorted
    (or stretched) for two reasons
  • Dynamic changes of the overlay during a crawl
  • Peers unreachable by the crawler
  • Previous studies used slow crawlers and have not
    examined the accuracy of their snapshots
  • 8K peers, speed 133 peers/min, in 1 hr Clip2
    00
  • 30K peers, speed 250 peers/min, in 2 hrs
    Ripeanu 02
  • However, average peer uptime is just minutes!

3. Two-Tier Topologies
  • Gnutella, FastTrack and eDonkey use a two-tier
    overlay topology.
  • Top-level nodes form the core overlay.
  • Leaves connect to a few top-level nodes.
  • We initially focus on Gnutella.

4. Results
Completeness-Granularity Tradeoff
Effects on Derived Characterization
Distortion and Speed
Completeness
Fig. 1
Fig. 3
Fig. 2
Fig. 4
  • Distortion and granularity are determined
    primarily by crawl speed. (Fig. 3)
  • Decreasing speed significantly increases
    distortion
  • Cruiser captures complete accurate snapshots.
    (Fig. 1, 2 3)
  • Distorted snapshots lead to inaccurate
    characterization of overlay topology (Fig. 4).
  • A slow crawler reports a power-law tail for node
    degree distribution.
  • The incremental value of contacting more peers
    indicates snapshots are reasonably complete (Fig.
    1).
  • Top-level peers are discovered quickly.
  • Leaf nodes and top-level links are
    well-discovered by the end of the crawl.
  • There is a fundamental tradeoff between
    completeness and granularity. Longer crawls are
    more complete but reduce granularity for studying
    dynamics. There is a sweet spot (Fig. 2).

5. Future Work
  • Characterizing dynamics of overlay topologies
  • Peer departure/arrival (or churn)
  • Changes in connectivity among peers
  • Properties of long-lived versus short-lived peers
  • Building an overlay topology generator for
    simulation
  • Characterizing graph-related properties of
    individual snapshots of overlay topology
  • Degree distribution
  • Resiliency of overlay to node departure
  • Distribution of pair-wise path lengths
  • Small-world properties
Write a Comment
User Comments (0)
About PowerShow.com