Measurements of Peer-to-Peer Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Measurements of Peer-to-Peer Systems

Description:

Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25th, 2003 CS 8803: Network Measurements Seminar Introduction to Peer-to-Peer (P2P) systems End-systems (or ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 40
Provided by: prad4
Category:

less

Transcript and Presenter's Notes

Title: Measurements of Peer-to-Peer Systems


1
Measurements ofPeer-to-Peer Systems
  • Pradnya Karbhari
  • Nov 25th, 2003
  • CS 8803 Network Measurements Seminar

2
Introduction to Peer-to-Peer (P2P) systems
  • End-systems (or peers), are capable of behaving
    as clients and servers of data, hence system is
    scalable and reliable
  • Peers participation is voluntary, membership is
    dynamic, hence topology keeps changing
  • Most popularly used for file sharing, hence
    peer-to-peer systems have become synonymous with
    peer-to-peer file sharing networks

3
Classification of P2P systems
  • P2P computation (e.g. seti_at_home)
  • P2P communication (instant messaging)
  • P2P file-sharing networks
  • Centralized (e.g. Napster)
  • Decentralized
  • Structured (e.g. Chord, CAN, Pastry, Tapestry)
  • Unstructured (e.g. Gnutella, Kazaa, Freenet,
    eDonkey, eMule, Direct Connect, )

4
Popularity of unstructured decentralized P2P
networks
  • Gnutella host count, maintained by Limewire
    (http//www.limewire.com)
  • good scope for measurement studies because
  • deployed and widely used
  • use a lot of bandwidth during data transfer,
    hence a concern for network operators
  • quite a few measurement studies have been done on
    these systems, some of which we will discuss in
    this seminar

5
Outline
  • Characterization of users of P2P systems
  • Saroiu, et.al., A Measurement Study of
    Peer-to-Peer File Sharing Systems, MMCN, 2002.
  • Effect of P2P traffic on the underlying network
  • Sen, et.al., Analyzing peer-to-peer traffic
    across large networks, IMW02
  • Peer-to-Peer Topologies
  • Ripeanu, et.al., Mapping the Gnutella Network
    Properties of Large-Scale Peer-to-Peer Systems
    and Implications for System Design, IEEE
    Internet Computing, 2002.
  • Searching on the P2P network
  • Sripanidkulchai, The popularity of Gnutella
    queries and its implications on scalability,
    2001
  • Deciphering proprietary P2P systems (like Kazaa)
  • Leibowitz, et.al., Deconstructing the Kazaa
    Network, WIAPP, 2003.

6
Gnutella protocol overview
  • Connecting to the Gnutella network
  • bootstrap using GWebCache system and locally
    cached hostlist
  • Ping/Pong messages are exchanged with potential
    neighbors
  • Searching on the network
  • Query messages are flooded on the network
  • QueryHit messages are received (back-propagated
    along Query path) from peers having the requested
    content
  • Downloading the content
  • peers download files directly from peers having
    the requested content

7
Characterization of Users of P2P systems
  • S. Saroiu, P. Gummadi and S. Gribble, A
    Measurement Study of Peer-to-Peer File Sharing
    Systems, MMCN02.
  • first paper to characterize p2p file sharing
    systems
  • Goal To analyze the following user
    characteristics
  • latency
  • lifetime of peers
  • bottleneck bandwidth
  • number of files shared and downloaded
  • degree of cooperation
  • methodology active crawling
  • systems studied Napster and Gnutella
  • data collection May 2001

8
Measurement Methodology
  • active crawling of the Napster and Gnutella
    systems
  • Napster issued queries for popular content, and
    then queried central server for peer information
  • Gnutella used ping/pong messages in protocol to
    get metadata about peers, and then their
    neighbors and so on
  • parallel measurement for
  • peer lifetime- periodic probing of peers obtained
    from crawlers
  • offline if no response to TCP SYN
  • inactive if response to TCP SYN is a TCP RST
  • active if accepts the incoming TCP connection on
    that port
  • latency- RTT measurements from one host
  • bottleneck link bandwidth- active probing using
    Sprobe, a tool they developed based on
    packet-pair dispersion technique

9
Host Lifetime analysis
  • 20 peers in Napster, Gnutella have IP-level
    uptime of 93 or more
  • Napster peers have higher application uptimes
    than Gnutella peers
  • the best 20 of Napster peers have uptime of 83
    or more and the best 20 of Gnutella peers have
    uptime of 45 or more
  • median session duration is 60 minutes for Napster
    and Gnutella

10
Latency analysis (Gnutella)
  • 20 peers have a latency of at most 70ms and 20
    have a latency of at least 280ms
  • correlation between downstream bottleneck
    bandwidth and latency two clusters for modems
    (20-60Kbps, 100-1000ms) and broadband (1Mbps,
    60-300ms)

11
Bottleneck Bandwidth Analysis (Gnutella)
  • 92 Gnutella peers have downstream bottleneck
    bandwidth of at least 100Kbps
  • 22 peers have upstream bottleneck bandwidth of
    100Kbps or less
  • peers are unsuitable to serve content

12
Downloads, Uploads and Shared Files
  • relative number of downloads and uploads varies
    significantly across bandwidth classes
  • clear client/server behavior of different classes

13
Shared files v/s Shared Data(Napster and
Gnutella)
  • Strong correlation between number of files shared
    and amount of shared MB of data
  • slope of both lines is 3.7MB, the size of a
    typical MP3 audio file

14
Degree of Cooperation (Napster)
  • 30 of the peers report bandwidth as 64Kbps or
    less, but actually have significantly higher
    bandwidths
  • 10 of the peers reporting higher bandwidths
    (3Mbps or higher) actually have significantly
    lower bandwidth

15
Effect of P2P traffic on underlying network
  • S. Sen and J. Wang, Analyzing peer-to-peer
    traffic across large networks, IMW 2002.
  • Goal To characterize p2p traffic at three
    aggregation levels- IP, prefix and AS
  • host distribution and host connectivity
  • traffic volume and mean bandwidth usage
  • traffic patterns over time
  • connection duration and on-time methodology
    passive measurements at routers (port based)
  • systems studied FastTrack(Kazaa), Gnutella,
    Direct Connect
  • analysis of flow-level data collected from
    multiple border routers across a large tier-1
    ISPs backbone

16
Measurement Methodology
  • flow records from multiple border routers
    matching ports
  • 6346/6347 Kazaa
  • 1214 FastTrack
  • 411/412 Direct Connect
  • processed data to eliminate
  • private IP addresses
  • invalid AS numbers
  • final data set contained 800 million flow records

17
Datasets used for analysis
  • FastTrack is most popular in terms of number of
    hosts participating and average traffic volume
    per day
  • rapid growth of P2P traffic is mainly caused by
    increasing number of hosts in the system
  • Direct Connect systems have higher traffic volume
    per IP address

18
Host distribution analysis
  • of IP addresses in FastTrack ranges from 0.5 to
    2 million
  • ratio of of IP addresses in FastTrackGnutellaD
    irectConnect is 150301
  • Density of a prefix is the number of unique
    active IP addresses belonging to it
  • Density of an AS is the number of unique prefixes
    belonging to it
  • FastTrack hosts are distributed more densely than
    Gnutella and Direct Connect hosts (64164)

19
Host connectivity analysis (FastTrack)
  • 48 of individual IPs communicate with at most
    one IP and 89 with at most 10 IPs
  • 75 of prefixes and ASes communicate with at
    least 2 prefixes or ASes
  • very few hosts have very high connectivity and
    most hosts have very low connectivity

20
Traffic volume analysis
  • CDF of traffic volume per IP/prefix/AS for
    FastTrack (one day)
  • distribution of P2P upstream traffic volume
    across three months

21
Mean bandwidth usage(FastTrack and Direct
Connect)
  • FastTrack 33 IP addresses have mean downstream
    b/w 56Kbps or less 50 have mean upstream b/w
    56Kbps or less
  • Direct Connect 20 IP addresses have mean
    downstream b/w 56Kbps or less 33 have mean
    upstream b/w 56Kbps or less

22
Traffic patterns over time (FastTrack)
  • traffic volume transferred every hour among
    FastTrack hosts
  • number of unique IP addresses, prefixes, ASes
    active every hour
  • number of active unique IP addresses in each bin
    of various sizes
  • system is very dynamic- hosts join and leave
    frequently

23
Connection duration and On-time (FastTrack)
  • 50 of the IPs are online for less than one
    minute/day
  • 60 IPs, 40 prefixes, 30 ASes stay for less
    than 10 mins/day
  • 65 of the IPs join only once
  • AS, prefix level- not very transient

24
Peer-to-Peer Topologies
  • M. Ripeanu, I. Foster and A. Iamnitchi, Mapping
    the Gnutella Network Properties of Large-Scale
    Peer-to-Peer Systems and Implications for System
    Design, IEEE Internet Computing Journal, 2002.
  • Goal To discover and analyze the Gnutella
    overlay topology and evaluate generated traffic
  • methodology active crawling
  • datasets Nov 2000, March 2001 and May 2001

25
Gnutella Network Growth
  • number of nodes in the largest connected
    component in the Gnutella network
  • significantly larger network found during
    Memorial Day and Thanksgiving
  • 50 times increase within 6 months

26
Distribution of node-to-node shortest paths
  • more than 95 node pairs are at most 7 hops away
  • longest node-to-node path is 12 hops

27
Averag node connectivity
  • average number of connections per node remains
    constant 3.4

28
Node connectivity distribution
  • Nov 2000 Gnutella nodes organize themselves in a
    power law
  • March 2001 connectivity does not look like a
    power law for all nodes power law distribution
    is preserved for nodes with more than 10 links
    for less than 10 links, the distribution is
    almost constant

29
Searching on the P2P network
  • K. Sripanidkulchai, The popularity of Gnutella
    queries and its implications on scalability,
    2001, http//www-2.cs.cmu.edu/kunwadee/research/p
    2p/gnutella.html
  • methodology passive measurements at one or two
    peers, made part of the Gnutella network, to log
    queries and query messages routed through it
  • data sets Dec 2000, Jan 2001

30
Top 20 most popular query types
  • 17 queries contained non-ASCII strings- filtered
    them out
  • most queries for artists, adult content and file
    extensions (audio)
  • some queries for books, software etc.

31
Query popularity distribution
  • two distinct distributions of document
    popularity, with a break at query rank 100
  • most popular documents are equally popular
  • less popular documents follow a Zipf-like
    distribution, with alpha beween 0.63 and 1.24

32
Deciphering proprietary P2P systems
  • Leibowitz, M. Ripeanu and A. Wierzbicki,
    Deconstructing the Kazaa Network, WIAPP, 2003.
  • methodology passive content-based data
    collection at a caching server installed at the
    border of a large ISP
  • L4 switch inspects first few packets of each TCP
    connection to detect Kazaa download traffic
  • redirects Kazaa download traffic through caching
    server
  • focus on download traffic only, not control
    traffic (since it is encrypted)

33
Characteristics of Collected Traces
  • 38 of all download sessions do not use standard
    Kazaa port (1214)

34
File download distribution by bytes
  • CDF of byte popularity distribution for 10, 1
    most popular files
  • 0.8 of all files account for 80 of the
    generated traffic
  • 0.1 of the most bandwidth hungry files (top 1
    of all files) generate 50 traffic

35
File size distribution
  • note the log-scale on X-axis
  • 3 distinct modes
  • 100KB for pictures
  • 2-5MB for music files
  • 700MB for movies

36
Quantity and Rate of Distinct Files
  • new files seen at different time scales- every
    day, hour, minute
  • 150,000 distinct files during a 17-day period
  • daily graph new files seen continued to
    decrease, but no steady state value (rate of
    injection of files in the network) achieved
  • hourly graph time of day effect
  • per-minute graph 50 new files seen every minute
    on an average

37
Rate of change of popularity of files
  • percentage of files that make it to the N most
    popular files list- (a) in consecutive intervals
    and (b) after T intervals, compared with first
    list
  • measurement interval is 24 hours
  • 15 of the highly popular files remain popular
    throughout the experiment, and the rest are
    popular at short time intervals

38
Open Questions
  • Mapping a global snapshot of the entire Gnutella
    topology
  • Bootstrapping of peers in unstructured
    peer-to-peer systems (work in progress)
  • More efficient searching on P2P networks- efforts
    in this direction include random walks,
    bloom-filter based techniques etc.
  • End-point privacy/anonymity is absent in most of
    these peer-to-peer networks

39
References
  • Papers covered in the seminar
  • S. Saroiu, P. Gummadi and S. Gribble, A
    Measurement Study of Peer-to-Peer File Sharing
    Systems, MMCN 2002.
  • S. Sen and J. Wang, Analyzing peer-to-peer
    traffic across large networks, IMW 2002.
  • M. Ripeanu, I. Foster, A. Iamnitchi, Mapping the
    Gnutella Network Properties of Large-Scale
    Peer-to-Peer Systems and Implications for System
    Design, IEEE Internet Computing, 2002.
  • Sripanidkulchai, The popularity of Gnutella
    queries and its implications on scalability,
    2001.
  • N. Leibowitz, M. Ripeanu, A. Wierzbicki,
    Deconstructing the Kazaa Network, WIAPP 2003.
  • Papers not covered in the seminar
  • J. Chu, K.Labonte and B. Levine, Availability
    and Locality Measurements of Peer-to-Peer File
    Systems, SPIE, July 2002.
  • F. Bustamante and Y. Qiao, Friendships that
    last Peer lifespan and its role in P2P
    protocols, WCW 2003.
  • R. Bhagwan, S. Savage and G. Voelker,
    Understanding Availability, IPTPS 2003.
  • Saroiu, et.al., An Analysis of Internet Content
    Delivery Systems, OSDI 2002.
  • Markatos et.al., Tracing a large-scale
    Peer-to-Peer System An hour in the life of
    Gnutella, CCGrid 2002.
Write a Comment
User Comments (0)
About PowerShow.com