Title: Understanding Peer-level Performance in BitTorrent: A Measurement Study
1Understanding Peer-level Performance in
BitTorrent A Measurement Study
Dept. of Computer Science University of Oregon
2Introduction
- Peer-to-peer systems have become increasingly
popular - Millions of simultaneous users
- Significant percentage of Internet traffic
- is one of the most popular
p2p applications - Responsible for 35 of all Internet traffic
Parker05 - BitTorrent is important because
- Popularity
- Its impact on the network
3BitTorrent A brief overview
Introduction
- Scalable one to many peer-to-peer file
distribution - Overlay Unstructured, Random, High degree
- Swarming
- File is divided into segments
- Segments are randomly distributed among peers
Get rarest seg. first - Contribution
- Peers exchange segments and contribute their
outgoing bandwidth - Incentive Tit-for-Tat
- Tracker
- Torrent coordinator
- Periodic peer status updates
- Performance Intuitively depends on
- Peer properties (BW, Contribution, etc. )
- Group properties (Population, Content
availability, Churn)
4Previous Studies on BitTorrent
Related work
- Modeling and analytical studies
- Simulation studies
- Empirical studies
- Capture BitTorrent system properties in operation
through measurement (instrumented
clients)Legout06 - Group propertiesIzal04 Population, Average
cont. avail., .. - No explicit notion of performance
- No study on the effects of underlying factors of
peer performance
- Characterization
- Understanding group-level and peer-level
properties in a torrent - Analysis
- What are the main factors that affect observed
performance by individual peers?
5Methodology
Methodology/Approach
- Common approach Instrumented clients
- Detailed and flexible
- Representative?
- Our approach Tracker logs
- Coarse granularity(30 min)
- Global view
- Data Sets
Source Torrents Start Date End Date Reports Sessions
RedHat 1 3/03 8/03 2M 170k
Debian 1599 2/05 3/05 32M 1268k
Games 2585 8/03 12/04 38M 4416k
Torrent File Size Sessions, rank Duration
RedHat 1.8GB 170k, 3rd 146d
Debian 677MB 139k, 6th 51d
Games 363MB 195k, 2th 66d
Tracker logs sets
Selected Torrents
6Peer-level properties
Methodology
- Session
- Set of all updates from a particular peer from
its arrival till departure - Peer-level properties
- Represent the peers status during a session
- Average download rate
- Average upload rate
Download Complete
Session Start
Avg download rate
Slope upload rate
Slopes upload rates
Download rate
Download rates
Studied zone(leeching)
7Group-level properties
Measurement methodology
- Population, Avg. Content Availability, Churn
- Sampling approach
- Once every t minutes
- Last update before and first update after each
sample - Interpolation
- Averaging across peers
- t determines sampling resolution
- t gt average update interval
- Peer view
- Average of the samples during peers download time
Update Time
t
8Performance metrics
Methodology
- Is Download Rate a good performance metric ?
- A reference is needed to evaluate peers download
rate - Ideally peer performance is
- Accurate measurement of Utilization is difficult
- We use maximum observed download rate as a
(lower bound) estimate for incoming bandwidth. - Standard deviation of download rate captures
stability of download rate - Rates close to avg. ? higher performance
- Normalization ? comparability
- Two performance metrics
9Distribution of Performance Metrics
Characterization Results/Peer-Properties
- Similar distribution across 3 different torrents
- Utilization has an almost uniform distribution
- Nearly Fixed probability density
- 90 show closely uniform distribution
- Diverse performance
- No dominant modes
10Peer-View of Group Properties
Characterization Results/Peer-view of group
properties
- Content availability
- 75 of peers in RH observe an average cont.
avail. of 50 - No content shortage
- Avg. Population
- Very different
- Flash crowd in RH
Initial flash crowd
11Underlying factors
- Remember the second questions
- What are the peer- or group-level properties that
primarily determine the observed performance by
individual peers in a torrent? - Performance metrics
- Utilization and Stability
- Possible Underlying factors
- Group-level properties Population, Churn ,
Content avail. - Peer-level properties Upload rate, etc.
- Approach To Identify Underlying factors
- Scatter-plot
- Linear Regression (Using S-plus)
- Spearmans rank correlation (S-Plus)
12Scatterplots
Statistical Analysis/Scatter-plots
- Utilization vs. Average group content
availability - No obvious correlation
- Utilization vs. Average group population
- Vertical patterns
- No obvious correlation
13Sample Regression Result Utilization in RedHat
torrent
Statistical Analysis/Linear Regression
- Several values to consider
- R-Squared determines goodness of fit 01
- P-value determines Probability of obtaining a
result as impressive just by chance
- Suggested techniques result in marginal
improvement (R-squared) - No single parameter with dominant effect
- Seed percentage was removed by step() ? suggests
number of seeds is sufficient
14Spearmans rank correlation coefficients
Statistical Analysis/Spearmans Rank correlation
- Highest correlation with deviation of upload rate
for all torrents -gt Tit-for-tat effect - Two perf. metrics are similarly affected with
opposite signs - GA Little correlation with util. -gt unreliable
metric - DE Slightly larger effect from content avail.
15Conclusion and Future Work
- Conclusions
- No single factor determines observed performance
by peers - Outgoing bandwidth seems to have the largest
effect - Tit-for-tat is working
- There often appears to be sufficient number of
seeds available (non-factor on performance) - Capturing comparable performance is hard
- Performance of the peers in a torrent is rather
diverse - Instrumented clients cannot reflect a
representative picture. - Future work
- Active monitoring of BitTorrent
- BitTorrent overlay topology using peer exchange
feature - Characterizing new features
- DHT, super-seeding, peer exchange
16Thank you !