Title: Performance Comparison of Scheduling Algorithms for PeertoPeer Collaborative File Distribution
1Performance Comparison of Scheduling Algorithms
for Peer-to-Peer Collaborative File Distribution
- Presented by Chan Siu Kei, Jonathan
- Supervisors Prof. VOK Li, Dr. KS Lui
2Overview
- Introduction
- Communication Model
- Analysis
- Scheduling Algorithms
- - Rarest Piece First
- - Most Demanding Node First
- - Maximum-Flow Algorithms
- Simulation Results
- Future Work
- Conclusion
3Introduction
- P2P file sharing applications are highly popular
in the Internet, e.g. BitTorrent, Gnutella,
Kazaa, Napster, etc. - More scalable (faster) compared with traditional
client/server approach (e.g. FTP) - Former research focuses on topics like overlay
topology formation, peer discovery, content
search, fairness and incentive issues, etc. But
seldom look into the data distribution scheduling
problem - We present the first effort and propose a novel
Maximum-Flow algorithm to better solve the problem
4Communication Model
- Synchronous Scheduling
- - same transmission time for every pair of nodes
- Asymmetric Bandwidth
- - send p pieces out, receive q pieces in for
each cycle
5Notations and Definitions
- N no. of peers, M no. of file pieces
- F F1, F2, , FM
- P NxM possession matrix,
- Pij 1 iff node i possesses file piece Fj,
otherwise Pij 0 - Pt possession matrix at time t
- p p1,p2,,pN (upload limit vector),
- q q1,q2,,qN (download limit vector)
p 1,1,2,2,2, q 2,3,2,3,3
6Schedule (1)
- Specifies which file pieces each peer has to send
out and to whom - A possible schedule for P0 with p1,1,2,2,2,
q2,3,2,3,3 - - Node 1 send piece 3 to node 2
- - Node 2 send piece 4 to node 1
- - Node 3 send piece 5 to node 1
- send piece 5 to node 2
- - Node 4 send piece 6 to node 2
- send piece 6 to node 3
- - Node 5 send piece 2 to node 4
- send piece 7 to node 4
- Formally, we use NxM matrix Sk to represent the
schedule at cycle k. From Sk, we can derive
transmission matrix Tk (NxM)
e.g. Node 1 receives piece 4 from Node 2, piece 5
from Node 3 gt and
7Schedule (2)
- Given Pk-1 and the schedule Sk-1, Tk-1, the
possession matrix at next cycle k is Pk Pk-1
Tk-1 (k gt 0) - The distribution terminates after certain, say k0
cycles, until - Our goal is to minimize k0, which is the time
needed for complete distribution
8Analysis on Lower Bound (1)
- Let p p1,p2,,pN, q q1,q2,,qN be the
upload and download limit vectors.
, , - Let ri be the total no. of 0s across row i, i.e.
, the min. value of k0 is given by - Let cj be the total no. of 1s along column j,
i.e. , we can find the minimum no. of
1s along all columns, , the
min. value of k0 is given by - Let z be the total no. of 0s in P, i.e.
, the min. value of k0 is given by
(1)
(2)
(3)
9Analysis on Lower Bound (2)
- Combining (1),(2),(3), the lower bound k0 is
given by
(4)
From (1),
From (2),
From (3),
10Rarest Piece First (RPF)
- Borrowed from the Rarest Element First algorithm
employed in BitTorrent - Rarity cj of piece j is the no. of peers who have
piece j, i.e.
RPF Node-Oriented (p1,1,2,2,2,
q2,3,2,3,3)
RPF Piece-Oriented (p1,1,2,2,2,
q2,3,2,3,3)
11Most Demanding Node First (MDNF)
- Demand di of node i is the no. of un-received
pieces for node i, i.e. - When choosing recipients, prefer sending to the
node with largest di
MDNF Node-Oriented (p1,1,2,2,2,
q2,3,2,3,3)
6
6
4
4
5
MDNF Piece-Oriented (p1,1,2,2,2,
q2,3,2,3,3)
6
6
4
4
5
12Problem with RPF and MDNF
- The max. no. of transmissions for each cycle
cannot be achieved
Using MDNF Piece-Oriented (p2,2,2,1,
q2,1,2,2) only 6 transmissions can be
scheduled (but the max. is 7)
MDNF (only 6 transmissions)
Maximum is 7 transmissions
13Maximum-Flow (MaxFlow)
Let G (V,E) to be the flow network graph
L L1, L2, , LN
R R1, R2, , RN
14Maximum-Flow (MaxFlow)
- Edmonds-Karp Algorithm
- Find augmenting paths using BFS
- Guarantee to find maximum of transmissions in
each cycle - Complexity
15MaxFlow Counter Example
- Pure MaxFlow performance is unsatisfactory, as it
does not consider whether we can match more in
subsequent cycles
Using MaxFlow, total 3 cycles are needed
(p2,2,2,2,2, q3,3,3,3,3)
Using RPF Node-Oriented, only 2 cycles are
needed (p2,2,2,2,2, q3,3,3,3,3)
16MaxFlow - Weighted
- Put weights on both sides to give priorities to
some nodes during searching - Weights on Li (sum of the no. of 0s in
other peers for those pieces that peer i has) - Weights on Bij dij (sum of the no. of 0s across
row i and column j) - E.g.
- d42 7
17MaxFlow WeightedCounter Example
For p2,2,2,2,2, q3,3,3,3,3
Using MaxFlow Weighted, total 3 cycles are
needed
P3 1
Using MDNF Piece-Oriented, only 2 cycles are
needed
P2 1
18MaxFlow Dynamically-Weighted
- Allows the weights to be dynamically varied
within each scheduling cycle
? 15,14,25,13,15,10,16,16 and d43 9 which
is the greatest value among all dij
19Simulation Results (1)
Fig. 1 Performance comparison of various
scheduling algorithms (All) with varying peer
sizes (file size 100, pi 2, qi 3, equal
probability for 1s and 0s)
20Simulation Results (2)
Fig. 2 Performance comparison of various
scheduling algorithms (Representative) with
varying peer sizes (file size 100, pi 2, qi
3, equal probability for 1s and 0s)
21Simulation Results (3)
Fig. 3 Performance comparison of various
scheduling algorithms (Representative) with
varying file sizes (peer size 10, pi 2, qi
3, equal probability for 1s and 0s)
22Future Work
- Study the case of asynchronous scheduling, where
the transmission time is different for different
pairs of nodes - Study the case when the network is dynamic in
nature, where peers can come and go at any
instant and they may shift to communicate with
different sets of peers during the distribution
process
23Conclusion
- The data distribution problem in P2P networks is
not well studied in previous research - We formally define the collaborative file
distribution problem with the possession and
transmission matrix formulations - We also deduce a theoretical bound for the
minimum distribution time required - We develop several types of algorithms (RPF,
MDNF, MaxFlow) for solving the problem - Our novel dynamically-weighted max-flow algorithm
outperforms all other algorithms by simulations
24Thank You!