Title: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet
1Performance Analysis of a Parallel Downloading
Scheme from Mirror Sites Throughout the Internet
- Allen Miu, Eugene Shih
- 6.892 Class Project
- December 3, 1999
2Overview
- Problem Statement
- Advantages/Disadvantages
- Operation of Paraloading
- Goals of Experiment
- Setup of Experiment
- Current Results
- Summary
- Questions
3Problem Statement Is Paraloading Good?
Paraloading is the downloading from multiple
mirror sites in parallel.
Mirror C
Paraloader
Mirror A
Mirror B
4Advantages of Paraloading
- Performance is proportional to the realized
aggregate bandwidth of the parallel connections - Less prone to complete download failures compared
to the single connection download - Facilitates dynamic load balancing among parallel
connections - Facilitates reliable, out-of-order delivery
(similar to Netscape)
5Disadvantages of Paraloading
- Can be overly aggressive
- Consumes more server resources
- Overhead costs for scheduling, maintaining
buffers, and sending block request messages - Only effective when mirror servers are available
6Step 1 Obtain Mirror List
Mirror List
Mirror C
Paraloader
Mirror B
Mirror A
7Step 2 Obtain File Length
Mirror C
Paraloader
Mirror B
Mirror A
8Step 3 Send Block Requests
Mirror C
Paraloader
Mirror B
Mirror A
9Step 4 Re-order
Mirror C
Paraloader
Mirror B
Mirror A
10Step 5 Send Next Request
Mirror C
Paraloader
Mirror B
Mirror A
11Goals of Experiment
- Main goal To compare the performance of serial
and parallel downloading - To verify the results of Rodriguez et al.
- To examine whether varying the degree of
parallelism, the number of mirror servers used,
affects performance - To gain experience with paraloading and to find
out what issues are involved in designing
efficient paraloading systems
12Experiment Setup
- Implemented a paraloader application in Java,
using HTTP1.1 (range-requests and persistent
connections) - Files are downloaded at MIT from 3 different sets
(kernel, mars, tucows) of 7 mirror servers - Degree of parallelism examined M 1, 3, 5, 7
- Downloaded a 1MB and a 300KB file (S 1MB,
300KB) in 1 hour intervals for 7 days - Block Size 32KB
13Results
- Paraloading decreases download time over the
average single connection case - Speedup is far from optimal case (aggregate
bandwidth) - Block request gaps result in wasted bandwidth
- Gaps are proportional to RTT
- Congestion at client? Possible but unlikely.
14S 1MB
15S 1MB
16S - 763K
S 763KB, B 30, M 4
17Acknowledgements
- Dave Anderson
- Dorothy Curtis
- Wendi Heinzelmann
- WIND Group
18Questions
19(No Transcript)
20Summary of Contributions
- Implemented a paraloader
- Verified that paraloading indeed provides
performance gain sometimes - Increasing degree of parallelism improves overall
performance - Performance gains are not as good as those
reported by Rodriguez et al.
21Future Work
- Examine how block size affects performance gain
- Examine cost of paraloading
- Implement and test various optimization
techniques - Perform measurements at different client sites
22Paraloading Will Not Be Effective In All
Situations
- Clients should have enough slack bandwidth
capacity to open more than one connection - Parallel connections are bottleneck disjoint
- Target data on mirror servers is consistent and
static - Security and authentication services are
installed where appropriate - Data transport is reliable
- Mirror locations are quickly and easily obtained
23Step-by-step Process of the Block Scheduling
Paraloading Scheme
- 1. Obtain a list of mirror sites
- 2. Open a connection to a mirror server and
obtain file length - 3. Divide file length into blocks
- 4. Send a block request to each open connection
- 5. Wait for a response
- 6. Send a new block request to the first
connection that finished downloading a block - 7. Loop back to 5 until all blocks are retrieved
24Paraloading is Not a Well-studied Concept
- Byers et al. proposed using Tornado codes to
facilitate paraloading. - Rodriguez et al. proposed the block scheduling
paraloading scheme that is used in our project