Rethinking Internet Bulk Data Transfers - PowerPoint PPT Presentation

About This Presentation
Title:

Rethinking Internet Bulk Data Transfers

Description:

Why rethink Internet bulk data transfers? ... Marcel Dischinger. Faculty. Stefan Savage, UCSD. Amin Vahdat, UCSD. Peter Druschel, MPI-SWS ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 24
Provided by: krishna50
Category:

less

Transcript and Presenter's Notes

Title: Rethinking Internet Bulk Data Transfers


1
Rethinking Internet Bulk Data Transfers
  • Krishna P. Gummadi
  • Networked Systems Group
  • Max-Planck Institute for Software Systems
  • Saarbruecken, Germany

2
Why rethink Internet bulk data transfers?
  • The demand for bulk content in the Internet is
    rapidly rising
  • e.g., movies, home videos, software, backups,
    scientific data
  • Yet, bulk transfers are expensive and inefficient
  • postal mail is often preferred for delivering
    bulk content
  • e.g., netflix mails DvDs, even Google uses
    sneaker-nets
  • The problem is inherent to the current Internet
    design
  • it was designed for connecting end hosts, not
    content delivery
  • How to design a future network for bulk content
    delivery?

3
Lots of demand for bulk data transfers
  • Bulk transfers comprise of a large fraction
    Internet traffic
  • they are few in number, but account for a lot of
    bytes
  • Lots more bulk content is transferred using
    postal mail
  • NetFlix ships more DvD bytes than the Internet in
    the U.S
  • Growing demand to transfer all bulk data over the
    Internet


flows bytes
Toronto 0.008 36
Munich 0.003 44
Abilene 0.08 50
Share of bulk (gt 100MB) TCP flows
4
Internet bulk transfers are expensive and
inefficient
  • Attempted 1.2 TB transfer between U.S. and
    Germany
  • between Univ. of Washington and Max-Planck
    Institute
  • MPIs happy hours were UWs unhappy hours
  • network transfer would have lasted several weeks
  • Ultimately, used physical disks and DHL to
    transfer data
  • Other examples
  • Amazons on-demand Simple Storage Service (S3)
  • storage 0.15 cents/GB, data transfer 0.28
    cents/GB
  • ISPs routinely rate-limit bulk transfers
  • even academic networks rate-limit YouTube and
    file-sharing

5
The Internet is not designed for bulk content
delivery
  • For many bulk transfers, users primarily care
    about completion time
  • i.e., when the last data packet was delivered
  • But, Internet routing / transport focus on
    individual packets
  • such as their latency, jitter, loss, and ordered
    delivery
  • What would an Internet for bulk transfers focus
    on?
  • optimizing routes transfer schedules for
    efficiency cost
  • finding routes with most bandwidth, not min.
    delay
  • delaying transfers to save cost, if deadlines can
    be met

6
Rest of the talk
  • Identify opportunities for inexpensive and
    efficient data transfers
  • Explore network designs that exploit the
    opportunity

7
Opportunity 1 Spare network capacity
  • Average utilization of most Internet links is
    very low
  • backbone links typically run at 30 utilization
  • peering links are used more, but nowhere near
    capacity
  • residential access links are unused most of the
    time

8
Opportunity 1 Spare network capacity
  • Why do ISPs over-provision?
  • we do not know how to manage congested networks
  • TCP reacts badly to congestion
  • Over-provisioning avoids congestion guarantees
    QoS!
  • backbone ISPs offer strict SLAs on reliability,
    jitter, and loss
  • customers buy bigger access pipes than they need
  • Why not use spare link capacities for bulk
    transfers?

9
Opportunity 2Large diurnal variation in network
load
  • Internet traffic shows large diurnal variation
  • much of it driven by human activity

10
Opportunity 2Large diurnal variation in network
load
  • ISPs charge customers based on their peak load
  • 95th percentile of load averaged over 5 min
    intervals
  • Because networks have to be built to withstand
    peak load
  • Why not delay bulk transfers till periods of low
    load?
  • reduces peak load and evens out load variations
  • predictable load is easier for ISPs to handle
  • lower costs for customers

11
Opportunity 2Large diurnal variation in network
load
  • Can significantly reduce peak load by delaying
    bulk traffic
  • allowing a 2 hour delay can decrease peak load by
    50

12
Opportunity 3Non-shortest paths with most
bandwidth
  • The Internet picks a single short route between
    end hosts
  • Why not use non-shortest, multi-paths with most
    b.w.?

13
Rest of the talk
  • Identify opportunities for inexpensive and
    efficient data transfers
  • Explore network designs that exploit the
    opportunity

14
Many potential network designs
  • How to exploit spare bandwidth for bulk
    transfers?
  • network-level differentiation using router
    queues?
  • transport-level separation using TCP-NICE?
  • How and where to delay bulk data for scheduling?
  • at every-router? at selected routers at peering
    points?
  • storage-enabled routers? data centers attached to
    routers?
  • How to select non-shortest multiple paths?
  • separately within each ISP? or across different
    ISPs?
  • But, all designs share few key architectural
    features

15
Key architectural features
  • Isolation separate bulk traffic from the rest
  • Staging break end-to-end transfers into multiple
    stages

ATT
S3
UW
S5
S2
DT
S1
S4
S6
UW ? MPI S1, S2, S3, S4, S5, S6
MPI
16
How to guarantee end-to-end reliability?
  • How to recover when an intermediate stage fails?
  • option 1 client times out and requests the
    sender again
  • option 2 client directly queries the network to
    locate data
  • analogous to FedEx or DHL tracking service

ATT
S3
UW
S5
S5
S2
X
DT
S1
S6
S4
Content Addressed Tracking (CAT)
MPI
17
CAT allows opportunistic multiplexing of transfer
stages
Fraunhofer
ATT
S3
UW
S7
S5
CAT
S2
DT
S1
S4
S6
UW ? MPI S1, S2, S3, S4, S5, S6
MPI
UW ? Fraunhofer S1, S2, S3, S4, S5, S7
  • Eliminates redundant data transfers independent
    of end hosts
  • Huge potential benefits for popular data
    transfers
  • e.g., 80 of P2P file-sharing traffic is repeated
    bytes

18
Service model of the new architecture
  • Our architecture has 3 key features
  • isolation, staging and content addressed tracking
  • Data transfers in our architecture are not
    end-to-end
  • packets are not individually acked by end hosts
  • Our architecture provides an offline data
    transfer service
  • the sender pushes the data into the network
  • the network delivers the data to the receiver

19
Evaluating the architecture
  • CERNs LHC workload
  • 15 Petabytes of data per year -- 41 TB/day
  • transfer data from Tier 1 site at FNAL and 6 Tier
    2 sites

20
Benefits of our architecture
  • Abilene (10 Gbps) backbone was deemed
    insufficient
  • plan to use NSFs new TeraGrid (40 Gbps) network
  • Could our new architecture help?

Max. Time (days) required to transfer 41TB over
Abilene
  • With our architecture, the CERN data could be
    transferred using less than 20 of spare b.w. in
    Abilene!

21
Are bulk transfers worth a new network
architecture?
  • The economics of distributed computing
  • computation costs falling at Moores law
  • storage costs falling faster than Moores law
  • networking costs falling slower than Moores law
  • Wide-area bandwidth is the most expensive
    resource today
  • strong incentive to put computation near data
  • Lowering bulk transfer costs enables new
    distributed systems
  • encourages data replication near computation
  • e.g., Web on my disk, Google might become PC
    software

22
Conclusion
  • A large and growing demand for bulk data in the
    Internet
  • Yet, bulk transfers are expensive and inefficient
  • The Internet is not designed for bulk content
    transfers
  • Proposed an offline data transfer architecture
  • to exploit spare resources, to schedule
    transfers, to route efficiently, and to eliminate
    redundant transfers
  • It relies on isolation, staging,
    content-addressed tracking
  • preliminary evaluation shows lots of promise

23
Acknowldgements
  • Students at MPI-SWS
  • Massimiliano Marcon
  • Andreas Haeberlen
  • Marcel Dischinger
  • Faculty
  • Stefan Savage, UCSD
  • Amin Vahdat, UCSD
  • Peter Druschel, MPI-SWS
Write a Comment
User Comments (0)
About PowerShow.com