Achieving High Throughput on Fast Networks Bandwidth Challenges and World Records - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Achieving High Throughput on Fast Networks Bandwidth Challenges and World Records

Description:

Presented at ICTP 'Optimization Technologies for Low Bandwidth Networks' ... Astrophysics, Global weather, Bioinformatics, Fusion, seismology... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 35
Provided by: cot51
Category:

less

Transcript and Presenter's Notes

Title: Achieving High Throughput on Fast Networks Bandwidth Challenges and World Records


1
Achieving High Throughput on Fast Networks
(Bandwidth Challenges and World Records)
Bandwidth Challenges and Internet World Records
Presented at ICTP Optimization Technologies for
Low Bandwidth Networks workshop Trieste October
2006
  • Les Cottrell Yee-Ting Li
  • Stanford Linear Accelerator Center

2
Driver LHC Network Requirements
3
Internet2 Land Speed Records SC2003-2005
Records
7.2G X 20.7 kkm
H. Newman
  • Product of distance speed using TCP with
    routers
  • IPv4 Multi-stream record with FAST TCP 6.86 Gbps
    X 27kkm Nov 2004
  • PCI-X 2.0 9.3 Gbps Caltech-StarLight Dec 2005
  • PCI Express 9.8 Gbps Caltech Sunnyvale, July
    2006

Internet2 LSRsBlue HEP
Throuhgput (Petabit-m/sec)
4
Internet2 Land Speed Record 02-03 Outline
  • Breaking the Internet2 Land Speed Record
    (2002/03)
  • Not be confused with
  • Rocket-powered sled breaks 1982 world land speed
    record, San Francisco Chronicle May 1, 2003
  • Who did it
  • What was done
  • How was it done?
  • What was special about this anyway?
  • Who needs it?

5
Who did it Collaborators and sponsors
  • Caltech Harvey Newman, Steven Low, Sylvain
    Ravot, Cheng Jin, Xiaoling Wei, Suresh Singh,
    Julian Bunn
  • SLAC Les Cottrell, Gary Buhrmaster, Fabrizio
    Coccetti (SISSA)
  • LANL Wu-chun Feng, Eric Weigle, Gus Hurwitz,
    Adam Englehart
  • CERN Olivier Martin, Paolo Moroni
  • ANL Linda Winkler
  • DataTAG, StarLight, TeraGrid, SURFnet,
    NetherLight, Deutsche Telecom, Information
    Society Technologies
  • Cisco, Level(3), Intel
  • DoE, European Commission, NSF

6
What was done?
  • Set a new Internet2 TCP land speed record, 10,619
    Tbit-meters/sec
  • (see http//lsr.internet2.edu/)
  • With 10 streams achieved 8.6Gbps across US
  • Beat the Gbps limit for a single TCP stream
    across the Atlantic transferred a TByte in an
    hour

One Terabyte transferred in less than one hour
7
Typical Components
Earthquake strap
Disk servers
  • CPU
  • Pentium 4 (Xeon) with 2.4GHz cpu
  • For GE used Syskonnect NIC
  • For 10GE used Intel NIC
  • Linux 2.4.19 or 20
  • Routers
  • Cisco GSR 12406 with OC192/POS 1 and 10GE
    server interfaces (loaned, list gt 1M)
  • Cisco 760x
  • Juniper T640 (Chicago)
  • Level(3) OC192/POS fibers (loaned SNV-CHI monthly
    lease cost 220K)
  • All borrowed Off the Shelf

Compute servers
Heat sink
GSR
Note bootees
8
Challenges
  • PCI bus limitations (66MHz 64 bit 4.2Gbits/s
    at best)
  • At 10Gbits/s and 180msec RTT requires 500MByte
    window
  • Slow start problem at 1Gbits/s takes about 5-6
    secs for 180msec link,
  • i.e. if want 90 of measurement in stable (non
    slow start), need to measure for 60 secs
  • need to ship gt700MBytes at 1Gbits/s

Sunnyvale-Geneva, 1500Byte MTU, stock TCP
  • After a loss it can take over an hour for stock
    TCP (Reno) to recover to maximum throughput at
    1Gbits/s
  • i.e. loss rate of 1 in 2 Gpkts (3Tbits), or BER
    of 1 in 3.61012

Mbits/s
Seconds
9
What was special? 1/2
  • End-to-end application-to-application, single and
    multi-streams (not just internal backbone
    aggregate speeds)
  • TCP has not run out of stream yet, scales from
    modem speeds into multi-Gbits/s region
  • TCP well understood, mature, many good features
    reliability etc.
  • Friendly on shared networks
  • New TCP stacks only need to be deployed at sender
  • Often just a few data sources, many destinations
  • No modifications to backbone routers etc
  • No need for jumbo frames
  • Used Commercial Off The Shelf (COTS) hardware and
    software

10
What was Special 2/2
  • Raise the bar on expectations for applications
    and users
  • Some applications can use Internet backbone
    speeds
  • Provide planning information
  • The network is looking less like a bottleneck and
    more like a catalyst/enabler
  • Reduce need to colocate data and cpu
  • No longer ship literally truck or plane loads of
    data around the world
  • Worldwide collaborations of people working with
    large amounts of data become increasingly possible

11
Who needs it?
  • HENP current driver
  • Multi-hundreds Mbits/s and Multi TByte files/day
    transferred across Atlantic today
  • SLAC BaBar experiment already has almost a PByte
    stored
  • Tbits/s and ExaBytes (1018) stored in a decade
  • Data intensive science
  • Astrophysics, Global weather, Bioinformatics,
    Fusion, seismology
  • Industries such as aerospace, medicine, security
  • Future
  • Media distribution
  • Gbits/s2 full length DVD movies/minute
  • 2.36Gbits/s is equivalent to
  • Transferring a full CD in 2.3 seconds (i.e. 1565
    CDs/hour)
  • Transferring 200 full length DVD movies in one
    hour (i.e. 1 DVD in 18 seconds)
  • Will sharing movies be like sharing music today?

12
When will it have an impact
  • ESnet traffic doubling/year since 1990
  • SLAC capacity increasing by 90/year since 1982
  • SLAC Internet traffic increased by factor 2.5 in
    last year
  • International throughput increase by factor 10 in
    4 years
  • So traffic increases by factor 10 in 3.5 to 4
    years, so in
  • 3.5 to 5 years 622 Mbps gt 10Gbps
  • 3-4 years 155 Mbps gt 1Gbps
  • 3.5-5 years 45Mbps gt 622Mbps
  • 2010-2012
  • 100s Gbits for high speed production net end
    connections
  • 10Gbps will be mundane for RE and business
  • Home broadband doubling every year, 100Mbits/s
    by end of decade (if double each year then
    10Mbits/s by 2012)?
  • Aggressive Goal 1Gbps to all Californians by 2010

13
Impact
  • Caught technical press attention
  • On TechTV and ABC Radio
  • Reported in places such as CNN, the BBC, Times of
    India, Wired, Nature
  • Reported in English, Spanish, Portuguese, French,
    Dutch, Japanese
  • Guinness Book of Reccords (2004)

14
SC Bandwidth Challenge
  • Bandwidth Challenge
  • Yearly challenge of SuperComputing show
  • The Bandwidth Challenge highlights the best and
    brightest in new techniques for creating and
    utilizing vast rivers of data that can be carried
    across advanced networks.
  • Transfer as much data as possible using real
    applications over a 2 hour window

15
BWC History
  • 2002 Extreme Bandwidth Caltech, SLAC, CERN
  • 12.4Gbits/s peak, 2nd place, LBNL video stream
    with UDP won
  • 2003 Bandwidth Lust Caltech, SLAC, LANL, CERN,
    Manchester, NIKHEF
  • 23 Gbits/s peaks (6.6 TBytes in lt 1 hour), 1st
    place
  • 2004 Terabyte data transfers for physics
    Caltech, SLAC, FNAL
  • Achieved 101 Gbits/s, 1st place
  • 2005 Global Lambda for Particle Physics
  • Sustained gt 100Gbits/s for many hours, peak gt
    150Gbits/s 1st place

16
BWC Overview 2005
  • Distributed TeraByte Particle Physics Data Sample
    Analysis
  • Demonstrated high speed transfers of particle
    physics data between host labs and collaborating
    institutes in the USA and worldwide. Using state
    of the art WAN infrastructure and Grid Web
    Services based on the LHC Tiered Architecture,
    they showed real-time particle event analysis
    requiring transfers of Terabyte-scale datasets.
  • In detail, during the bandwidth challenge (2
    hours)
  • 131 Gbps measured by SCInet BWC team on 17 of our
    21 waves (15 minute average) - gt 150Gbps on all
    22
  • 95.37TB of data transferred.
  • (3.8 DVDs per second)
  • 90-150Gbps (peak 150.7Gbps)
  • On day of challenge
  • Transferred 475TB practising (waves were
    shared, still tuning applications and hardware)
  • Peak one way utlisation observed on a single link
    was 9.1Gbps (Caltech) and 8.4Gbps (SLAC)
  • Also wrote to StorCloud
  • SLAC wrote 3.2TB in 1649 files during BWC
  • Caltech 6GB/sec with 20 nodes

17
Participants Worldwide
  • Caltech/HEP/CACR/ NetLab Harvey Newman, Julian
    Bunn - Contact, Dan Nae, Sylvain Ravot, Conrad
    Steenberg, Yang Xia, Michael Thomas
  • SLAC/IEPM Les Cottrell, Gary Buhrmaster,
    Yee-Ting Li, Connie Logg
  • FNAL Matt Crawford, Don Petravick, Vyto
    Grigaliunas, Dan Yocum
  • University of Michigan Shawn McKee, Andy Adamson,
    Roy Hockett, Bob Ball, Richard French, Dean
    Hildebrand, Erik Hofer, David Lee, Ali Lotia, Ted
    Hanss, Scott Gerstenberger
  • U Florida Paul Avery, Dimitri Bourilkov,
  • University of Manchester Richard Hughes-Jones?
  • CERN, Switzerland David Foster
  • KAIST, Korea Yusung Kim,
  • Kyungpook Univserity, Korea, Kihwan Kwon,
  • UERJ, Brazil Alberto Santoro,
  • UNESP, Brazil Sergio Novaes,
  • USP, Brazil Luis Fernandez Lopez
  • GLORIAD, USA Greg Cole, Natasha Bulashova

Sun Chelsio
ESnet
Neterion
18
(No Transcript)
19
Networking Overview
  • We had 22 10Gbits/s waves to the Caltech and
    SLAC/FNAL booths. Of these
  • 15 waves to the Caltech booth (from Florida (1),
    Korea/GLORIAD (1), Brazil (1 2.5Gbits/s),
    Caltech (2), LA (2), UCSD, CERN (2), U Michigan
    (3), FNAL(2)).
  • 7 x 10Gbits/s waves to the SLAC/FNAL booth (2
    from SLAC, 1 from the UK, and 4 from FNAL).
  • The waves were provided by Abilene, Canarie,
    Cisco (5), ESnet (3), GLORIAD (1), HOPI (1),
    Michigan Light Rail (MiLR), National Lambda Rail
    (NLR), TeraGrid (3) and UltraScienceNet (4).

20
Network Overview
21
Hardware (SLAC only)
  • At SLAC
  • 14 x 1.8Ghz Sun v20z (Dual Opteron)
  • 2 x Sun 3500 Disk trays (2TB of storage)
  • 12 x Chelsio T110 10Gb NICs (LR)
  • 2 x Neterion/S2io Xframe I (SR)
  • Dedicated Cisco 6509 with 4 x 4x10GB blades
  • At SC05
  • 14 x 2.6Ghz Sun v20z (Dual Opteron)
  • 10 QLogic HBAs for StorCloud Access
  • 50TB Storage at SC05 provide by 3PAR (Shared
    with Caltech)
  • 12 x Neterion/S2io Xframe I NICs (SR)
  • 2 x Chelsio T110 NICs (LR)
  • Shared Cisco 6509 with 6 x 4x10GB blades

22
Hardware at SC05
  • Industrial fans to keep cool around back

23
Software
  • BBCP Babar File Copy
  • Uses ssh for authentication
  • Multiple stream capable
  • Features rate synchronisation to reduce byte
    retransmissions
  • Sustained over 9Gbps on a single session
  • XrootD
  • Library for transparent file access (standard
    unix file functions)
  • Designed primarily for LAN access (transaction
    based protocol)
  • Managed over 35Gbit/sec (in two directions) on 2
    x 10Gbps waves
  • Transferred 18TBytes in 257,913 files
  • DCache
  • 20Gbps production and test cluster traffic

24
BWC Aggregate Bandwidth
Previous year (SC04)
25
Cumulative Data Transferred
Bandwidth Challenge period
26
Component Traffic
  • Note instability

27
SLAC Cluster Contributions
Router crashes
ESnet routed
ESnet SDN layer 2 via USN
In to booth
Bandwidth Challenge period
Out from booth
28
Problems
  • Managerial/PR
  • Initial request for loan hardware took place 6
    months in advance!
  • Lots and lots of paperwork to keep account of all
    loan equipment (over 100 items loaned from 7
    vendors)
  • Thank/acknowledge all contributors, press release
    clearances
  • Logistical
  • Set up and tore down a pseudo production network
    and servers in a space of week!
  • Testing could not begin until waves were alight
  • Most waves lit day before challenge!
  • Shipping so much hardware not cheap!
  • Setting up monitoring

29
Problems
  • Tried to configure hardware and software prior to
    show
  • Hardware
  • NICS
  • We had 3 bad Chelsios (bad memory)
  • Neterions Xframe IIs did not work in UKLights
    Boston machines
  • Hard-disks
  • 3 dead 10K disks (had to ship in spare)
  • 1 x 4Port 10Gb blade DOA
  • MTU mismatch between domains
  • Router blade died during stress testing day
    before BWC!
  • Cables! Cables! Cables!
  • Software
  • Used golden disks for duplication (still takes 30
    minutes per disk to replicate!)
  • Linux kernels
  • Initially used 2.6.14, found severe performance
    problems compared to 2.6.12.
  • (New) Router firmware caused crashes under heavy
    load
  • Unfortunately, only discovered just before BWC
  • Had to manually restart the affected ports during
    BWC

30
Problems
  • Most transfers were from memory to memory
    (Ramdisk etc).
  • Local caching of (small) files in memory
  • Reading and writing to disk will be the next
    bottleneck to overcome

31
SC05 BWC Takeaways Lessons
  • Substantive take-aways from this Marathon
    exercise
  • An optimized Linux kernel (2.6.12 FAST NFSv4)
    for data transport after 7 full kernel-build
    cycles in 4 days
  • Scaling up SRM/gridftp to near 10 Gbps per wave,
    using Fermilabs production clusters
  • A newly optimized application-level copy program,
    bbcp, that matches the performance of iperf under
    some conditions
  • Extensions of SLACs Xrootd, an optimized
    low-latency file access application for clusters,
    across the wide area
  • Understanding of the limits of 10 Gbps-capable
    computer systems, network switches and interfaces
    under stress
  • A Lot of Work remains, to put it this into
    production-use(for example Caltech/CERN/FNAL/SLAC
    /Michigan Collaboration)

32
Conclusion
  • Previewed the IT Challenges of the next
    generation Data Intensive Science Applications
    (High Energy Physics, astronomy etc)
  • Petabyte-scale datasets
  • Tens of national and transoceanic links at 10
    Gbps (and up)
  • 100 Gbps aggregate data transport sustained for
    hours We reached a Petabyte/day transport rate
    for real physics data
  • Learned to gauge difficulty of the global
    networks and transport systems required for the
    LHC mission
  • Set up, shook down and successfully ran the
    systems in lt 1 week
  • Understood and optimized the configurations of
    various components (Network interfaces,
    router/switches, OS, TCP kernels, applications)
    for high performance over the wide area network.

33
Whats next?
  • Break 10Gbits/s single stream limit, distance
    capped
  • Evaluate new stacks with real-world links, and
    other equipment
  • Other NICs
  • Response to congestion, pathologies
  • Fairnesss, robustness, stability
  • Deploy for some major (e.g. HENP/Grid) customer
    applications
  • Disk-to-disk throughput useful applications
  • Need faster cpus (extra 60 MHz/Mbits/s over TCP
    for disk to disk), understand how to use
    multi-processors
  • Disk-to-disk Marks 536 Mbytes/sec (Windows)
    500 Mbytes/sec (Linux)
  • Concentrate now on reliable Terabyte-scale file
    transfers
  • System Issues PCI-X Bus, Network Interfaces,
    Disk I/O Controllers, Linux Kernel, CPU
    utilization
  • Move from hero demonstrations to commonplace

34
Press and PR SC05
  • 11/8/05 - Brit Boffins aim to Beat LAN speed
    record from vnunet.com
  • SC05 Bandwidth Challenge SLAC Interaction Point.
  • Top Researchers, Projects in High Performance
    Computing Honored at SC/05 ... Business Wire
    (press release) - San Francisco, CA, USA
  • 11/18/05 - Official Winner Announcement
  • 11/18/05 - SC05 Bandwidth Challenge Slide
    Presentation
  • 11/23/05 - Bandwidth Challenge Results from
    Slashdot
  • 12/6/05 - Caltech press release
  • 12/6/05 - Neterion Enables High Energy Physics
    Team to Beat World Record Speed at SC05
    Conference CCN Matthews News Distribution Experts
  • High energy physics team captures network prize
    at SC05 from SLAC
  • High energy physics team captures network prize
    at SC05 EurekaAlert!
  • 12/7/05 - High Energy Physics Team Smashes
    Network Record, from Science Grid this Week.
  • Congratulations to our Research Partners for a
    New Bandwidth Record at SuperComputing 2005, from
    Neterion.
Write a Comment
User Comments (0)
About PowerShow.com