Characterization and Evaluation of TCP and UDPbased Transport on Real Networks - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Characterization and Evaluation of TCP and UDPbased Transport on Real Networks

Description:

Low performance on fast long distance paths ... One of the best performers. Throughput is high. Big effects on RTT when achieves best throughput ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 21
Provided by: jul9191
Category:

less

Transcript and Presenter's Notes

Title: Characterization and Evaluation of TCP and UDPbased Transport on Real Networks


1
Characterization and Evaluation of TCP and
UDP-based Transport on Real Networks
  • Les Cottrell, Saad Ansari, Parakram Khandpur,
    Ruchi Gupta, Richard Hughes-Jones, Michael Chen,
    Larry McIntosh, Frank Leers
  • SLAC, Manchester University, Chelsio and Sun
  • Protocols for Fast Long Distance Networks, Lyon,
    France
  • February, 2005
  • www.slac.stanford.edu/grp/scs/net/talk05/pfld-feb0
    5.ppt

Partially funded by DOE/MICS Field Work Proposal
on Internet End-to-end Performance Monitoring
(IEPM), also supported by IUPAP
2
Project goals
  • Evaluate various techniques for achieving high
    bulk-throughput on fast long-distance real
    production WAN links
  • Compare contrast ease of configuration,
    throughput, convergence, fairness, stability etc.
  • For different RTTs
  • Recommend optimum techniques for data intensive
    science (BaBar) transfers using bbftp, bbcp,
    GridFTP
  • Validate simulator emulator findings provide
    feedback

3
Techniques rejected
  • Jumbo frames
  • Not an IEEE standard
  • May break some UDP applications
  • Not supported on SLAC LAN
  • Sender mods only, HENP model is few big senders,
    lots of smaller receivers
  • Simplifies deployment, only a few hosts at a few
    sending sites
  • So no Dynamic Right Sizing (DRS)
  • Runs on production nets
  • No router mods (XCP/ECN)

4
Software Transports
  • Advanced TCP stacks
  • To overcome AIMD congestion behavior of Reno
    based TCPs
  • BUT
  • SLAC datamover are all based on Solaris, while
    advanced TCPs currently are Linux only
  • SLAC production systems people concerned about
    non-standard kernels, ensuring TCP patches keep
    current with security patches for SLAC supported
    Linux version
  • So also very interested in transport that runs in
    user space (no kernel mods)
  • Evaluate UDT from UIC folks

5
Hardware Assists
  • For 1Gbits/s paths, cpu, bus etc. not a problem
  • For 10Gbits/s they are more important
  • NIC assistance to the CPU is becoming popular
  • Checksum offload
  • Interrupt coalescence
  • Large send/receive ofload (LSO/LRO)
  • TCP Offload Engine (TOE)
  • Several vendors for 10Gbits/s NICs, at least one
    for 1Gbits/s NIC
  • But currently restricts to using NIC vendors TCP
    implementation
  • Most focus is on the LAN
  • Cheap alternative to Infiniband, MyriNet etc.

6
Protocols Evaluated
  • TCP (implementations as of April 2004)
  • Linux 2.4 New Reno with SACK single and parallel
    streams (Reno)
  • Scalable TCP (Scalable)
  • Fast TCP
  • HighSpeed TCP (HSTCP)
  • HighSpeed TCP Low Priority (HSTCP-LP)
  • Binary Increase Control TCP (BICTCP)
  • Hamilton TCP (HTCP)
  • Layering TCP (LTCP)
  • UDP
  • UDT v2.

7
Methodology (1Gbit/s)
  • Chose 3 paths from SLAC
  • Caltech (10ms), Univ Florida (80ms), CERN (180ms)
  • Used iperf/TCP and UDT/UDP to generate traffic
  • Each run was 16 minutes, in 7 regions

SLAC
bottleneck
Caltech/UFL/CERN
TCP/UDP
Iperf or UDT
Ping 1/s
iperf
ICMP/ping traffic
4 mins
2 mins
8
Behavior Indicators
  • Achievable throughput
  • Stability S s/ยต (standard deviation/average)
  • Intra-protocol fairness F

9
Behavior wrt RTT
  • 10ms (Caltech) Throughput, Stability (small is
    good), Fairness minimum (over regions 2 thru 6)
    (closer to 1 is better)
  • Excl. FAST 72064Mbps, S0.180.04, F0.95
  • FAST 400120Mbps, S0.33, F0.88
  • 80ms (U. Florida) Throughput, Stability
  • All 350103Mbps, S0.30.12, F0.82
  • 180ms (CERN)
  • All 340130Mbps, S0.420.17, F0.81
  • The Stability and Fairness effects are more
    manifest on longer RTT, so focus on CERN

10
Reno single stream
  • Low performance on fast long distance paths
  • AIMD (add a1 pkt to cwnd / RTT, decrease cwnd by
    factor b0.5 in congestion)
  • Net effect recovers slowly, does not effectively
    use available bandwidth, so poor throughput
  • Remaining flows do not take up slack when flow
    removed

Multiple streams increase recovery rate
Congestion has a dramatic effect
SLAC to CERN
Recovery is slow
RTT increases when achieves best throughput
11
Fast
  • Also uses RTT to detect congestion
  • RTT is very stable s(RTT) 9ms vs 370.14ms for
    the others

2nd flow never gets equal share of bandwidth
Big drops in throughput which take several
seconds to recover from
SLAC-CERN
12
HTCP
  • One of the best performers
  • Throughput is high
  • Big effects on RTT when achieves best throughput
  • Flows share equally

Appears to need gt1 flow to achieve best
throughput
Two flows share equally
SLAC-CERN
13
BICTCP
  • Needs gt 1 flow for best throughput

14
UDTv2
  • Similar behavior to better TCP stacks
  • RTT very variable at best throughputs
  • Intra-protocol sharing is good
  • Behaves well as flows add subtract

15
Overall
Scalable is one of best, but inter-protocol is
poor (see Bullot et al.) BIC HTCP are about
equal UDT is close, BUT cpu intensive (used to be
much (factor of 10) worse) Fast gives low RTT
values variability All TCP protocols use
similar cpu (HSTCP looks poor because throughput
low)
16
10Gbps tests
  • At SC2004 using two 10Gbps dedicated paths
    between Pittsburgh and Sunnyvale
  • Using Solaris 10 (build 69) and Linux 2.6
  • On Sunfire Vx0z (dual quad 2.4GHz 64 bit AMD
    Opterons) with PCI-X 133MHz 64 bit
  • Only 1500 Byte MTUs
  • Achievable performance limits (using iperf)
  • Reno TCP (multi-flows) vs UDTv2,
  • TOE (Chelsio) vs no TOE (S2io)

17
Results
  • UDT limit was 4.45Gbits/s
  • Cpu limited
  • TCP Limit was about 7.50.07 Gbps, regardless of
  • Whether LAN (back to back) or WAN
  • WAN used 2MB window 16 streams
  • Whether Solaris 10 or Linux 2.6
  • Whether S2io or Chelsio NIC
  • Gating factorPCI-X
  • Raw bandwidth 8.53Gbps
  • But transfer broken into segments to allow
    interleaving
  • E.g. with max memory read byte count of 4096Bytes
    with Intel Pro/10GbE LR NIC limit is 6.83Gbits/s
  • One host with 4 cpus 2 NICs sent 11.50.2Gbps
    to two dual cpu hosts with 1 NIC each
  • Two hosts to two hosts (1 NIC/host) 9.07Gbps
    goodput forward 5.6Gbps reverse

18
TCP CPU Utilization
  • CPU power important
  • Each cpu2.4GHz
  • Throughput increases with flows
  • Util. not linear(throughput)
  • Depends on flows too

Chelsio(TOE)
  • Normalize GHz/Gbps
  • Chelsio TOE Linux 2.6.6
  • S2io CKS offload Sol10
  • S2io supports LSO but Sol10 did not, so not used
  • Microsoft reports 0.017GHz/Gbps with
    WindowsS2io/LSO, 1 flow

19
Conclusions
  • Need testing on real networks
  • Controlled simulation emulation critical for
    understanding
  • BUT need to verify, and results look different
    than expected (e.g. Fast)
  • Most important for transoceanic paths
  • UDT looks promising, still needs work for gt
    6Gbits/s
  • Need to evaluate various offloads (TOE, LSO ...)
  • Need to repeat inter-protocol fairness vs Reno
  • New buses important, need NICs to support then
    evaluate

20
Further Information
  • Web site with lots of plots analysis
  • www.slac.stanford.edu/grp/scs/net/papers/pfld05/ru
    chig/Fairness/
  • Inter-protocols comparison (Journal of Grid Comp,
    PFLD04)
  • www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-104
    02.pdf
  • SC2004 details
  • www-iepm.slac.stanford.edu/monitoring/bulk/sc2004/
Write a Comment
User Comments (0)
About PowerShow.com