Title: The Performance Bottleneck Application, Computer, or Network
1The Performance BottleneckApplication, Computer,
or Network
- Richard Carlson
- Internet2
- Part 1
2Outline
- Why there is a problem
- What can be done to find/fix problems
- Tools you can use
- Ramblings on whats next
3Basic Premise
- Applications performance should meet your
expectations! - If they dont you should complain!
4Questions
- How many times have you said
- Whats wrong with the network?
- Why is the network so slow?
- Do you have any way to find out?
- Tools to check local host
- Tools to check local network
- Tools to check end-to-end path
5Underlying Assumption
- When problems exist, its the networks fault!
6NDT Demo First
7Simple Network Picture
Bobs Host
Network Infrastructure
Carols Host
8Network Infrastructure
9Possible Bottlenecks
- Network infrastructure
- Host computer
- Application design
10Network Infrastructure Bottlenecks
- Links too small
- Using standard Ethernet instead of FastEthernet
- Links congested
- Too many hosts crossing this link
- Scenic routing
- End-to-end path is longer than it needs to be
- Broken equipment
- Bad NIC, broken wire/cable, cross-talk
- Administrative restrictions
- Firewalls, Filters, shapers, restrictors
11Host Computer Bottlenecks
- CPU utilization
- What else is the processor doing?
- Memory limitations
- Main memory and network buffers
- I/O bus speed
- Getting data into and out of the NIC
- Disk access speed
12Application Behavior Bottlenecks
- Chatty protocol
- Lots of short messages between peers
- High reliability protocol
- Send packet and wait for reply before continuing
- No run-time tuning options
- Use only default settings
- Blaster protocol
- Ignore congestion control feedback
13TCP 101
- Transmission Control Protocol (TCP)
- Provides applications with a reliable in-order
delivery service - The most widely used Internet transport protocol
- Web, File transfers, email, P2P, Remote login
- User Datagram Protocol (UDP)
- Provides applications with an unreliable delivery
service - RTP, DNS
14Summary Part 1
- Problems can exist at multiple levels
- Network infrastructure
- Host computer
- Application design
- Multiple problems can exist at the same time
- All problems must be found and fixed before
things get better
15Summary Part 2
- Every problem exhibits the same symptom
- The application performance doesnt meet the
users expectations!
16Outline
- Why there is a problem
- What can be done to find/fix problems
- Tools you can use
- Ramblings on whats next
17Real Life Examples
- I know what the problem is
- Bulk transfer with multiple problems
18Example 1 - SC04 experience
- Booth having trouble getting application to run
from Amsterdam to Pittsburgh - Tests between remote SGI and local PC showed
throughput limited to lt 20 Mbps - Assumption is PC buffers too small
- Question How do we set WinXP send/receive window
size
19SC04 Determine WinXP info
http//www.dslreports.com/drtcp
20SC04 Confirm PC settings
- DrTCP reported 16 MB buffers, but test program
still slow, Q How to confirm? - Run test to SC NDT server (PC has Fast Ethernet
Connection) - Client-to-Server 90 Mbps
- Server-to-Client 95 Mbps
- PC Send/Recv window size 16 Mbytes (wscale 8)
- NDT Send/Recv window Size 8 Mbytes (wscale 7)
- Reported TCP RTT 46.2 msec
- approximately 600 Kbytes of data in TCP buffer
- Min window size / RTT 1.3 Gbps
21SC04 Local PC Configured OK
- No problem found
- Able to run at line rate
- Confirmed that PCs TCP window values were set
correctly
22SC04 Remote SGI
- Run test from remote SGI to SC show floor (SGI is
Gigabit Ethernet connected). - Client-to-Server 17 Mbps
- Server-to-Client 16 Mbps
- SGI Send/Recv window size 256 Kbytes (wscale 3)
- NDT Send/Recv window Size 8 Mbytes (wscale 7)
- Reported RTT 106.7 msec
- Min window size / RTT 19 Mbps
23SC04 Remote SGI Results
- Needed to download and compile command line
client - SGI TCP window is too small to fill transatlantic
pipe (19 Mbps max) - User reluctant to make changes to SGI network
interface from SC show floor - NDT client tool allows application to change
buffer (setsockopt() function call)
24SC04 Remote SGI (tuned)
- Re-run test from remote SGI to SC show floor.
- Client-to-Server 107 Mbps
- Server-to-Client 109 Mbps
- SGI Send/Recv window size 2 Mbytes (wscale 5)
- NDT Send/Recv window Size 8 Mbytes (wscale 7)
- Reported RTT 104 msec
- Min window size / RTT 153.8 Mbps
25SC04 Debugging Results
- Team spent over 1 hour looking at Win XP config,
trying to verify window size - Single NDT test verified this in under 30 seconds
- 10 minutes to download and install NDT client on
SGI - 15 minutes to discuss options and run client test
with set buffer option
26SC04 Debugging Results
- 8 Minutes to find SGI limits and determine
maximum allowable window setting (2 MB) - Total time 34 minutes to verify problem was with
remote SGIs TCP send/receive window size - Network path verified but Application still
performed poorly until it was also tuned
27Example 2 SCP file transfer
- Bob and Carol are collaborating on a project.
Bob needs to send a copy of the data (50 MB) to
Carol every ½ hour. Bob and Carol are 2,000
miles apart. How long should each transfer take? - 5 minutes?
- 1 minute?
- 5 seconds?
28What should we expect?
- Assumptions
- 100 Mbps Fast Ethernet is the slowest link
- 50 msec round trip time
- Bob Carol calculate
- 50 MB 8 400 Mbits
- 400 Mb / 100 Mb/sec 4 seconds
29Initial SCP Test Results
30Initial Test Results
- This is unacceptable!
- First look for network infrastructure problem
- Use NDT tester to examine both hosts
31Initial NDT testing shows Duplex Mismatch at one
end
32NDT Found Duplex Mismatch
- Investigating this it is found that the switch
port is configured for 100 Mbps Full-Duplex
operation. - Network administrator corrects configuration and
asks for re-test
33Duplex Mismatch Corrected
34SCP results after Duplex Mismatch Corrected
35Intermediate Results
- Time dropped from 18 minutes to 40 seconds.
- But our calculations said it should take 4
seconds! - 400 Mb / 40 sec 10 Mbps
- Why are we limited to 10 Mbps?
- Are you satisfied with 1/10th of the possible
performance?
36Default TCP window settings
37Calculating the Window Size
- Remember Bob found the round-trip time was 50
msec - Calculate window size limit
- 85.3KB 8 b/B 698777 b
- 698777 b / .050 s 13.98 Mbps
- Calculate new window size
- (100 Mb/s .050 s) / 8 b/B 610.3 KB
- Use 1MB as a minimum
38Resetting Window Value
39With TCP windows tuned
40Steps so far
- Found and fixed Duplex Mismatch
- Network Infrastructure problem
- Found and fixed TCP window values
- Host configuration problem
- Are we done yet?
41SCP results with tuned windows
42Intermediate Results
- SCP still runs slower than expected
- Hint SCP uses internal buffers
- Patch available from PSC
43SCP Results with tuned SCP
44Final Results
- Fixed infrastructure problem
- Fixed host configuration problem
- Fixed Application configuration problem
- Achieved target time of 4 seconds to transfer 50
MB file over 2000 miles
45Why is it hard to Find/Fix Problems?
- Network infrastructure is complex
- Network infrastructure is shared
- Network infrastructure consists of multiple
components
46Shared Infrastructure
- Other applications accessing the network
- Remote disk access
- Automatic email checking
- Heartbeat facilities
- Other computers are attached to the closet switch
- Uplink to campus infrastructure
- Other users on and off site
- Uplink from campus to gigapop/backbone
47Other Network Components
- DHCP (Dynamic Host Resolution Protocol)
- At least 2 packets exchanged to configure your
host - DNS (Domain Name Resolution)
- At least 2 packets exchanged to translate FQDN
into IP address - Network Security Devices
- Intrusion Detection, VPN, Firewall
48Network Infrastructure
- Large complex system with potentially many
problem areas
49Why is it hard to Find/Fix Problems?
- Computers have multiple components
- Each Operating System (OS) has a unique set of
tools to tune the network stack - Application Appliances come with few knobs and
limited options
50Computer Components
- Main CPU (clock speed)
- Front Back side bus
- Main Memory
- I/O Bus (ATA, SCSI, SATA)
- Disk (access speed and size)
51Computer Issues
- Lots of internal components with multi-tasking OS
- Lots of tunable TCP/IP parameters that need to be
right for each possible connection
52Why is it hard to Find/Fix Problems?
- Applications depend on default system settings
- Problems scale with distance
- More access to remote resources
53Default System Settings
- For Linux 2.6.13 there are
- 11 tunable IP parameters
- 45 tunable TCP parameters
- 148 Web100 variables (TCP MIB)
- Currently no OS ships with default settings that
work well over trans-continental distances - Some applications allow run-time setting of some
options - 30 settable/viewable IP parameters
- 24 settable/viewable TCP parameters
- There are no standard ways to set run-time option
flags
54Application Issues
- Setting tunable parameters to the right value
- Getting the protocol right
55How do you set realistic Expectations?
- Assume network bandwidth exists or find out what
the limits are - Local LAN connection
- Site Access link
- Monitor the link utilization occasionally
- Weathermap
- MRTG graphs
- Look at your host config/utilization
- What is the CPU utilization
56Ethernet, FastEthernet, Gigabit Ethernet
- 10/100/1000 auto-sensing NICs are common today
- Most campuses have installed 10/100 switched
infrastructure - Access network links are currently the limiting
factor in most networks - Backbone networks are 10 Gigabit/sec
57Site Access and Backbone
- Campus access via Regional GigaPoP
- Confirm with campus admin
- Abilene Backbone
- 10 Gbps POS links coast-to-coast
- Other Federal backbone networks
- Other Commercial network
- Other institutions, sites, and networks
58Tools, Tools, Tools
- Ping
- Traceroute
- Iperf
- Tcpdump
- Tcptrace
- BWCTL
- NDT
- OWAMP
- AMP
- Advisor
- Thrulay
- Web100
- MonaLisa
- pathchar
- NPAD
- Pathdiag
- Surveyor
- Ethereal
- CoralReef
- MRTG
- Skitter
- Cflowd
- Cricket
- Net100
59Active Measurement Tools
- Tools that inject packets into the network to
measure some value - Available Bandwidth
- Delay/Jitter
- Loss
- Requires bi-directional traffic or synchronized
hosts
60Passive Measurement Tools
- Tools that monitor existing traffic on the
network and extract some information - Bandwidth used
- Jitter
- Loss rate
- May generate some privacy and/or security concerns
61Abilene Weather Map
62MRTG Graphs
63Windows XP Performance
64Outline
- Why there is a problem
- What can be done to find/fix problems
- Tools you can use
- Ramblings on whats next
65Focus on 3 tools
- Existing NDT tool
- Allows users to test network path for a limited
number of common problems - Existing NPAD tool
- Allows users to test local network infrastructure
while simulating a long path - Emerging PerfSonar tool
- Allows users to retrieve network path data from
major national and international REN network
66Network Diagnostic Tool (NDT)
- Measure performance to users desktop
- Identify real problems for real users
- Network infrastructure is the problem
- Host tuning issues are the problem
- Make tool simple to use and understand
- Make tool useful for users and network
administrators
67NDT user interface
- Web-based JAVA applet allows testing from any
browser - Command-line client allows testing from remote
login shell
68NDT test suite
- Looks for specific problems that affect a large
number of users - Duplex Mismatch
- Faulty Cables
- Bottleneck link capacity
- Achievable throughput
- Ethernet duplex setting
- Congestion on this network path
69Duplex Mismatch Detection
- Developing analytical model to describe how
network operates (no prior art?) - Expanding model to describe UDP and TCP flows
- Test models in LAN, MAN, and WAN environments
- NIH/NLM grant funding
70Four Cases of Duplex Setting
71Bottleneck Link Detection
- What is the slowest link in the end-2-end path?
- Monitors packet arrival times using libpacp
routine - Use TCP dynamics to create packet pairs
- Quantize results into link type bins (no
fractional or bonded links) - Cisco URP grant work
72Normal congestion detection
- Shared network infrastructures will cause
periodic congestion episodes - Detect/report when TCP throughput is limited by
cross traffic - Detect/report when TCP throughput is limited by
own traffic
73Faulty Hardware/Link Detection
- Detect non-congestive loss due to
- Faulty NIC/switch interface
- Bad Cat-5 cable
- Dirty optical connector
- Preliminary works shows that it is possible to
distinguish between congestive and non-congestive
loss
74Full/Half Link Duplex setting
- Detect half-duplex link in E2E path
- Identify when throughput is limited by
half-duplex operations - Preliminary work shows detection possible when
link transitions between blocking states
75Finding Results of Interest
- Duplex Mismatch
- This is a serious error and nothing will work
right. Reported on main page and on Statistics
page - Packet Arrival Order
- Inferred value based on TCP operation. Reported
on Statistics page, (with loss statistics) and
order value on More Details page
76Finding Results of Interest
- Packet Loss Rates
- Calculated value based on TCP operation.
Reported on Statistics page, (with out-of-order
statistics) and loss value on More Details page - Path Bottleneck Capacity
- Measured value based on TCP operation. Reported
on main page
77Additional Functions and Features
- Provide basic tuning information
- Basic Features
- Basic configuration file
- FIFO scheduling of tests
- Simple server discovery protocol
- Federation mode support
- Command line client support
- Created sourceforge.net project page
78NPAD/pathdiag
- A new tool from researchers at Pittsburgh
Supercomputer Center - Finds problems that affect long network paths
- Uses Web100-enhanced Linux based server
- Web based Java client
79Long Path Problem
- E2E application performance is dependant on
distance between hosts - Full size frame time at 100 Mbps
- Frame 1500 Bytes
- Time 0.12 msec
- In flight for 1 msec RTT 8 packets
- In flight for 70 msec RTT 583 packets
80Long Path Problem
H2
1 msec H1 H2
H3
X
H1
70 msec H1 H3
81TCP Congestion Avoidance
- Cut number of packets by ½
- Increase by 1 per RTT
- LAN (RTT1msec)
- In flight changes to 4 packets
- Time to increase back to 8 is 4msec
- WAN (RTT 70 msec)
- In flight changes to 292 packets
- Time to increase back to 583 is 20.4 seconds
82PerfSonar Next Steps in Performance Monitoring
- New Initiative involving multiple partners
- ESnet (DOE labs)
- GEANT (European Research and Education network)
- Internet2 (Abilene and connectors)
83PerfSonar Router stats on a path
- Demo ESnet tool
- https//performance.es.net/cgi-bin/perfsonar-trace
.cgi - Paste output from Traceroute into the window and
view the MRTG graphs for the routers in the path - Author Joe Metzger ESnet
84Traceroute Visualizer
85The Wizard Gap
Courtesy of Matt Mathis (PSC)
86Google it!
- Enter tuning tcp into the google search engine.
- Top 2 hits are
- http//www.psc.edu/networking/perf_tune.html
- http//www-didc.lbl.gov/TCP-tuning/TCP-tuning.html
87PSC Tuning Page
88LBNL Tuning Page
89Internet2 Land Speed Record
- Challenge to community to demonstrate how to run
fast long distance flows - 2000 record 751 Mbps over 5,262 km
- 2005 record - 7.2 Gbps over 30,000 km
90Conclusions
- Applications can fully utilize the network
- All problems have a single symptom
- All problems must be found and fixed before
things get better - Some people stop investigating before finding all
problems - Tools exist, and more are being developed, to
make it easier to find problems
91 92Outline
- Why there is a problem
- What can be done to find/fix problems
- Tools you can use
- Ramblings on whats next
93Introduction
- Where have we been and where are we headed?
- Technology and hardware
- Transport Protocols
94Basic Assumption
- The Internet was designed to improve
communications between people
95What does the future hold?
- Moores Law shows no signs of slowing down
- The original law says the number of transistors
on a chip doubles every 18 months - Now it simply means that everything gets faster
96PC Hardware
- CPU processing power (flops) is increasing
- Front/back side bus clock rate is increasing
- Memory size is increasing
- HD size is increasing too
- For the past 10 years, every HD Ive purchased
cost 130
97Scientific Workstation
- PC or Sparc class computer
- Fast CPU
- 1 GB RAM
- 1 TB disk
- 10 Gbps NIC
- Todays cost 5,000
98Network Capability
- LAN networks (includes campus)
- MAN/RON network
- WAN network
- Remember the 80/20 rule
99Network NIC costs
- 10 Mbps NICs were 50 - 150 circa 1985
- 100 Mbps NICS were 50 - 150 circa 1995
- 1,000 Mbps NICS are 50 - 150 circa 2005
- 10 Gbps NICs are 1,500 - 2,500 today
- Note today 10/100/1000 cards are common and
10/100 cards are lt 10
100Ethernet Switches
- Unmanaged 5 port 10/100 switch 25.00
- Unmanaged 5 port 10/100/1000 switch 50
- Managed switches have more ports and are more
expensive (150 - 400 per port)
101Network Infrastructure
- Campus
- Regional
- National
- International
102Campus Infrastructure
- Consists of switches, routers, and cables
- Limited funds make it hard to upgrade
103Regional Infrastructure
- Many states have optical networks
- Illinois has I-Wire
- Metro area optical gear is reasonably priced
- Move by some to own fiber
- Flexible way to cut operating costs, but requires
larger up-front investment
104National Infrastructure
- Commercial vendors have pulled fiber to major
metro areas - NLR n x 10 Gbps
- Abilene - 1 x 10 Gbps (Qwest core)
- FedNets - (DoE, DoD, and NASA all run national
networks) - CAnet n x 10 Gbps
- Almost 500 Gbps into SC05 conference in Seattle
105International Infrastructure
- Multiple trans-atlantic 10 Gbps links
- Multiple trans-pacific 10 Gbps links
- Gloriad
106Interesting sidebar
- Chinas demand for copper, aluminum, and steel
have caused an increase in theft - Man hole covers
- Street lamps
- Parking meters
- Phone cable
- One possible solution is to replace copper wires
with FTTH solutions
107Transport Protocol
- TCP Reno has know problems with loss at high
speeds - Linear growth following packet loss
- No memory of past achievements
- TCP research groups are actively working on
solutions - HighSpeed-TCP, Scaleable-TCP, Hamilton-TCP, BIC,
CUBIC, FAST, UDT, Westwood - Linux (2.6.13) has run-time support for these
stacks
108What drives prices?
- Electronic component prices are driven by units
produces - Try buying a brand NEW i386 CPU
- Try upgrading your PCs CPU
- NICs are no different