Title: Experiences in Design and Implementation of a High Performance Transport Protocol
1Experiences in Design and Implementation of a
High Performance Transport Protocol
- Yunhong Gu, Xinwei Hong, and Robert L. Grossman
National Center for Data Mining
2Outline
- TCPs inefficiency in grid applications
- UDT
- Design issues
- Implementations issues
- Conclusion and future work
3TCP and AIMD
- TCP has been very successful in the Internet
- AIMD (Additive Increase Multiplicative Decrease)
- Fair max-min fairness
- Stable globally asynchronously stable
- But, inefficient and not scalable
- In grid networks (with high bandwidth-delay
product) - RTT bias
4Efficiency of TCP
1 Gb/s link, 200ms RTT, between Tokyo and Chicago
28 minutes
On 10 Gb/s link, 200ms RTT, it will take 4 hours
43 minutes to recover from a single loss.
TCPs throughput model It needs extremely low
loss rate on high bandwidth-delay product
networks.
5Fairness of TCP
Merge two real-time data streams From Chicago 1
to Chicago 2 800Mbps From Amsterdam to Chicago
2 80Mbps The throughput is limited by the
slowest stream!
Amsterdam
100ms 1 Gb/s
Chicago 1
1ms 1Gb/s
Chicago 2
6UDT UDP-based Data Transfer Protocol
- Application level transport protocol built above
UDP - Reliable data delivery
- End-to-end approach
- Bi-directional
- General transport API not a (file transfer)
tool. - Open source
7UDT Architecture
8UDT Objectives
- Goals
- Easy to install and use
- Efficient for bulk data transfer
- Fair
- Friendly to TCP
- Non-goals
- TCP replacement
- Messaging service
9Design Issues
- Reliability/Acknowledging
- Congestion/Flow Control
- Performance evaluation
- Efficiency
- Fairness and friendliness
- Stability
10Reliability/Acknowledging
- Acknowledging is expensive
- Packet processing at end hosts and routers
- Buffer processing
- Timer-based selective acknowledgement
- Send acknowledgement per constant time (if there
are packets to be acknowledged) - Explicit negative acknowledgement
11Congestion Control
- AIMD with decreasing increases
- Increase formula
- Decrease
- 1/9
- Control interval is constant
- SYN 0.01 second
12UDT Algorithm
L 10 Gbps, S 1500 bytes
C (Mbps) L - C (Mbps) Increment (pkts/SYN)
0, 9000) (1000, 10000 10
9000, 9900) (100, 1000 1
9900, 9990) (10, 100 0.1
9990, 9999) (1, 10 0.01
9999, 9999.9) (0.1, 1 0.001
9999.9 lt0.1 0.00067
13UDT Efficiency and Fairness Characteristics
- Takes 7.5 seconds to reach 90 of the link
capacity, independent of BDP - Satisfies max-min fairness if all the flows have
the same end-to-end link capacity - Otherwise, any flow will obtain at least half of
its fair share - Does not take more bandwidth than concurrent TCP
flow as long as
14Efficiency
- UDT bandwidth utilization
- 960Mb/s on 1Gb/s
- 580Mb/s on OC-12 (622Mb/s)
15Fairness
- Fair bandwidth sharing between networks with
different RTTs and bottleneck capacities - 330 Mb/s each for the 3 flows from Chicago to
Chicago Local via 1Gb/s, Amsterdam via 1Gb/s and
Ottawa via 622Mb/s
16Fairness
- Fairness index
- Simulation Jains Fairness Index for 10 UDT and
TCP flows over 100Mb/s link with different RTTs
17RTT Fairness
- Fairness index of TCP flows with different RTTs
- 2 flows, one has 1ms RTT, the other varies from
1ms to 1000ms
18Fairness and Friendliness
50 TCP flows and 4 UDT flows between SARA and
StarLight Realtime snapshot of the throughput The
4 UDT flows have similar performance and leave
enough space for TCP flows
19TCP Friendliness
- Impact on short life TCP flows
- 500 1MB TCP flows with 1-10 bulk UDT flows, over
1Gb/s link between Chicago and Amsterdam
20Stability
- Stability index of UDT and TCP
- Stability average standard deviation of
throughout per unit time - 10 UDT flows and 10 TCP flows with different RTTs
21Implementations Issues
- Efficiency and CPU utilization
- Loss information processing
- Memory management
- API
- Conformance
22Efficiency and CPU utilization
- Efficiency Mbps/MHz
- Maximize throughput
- Use CPU time as little as possible, so that CPU
wont be used up before network bottleneck is
reached - Remove CPU burst, which can cause packet loss
even distribution of processing - Minimize CPU utilization
23Loss Processing
- On high BDP networks, the number of lost packets
can be very large during a loss event - Access to the loss information may take long time
- Acknowledge may take several packets
24Loss Processing
- UDT loss processing
- Most loss are continuous
- Record loss event other than lost packets
- Access time is almost constant
25Memory Processing
- Memory copy avoidance
- Overlapped IO
- Data scattering/gathering
- Speculation of next packet
New Data
User Buffer
Data
Protocol Buffer
Protocol Buffer
26API
- Socket-like API
- Support overlapped IO
- File transfer API
- sendfile/recvfile
- Thread safe
- Performance monitoring
27API - Example
int client socket(AF_INET, SOCK_STREAM, 0)
connect(client, (sockaddr)serv_addr,
sizeof(serv_addr)) If (-1 send(client, data,
size, 0)) //error processing
UDTSOCKET client UDTsocket(AF_INET,
SOCK_STREAM, 0) UDTconnect(client,
(sockaddr)serv_addr, sizeof(serv_addr)) If
(UDTERROR UDTsend(client, data, size, 0))
//error processing
28Implementation Efficiency
- CPU usage of UDT and TCP
- UDT takes about 10 more CPU than TCP
- More code optimizations are still on going
29Conclusion
- TCP is not suitable for distributed data
intensive applications over grid networks - We introduced a new application level protocol
named UDT, to overcome the shortcomings of TCP - We explained the design rationale and
implementations details in this paper
30Future Work
- Bandwidth Estimation
- CPU utilization
- Self-clocking
- Code optimization
- Theoretical work
31References
- More details can be found in our paper.
- UDT specification
- Draft-gg-udt-01.txt
- Congestion control
- Paper on Gridnets '04 workshop
- UDT open source project
- http//udt.sf.net
32Thank you!
- Questions and comments are welcome!
- For more information, please visit
- Booth 653 (UIC/NCDM) at Exhibition Floor
- UDT Project http//udt.sf.net
- NCDM http//www.ncdm.uic.edu