Title: Remote%20Direct%20Memory%20Access%20(RDMA)%20over%20IP%20PFLDNet%202003,%20Geneva%20Stephen%20Bailey,%20Sandburst%20Corp.,%20steph@sandburst.com%20Allyn%20Romanow,%20Cisco%20Systems,%20allyn@cisco.com
1Remote Direct Memory Access (RDMA) over
IPPFLDNet 2003, GenevaStephen Bailey,
Sandburst Corp., steph_at_sandburst.comAllyn
Romanow, Cisco Systems, allyn_at_cisco.com
2RDDP Is Coming Soon
- ST RDMA Is The Wave Of The Future S Bailey
C Good, CERN 1999 - Need
- standard protocols
- host software
- accelerated NICs (RNICs)
- faster host buses (for gt 1G)
- Vendors are finally serious
- Broadcom, Intel, Agilent, Adaptec, Emulex,
Microsoft, IBM, HP (Compaq, Tandem, DEC), Sun,
EMC, NetApp, Oracle, Cisco many, many others
3Overview
- Motivation
- Architecture
- Open Issues
4CFP SigComm Workshop
- NICELI SigComm 03 Workshop
- Workshop on Network-I/O Convergence Experience,
Lessons, Implications - http//www.acm.org/sigcomm/sigcomm2003/workshop/ni
celi/index.html
5High Speed Data Transfer
- Bottlenecks
- Protocol performance
- Router performance
- End station performance, host processing
- CPU Utilization
- The I/O Bottleneck
- Interrupts
- TCP checksum
- Copies
6What is RDMA?
- Avoids copying by allowing network adapter under
control of application to steer data directly
into application buffers - Bulk data transfer or kernel bypass for small
messages - Grid, cluster, supercomputing, data centers
- Historically, special purpose fabrics Fibre
Channel, VIA, Infiniband, Quadrics, Servernet
7Traditional Data Center
The World
Servers
8Why RDMA over IP? Business Case
- TCP/IP not used for high bandwidth
interconnection, host processing costs too high - High bandwidth transfer to become more prevalent
10 GE, data centers - Special purpose interfaces are expensive
- IP NICs are cheap, volume
9The Technical Problem- I/O Bottleneck
- With TCP/IP host processing cant keep up with
link bandwidth, on receive - Per byte costs dominate, Clark (89)
- Well researched by distributed systems community,
mid 1990s. Industry experience. - Memory bandwidth doesnt scale, processor memory
performance gap Hennessy(97), D.Patterson, T.
Anderson(97), - Stream benchmark
10Copying
- Using IP transports (TCP SCTP) requires data
copying
1
NIC
Packet Buffer
2
Packet Buffer
User Buffer
Data copies
11Why Is Copying Important?
- Heavy resource consumption _at_ high speed (1Gbits/s
and up) - Uses large of available CPU
- Uses large fraction of avail. bus bw min 3
trips across the bus
Test Throughput (Mb/sec) Tx CPUs Rx CPUs
1 GBE, TCP 769 0.5 CPUs 1.2 CPUs
1 Gb/s RDMA SAN - VIA 891 0.2 CPUs 0.2 CPUs
64 KB window, 64 KB I/Os, 2P 600 MHz PIII, 9000 B
MTU
12Whats In RDMA For Us?
- Network I/O becomes free (still have latency
though)
1750 machines using 0 CPU for I/O
2500 machines using 30 CPU for I/O
13Approaches to Copy Reduction
- On-host Special purpose software and/or
hardware e.g., Zero Copy TCP, page flipping - Unreliable, idiosyncratic, expensive
- Memory to memory copies, using network protocols
to carry placement information - Satisfactory experience Fibre Channel, VIA,
Servernet - FOR HARDWARE, not software
14RDMA over IP Standardization
- IETF RDDP Remote Direct Data Placement WG
- http//ietf.org/html.charters/rddp-charter.html
- RDMAC RDMA Consortium
- http//www.rdmaconsortium.org/home
15RDMA over IP Architecture
- Two layers
- DDP Direct Data Placement
- RDMA - control
16Upper and Lower Layers
- ULPs- SDP Sockets Direct Protocol, iSCSI, MPI
- DAFS is standardized NFSv4 on RDMA
- SDP provides SOCK_STREAM API
- Over reliable transport TCP, SCTP
17Open Issues
- Security
- TCP order processing, framing
- Atomic ops
- Ordering constraints performance vs.
predictability - Other transports, SCTP, TCP, unreliable
- Impact on network protocol behaviors
- Next performance bottleneck?
- What new applications?
- Eliminates the need for large MTU (jumbos)?