Title: Assessment%20of%20Data%20Path%20Implementations%20for%20Download%20and%20Streaming
1Assessment of Data Path Implementations for
Download and Streaming
Pål Halvorsen
2Overview
- RELAY overview???
- Existing mechanisms in Linux
- Tested enhancements
- Ongoing
- Summary and Conclusions
3RELAYResource Utilization in Large-Scale
Time-Dependent Systems
4RELAY people
- Internal
- Wladimir Palant
- Knut-Helge Vik
- Andreas Petlund
- Håkon Stensland
- Carsten Griwodz
- Pål Halvorsen
- External
- Svetlana Boudko
- Haakon Riiser
5Picture Today
network
network
network
network
P2P
6RELAY
- System support for improved resource
utilization QoS - Multimedia (game and video) servers
-
- Some current areas
- protocols for interactive applications
- multicast group maintenance
- latency hiding
- resource availability adaptation
- hybrid P2P streaming / streaming to mobile
devices - asymmetric multiprocessor scheduling
-
7Linux Data Path Implementations
8Delivery Systems
Network
9Delivery Systems
10Intel Hub Architecture
- several in-memory data movements and context
switches
Pentium 4 Processor
registers
cache(s)
RDRAM
RDRAM
RDRAM
RDRAM
PCI slots
PCI slots
PCI slots
11Cost of Data Transfers
- Data copy operations are expensive
- consume CPU, memory, hub, bus and interface
resources (proportional to size) - profiling shows that 40 of CPU time is
consumed by copying data in a disk-network
scenario - speed-gap between memory and CPU increase
- different access times to different banks
- System calls makes a lot of switches between user
and kernel space - 450 ns on 933MHz PentiumIII
- 920 ns on 1.7GHz PentiumIV
12Observation and Question
A lot of research has been performed in this
area!!!!
BUT, what is the status today of commodity OSes?
IO-Lite
splice
MMBUF
stream
sendfile
.
13Content Download
bus(es)
14Content Download read / send
application
application buffer
kernel
copy
copy
page cache
socket buffer
DMA transfer
DMA transfer
- 2n copy operations
- 2n system calls
15Content Download mmap / send
application
kernel
page cache
socket buffer
copy
DMA transfer
DMA transfer
- n copy operations
- 1 n system calls
16Content Download sendfile
application
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
- 0 copy operations
- 1 system calls
17Content Download Results
- Tested transfer of 1 GB file on Linux 2.6
- Both UDP (with enhancements) and TCP
UDP
TCP
18Streaming
bus(es)
19Streaming read / send
application
application buffer
kernel
copy
copy
page cache
socket buffer
DMA transfer
DMA transfer
- 2n (3n) copy operations
- 2n system calls
20Streaming read / writev
application
application buffer
kernel
copy
copy
copy
page cache
socket buffer
DMA transfer
DMA transfer
- 3n copy operations
- 2n system calls
21Streaming mmap / send
application
application buffer
kernel
copy
page cache
socket buffer
copy
DMA transfer
DMA transfer
- 2n copy operations
- 1 4n system calls
22Streaming mmap / writev
application
application buffer
kernel
copy
page cache
socket buffer
copy
DMA transfer
DMA transfer
- 2n copy operations
- 1 n system calls
23Streaming sendfile
application
application buffer
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
- n copy operations
- 4n system calls
24Streaming Results
- Tested streaming of 1 GB file on Linux 2.6
- RTP over UDP
Compared to not sending an RTP header over UDP,
we get an increase of 29 (additional send call)
More copy operations and system calls required ?
potential for improvements
TCP sendfile (content download)
25Enhanced Streaming Data Paths
26Enhanced Streaming mmap / msend
application
application buffer
msend allows to send data from an mmaped file
without copy
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
copy
DMA transfer
DMA transfer
- n copy operations
- 1 4n system calls
27Enhanced Streaming mmap / rtpmsend
application
application buffer
RTP header copy integrated into msend system call
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
- n copy operations
- 1 n system calls
28Enhanced Streaming mmap / krtpmsend
application
application buffer
An RTP engine in the kernel adds RTP headers
copy
kernel
gather DMA transfer
RTP engine
page cache
socket buffer
append descriptor
DMA transfer
- 0 copy operations
- 1 system call
29Enhanced Streaming rtpsendfile
application
application buffer
RTP header copy integrated into sendfile system
call
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
- n copy operations
- n system calls
30Enhanced Streaming krtpsendfile
application
application buffer
An RTP engine in the kernel adds RTP headers
copy
kernel
gather DMA transfer
RTP engine
page cache
socket buffer
append descriptor
DMA transfer
- 0 copy operations
- 1 system call
31Enhanced Streaming Results
- Tested streaming of 1 GB file on Linux 2.6
- RTP over UDP
mmap based mechanisms
sendfile based mechanisms
Existing mechanism (streaming)
27 improvement
25 improvement
TCP sendfile (content download)
32Ongoing Work
33Enhanced Streaming rtpsendfile
application
application buffer
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
- n copy operations
- n system calls
? Calls like writev, sendfilev, exist
34Enhanced Streaming sendfilew
len, off, src_fd, flags
application
application buffer
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
- Batched system call enabling an arbitrary
interleaving of blocks from files and user-space
buffers to be sent as one or more packets
35Conclusions
- sendfile works nice for download scenarios
- Current commodity operating systems still pay a
high price for streaming services - However, small changes in the system call layer
might be sufficient to remove most of the
overhead - Conclusively, commodity operating systems still
have potential for improvement with respect to
streaming support - What can we hope to be supported?
36Questions??