Assessment%20of%20Data%20Path%20Implementations%20for%20Download%20and%20Streaming - PowerPoint PPT Presentation

About This Presentation
Title:

Assessment%20of%20Data%20Path%20Implementations%20for%20Download%20and%20Streaming

Description:

Drivers. Kernel. User Space. System support for improved ... Multimedia (game and video) servers. Some current areas. protocols for interactive applications ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 36
Provided by: paa13
Category:

less

Transcript and Presenter's Notes

Title: Assessment%20of%20Data%20Path%20Implementations%20for%20Download%20and%20Streaming


1
Assessment of Data Path Implementations for
Download and Streaming
Pål Halvorsen
2
Overview
  • RELAY overview???
  • Existing mechanisms in Linux
  • Tested enhancements
  • Ongoing
  • Summary and Conclusions

3
RELAYResource Utilization in Large-Scale
Time-Dependent Systems
4
RELAY people
  • Internal
  • Wladimir Palant
  • Knut-Helge Vik
  • Andreas Petlund
  • Håkon Stensland
  • Carsten Griwodz
  • Pål Halvorsen
  • External
  • Svetlana Boudko
  • Haakon Riiser

5
Picture Today
network
network
network
network
P2P
6
RELAY
  • System support for improved resource
    utilization QoS
  • Multimedia (game and video) servers
  • Some current areas
  • protocols for interactive applications
  • multicast group maintenance
  • latency hiding
  • resource availability adaptation
  • hybrid P2P streaming / streaming to mobile
    devices
  • asymmetric multiprocessor scheduling

7
Linux Data Path Implementations
8
Delivery Systems
Network
9
Delivery Systems
10
Intel Hub Architecture
  • several in-memory data movements and context
    switches

Pentium 4 Processor
registers
cache(s)
RDRAM
RDRAM
RDRAM
RDRAM
PCI slots
PCI slots
PCI slots
11
Cost of Data Transfers
  • Data copy operations are expensive
  • consume CPU, memory, hub, bus and interface
    resources (proportional to size)
  • profiling shows that 40 of CPU time is
    consumed by copying data in a disk-network
    scenario
  • speed-gap between memory and CPU increase
  • different access times to different banks
  • System calls makes a lot of switches between user
    and kernel space
  • 450 ns on 933MHz PentiumIII
  • 920 ns on 1.7GHz PentiumIV

12
Observation and Question
A lot of research has been performed in this
area!!!!
BUT, what is the status today of commodity OSes?
IO-Lite
splice
MMBUF
stream
sendfile
.
13
Content Download
bus(es)
14
Content Download read / send
application
application buffer
kernel
copy
copy
page cache
socket buffer
DMA transfer
DMA transfer
  • 2n copy operations
  • 2n system calls

15
Content Download mmap / send
application
kernel
page cache
socket buffer
copy
DMA transfer
DMA transfer
  • n copy operations
  • 1 n system calls

16
Content Download sendfile
application
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
  • 0 copy operations
  • 1 system calls

17
Content Download Results
  • Tested transfer of 1 GB file on Linux 2.6
  • Both UDP (with enhancements) and TCP


UDP
TCP
18
Streaming
bus(es)
19
Streaming read / send
application
application buffer
kernel
copy
copy
page cache
socket buffer
DMA transfer
DMA transfer
  • 2n (3n) copy operations
  • 2n system calls

20
Streaming read / writev
application
application buffer
kernel
copy
copy
copy
page cache
socket buffer
DMA transfer
DMA transfer
  • 3n copy operations
  • 2n system calls

21
Streaming mmap / send
application
application buffer
kernel
copy
page cache
socket buffer
copy
DMA transfer
DMA transfer
  • 2n copy operations
  • 1 4n system calls

22
Streaming mmap / writev
application
application buffer
kernel
copy
page cache
socket buffer
copy
DMA transfer
DMA transfer
  • 2n copy operations
  • 1 n system calls

23
Streaming sendfile
application
application buffer
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
  • n copy operations
  • 4n system calls

24
Streaming Results
  • Tested streaming of 1 GB file on Linux 2.6
  • RTP over UDP

Compared to not sending an RTP header over UDP,
we get an increase of 29 (additional send call)
More copy operations and system calls required ?
potential for improvements
TCP sendfile (content download)
25
Enhanced Streaming Data Paths
26
Enhanced Streaming mmap / msend
application
application buffer
msend allows to send data from an mmaped file
without copy
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
copy
DMA transfer
DMA transfer
  • n copy operations
  • 1 4n system calls

27
Enhanced Streaming mmap / rtpmsend
application
application buffer
RTP header copy integrated into msend system call
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
  • n copy operations
  • 1 n system calls

28
Enhanced Streaming mmap / krtpmsend
application
application buffer
An RTP engine in the kernel adds RTP headers
copy
kernel
gather DMA transfer
RTP engine
page cache
socket buffer
append descriptor
DMA transfer
  • 0 copy operations
  • 1 system call

29
Enhanced Streaming rtpsendfile
application
application buffer
RTP header copy integrated into sendfile system
call
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
  • n copy operations
  • n system calls

30
Enhanced Streaming krtpsendfile
application
application buffer
An RTP engine in the kernel adds RTP headers
copy
kernel
gather DMA transfer
RTP engine
page cache
socket buffer
append descriptor
DMA transfer
  • 0 copy operations
  • 1 system call

31
Enhanced Streaming Results
  • Tested streaming of 1 GB file on Linux 2.6
  • RTP over UDP

mmap based mechanisms
sendfile based mechanisms
Existing mechanism (streaming)
27 improvement
25 improvement
TCP sendfile (content download)
32
Ongoing Work
33
Enhanced Streaming rtpsendfile
application
application buffer
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
  • n copy operations
  • n system calls

? Calls like writev, sendfilev, exist
34
Enhanced Streaming sendfilew
len, off, src_fd, flags
application
application buffer
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
  • Batched system call enabling an arbitrary
    interleaving of blocks from files and user-space
    buffers to be sent as one or more packets

35
Conclusions
  • sendfile works nice for download scenarios
  • Current commodity operating systems still pay a
    high price for streaming services
  • However, small changes in the system call layer
    might be sufficient to remove most of the
    overhead
  • Conclusively, commodity operating systems still
    have potential for improvement with respect to
    streaming support
  • What can we hope to be supported?

36
Questions??
Write a Comment
User Comments (0)
About PowerShow.com