Title: LAIO: Lazy Asynchronous I/O For Event Driven Servers
1LAIO Lazy Asynchronous I/O For Event Driven
Servers
- Khaled Elmeleegy
- Alan L. Cox
2Outline
- Available I/O APIs and their shortcomings.
- Event driven programming and its challenges.
- Lazy Asynchronous I/O (LAIO).
- Experiments and results.
- Conclusions.
3Key Idea
- Existing I/O APIs come short of event driven
server needs. - LAIO fixes that.
4Non-Blocking I/O
- System call may return without fully completing
the operation. - Ex write to a socket.
- System call may also return with completion.
- Disadvantages
- Not available for disk operations.
- Program using it needs to maintain state.
5Asynchronous I/O (AIO)
- System call returns immediately.
- Operation always runs to completion and sends
notification on completion. - Via signal, event or polling.
- Disadvantages
- Missing disk operations like open and stat.
- Always receive completion via a notification even
if the operation didnt block. - Lower performance.
6Event Driven Programming with I/O
(What we have)
event_loop(..) while(true)
event_list get available events for each
event ev in event_list do call handler of
ev
handler() / do stuff 1 / open(..)
/may block/ / do stuff 2 / return /
to event_loop /
7Event Driven Programming with I/O
(What we have)
If Blocks Server Stalls
event_loop(..) while(true)
event_list get available events for each
event ev in event_list do call handler of
ev
handler() / do stuff 1 / open(..)
/may block/ / do stuff 2 / return /
to event_loop /
8Event Driven Programming with I/O
(What we want)
handler1() / do stuff 1 / open(..)
/may block/ if open blocks set handler2
as callback for open return / to
event_loop / / do stuff 2 / return
/ to event_loop /
event_loop(..) while(true)
event_list get available events for each
event ev in event_list do call
event_handler of ev
9Event Driven Programming with I/O
(What we want)
handler1() / do stuff 1 / open(..)
/may block/ if open blocks set handler2
as callback for open return / to
event_loop / / do stuff 2 / return
/ to event_loop /
event_loop(..) while(true)
event_list get available events for each
event ev in event_list do call
event_handler of ev
10Event Driven Programming with I/O
(What we want)
handler1() / do stuff 1 / open(..)
/may block/ if open blocks set handler2
as callback for open return / to
event_loop / / do stuff 2 / return
/ to event_loop /
event_loop(..) while(true)
event_list get available events for each
event ev in event_list do call
event_handler of ev
handler2() / do stuff 2 / return
/ to event_loop /
11Lazy Asynchronous I/O (LAIO)
- Like AIO on blocking asynchronous completion
notification. - Also like AIO operations are done in one shot and
no partial completions. - Similar to non-blocking I/O if operations
completes without blocking. - Scheduler activation based.
- Scheduler activation is an upcall delivered by
kernel when a thread blocks or unblocks.
12LAIO API
Function Name Description
int laio_syscall (int number,) Performs the specified syscall asynchronously.
void laio_gethandle (void) Returns a handle to the last laio operation.
laio_list laio_poll (void) Returns a list of handles to completed laio operations.
13laio_syscall(int number, )
Invoked via kernel upcall
- Enable upcalls.
- Save context
- Invoke system call
upcall_handler(..)
. . . Steals old stack using stored context
Yes
System call blocks?
No
- Disable upcalls
- Return retval
- errno EINPROGRESS
- Return -1
14Experiments and Experimental setup.
- Performance evaluated using both micro-benchmarks
and event driven web servers (thttpd and Flash). - Pentium Xeon 2.4 GZ with 2 GB RAM machines.
- FreeBSD-5 with KSE, FreeBSDs scheduler
activation implementation. - Two web traces, Rice and Berkeley, with working
set sizes 1.1 GB and 6.4 GB respectively.
15Micro-benchmarks
- Read a byte from a pipe 100,000 times two cases
blocking and non-blocking - For non-blocking (byte ready on pipe)
- LAIO is 320 faster than AIO.
- LAIO is 40 slower than non-blocking I/O.
- For blocking (byte not ready on pipe)
- AIO is 8 faster than LAIO.
- Call getpid(2) 1,000,000 times in two cases KSE
enabled and disabled. - When disabled program was 5 faster (KSE overhead)
16thttpd Experiments
- thttpd is an event driven server modified to use
libevent an event notification library. - Two versions of thttpd, libevent-thttpd and
LAIO-thttpd. - For LAIO-thttpd, thttpd was modified by breaking
up event handlers around blocking operations like
open.
17thttpd Results (Berkeley Throughput)
18thttpd Results (Berkeley Response Time)
19thttpd Results (Rice Throughput)
20thttpd Results (Rice Response Time)
21thttpd Results (Rice Throughput 512 MB RAM)
22thttpd Results (Rice Response Time 512 MB RAM)
23Flash
- An event driven web server.
- 3 flavors
- Pure event driven.
- AMPED Asymmetric Multiprocess Event Driven.
- Event driven core.
- Potentially blocking I/O handed off to a helper
process. - Helper does an explicit read to bring data in
memory. - LAIO uses LAIO to do all I/O asynchronously.
- For each of the three flavors files are sent
either with sendfile(2), or using mmap(2).
24Flash Experiments
- All experiments are done with 500 clients.
- All sockets are blocking.
- For mmap File maped to memory, then written to
socket. - Page faults may happen.
- mincore(2) is used to check if pages are in
memory. - For sendfile File is sent via the sendfile(2)
syscall which may block. - Optimized sendfile Kernel is modified that
sendfile returns if blocking on disk occurs.
25Flash Throughput(mmap)
Configuration Flash-event (mmap) FLASH-AMPED (mmap) FLASH-LAIO (mmap)
Rice-Cold 203 Mbps 386 Mbps 299 Mbps
Rice-Warm 830 Mbps 800 Mbps 797 Mbps
Berkeley-Cold 81 Mbps 134 Mbps 132 Mbps
Berkeley-Warm 78 Mbps 127 Mbps 131 Mbps
- For Rice-Cold 41072 callouts to the helper
process for AMPED. For LAIO 46486 page faults. - Performance difference is due to prefetching.
26Flash Throughput(sendfile)
Configuration Flash-event (sendfile) FLASH-AMPED (sendfile) FLASH-LAIO (sendfile)
Rice-Cold 277 Mbps 398 Mbps 382 Mbps
Rice-Warm 845 Mbps 843 Mbps 815 Mbps
Berkeley-Cold 122 Mbps 171 Mbps 171 Mbps
Berkeley-Warm 125 Mbps 180 Mbps 179 Mbps
27Conclusions
- LAIO subdues shortcomings of other I/O APIs.
- LAIO is more than 3 times faster than AIO when
data is in memory. - LAIO serves well event driven servers.
- LAIO increased thttpd throughput by 38.
- LAIO matched Flash performance with no kernel
modifications.
28