Web Servers: Implementation and Performance - PowerPoint PPT Presentation

About This Presentation
Title:

Web Servers: Implementation and Performance

Description:

Images: GIF, JPG (logos, banner ads) Usually automatically retrieved ... Run an arbitrary program (e.g., stock trade) Hundreds or thousands of simultaneous clients ... – PowerPoint PPT presentation

Number of Views:233
Avg rating:3.0/5.0
Slides: 143
Provided by: EricH50
Category:

less

Transcript and Presenter's Notes

Title: Web Servers: Implementation and Performance


1
Web Servers Implementation and Performance
  • Erich Nahum

IBM T.J. Watson Research Center www.research.ibm
.com/people/n/nahum nahum_at_us.ibm.com
2
Contents of This Tutorial
  • Introduction to HTTP
  • HTTP Servers
  • Outline of an HTTP Server Transaction
  • Server Models Processes, Threads, Events
  • Event Notification Asynchronous I/O
  • HTTP Server Workloads
  • Workload Characteristics
  • Workload Generation
  • Server TCP Issues
  • Introduction to TCP
  • Server TCP Dynamics
  • Server TCP Implementation Issues
  • Other Issues (time permitting)
  • Large Site Studies
  • Clusters
  • Running Experiments
  • Brief Overview of Other Topics

3
Things Not Covered in Tutorial
  • Client-side issues DNS, HTML rendering
  • Proxies some similarities, many differences
  • Dynamic Content CGI, PHP, ASP, etc.
  • QoS for Web Servers
  • SSL/TLS and HTTPS
  • Content Distribution Networks (CDNs)
  • Security and Denial of Service

If time is available, may cover briefly at the end
4
Assumptions and Expectations
  • Some familiarity with WWW as a user
  • (Has anyone here not used a browser?)
  • Some familiarity with networking concepts
  • (e.g., unreliability, reordering, race
    conditions)
  • Familiarity with systems programming
  • (e.g., know what sockets, hashing, caching are)
  • Examples will be based on C Unix
  • taken from BSD, Linux, AIX, and real servers
  • (sorry, Java and Windows fans)

5
Objectives and Takeaways
After this tutorial, hopefully we will all know
  • Basics of server implementation performance
  • Pros and cons of various server architectures
  • Difficulties in workload generation
  • Interactions between HTTP and TCP
  • Design loop of implement, measure, profile,
    debug, and fix

Many lessons should be applicable to any
networked server, e.g., files, mail, news, DNS,
LDAP, etc.
6
Timeline
  • Intro, HTTP, server transaction 40 min
  • Server models, event notification 40 min
  • Workload characterization generation 40 min
  • Intro to TCP, dynamics, implementation 40 min
  • Clusters, large site studies, experiments 30 min
  • Other topics time permitting

7
Acknowledgements
Many people contributed comments and suggestions
to this tutorial, including
Abhishek Chandra Mark Crovella Suresh Chari Peter
Druschel Jim Kurose
Balachander Krishnamurthy Vivek Pai Jennifer
Rexford Anees Shaikh
Errors are all mine, of course.
8
Chapter 1 Introduction to HTTP
9
Introduction to HTTP
http request
http request
http response
http response
Laptop w/ Netscape
Desktop w/ Explorer
Server w/ Apache
  • HTTP Hypertext Transfer Protocol
  • Communication protocol between clients and
    servers
  • Application layer protocol for WWW
  • Client/Server model
  • Client browser that requests, receives, displays
    object
  • Server receives requests and responds to them
  • Protocol consists of various operations
  • Few for HTTP 1.0 (RFC 1945, 1996)
  • Many more in HTTP 1.1 (RFC 2616, 1999)

10
How are Requests Generated?
  • User clicks on something
  • Uniform Resource Locator (URL)
  • http//www.nytimes.com
  • https//www.paymybills.com
  • ftp//ftp.kernel.org
  • news//news.deja.com
  • telnet//gaia.cs.umass.edu
  • mailtonahum_at_us.ibm.com
  • Different URL schemes map to different services
  • Hostname is converted from a name to a 32-bit IP
    address (DNS resolve)
  • Connection is established to server

Most browser requests are HTTP requests.
11
What Happens Then?
  • Client downloads HTML document
  • Sometimes called container page
  • Typically in text format (ASCII)
  • Contains instructions for rendering
  • (e.g., background color, frames)
  • Links to other pages
  • Many have embedded objects
  • Images GIF, JPG (logos, banner ads)
  • Usually automatically retrieved
  • I.e., without user involvement
  • can control sometimes
  • (e.g. browser options, junkbusters)

lthtmlgt ltheadgt ltmeta nameAuthor contentErich
Nahumgt lttitlegt Linux Web Server Performance
lt/titlegt lt/headgt ltbody text00000gt ltimg
width31 height11 srcibmlogo.gifgt ltimg
srcimages/new.gifgt lth1gtHi There!lt/h1gt Heres
lots of cool linux stuff! lta hrefmore.htmlgt Cli
ck herelt/agt for more! lt/bodygt lt/htmlgt
sample html file
12
So Whats a Web Server Do?
  • Respond to client requests, typically a browser
  • Can be a proxy, which aggregates client requests
    (e.g., AOL)
  • Could be search engine spider or custom (e.g.,
    Keynote)
  • May have work to do on clients behalf
  • Is the clients cached copy still good?
  • Is client authorized to get this document?
  • Is client a proxy on someone elses behalf?
  • Run an arbitrary program (e.g., stock trade)
  • Hundreds or thousands of simultaneous clients
  • Hard to predict how many will show up on some day
  • Many requests are in progress concurrently

Server capacity planning is non-trivial.
13
What do HTTP Requests Look Like?
GET /images/penguin.gif HTTP/1.0 User-Agent
Mozilla/0.9.4 (Linux 2.2.19) Host
www.kernel.org Accept text/html, image/gif,
image/jpeg Accept-Encoding gzip Accept-Language
en Accept-Charset iso-8859-1,,utf-8 Cookie
Bxh203jfsf Y3sdkfjej ltcrgtltlfgt
  • Messages are in ASCII (human-readable)
  • Carriage-return and line-feed indicate end of
    headers
  • Headers may communicate private information
  • (browser, OS, cookie information, etc.)

14
What Kind of Requests are there?
  • Called Methods
  • GET retrieve a file (95 of requests)
  • HEAD just get meta-data (e.g., mod time)
  • POST submitting a form to a server
  • PUT store enclosed document as URI
  • DELETE removed named resource
  • LINK/UNLINK in 1.0, gone in 1.1
  • TRACE http echo for debugging (added in 1.1)
  • CONNECT used by proxies for tunneling (1.1)
  • OPTIONS request for server/proxy options (1.1)

15
What Do Responses Look Like?
HTTP/1.0 200 OK Server Tux 2.0 Content-Type
image/gif Content-Length 43 Last-Modified Fri,
15 Apr 1994 023621 GMT Expires Wed, 20 Feb
2002 185446 GMT Date Mon, 12 Nov 2001 142948
GMT Cache-Control no-cache Pragma
no-cache Connection close Set-Cookie
PAwefj2we0-jfjf ltcrgtltlfgt ltdata followsgt
  • Similar format to requests (i.e., ASCII)

16
What Responses are There?
  • 1XX Informational (defd in 1.0, used in 1.1)
  • 100 Continue, 101 Switching Protocols
  • 2XX Success
  • 200 OK, 206 Partial Content
  • 3XX Redirection
  • 301 Moved Permanently, 304 Not Modified
  • 4XX Client error
  • 400 Bad Request, 403 Forbidden, 404 Not Found
  • 5XX Server error
  • 500 Internal Server Error, 503 Service
    Unavailable, 505 HTTP Version Not Supported

17
What are all these Headers?
Specify capabilities and properties
  • General
  • Connection, Date
  • Request
  • Accept-Encoding, User-Agent
  • Response
  • Location, Server type
  • Entity
  • Content-Encoding, Last-Modified
  • Hop-by-hop
  • Proxy-Authenticate, Transfer-Encoding

Server must pay attention to respond properly.
18
Summary Introduction to HTTP
  • The major application on the Internet
  • Majority of traffic is HTTP (or HTTP-related)
  • Client/server model
  • Clients make requests, servers respond to them
  • Done mostly in ASCII text (helps debugging!)
  • Various headers and commands
  • Too many to go into detail here
  • Well focus on common server ones
  • Many web books/tutorials exist (e.g.,
    Krishnamurthy Rexford 2001)

19
Chapter 2 Outline of a Typical HTTP Transaction
20
Outline of an HTTP Transaction
  • In this section we go over the basics of
    servicing an HTTP GET request from user space
  • For this example, we'll assume a single process
    running in user space, similar to Apache 1.3
  • At each stage see what the costs/problems can be
  • Also try to think of where costs can be optimized
  • Well describe relevant socket operations as we
    go

initialize forever do get request
process send response log request
server in a nutshell
21
Readying a Server
s socket() / allocate listen socket
/ bind(s, 80) / bind to TCP port 80
/ listen(s) / indicate willingness to
accept / while (1) newconn accept(s) /
accept new connection /b
  • First thing a server does is notify the OS it is
    interested in WWW server requests these are
    typically on TCP port 80. Other services use
    different ports (e.g., SSL is on 443)
  • Allocate a socket and bind()'s it to the address
    (port 80)
  • Server calls listen() on the socket to indicate
    willingness to receive requests
  • Calls accept() to wait for a request to come in
    (and blocks)
  • When the accept() returns, we have a new socket
    which represents a new connection to a client

22
Processing a Request
remoteIP getsockname(newconn) remoteHost
gethostbyname(remoteIP) gettimeofday(currentTime)
read(newconn, reqBuffer, sizeof(reqBuffer)) req
Info serverParse(reqBuffer)
  • getsockname() called to get the remote host name
  • for logging purposes (optional, but done by most)
  • gethostbyname() called to get name of other end
  • again for logging purposes
  • gettimeofday() is called to get time of request
  • both for Date header and for logging
  • read() is called on new socket to retrieve
    request
  • request is determined by parsing the data
  • GET /images/jul4/flag.gif

23
Processing a Request (cont)
fileName parseOutFileName(requestBuffer) fileAt
tr stat(fileName) serverCheckFileStuff(fileName
, fileAttr) open(fileName)
  • stat() called to test file path
  • to see if file exists/is accessible
  • may not be there, may only be available to
    certain people
  • "/microsoft/top-secret/plans-for-world-domination.
    html"
  • stat() also used for file meta-data
  • e.g., size of file, last modified time
  • "Have plans changed since last time I checked?
  • might have to stat() multiple files just to get
    to end
  • e.g., 4 stats in bill g example above
  • assuming all is OK, open() called to open the file

24
Responding to a Request
read(fileName, fileBuffer) headerBuffer
serverFigureHeaders(fileName, reqInfo) write(newS
ock, headerBuffer) write(newSock,
fileBuffer) close(newSock) close(fileName) writ
e(logFile, requestInfo)
  • read() called to read the file into user space
  • write() is called to send HTTP headers on socket
  • (early servers called write() for each header!)
  • write() is called to write the file on the
    socket
  • close() is called to close the socket
  • close() is called to close the open file
    descriptor
  • write() is called on the log file

25
Optimizing the Basic Structure
  • As we will see, a great deal of locality exists
    in web requests and web traffic.
  • Much of the work described above doesn't really
    need to be performed each time.
  • Optimizations fall under 2 categories caching
    and custom OS primitives.

26
Optimizations Caching
Idea is to exploit locality in client requests.
Many files are requested over and over (e.g.,
index.html).
fileDescriptor lookInFDCache(fileName) metaI
nfo lookInMetaInfoCache(fileName) headerBuff
er lookInHTTPHeaderCache(fileName)
  • Why open and close files over and over again?
    Instead, cache open file FDs, manage them LRU.
  • Again, cache HTTP header info on a per-url basis,
    rather than re-generating info over and over.
  • Why stat them again and again? Cache path name
    and access characteristics.

27
Optimizations Caching (cont)
  • Instead of reading and writing the data, cache
    data, as well as meta-data, in user space

fileData lookInFileDataCache(fileName) fileD
ata lookInMMapCache(fileName) remoteHostName
lookRemoteHostCache(fileName)
  • Even better, mmap() the file so that two copies
    don't exist in both user and kernel space
  • Since we see the same clients over and over,
    cache the reverse name lookups (or better yet,
    don't do resolves at all, log only IP addresses)

28
Optimizations OS Primitives
  • Rather than call accept(), getsockname()
    read(), add a new primitive, acceptExtended(),
    which combines the 3 primitives

acceptExtended(listenSock, newSock, readBuffer,
remoteInfo) currentTime mappedTimePointer
buffer0 firstHTTPHeader buffer1
secondHTTPHeader buffer2 fileDataBuffer writ
ev(newSock, buffer, 3)
  • Instead of calling gettimeofday(), use a
    memory-mapped counter that is cheap to access (a
    few instructions rather than a system call)
  • Instead of calling write() many times, use
    writev()

29
OS Primitives (cont)
  • Rather than calling read() write(), or write()
    with an mmap()'ed file, use a new primitive
    called sendfile() (or transmitfile()). Bytes
    stay in the kernel.
  • While we're at it, add a header option to
    sendfile() so that we don't have to call write()
    at all.

httpInfo cacheLookup(reqBuffer) sendfile(newCon
n, httpInfo-gtheaders, httpInfo-gtfileDescriptor
, OPT_CLOSE_WHEN_DONE)
  • Also add an option to close the connection so
    that we don't have to call close() explicitly.

All this assumes proper OS support. Most have it
these days.
30
An Accelerated Server Example
acceptex(socket, newConn, reqBuffer,
remoteHostInfo) httpInfo cacheLookup(reqBuffer)
sendfile(newConn, httpInfo-gtheaders,
httpInfo-gtfileDescriptor, OPT_CLOSE_WHEN_DONE)
write(logFile, requestInfo)
  • acceptex() is called
  • gets new socket, request, remote host IP address
  • string match in hash table is done to parse
    request
  • hash table entry contains relevant meta-data,
    including modification times, file descriptors,
    permissions, etc.
  • sendfile() is called
  • pre-computed header, file descriptor, and close
    option
  • log written back asynchronously (buffered
    write()).

Thats it!
31
Complications
  • Much of this assumes sharing is easy
  • but, this is dependent on the server
    architectural model
  • if multiple processes are being used, as in
    Apache, it is difficult to share data structures.
  • Take, for example, mmap()
  • mmap() maps a file into the address space of a
    process.
  • a file mmap'ed in one address space cant be
    re-used for a request for the same file served by
    another process.
  • Apache 1.3 does use mmap() instead of read().
  • in this case, mmap() eliminates one data copy
    versus a separate read() write() combination,
    but process will still need to open() and close()
    the file.

32
Complications (cont)
  • Similarly, meta-data info needs to be shared
  • e.g., file size, access permissions, last
    modified time, etc.
  • While locality is high, cache misses can and do
    happen sometimes
  • if previously unseen file requested, process can
    block waiting for disk.
  • OS can impose other restrictions
  • e.g., limits on number of open file descriptors.
  • e.g., sockets typically allow buffering about 64
    KB of data. If a process tries to write() a 1 MB
    file, it will block until other end receives the
    data.
  • Need to be able to cope with the misses without
    slowing down the hits

33
Summary Outline of a Typical HTTP Transaction
  • A server can perform many steps in the process of
    servicing a request
  • Different actions depending on many factors
  • e.g., 304 not modified if client's cached copy is
    good
  • e.g., 404 not found, 401 unauthorized
  • Most requests are for small subset of data
  • well see more about this in the Workload section
  • we can leverage that fact for performance
  • Architectural model affects possible
    optimizations
  • well go into this in more detail in the next
    section

34
Chapter 3 Server Architectural Models
35
Server Architectural Models
  • Several approaches to server structure
  • Process based Apache, NCSA
  • Thread-based JAWS, IIS
  • Event-based Flash, Zeus
  • Kernel-based Tux, AFPA, ExoKernel
  • We will describe the advantages and disadvantages
    of each.
  • Fundamental tradeoffs exist between performance,
    protection, sharing, robustness, extensibility,
    etc.

36
Process Model (ex Apache)
  • Process created to handle each new request
  • Process can block on appropriate actions,
  • (e.g., socket read, file read, socket write)
  • Concurrency handled via multiple processes
  • Quickly becomes unwieldy
  • Process creation is expensive.
  • Instead, pre-forked pool is created.
  • Upper limit on of processes is enforced
  • First by the server, eventually by the operating
    system.
  • Concurrency is limited by upper bound

37
Process Model Pros and Cons
  • Advantages
  • Most importantly, consistent with programmer's
    way of thinking. Most programmers think in terms
    of linear series of steps to accomplish task.
  • Processes are protected from one another can't
    nuke data in some other address space.
    Similarly, if one crashes, others unaffected.
  • Disadvantages
  • Slow. Forking is expensive, allocating stack, VM
    data structures for each process adds up and puts
    pressure on the memory system.
  • Difficulty in sharing info across processes.
  • Have to use locking.
  • No control over scheduling decisions.

38
Thread Model (Ex JAWS)
  • Use threads instead of processes. Threads
    consume fewer resources than processes (e.g.,
    stack, VM allocation).
  • Forking and deleting threads is cheaper than
    processes.
  • Similarly, pre-forked thread pool is created.
    May be limits to numbers but hopefully less of an
    issue than with processes since fewer resources
    required.

39
Thread Model Pros and Cons
  • Advantages
  • Faster than processes. Creating/destroying
    cheaper.
  • Maintains programmer's way of thinking.
  • Sharing is enabled by default.
  • Disadvantages
  • Less robust. Threads not protected from each
    other.
  • Requires proper OS support, otherwise, if one
    thread blocks on a file read, will block all the
    address space.
  • Can still run out of threads if servicing many
    clients concurrently.
  • Can exhaust certain per-process limits not
    encountered with processes (e.g., number of open
    file descriptors).
  • Limited or no control over scheduling decisions.

40
Event Model (Ex Flash)
while (1) accept new connections until none
remaining call select() on all active file
descriptors for each FD if (fd ready for
reading) call read() if (fd ready for
writing) call write()
  • Use a single process and deal with requests in a
    event-driven manner, like a giant switchboard.
  • Use non-blocking option (O_NDELAY) on sockets, do
    everything asynchronously, never block on
    anything, and have OS notify us when something is
    ready.

41
Event-Driven Pros and Cons
  • Advantages
  • Very fast.
  • Sharing is inherent, since theres only one
    process.
  • Don't even need locks as in thread models.
  • Can maximize concurrency in request stream
    easily.
  • No context-switch costs or extra memory
    consumption.
  • Complete control over scheduling decisions.
  • Disadvantages
  • Less robust. Failure can halt whole server.
  • Pushes per-process resource limits (like file
    descriptors).
  • Not every OS has full asynchronous I/O, so can
    still block on a file read. Flash uses helper
    processes to deal with this (AMPED architecture).

42
In-Kernel Model (Ex Tux)
user/ kernel boundary
user/ kernel boundary
user-space server
kernel-space server
  • Dedicated kernel thread for HTTP requests
  • One option put whole server in kernel.
  • More likely, just deal with static GET requests
    in kernel to capture majority of requests.
  • Punt dynamic requests to full-scale server in
    user space, such as Apache.

43
In-Kernel Model Pros and Cons
  • In-kernel event model
  • Avoids transitions to user space, copies across
    u-k boundary, etc.
  • Leverages already existing asynchronous
    primitives in the kernel (kernel doesn't block on
    a file read, etc.)
  • Advantages
  • Extremely fast. Tight integration with kernel.
  • Small component without full server optimizes
    common case.
  • Disadvantages
  • Less robust. Bugs can crash whole machine, not
    just server.
  • Harder to debug and extend, since kernel
    programming required, which is not as well-known
    as sockets.
  • Similarly, harder to deploy. APIs are
    OS-specific (Linux, BSD, NT), whereas sockets
    threads are (mostly) standardized.
  • HTTP evolving over time, have to modify kernel
    code in response.

44
So Whats the Performance?
  • Graph shows server throughput for Tux, Flash, and
    Apache.
  • Experiments done on 400 MHz P/II, gigabit
    Ethernet, Linux 2.4.9-ac10, 8 client machines,
    WaspClient workload generator
  • Tux is fastest, but Flash close behind

45
Summary Server Architectures
  • Many ways to code up a server
  • Tradeoffs in speed, safety, robustness, ease of
    programming and extensibility, etc.
  • Multiple servers exist for each kind of model
  • Not clear that a consensus exists.
  • Better case for in-kernel servers as devices
  • e.g. reverse proxy accelerator, Akamai CDN node
  • User-space servers have a role
  • OS should provides proper primitives for
    efficiency
  • Leave HTTP-protocol related actions in user-space
  • In this case, event-driven model is attractive
  • Key pieces to a fast event-driven server
  • Minimize copying
  • Efficient event notification mechanism

46
Chapter 4 Event Notification
47
Event Notification Mechanisms
  • Recall how Flash works
  • One process, many FD's, calling select() on all
    active socket descriptors.
  • All sockets are set using O_NDELAY flag
    (non-blocking)
  • Single address space aids sharing for performance
  • File reads and writes don't have non-blocking
    support, thus helper processes (AMPED
    architecture)
  • Point is to exploit concurrency/parallelism
  • Can read one socket while waiting to write on
    another
  • Event notification
  • Mechanism for kernel and application to notify
    each other of interesting/important events
  • E.g., connection arrivals, socket closes, data
    available to read, space available for writing

48
State-Based Select Poll
  • select() and poll()
  • State-based Is socket ready for reading/writing?
  • select() interface has FD_SET bitmasks turned
    on/off based on interest
  • poll() is simple array, larger structure but
    simpler implementation
  • Performance costs
  • Kernel scans O(N) descriptors to set bits
  • User application scans O(N) descriptors
  • select() bit manipulation can be expensive
  • Problems
  • Traffic is bursty, connections not active all at
    once
  • (active connections) ltlt (open connections).
  • Costs are O(total connections), not O(active
    connections)
  • Application keeps specifying interest set
    repeatedly

49
Event-Based Notification
Banga, Mogul Druschel (USENIX 99)
  • Propose an event based approach, rather than
    state-based
  • Something just happened on socket X, rather than
    socket X is ready for reading or writing
  • Server takes event as indication socket might be
    ready
  • Multiple events can happen on a single socket
    (e.g., packets draining (implying writeable) or
    accumulating (readable))
  • API has following
  • Application notifies kernel by calling
    declare_interest() once per file descriptor
    (e.g., after accept()), rather than multiple
    times like in select()/poll()
  • Kernel queues events internally
  • Application calls get_next_event() to see changes

50
Event-Based Notification (cont)
  • Problems
  • Kernel has to allocate storage for event queue.
    Little's law says it needs to be proportional to
    the event rate
  • Bursty applications could overflow queue
  • Can address multiple events by coalescing based
    on FD
  • Results in storage O(total connections).
  • Application has to change the way it thinks
  • Respond to events, instead of checking state.
  • If events are missed, connections might get
    stuck.
  • Evaluation shows it scales nicely
  • cost is O(active) not O(total)
  • Windows NT has something similar
  • called IO completion ports

51
Notification in the Real World
  • POSIX Real-Time Signals
  • Different concept Unix signals are invoked when
    something is ready on a file descriptor.
  • Signals are expensive and difficult to control
    (e.g., no ordering), so applications can suppress
    signals and then retrieve them via sigwaitinfo()
  • If signal queue fills up, events will be dropped.
    A separate signal is raised to notify application
    about signal queue overflow.
  • Problems
  • If signal queue overflows, then app must fall
    back on state-based approach. Chandra and
    Mosberger propose signal-per-fd (coalescing
    events per file descriptor).
  • Only one event is retrieved at a time Provos and
    Lever propose sigtimedwait4() to retrieve
    multiple signals at once

52
Notification in the Real World
  • Sun's /dev/poll
  • App notifies kernel by writing to special file
    /dev/poll to express interest
  • App does IOCTL on /dev/poll for list of ready
    FD's
  • App and kernel are still both state based
  • Kernel still pays O(total connections) to create
    FD list
  • Libenzis /dev/epoll (patch for Linux 2.4)
  • Uses /dev/epoll as interface, rather than
    /dev/poll
  • Application writes interest to /dev/epoll and
    IOCTL's to get events
  • Events are coalesced on a per-FD basis
  • Semantically identical to RT signals with
    sig-per-fd sigtimedwait4().

53
Real File Asynchronous I/O
  • Like setting O_NDELAY (non-blocking) on file
    descriptors
  • Application can queue reads and writes on FDs and
    pick them up later (like dry cleaning)
  • Requires support in the file system (e.g.,
    callbacks)
  • Currently doesn't exist on many OS's
  • POSIX specification exists
  • Solaris has non-standard version
  • Linux has it slated for 2.5 kernel
  • Two current candidates on Linux
  • SGI's /dev/kaio and Ben LeHaises's /dev/aio
  • Proper implementation would allow Flash to
    eliminate helpers

54
Summary Event Notification
  • Goal is to exploit concurrency
  • Concurrency in user workloads means host CPU can
    overlap multiple events to maximize parallelism
  • Keep network, disk busy never block
  • Event notification changes applications
  • state-based to event-based
  • requires a change in thinking
  • Goal is to minimize costs
  • user/kernel crossings and testing idle socket
    descriptors
  • Event-based notification not yet fully deployed
  • Most mechanisms only support network I/O, not
    file I/O
  • Full deployment of Asynchronous I/O spec should
    fix this

55
Chapter 5 Workload Characterization
56
Workload Characterization
  • Why Characterize Workloads?
  • Gives an idea about traffic behavior
  • ("Which documents are users interested in?")
  • Aids in capacity planning
  • ("Is the number of clients increasing over
    time?")
  • Aids in implementation
  • ("Does caching help?")
  • How do we capture them ?
  • Through server logs (typically enabled)
  • Through packet traces (harder to obtain and to
    process)

57
Factors to Consider
client?
proxy?
server?
  • Where do I get logs from?
  • Client logs give us an idea, but not necessarily
    the same
  • Same for proxy logs
  • What we care about is the workload at the server
  • Is trace representative?
  • Corporate POP vs. News vs. Shopping site
  • What kind of time resolution?
  • e.g., second, millisecond, microsecond
  • Does trace/log capture all the traffic?
  • e.g., incoming link only, or one node out of a
    cluster

58
Probability Refresher
  • Lots of variability in workloads
  • Use probability distributions to express
  • Want to consider many factors
  • Some terminology/jargon
  • Mean average of samples
  • Median half are bigger, half are smaller
  • Percentiles dump samples into N bins
  • (median is 50th percentile number)
  • Heavy-tailed
  • As x-gtinfinity

59
Important Distributions
  • Some Frequently-Seen Distributions
  • Normal
  • (avg. sigma, variance mu)
  • Lognormal
  • (x gt 0 sigma gt 0)
  • Exponential
  • (x gt 0)
  • Pareto
  • (x gt k, shape a, scale k)

60
More Probability
  • Graph shows 3 distributions with average 2.
  • Note average ? median in all cases !
  • Different distributions have different weight
    in tail.

61
What Info is Useful?
  • Request methods
  • GET, POST, HEAD, etc.
  • Response codes
  • success, failure, not-modified, etc.
  • Size of requested files
  • Size of transferred objects
  • Popularity of requested files
  • Numbers of embedded objects
  • Inter-arrival time between requests
  • Protocol support (1.0 vs. 1.1)

62
Sample Logs for Illustration
Name Chess 1997 Olympics 1998 IBM 1998 IBM 2001
Description Kasparov-Deep Blue Event Site Nagano 1998 Olympics Event Site Corporate Presence Corporate Presence
Period 2 weeks in May 1997 2 days in Feb 1998 1 day in June 1998 1 day in Feb 2001
Hits 1,586,667 5,800,000 11,485,600 12,445,739
Bytes 14,171,711 10,515,507 54,697,108 28,804,852
Clients 256,382 80,921 86,0211 319,698
URLS 2,293 30,465 15,788 42,874
Well use statistics generated from these logs as
examples.
63
Request Methods
Chess 1997 Olympics 1998 IBM 1998 IBM 2001
GET 96 99.6 99.3 97
HEAD 04 00.3 00.08 02
POST 00.007 00.04 00.02 00.2
Others noise noise noise noise
  • KR01 "overwhelming majority" are GETs, few POSTs
  • IBM2001 trace starts seeing a few 1.1 methods
    (CONNECT, OPTIONS, LINK), but still very small
    (1/105 )

64
Response Codes
Code Meaning Chess 1997 Olympics 1998 IBM 1998 IBM 2001
200 204 206 301 302 304 400 401 403 404 407 500 501 503 ??? OK NO_CONTENT PARTIAL_CONTENT MOVED_PERMANENTLY MOVED_TEMPORARILY NOT_MODIFIED BAD_REQUEST UNAUTHORIZED FORBIDDEN NOT_FOUND PROXY_AUTH SERVER_ERROR NOT_IMPLEMENTED SERVICE_UNAVAIL UNKNOWN 85.32 --.-- 00.25 00.05 00.05 13.73 00.001 --.- 00.01 00.55 --.-- --.-- --.-- --.-- 00.0003 76.02 --.-- --.-- --.-- 00.05 23.24 00.0001 00.001 00.02 00.64 --.-- 00.003 00.0001 --.-- 00.00004 75.28 00.00001 --.-- --.-- 01.18 22.84 00.003 00.0001 00.01 00.65 --.-- 00.006 00.0005 00.0001 00.005 67.72 --.-- --.-- --.-- 15.11 16.26 00.001 00.001 00.009 00.79 00.002 00.07 00.006 00.0003 00.0004
  • Table shows percentage of responses.
  • Majority are OK and NOT_MODIFIED.
  • Consistent with numbers from AW96, KR01.

65
Resource (File) Sizes
  • Shows file/memory usage (not weighted by
    frequency!)
  • Lognormal body, consistent with results from
    AW96, CB96, KR01.
  • AW96, CB96 sizes have Pareto tail Downey01
    Sizes are lognormal.

66
Tails from the File Size
  • Shows the complementary CDF (CCDF) of file sizes.
  • Havent done the curve fitting but looks
    Pareto-ish.

67
Response (Transfer) Sizes
  • Shows network usage (weighted by frequency of
    requests)
  • Lognormal body, pareto tail, consistent with
    CBC95, AW96, CB96, KR01

68
Tails of Transfer Size
  • Shows the complementary CDF (CCDF) of file sizes.
  • Looks somewhat Pareto-like certainly some big
    transfers.

69
Resource Popularity
  • Follows a Zipf model p(r) r-alpha
  • (alpha 1 true Zipf others Zipf-like")
  • Consistent with CBC95, AW96, CB96, PQ00, KR01
  • Shows that caching popular documents is very
    effective

70
Number of Embedded Objects
  • Mah97 avg 3, 90 are 5 or less
  • BC98 pareto distr, median 0.8, mean 1.7
  • Arlitt98 World Cup study median 15 objects, 90
    are 20 or less
  • MW00 median 7-17, mean 11-18, 90 40 or less
  • STA00 median 5,30 (2 traces), 90 50 or less
  • Mah97, BC98, SCJO01 embedded objects tend to be
    smaller than container objects
  • KR01 median is 8-20, pareto distribution

Trend seems to be that number is increasing over
time.
71
Session Inter-Arrivals
  • Inter-arrival time between successive requests
  • Think time"
  • difference between user requests vs. ALL requests
  • partly depends on definition of boundary
  • CB96 variability across multiple timescales,
    "self-similarity", average load very different
    from peak or heavy load
  • SCJO01 log-normal, 90 less than 1 minute.
  • AW96 independent and exponentially distributed
  • KR01 pareto with a1.5, session arrivals follow
    poisson distribution, but requests follow pareto

72
Protocol Support
  • IBM.com 2001 logs
  • Show roughly 53 of client requests are 1.1
  • KA01 study
  • 92 of servers claim to support 1.1 (as of Sep
    00)
  • Only 31 actually do most fail to comply with
    spec
  • SCJO01 show
  • Avg 6.5 requests per persistent connection
  • 65 have 2 connections per page, rest more.
  • 40-50 of objects downloaded by persistent
    connections

Appears that we are in the middle of a slow
transition to 1.1
73
Summary Workload Characterization
  • Traffic is variable
  • Responses vary across multiple orders of
    magnitude
  • Traffic is bursty
  • Peak loads much larger than average loads
  • Certain files more popular than others
  • Zipf-like distribution captures this well
  • Two-sided aspect of transfers
  • Most responses are small (zero pretty common)
  • Most of the bytes are from large transfers
  • Controversy over Pareto/log-normal distribution
  • Non-trivial for workload generators to replicate

74
Chapter 6 Workload Generators
75
Why Workload Generators?
  • Allows stress-testing and bug-finding
  • Gives us some idea of server capacity
  • Allows us a scientific process to compare
    approaches
  • e.g., server models, gigabit adaptors, OS
    implementations
  • Assumption is that difference in testbed
    translates to some difference in real-world
  • Allows the performance debugging cycle

Measure
Reproduce
Fix and/or improve
Find Problem
The Performance Debugging Cycle
76
Problems with Workload Generators
  • Only as good as our understanding of the traffic
  • Traffic may change over time
  • generators must too
  • May not be representative
  • e.g., are file size distributions from IBM.com
    similar to mine?
  • May be ignoring important factors
  • e.g., browser behavior, WAN conditions, modem
    connectivity
  • Still, useful for diagnosing and treating
    problems

77
How does W. Generation Work?
  • Many clients, one server
  • match asymmetry of Internet
  • Server is populated with some kind of synthetic
    content
  • Simulated clients produce requests for server
  • Master process to control clients, aggregate
    results
  • Goal is to measure server
  • not the client or network
  • Must be robust to conditions
  • e.g., if server keeps sending 404 not found, will
    clients notice?

Responses
Requests
78
Evolution WebStone
  • The original workload generator from SGI in 1995
  • Process based workload generator, implemented in
    C
  • Clients talk to master via sockets
  • Configurable client machines, client
    processes, run time
  • Measured several metrics avg max connect time,
    response time, throughput rate (bits/sec),
    pages, files
  • 1.0 only does GETS, CGI support added in 2.0
  • Static requests, 5 different file sizes

Percentage Size
35.00 500 B
50.00 5 KB
14.00 50 KB
0.90 500 KB
0.10 5 MB
www.mindcraft.com/webstone
79
Evolution SPECWeb96
  • Developed by SPEC
  • Systems Performance Evaluation Consortium
  • Non-profit group with many benchmarks (CPU, FS)
  • Attempt to get more representative
  • Based on logs from NCSA, HP, Hal Computers
  • 4 classes of files
  • Poisson distribution between each class

Percentage Size
35.00 0-1 KB
50.00 1-10 KB
14.00 10-100 KB
1.00 100 KB 1 MB
80
SPECWeb96 (cont)
  • Notion of scaling versus load
  • number of directories in data set size doubles as
    expected throughput quadruples (sqrt(throughput/5)
    10)
  • requests spread evenly across all application
    directories
  • Process based WG
  • Clients talk to master via RPC's (less robust)
  • Still only does GETS, no keep-alive
  • www.spec.org/osg/web96

81
Evolution SURGE
  • Scalable URL Reference GEnerator
  • Barford Crovella at Boston University CS Dept.
  • Much more worried about representativeness,
    captures
  • server file size distributions,
  • request size distribution,
  • relative file popularity
  • embedded file references
  • temporal locality of reference
  • idle periods ("think times") of users
  • Process/thread based WG

82
SURGE (cont)
  • Notion of user-equivalent
  • statistical model of a user
  • active off time (between URLS),
  • inactive off time (between pages)
  • Captures various levels of burstiness
  • Not validated, shows that load generated is
    different than SpecWeb96 and has more burstiness
    in terms of CPU and active connections
  • www.cs.wisc.edu/pb

83
Evolution S-client
  • Almost all workload generators are closed-loop
  • client submits a request, waits for server, maybe
    thinks for some time, repeat as necessary
  • Problem with the closed-loop approach
  • client can't generate requests faster than the
    server can respond
  • limits the generated load to the capacity of the
    server
  • in the real world, arrivals dont depend on
    server state
  • i.e., real users have no idea about load on the
    server when they click on a site, although
    successive clicks may have this property
  • in particular, can't overload the server
  • s-client tries to be open-loop
  • by generating connections at a particular rate
  • independent of server load/capacity

84
S-Client (cont)
  • How is s-client open-loop?
  • connecting asynchronously at a particular rate
  • using non-blocking connect() socket call
  • Connect complete within a particular time?
  • if yes, continue normally.
  • if not, socket is closed and new connect
    initiated.
  • Other details
  • uses single-address space event-driven model like
    Flash
  • calls select() on large numbers of file
    descriptors
  • can generate large loads
  • Problems
  • client capacity is still limited by active FD's
  • arrival is a TCP connect, not an HTTP request
  • www.cs.rice.edu/CS/Systems/Web-measurement

85
Evolution SPECWeb99
  • In response to people "gaming" benchmark, now
    includes rules
  • IP maximum segment lifetime (MSL) must be at
    least 60 seconds (more on this later!)
  • Link-layer maximum transmission unit (MTU) must
    not be larger than 1460 bytes (Ethernet frame
    size)
  • Dynamic content may not be cached
  • not clear that this is followed
  • Servers must log requests.
  • W3C common log format is sufficient but not
    mandatory.
  • Resulting workload must be within 10 of target.
  • Error rate must be below 1.
  • Metric has changed
  • now "number of simultaneous conforming
    connections rate of a connection must be
    greater than 320 Kbps

86
SPECWeb99 (cont)
  • Directory size has changed
  • (25 (400000/122000) simultaneous conns) /
    5.0)
  • Improved HTTP 1.0/1.1 support
  • Keep-alive requests (client closes after N
    requests)
  • Cookies
  • Back-end notion of user demographics
  • Used for ad rotation
  • Request includes user_id and last_ad
  • Request breakdown
  • 70.00 static GET
  • 12.45 dynamic GET
  • 12.60 dynamic GET with custom ad rotation
  • 04.80 dynamic POST
  • 00.15 dynamic GET calling CGI code

87
SPECWeb99 (cont)
  • Other breakdowns
  • 30 HTTP 1.0 with no keep-alive or persistence
  • 70 HTTP 1.0 with keep-alive to "model"
    persistence
  • still has 4 classes of file size with Poisson
    distribution
  • supports Zipf popularity
  • Client implementation details
  • Master-client communication now uses sockets
  • Code includes sample Perl code for CGI
  • Client configurable to use threads or processes
  • Much more info on setup, debugging, tuning
  • All results posted to web page,
  • including configuration back end code
  • www.spec.org/osg/web99

88
So how realistic is SPECWeb99?
  • Well compare a few characteristics
  • File size distribution (body)
  • File size distribution (tail)
  • Transfer size distribution (body)
  • Transfer size distribution (tail)
  • Document popularity
  • Visual comparison only
  • No curve-fitting, r-squared plots, etc.
  • Point is to give a feel for accuracy

89
SpecWeb99 vs. File Sizes
  • SpecWeb99 In the ballpark, but not very smooth

90
SpecWeb99 vs. File Size Tail
  • SpecWeb99 tail isnt as long as real logs (900 KB
    max)

91
SpecWeb99 vs.Transfer Sizes
  • Doesnt capture 304 (not modified) responses
  • Coarser distribution than real logs (i.e., not
    smooth)

92
Spec99 vs.Transfer Size Tails
  • SpecWeb99 does OK, although tail drops off
    rapidly (and in fact, no file is greater than 1
    MB in SpecWeb99!).

93
Spec99 vs. Resource Popularity
  • SpecWeb99 seems to do a good job, although tail
    isnt long enough

94
Evolution TPC-W
  • Transaction Processing Council (TPC-W)
  • More known for database workloads like TPC-D
  • Metrics include dollars/transaction (unlike SPEC)
  • Provides specification, not source
  • Meant to capture a large e-commerce site
  • Models online bookstore
  • web serving, searching, browsing, shopping carts
  • online transaction processing (OLTP)
  • decision support (DSS)
  • secure purchasing (SSL), best sellers, new
    products
  • customer registration, administrative updates
  • Has notion of scaling per user
  • 5 MB of DB tables per user
  • 1 KB per shopping item, 25 KB per item in static
    images

95
TPC-W (cont)
  • Remote browser emulator (RBE)
  • emulates a single user
  • send HTTP request, parse, wait for thinking,
    repeat
  • Metrics
  • WIPS shopping
  • WIPSb browsing
  • WIPSo ordering
  • Setups tend to be very large
  • multiple image servers, application servers, load
    balancer
  • DB back end (typically SMP)
  • Example IBM 12-way SMP w/DB2, 9 PCs w/IIS 1M
  • www.tpc.org/tpcw

96
Summary Workload Generators
  • Only the beginning. Many other workload
    generators
  • httperf from HP
  • WAGON from IBM
  • WaspClient from IBM
  • Others?
  • Both workloads and generators change over time
  • Both started simple, got more complex
  • As workload changes, so must generators
  • No one single "good" generator
  • SpecWeb99 seems the favorite (2002 rumored in the
    works)
  • Implementation issues similar to servers
  • They are networked-based request producers
  • (i.e., produce GET's instead of 200 OK's).
  • Implementation affects capacity planning of
    clients!
  • (want to make sure clients are not
    bottleneck)

97
Chapter 7 Introduction to TCP
98
Introduction to TCP
  • Layering is a common principle in network
    protocol design
  • TCP is the major transport protocol in the
    Internet
  • Since HTTP runs on top of TCP, much interaction
    between the two
  • Asymmetry in client-server model puts strain on
    server-side TCP implementations
  • Thus, major issue in web servers is TCP
    implementation and behavior

99
The TCP Protocol
  • Connection-oriented, point-to-point protocol
  • Connection establishment and teardown phases
  • Phone-like circuit abstraction
  • One sender, one receiver
  • Originally optimized for certain kinds of
    transfer
  • Telnet (interactive remote login)
  • FTP (long, slow transfers)
  • Web is like neither of these
  • Lots of work on TCP, beyond scope of this
    tutorial
  • e.g., know of 3 separate TCP tutorials!

100
TCP Protocol (cont)
  • Provides a reliable, in-order, byte stream
    abstraction
  • Recover lost packets and detect/drop duplicates
  • Detect and drop bad packets
  • Preserve order in byte stream, no message
    boundaries
  • Full-duplex bi-directional data flow in same
    connection
  • Flow and congestion controlled
  • Flow control sender will not overwhelm receiver
  • Congestion control sender will not overwhelm
    network!
  • Send and receive buffers
  • Congestion and flow control windows

101
The TCP Header
  • Fields enable the following
  • Uniquely identifying a connection
  • (4-tuple of client/server IP address and port
    numbers)
  • Identifying a byte range within that connection
  • Checksum value to detect corruption
  • Identifying protocol transitions (SYN, FIN)
  • Informing other side of your state (ACK)

102
Establishing a TCP Connection
  • Client sends SYN with initial sequence number
    (ISN)
  • Server responds with its own SYN w/seq number and
    ACK of client (ISN1) (next expected byte)
  • Client ACKs server's ISN1
  • The 3-way handshake
  • All modulo 32-bit arithmetic

client
server
connect()
listen() port 80
SYN (X)
SYN (Y) ACK (X1)
ACK (Y1)
accept()
read()
103
Sending Data
  • Sender puts data on the wire
  • Holds copy in case of loss
  • Sender must observed receiver flow control window
  • Sender can discard data when ACK is received
  • Receiver sends acknowledgments (ACKs)
  • ACKs can be piggybacked on data going the other
    way
  • Protocol says receiver should ACK every other
    packet in attempt to reduce ACK traffic (delayed
    ACKs)
  • Delay should not be more than 500 ms. (typically
    200)
  • Well see how this causes problems later

104
Preventing Congestion
  • Sender may not only overrun receiver, but may
    also overrun intermediate routers
  • No way to explicitly know router buffer
    occupancy,
  • so we need to infer it from packet losses
  • Assumption is that losses stem from congestion,
    namely, that intermediate routers have no
    available buffers
  • Sender maintains a congestion window
  • Never have more than CW of un-acknowledged data
    outstanding (or RWIN data min of the two)
  • Successive ACKs from receiver cause CW to grow.
  • How CW grows based on which of 2 phases
  • Slow-start initial state.
  • Congestion avoidance steady-state.
  • Switch between the two when CW gt slow-start
    threshold

105
Congestion Control Principles
  • Lack of congestion control would lead to
    congestion collapse (Jacobson 88).
  • Idea is to be a good network citizen.
  • Would like to transmit as fast as possible
    without loss.
  • Probe network to find available bandwidth.
  • In steady-state linear increase in CW per RTT.
  • After loss event CW is halved.
  • This is called additive increase /multiplicative
    decrease (AIMD).
  • Various papers on why AIMD leads to network
    stability.

106
Slow Start
  • Initial CW 1.
  • After each ACK, CW 1
  • Continue until
  • Loss occurs OR
  • CW gt slow start threshold
  • Then switch to congestion avoidance
  • If we detect loss, cut CW in half
  • Exponential increase in window size per RTT

sender
receiver
one segment
RTT
two segments
four segments
107
Congestion Avoidance
Until (loss) after CW packets ACKed CW
1 ssthresh CW/2 Depending on loss type
SACK/Fast Retransmit CW/ 2 continue
Course grained timeout CW 1 go to slow
start. (This is for TCP Reno/SACK TCP Tahoe
always sets CW1 after a loss)
108
How are losses recovered?
  • Say packet is lost (data or ACK!)
  • Coarse-grained Timeout
  • Sender does not receive ACK after some period of
    time
  • Event is called a retransmission time-out (RTO)
  • RTO value is based on estimated round-trip time
    (RTT)
  • RTT is adjusted over time using exponential
    weighted moving average
  • RTT (1-x)RTT (x)sample
  • (x is typically 0.1)
  • First done in TCP Tahoe

sender
receiver
Seq92, 8 bytes data
ACK100
timeout
X
loss
Seq92, 8 bytes data
ACK100
lost ACK scenario
109
Fast Retransmit
  • Receiver expects N, gets N1
  • Immediately sends ACK(N)
  • This is called a duplicate ACK
  • Does NOT delay ACKs here!
  • Continue sending dup ACKs for each subsequent
    packet (not N)
  • Sender gets 3 duplicate ACKs
  • Infers N is lost and resends
  • 3 chosen so out-of-order packets dont trigger
    Fast Retransmit accidentally
  • Called fast since we dont need to wait for a
    full RTT

sender
receiver
ACK 3000
SEQ3000, size1000
X
SEQ4000
SEQ5000
SEQ6000
ACK 3000
ACK 3000
ACK 3000
SEQ3000, size1000
Introduced in TCP Reno
110
Other loss recovery methods
  • Selective Acknowledgements (SACK)
  • Returned ACKs contain option w/SACK block
  • Block says, "got up N-1 AND got N1 through N3"
  • A single ACK can generate a retransmission
  • New Reno partial ACKs
  • New ACK during fast retransmit may not ACK all
    outstanding data. Ex
  • Have ACK of 1, waiting for 2-6, get 3 dup acks of
    1
  • Retransmit 2, get ACK of 3, can now infer 4 lost
    as well
  • Other schemes exist (e.g., Vegas)
  • Reno has been prevalent SACK now catching on

111
How about Connection Teardown?
  • Either side may terminate a connection. ( In
    fact, connection can stay half-closed.) Let's
    say the server closes (typical in WWW)
  • Server sends FIN with seq Number (SN1) (i.e.,
    FIN is a byte in sequence)
  • Client ACK's the FIN with SN2 ("next expected")
  • Client sends it's own FIN when ready
  • Server ACK's client FIN as well with SN1.

client
server
close()
FIN(X)
close()
ACK(X1)
FIN(Y)
ACK(Y1)
timed wait
closed
112
The TCP State Machine
  • TCP uses a Finite State Machine, kept by each
    side of a connection, to keep track of what state
    a connection is in.
  • State transitions reflect inherent races that can
    happen in the network, e.g., two FIN's passing
    each other in the network.
  • Certain things can go wrong along the way, i.e.,
    packets can be dropped or corrupted. In fact,
    machine is not perfect certain problems can
    arise not anticipated in the original RFC.
  • This is where timers will come in, which we will
    discuss more later.

113
TCP State Machine Connection Establishment
CLOSED
  • CLOSED more implied than actual, i.e., no
    connection
  • LISTEN willing to receive connections (accept
    call)
  • SYN-SENT sent a SYN, waiting for SYN-ACK
  • SYN-RECEIVED received a SYN, waiting for an ACK
    of our SYN
  • ESTABLISHED connection ready for data transfer

server application calls listen()
client application calls connect() send SYN
LISTEN
SYN_SENT
receive SYN send SYN ACK
receive SYN send ACK
receive SYN ACK send ACK
SYN_RCVD
receive ACK
ESTABLISHED
114
TCP State Machine Connection Teardown
ESTABLISHED
  • FIN-WAIT-1 we closed first, waiting for ACK of
    our FIN (active close)
  • FIN-WAIT-2 we closed first, other side has ACKED
    our FIN, but not yet FIN'ed
  • CLOSING other side closed before it received our
    FIN
  • TIME-WAIT we closed, other side c
Write a Comment
User Comments (0)
About PowerShow.com