Title: Resource Overbooking and Application Profiling in Shared Hosting Platforms
1Resource Overbooking and Application Profiling in
Shared Hosting Platforms
- Bhuvan Urgaonkar
- Prashant Shenoy
- Timothy Roscoe
- UMASS Amherst and Intel Research
2Motivation
cluster
E-commerce
Streaming
Clients
Games
- Proliferation of Internet applications
- Electronic commerce, streaming media, online
games, online trading, - Commonly hosted on clusters of servers
- Cheaper alternative to large multiprocessors
3Hosting Platforms
- Hosting platform server cluster that runs
third-party applications - Application providers pay for server resources
- CPU, disk, network bandwidth, memory
- Platform provider guarantees resource
availability - Performance guarantees provided to applications
- Central challenge Maximize revenue while
providing resource guarantees
4Design Challenges
- How to determine an applications resource needs?
- How to provision resources to meet these needs?
- How to map applications to nodes in the platform?
- How to handle dynamic variations in the load?
5Talk Outline
- Introduction
- Inferring Resource Requirements
- Provisioning Resources
- Handling Dynamic Load Variations
- Experimental Evaluation
- Related Work
6Hosting Platform Model
- Hosting Platforms Dedicated vs Shared
- Dedicated Applications get integral nodes
- Shared Applications may get fractional nodes
- Our focus Shared Hosting Platforms
- Nodes may have competing applications
- Capsule component of an application running on a
node - Example e-commerce application HTTP server, app
server, database server
7Provisioning By Overbooking
- How should the platform allocate resources?
- Provision resources based on worst-case needs
- Worst-case provisioning is wasteful
- Low platform utilization
- Applications may be tolerant to occasional
violations - E.g., CPU guarantees should be met 99 of the
time - Possible to provide useful guarantees even after
provisioning less than worst-case needs - Idea Improve utilization by overbooking
resources
8Application Profiling
- Profiling process of determining resource usage
- Run the application on an isolated set of nodes
- Subject the application to a real workload
- Model CPU and network usage as ON-OFF processes
Begin CPU quantum
End CPU quantum
time
ON
OFF
- Use the Linux trace toolkit
9Resource Usage Distribution
Measurement Interval
time
10Capturing Burstiness Token Bucket
- Token Bucket (s, ?)
- Resource usage over t s.t ?
s1.t ?1
s2.t ?2
usage
?2
?1
Algorithm by Tang et al
time
- Additional parameter T
- Satisfy token bucket guarantees only for t T
11Profiles of Server Applications
- Applications exhibit different degrees of
burstiness - May have a long tail
- Insight Choose (s, ?) based on a high percentile
12Resource Overbooking
- Applications specify overbooking tolerance O
- Probability with which capsule needs may be
violated - Controlled overbooking via admission control
- SK (sk Tmin ?k)(1 - Ok)
CTmin - Pr (SKUk gt C) min (O1,,Ok)
- A node that has sufficient resources for a
capsule is feasible for it
13Mapping Capsules to Nodes
1
1
1
1
2
2
2
Final Mapping
3
3
3
3
4
4
capsules
capsules
nodes
nodes
- A bipartite graphs of capsules and feasible nodes
- Greedy mapping consider capsules in
non-decreasing order of degrees O( c . Log c ) - Guaranteed to find a placement if one exists!
- Multiple feasible nodes gt best fit, worst fit,
random
14Handling Flash Crowds
- Detect overloads by online profiling
- Reacting to overloads (ongoing work)
- Compute new allocations
- Change allocations, move capsules, add servers
15Talk Outline
- Introduction
- Inferring Resource Requirements
- Provisioning Resources
- Handling Dynamic Load Variations
- Experimental Evaluation
- Related Work
16The SHARC Prototype
- A Linux-based Shared Hosting Platform
- 6 Dell Poweredge 1550 servers
- Gigabit Ethernet link
- Software Components
- Profiling
- Vanilla Linux Linux trace toolkit
- Control plane
- Overbooking, placement
- QoS-enhanced Linux kernel
- HSFQ schedulers
17Experimental Setup
- Prototype running on a 5 node cluster
- Each server 1 GHz PIII with 512MB RAM and
Gigabit ethernet - Control plane runs on a dedicated node
- Applications run on the other four nodes
- Workload mix of server applications
- PostgreSQL database server with pgbench (TPC-B)
benchmark - Apache web server with SPECWeb99 (static
dynamic HTTP) - MPEG streaming server with 1.5 Mb/s VBR MPEG-1
clients - Quake I game server with terminator bots
18Resource Overbooking Benefits
- Small amounts of overbooking can yield large
gains - Bursty applications yields larger benefits
19Capsule Placement Algorithms
- Diverse requirements worst-fit outperforms
others - Similar requirements all perform similarly
20Performance with Overbooking
Application Metric Isolated 100th 99th 95th Avg
Apache Tput (req/s) 67.9 67.51 66.91 64.81 39.8
PostgreSQL Tput (trans/s) 22.8 22.46 22.21 21.78 9.04
Streaming Viol (sec) 0 0 0.31 0.59 5.23
- Performance degradation is within specified
overbooking tolerance
21Related Work
- Single node resource management
- Proportional share schedulers WFQ, SFQ, BVT,
- Reservation based schedulers Nemesis, Rialto,
- Cluster-based resource management
- Cluster Reserves Aron00, Aron thesis Aron00
- MUSE Chase01 economic approach
- Oceano IBM, Planetary computing HP
- Clusters for high availability Porcupine
Saito99 - Grid computing
22Concluding Remarks
- Resource management in shared hosting platforms
- Application profiling to determine resource usage
- Revenue maximization using controlled overbooking
- Ability to handle dynamic workloads (ongoing
work) - URL
- http//lass.cs.umass.edu