Title: Xen%20and%20Co.:%20Communication-aware%20CPU%20Scheduling%20for%20Consolidated%20Xen-based%20Hosting%20Platforms
1Xen and Co. Communication-aware CPU Scheduling
for Consolidated Xen-based Hosting Platforms
- Sriram Govindan,
- Arjun R Nath,
- Amitayu Das,
- Bhuvan Urgaonkar,
- Anand Sivasubramaniam,
- Computer Systems Laboratory,
- The Pennsylvania State University.
2Data centers
- Rent server resources
- Provide resource and performance guarantees
- Problem
- Server sprawl
- Solution Consolidation
- Reduce resource wastage
- Reduced floor space
- Better power management
- How?
3Server virtualization
- Ability to create multiple virtual servers from a
single physical server - Allows consolidation by hosting heterogeneous OS
instances over the same hardware - Why now?
- Emergence of highly efficient virtual machine
monitors - Xen, VMware etc
- Hardware support
- Intel, AMD, IBM etc
- Real world example
- Amazon EC2
2-tiered e-commerce application
Applications
Operating system
Linux
Windows
VMM
Hardware
Hardware
4Consolidation How?
- Know what the applications need
- Ensure resource requirement of the applications
are met
5Consolidation Example
- Consider a representative e-commerce benchmark,
- TPC-W, an online book store application
- Measure application resource needs and record
performance, - Run TPC-W tiers on dedicated servers
Query
Record resource usage
Jboss
Mysql
Requests
Record response times
. . .
Response
Responses
VMM
VMM
Clients
Hardware
Hardware
6Consolidation Example
CDF
CPU utilization 95th Percentile
Jboss 10
Mysql 20
Response time in seconds
7Consolidation Example
- Consolidate the TPC-W tiers on to a single server
- Use Hypervisor to ensure resource guarantees
- Reserve for the peak requirement
- Pack more applications to utilize the remaining
server capacity
10
20
Resource underutilized
Almost 100 Server Utilization
Other resource requirements are also met
VMM
VMM
Hardware
Hardware
8Consolidation Example
With consolidation
CDF
Without consolidation
Response time in seconds
Why did this happen?
9Scheduler induced delays
Jboss
query1
query2
TPC-W tiers running on dedicated servers
reply1
reply2
DB
Network latency
10Scheduler induced delays
Jboss
query1
query2
TPC-W tiers running on dedicated servers
reply1
reply2
DB
Jboss
query1
query2
Consolidated TPC-W tiers
reply1
reply2
DB
Scheduler induced delays
Network latency
11Does this look familiar?
- Parallel systems Gang scheduling/Co-scheduling
- Feitelson et al, Ousterhout et al, Andrea et al
- Schedulers low latency dispatch
- eg. BVT, Duda et al
- Our contribution
- Fairness guarantees Applications pay for
resources - Self-tuning - reduced administrator intervention
- Adapt to varying applications I/O behaviour
- Network I/O is virtualized further increases
the delays
12Xen Virtual Machine Monitor
I/O virtualization
Applications
Applications
Applications
Domain 0/ Driver domain
Modified Guest OS
Modified Guest OS
Modified Guest OS
Virtual machines
Virtual hardware (vCpu, vDisk, vNic, vMemory
etc.)
Xen Hypervisor
VM scheduler
Physical hardware (Cpu, Disk, Nic, Memory etc.)
13Network Virtualization in Xen - Reception
Application
Packet delivery
Guest VM
Netfront Driver
Virtual Interrupt
Netback driver
domain0
Hardware drivers
Notify
Hypervisor
Interrupt
NIC
14Network Virtualization in Xen - Transmission
Application
Packet send
Guest VM
Netfront Driver
Send over virtual NIC
Netback driver
domain0
Hardware drivers
Send over NIC
NIC
15Scheduler induced Delays
- Delay associated with scheduling of Domain0
- When a guest domain transmits a packet
- When a packet is received at the physical NIC
16Scheduler induced Delays
- Delay associated with scheduling of Domain0
- Delay at the recipient
- When Domain0 sends a packet to a guest domain
17Scheduler induced Delays
- Delay associated with scheduling of Domain0
- Delay at the recipient
- Delay at the sender
- Before a domain sends a network packet (on its
virtual NIC). - Unlike reception, sending a packet can only be
anticipated.
18Scheduler induced Delays
- Delay associated with scheduling of Domain0
- Delay at the recipient
- Delay at the sender
dom0
dom0
Jboss
Jboss
query
reply
dom0
dom0
DB
Scheduler induced delays with virtualization
overhead
Consolidated TPC-W tiers in a virtualized
environment
Network latency
19Scheduler design
- Recall Reservations must be provided
- Build on top of a reservation based scheduler
-SEDF - (slice, period) pair need slice ms every
period ms - Communication aware SEDF scheduler
- Enhance CPU scheduler to reduce scheduler induced
delays - Change scheduling order to preferentially
schedule communicating domains - Introduce short term unfairness
- Still preserve reservation guarantees over a
coarser time scale - PERIOD
20Scheduler Implementation
- Key idea
- Associate impending network activity with each
virtual machine - Incorporate communication activity in to decision
making - Greedy Heuristic
- Prefer VM that is likely to benefit the most
the VM with most pending packets
21Communication aware scheduler
- Reception
Domain 1
Domain 2
Domain n
Guest Domains
domain1.pending--
Hypervisor
Packet arrive at the NIC
Domain0.pending
Domain0.pending--
NIC
Domain0
Domain1.pending
Interrupt
Schedule Domain 1.
Now, schedule domain0.
22Evaluation Environment
- Applications
- TPC-W benchmark
- jboss and mysql tiers
- Multi-threaded UDP Streaming server,
- Simultaneously stream data at 3Mbps to specified
number of clients - Every client is provided with a 8MB buffer size
- Clients starts consuming data only when the
buffer is full - CPU intensive workloads,
- Used for illustrative purposes
23Streaming media experiments - performance
improvement
- Streaming to 45 Clients at 3Mpbs for 20 minutes
- Default scheduler suffered playback discontinuity
every 1.5 minutes
24Streaming media experiments - performance
improvement
- Streaming to 45 Clients at 3Mpbs for 20 minutes
- Default scheduler suffered playback discontinuity
every 1.5 minutes - Communication-aware scheduler suffered a
discontinuity only after 18th minute
25Streaming media experiments - improved
consolidation
( Lower the better )
- A single buffer under run at the client is fixed
as Service Level Objective (SLO) - Communication aware scheduler is able to sustain
30 more clients than the default scheduler
No. of buffer under runs at the client
SLO
No. of clients supported at the server
26TPC-W performance
- TPC-W benchmark ran for 20 minutes
- Around 35 percent improvement in response time
compared to the default scheduler
Scheduler Average (secs) 95th percentile (secs) Maximum (secs)
Default SEDF 1.3 7.1 26.1
Modified SEDF 0.8 5.7 12.8
Percentage improvement 34.11 19.98 51.15
27Scheduler Fairness Evaluation
CPU intensive Virtual Machine
- The CPU intensive VM lost less than 1 of CPU
compared to the default scheduler but was still
above their reservation which was 10 - Just changing the order of scheduling resulted in
huge response time improvement for the streaming
server
Default SEDF
CPU utilization
Modified SEDF
Reservation
Time in minutes
28Conclusion
- A communication-aware CPU scheduler developed for
a consolidated environment - Low overhead run-time monitoring of network
events by the hypervisor scheduler - Addressed additional problems due to network I/O
virtualization in Xen - Source code (300 lines) and Xen3.0.2 Patch
available in the software link in, - http//csl.cse.psu.edu/
29Questions
30Streaming media experiments - performance
improvement
- Streaming to 45 Clients at 3Mpbs for 20 minutes
- Default scheduler suffered glitches every 1.5
minutes - Communication-aware scheduler suffered a glitch
only after 18th minute - With only domain0 optimization ON, glitch
occurred at the 15th minute
31Communication aware scheduler
Domain 1
Domain 2
Domain n
Guest Domains
Guest domain book-keeping pages
Hypervisor
Domain0s book-keeping page
Domain0
32Communication aware scheduler
- Reception
Domain 1
Domain 2
Domain n
Receive packets.
Guest Domains
Update Packet reception.
Update pending activity.
Hypervisor
Domain0s book-keeping page
Packet arrive at the NIC
Domain0 network_reception_intensity
Domain0
Domain 1 network_reception_intensity
NIC
Interrupt
Now, schedule domain0.
Domain 0 is de scheduled, now we are in the
hypervisor.
Schedule Domain 1.
Domain 1 is de scheduled, now we are in the
hypervisor.
33Communication aware scheduler
- Transmission
Domain 1
Domain 2
Domain n
Domain1 network_transmission intensity
Guest Domains
Hypervisor
Domain0s book-keeping page
Domain0 network_transmission intensity
Domain0
Domain1 anticipated_network transmission_intensit
y
Now domain 1 is de scheduled, we are in the
hypervisor.
34I/O Virtualization in Xen
Guest domain
Frontend Driver
Transfer
Notify
Shared pages
Backend driver
Domain0
Hardware drivers
Disk
NIC
I/O Devices