CSE 598c Virtual Machines - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

CSE 598c Virtual Machines

Description:

CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in ... skb_copy_bits, skbuff_ctor, and tcp_collapse were the culprit functions ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 13
Provided by: patrickm150
Learn more at: https://www.cse.psu.edu
Category:

less

Transcript and Presenter's Notes

Title: CSE 598c Virtual Machines


1
CSE 598c Virtual MachinesDiagnosing Performance
Overheads in the Xen Virtual Machine
EnvironmentAravind Menon, Jose Renato Santos,
Yoshio Turner, G. Janakiraman, Willy Zwaenepoel
  • Lisa Johansen
  • March 13, 2006

2
Motivation
  • Performance of an application in VM environments
    are affected by
  • Operating System
  • Other Processes
  • Underlying VMM
  • Other VMs
  • We want a way to measure the elements which
    effect performance in a Virtual Machine
    environment (Xen)

3
Outline
  • Overview Statistical Analysis in VMs
  • Xenoprof
  • Performance Debugging
  • Performance Overhead Analysis in Xen

4
Issues in VM Statistical Analysis
OS
Virtual Machine
  • Distributed computing
  • Distributed profiling
  • VMs dont have access to hardware events
  • VMM does

5
Xenoprof
  • In order to handle distributed profiling, each VM
    runs an OProfile for individual profiling
  • In order to monitor hardware, Xenoprof accepts
    hypercalls from OProfile and returns samples
    through interrupts

P2
P3
P1
P1
P4
P2
P1
Kernel
Kernel
Dom0
OProfile
OProfile
OProfile
Xenoprof
VMM
Hardware
6
How it works
  • Each profiling domain queries the Xenoprof to
    find out if it should be the initiator
  • If there are multiple domains, Dom0 must be the
    initiator
  • The initiator collects profiling requirements
    from the participants and forwards this
    information to the Xenoprof
  • Xenoprof collects program counter samples in
    accordance to the instructions
  • These samples are then given to the OProfilers
    where they are mapped to the correct process
  • Individual or system wide performance can then be
    determined

7
Performance Debugging - Networking
  • The motivating example was looking at the
    comparison of receiver throughput between Linux
    and XenoLinux
  • Varying the size of the user-level buffer greatly
    effects XenoLinux. Why?
  • Using Xenoprof they found
  • XenoLinux kernel was the source of the increase
    in execution time
  • skb_copy_bits, skbuff_ctor, and tcp_collapse were
    the culprit functions
  • This is all due to time spent defragmenting
    memory taken up by empty socket buffer contents

8
Performance Overhead Evaluation
  • Given this cool new tool, lets apply it and
    determine performance overheads
  • Namely in network communication because it is an
    important element of VMs
  • Evaluate
  • Receiver workload
  • Sender workload
  • Web server workload
  • In three configurations
  • Xen-domain0
  • Xen-guest0 (same CPU)
  • Xen-guest1 (different CPUs)

9
Receiver workload
  • Domain0
  • Degraded performance when compared to Linux
  • Found that instruction TLB misses and data TLB
    misses are much greater than in Linux (primary
    cause)
  • May be TLB flushing or increase in working set
    size
  • Instruction cost is greater in XenoLinux due to
    overheads that exist within Xen
  • Guest0 Guest1
  • Degraded performance when compared to Dom0
  • Significant increase in instructions
  • Page remapping and transfer from Dom0 to DomUs
  • Increased L2 cache misses caused by increased
    working set size

10
Sender workload
  • Domain0
  • No throughput differences when compared to Linux
  • Guest0
  • Huge throughput degradation based on the high
    instruction cost (max 706 Mb/s compared to 3764
    Mb/s)
  • The TCP stack processes a larger number of
    packets than Dom0 to transfer the same amount of
    data
  • Due to the lack of TCP segmentation offload
    support
  • Also computes large checksums
  • Driver domain model prevents these instructions
    to be offloaded into physical interface
  • If similar abilities are taken away from Dom0, we
    see similar results

11
Webserver workload
  • Overall, very similar to the receive and send
  • Domain0
  • Higher TLB miss rate than that of Linux
  • Guest0
  • Higher instruction costs
  • Highest L2 cache miss rates
  • Highest computational overhead
  • TSO offload dont matter due to the small
    payloads
  • Guest1
  • Higher instruction costs
  • Higher L2 cache miss rates
  • TSO offload dont matter due to the small payloads

12
Conclusion
  • Xenoprof is a tool to examine performance within
    Xen
  • Xenoprof has been used to examine the different
    performance elements of network communication in
    Xen
  • It can be used to evaluate other performance
    within Xen
Write a Comment
User Comments (0)
About PowerShow.com