The VM deployment process has 3 major steps: - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

The VM deployment process has 3 major steps:

Description:

VMM (Virtual Machine Monitor) a 3rd-party tool providing the interface between a Virtual Machine and the host machine. ... EMBOSS installation: 45 minutes – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 2
Provided by: nimbuspro
Category:

less

Transcript and Presenter's Notes

Title: The VM deployment process has 3 major steps:


1
Quality of Life in the Grids VMs meet
Bioinformatics Applications Daniel Galron1 Tim
Freeman2 Kate Keahey3 Stephanie Gato4
Natalia Maltsev5 Alex Rodriguez6 Mike
Wilde7 1 The Ohio State University.
galron_at_cis.ohio-state.edu 2 Argonne
National Laboratory. tfreeman_at_mcs.anl.gov
3Argonne National Laboratory.
keahey_at_mcs.anl.gov 4Indiana University.
sgato_at_cs.indiana.edu 5 Argonne National
Laboratory. maltsev_at_mcs.anl.gov 6Argonne
National Laboratory. arodri7_at_mcs.anl.gov 7Argon
ne National Laboratory. wilde_at_mcs.anl.gov
A Glossary of Terms VMM (Virtual Machine
Monitor) a 3rd-party tool providing the
interface between a Virtual Machine and the host
machine. Some examples of VMMs are VMWare and Xen.
  • Using VMs has many benefits for scientists
    running complex applications
  • Broader resource base a virtual machine can be
    pre-configured with a required OS, library
    signature and application installation and then
    deployed on many different nodes independently of
    that nodes configuration
  • Simplified deployment/distribution VMs can be
    used as distribution packages to duplicate an
    installation, just copy a VM image
  • Easy Migration capability an executing VM image
    can be frozen, transferred to (another)
    resource and restarted within milliseconds
  • Fine grained resource management one can
    confine resource usage within most VM
    implementations
  • Enhanced security VMs provide outstanding
    isolation protecting the resource from the user
    and isolating users from each other
  • Complex applications require customized software
    configurations such environments may not be
    widely available on Grid nodes
  • Installing scientific applications by hand can
    be arduous, lengthy and error-prone the ability
    to amortize this process over many installations
    would help
  • Providing good isolation of Grid computations is
    a key security requirement the currently used
    mechanism of Unix accounts is not sufficient
  • Providing a vehicle for fine-grained resource
    usage enforcement is critical for more efficient
    use of Grid resources, yet such technology is not
    widely available
  • The ability to migrate or restart applications
    would be of enormous value in a Grid environment
    yet the current Grid frameworks do not support it

VMManager Grid service interface to allow a
remote client to interact with the
VMM VMRepository Grid service which catalogues
VM images of a VO and which stores them for
retrieval and deployment Authorization Service
Grid service which the VMManager and VMRepository
services call to check if a user is authorized to
perform the requested operation
Virtual Machines meet the Grids
Performance Implications
In a nutshell
The performance of applications running on a VM
depends on the third-party VMMs and the
applications themselves. A purely CPU-bound
program will have almost no performance
degradation as all instructions will be executed
directly on hardware. Typically, virtual machines
intercept privileged instructions (such as I/O)
resulting in a performance hit for those
instructions although new methods, such as those
implemented by Xen, improve this factor. In our
implementation, we experimented with VMWare
Workstation and Xen and in our experience
slowdown was never more than 30 and is often
less than 5. (The Xen slowdown was much less
than 30)
Instead of running Grid software within VMs, we
integrated VM deployment into the Grid
infrastructure mapping a client credential to a
Unix account was replaced by deploying a VM and
starting the clients environment within it.
We implemented the architecture using Globus
Toolkit 3.2, an open-source grid middleware
toolkit which provides a framework for resource,
data, and security management.
3
3
3
2
Migration
Describing VM Properties
  • Integrating Virtual Machines with Grid
    technology allows easy migration of applications
    from one node to another. The steps are as
    follows
  • Using Grid software, the client freezes execution
    of the VM
  • The client then sends the migrate command to
    the VMManager, specifying the new host node as a
    parameter
  • After checking for the proper authorization, the
    VM is registered with the new host and a GridFTP
    call transfers the image
  • In terms of performance this is on a par with
    deployment it is mainly bound by the length of
    transfer. In our tests, we migrated a 2GB VM
    image from two identical nodes through a Fast
    Ethernet connection.

2
1
1
A VM constitutes a virtual workspace configured
to meet the requirements of Grid computations. We
use an XML Schema to describes various aspects of
such workspace including virtual hardware (RAM
size, disk size, Virtual CD-ROM drives, serial
ports, parallel ports), installed software
including the operating system (e.g. kernel
version, distribution type) as well as library
signature, as well as other properties such as
image name and VM owner. Based on those
descriptions VMs can be selected, duplicated, or
further configured.
Legend
- VMManager
- VMRepository
The graph to the right shows the proportion of
time taken by the constituents of the deployment
process, measured in seconds. Note that the graph
does not include time for authorization, but
those times are comparable to registration time.
Also, the actual migration time depends on the
network latency and bandwidth. The pause and
resume times are dependent on 3rd party VMM.
VM Deployment
  • The VM deployment process has 3 major steps
  • The client queries the VM repository, sending a
    list of criteria describing a workspace. The
    repository returns a list of VM descriptors that
    match them.
  • The client contacts the VMManager, sending it the
    descriptor of the VM they want to deploy, along
    with an identifier, and a lifetime for the VM.
    The VMManager authorizes the request using an
    access control list.
  • The VM instance is registered with the VMManager
    and the VM is copied from the VMRepository. The
    VMManager then interfaces with the VMM on the
    resource to power on the VM.

The low level features of our architecture are
detailed in the diagram to the right. The diagram
describes for nodes, each running a (potentially
different) host OS. Each node is running a VMM
and a VMManager Grid Service. On top of that
layer, run the actual VMs, which are installed
with Grid software, allowing them to be run as
Grid nodes. The VMs could also be used as
independent execution environments, without Grid
middleware installed on them. (Instead they would
run applications directly).
The graph to the right shows the proportion of
time taken by the constituents of the deployment
process, measured in seconds. The authorization
time is not included, but it is comparable to
registration time. The dominant factor in overall
deployment time depends on network latency and
bandwidth.
After a scientist has deployed a VM onto the
resource, he may run an application in it. For
this purpose, each of our VMs was configured with
the Globus Toolkit. This picture represents a
scientist running the TOPO program, creating an
image of a transmembrane protein.
How does using VMs help the Bioinformatics
community?
Do VMs fulfill their promise?
Issues or Problems Encountered
  • Broader base of resources Our tests show that
    this first promise is met. Consider the following
    situation a scientist can use a testbed on DOE
    Science Grid across several clusters. A scientist
    has access to 20 Solaris nodes in LBNL, 20 nodes
    in ANLs Jazz Cluster (Linux nodes), and 20
    Linux nodes on NERSCs pdsf cluster. If only the
    Jazz nodes have the necessary configuration to
    run EMBOSS, it would take a lot more work to get
    EMBOSS to run on the LBNL and pdsf clusters. If
    we install EMBOSS on a VM, and then run an
    instance of the VM on each node we can use all 60
    nodes instead of just 20.
  • Easier deployment/distribution Using VMs makes
    deployment easier and faster. In our tests we
    experimented with a 2 GB minimal VM image with
    the following results
  • EMBOSS installation 45 minutes
  • VM deployment on our testbed 6 minutes 23
    seconds
  • Peace of mind (not having to debug
    installation) priceless!
  • Fine Grained resource management Depending on
    the implementation of, a VM can provide
    fine-grained resource usage enforcement critical
    in many scenarios in the Grid
  • Enhanced security VMs offer enhanced isolation
    and are therefore a more secure solution for
    representing user environments.

When developing the architecture we encountered
several important but interesting issues and
problems we had to resolve. Clocks While a VM
image is frozen or powered-off, the VMs clock
does not update. We need a way to update a VMs
clock as soon as it is powered-on or unpaused.
IP Addresses We need a way to assign unique IP
addresses to each new VM instance (i.e. each time
a VM is deployed) so that multiple copies of the
same VM can be deployed on the same
subnet. Starting a Grid container We also need a
way to automatically start up a Grid container on
startup of a VM if we want it to be a
full-fledged Grid node, or at least launch a User
Hosting Environment. We solved these issues by
installing a daemon on the VM upon deployment,
it sets the IP address of the VM, launches a UHE
and, if needed, updates the clock.
Write a Comment
User Comments (0)
About PowerShow.com