Nomadic Grid Applications: The Cactus WORM - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Nomadic Grid Applications: The Cactus WORM

Description:

Resource Selection using MDS-2 and ClassAds. Contract ... Automatically contact appropriate machines, stage executables and submit to the queuing system. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 23
Provided by: davea76
Category:

less

Transcript and Presenter's Notes

Title: Nomadic Grid Applications: The Cactus WORM


1
Nomadic Grid Applications The Cactus WORM
  • G.Lanfermann
  • Max Planck Institute for Gravitational Physics
  • Albert-Einstein-Institute, Golm
  • Dave Angulo
  • University of Chicago
  • Chicago, Il.

2
Outline
  • The Worm - Migration on the Grid
  • Motivation
  • Design
  • The Worm Adaptive Migration
  • Data Transfer using GridFTP
  • Resource Selection using MDS-2 and ClassAds
  • Contract Monitoring using MDS-2
  • Intelligent Migration using Gram

3
  • This talk
  • http//people.cs.uchicago.edu/dangulo/grads/Cactu
    sGrADS-Aug7-GlobusRetreat.ppt
  • Other documents on GrADS in Cactus architecture
  • http//people.cs.uchicago.edu/dangulo/grads/arch/
  • http//www.cactuscode.org
  • Paper available in back of 2001 Globus Retreat
    book

4
Migration on the Grid
Payload
Migration
Payload
Requested Available Resource
Resource Broker
Your Grid
5
Large Scale HPC Simulation Daily Routine
  • The daily routine of doing large scale
    numerical simulations
  • Take an educated guess at memory requirements,
    number of processors, disk space needed
  • Start with the first parameter in a range of
    values to explore the behavior of your
    simulation.
  • Select a machine and submit to the queuing
    system. Wait.
  • Archive and analyze the data make changes to the
    parameter file, resubmit to the queuing system.
    Wait.
  • For the large production run, increase resolution
    of your experiment, take educated guess at
    memory,.
  • Select a big machine, submit to the queue. Wait
    3-7 days.
  • Archive data checkpoint file, resubmit to the
    queue. Wait 3-7 days.
  • Archive data checkpoint file, resubmit. Wait
    3-7 days.
  • Archive data checkpoint file, resubmit. Wait
    3-7 days.
  • .

6
Automating the Routine
  • Let the computer find out about the codes
    resource requirements.
  • Automatically contact appropriate machines, stage
    executables and submit to the queuing system.
  • Let the computer monitor the quality of the
    requested resources as the simulation progresses.
  • Perform multiple simulations over a range of
    parameters automatically and in parallel.
  • Archive the data and give the user a uniform
    access.
  • There is plenty of room to automate the way
    simulations are carried out today.

7
Cactus Grid
Cactus based Application Thorns The Physics
Initial Data, Evolution, Analysis, etc

Grid Aware Application Thorns Drivers for
Contract Management, Dynamic Resource Detection,
Simulation Relocation
Grid Enabled Communication Library MPICH-G2
implementation of MPI, can run MPI programs
across heterogeneous computing resources
Standard MPI
Single Proc
8
The Grid Layer Concept
Grid Enabled Simulation
Application Thorns provides Initial Data,
Analysis,Evolution
Grid Enabled Computational Framework
Grid Thorns provide Migration
Resource Management
Cactus Computational Framework
9
Migrating Applications on the Grid
Resource Management
Application Information Server
Payload
AIS
Resource SelectorClient
Worm Layer
Off Site Data Server
Resource Broker
Resource Broker
Migration Unit
Hibernation Storage
10
The WORM at HPDC10
Migration Server
Information Server
11
Current ArchitectureUnder Development
Cactus Worm Server
Gram
Worm Migration Module
Cactus Flesh
User Supplied Application Payload
Data transfer
Performance Degradation Detection
Migration Logic Manager
External Processes
Thorns Cactus Application Unit
12
Migration of Checkpoint Files
  • Uses alpha version of GridFTP
  • Allows Third Party Transfer
  • Without this, need to
  • do a GET to transfer files from source to
    Migrator
  • do a PUT to transfer files from Migrator to
    destination
  • Uses GSI security
  • Allows grid-proxy with only a single sign-on
    while retaining tight security
  • Allows fast, efficient, reliable transfer

13
Resource Selector Architecture
(ClassAds) Resource Selection Client Thorn
Request in ClassAds format
Response in XML
UTk Project
NWS
Resource Selection Engine
GIIS
MDS-2
ClassAds library
14
MDS-2 Future Plans
  • Resource selector goes to GRIS directly after
    resources discovered
  • To investigate strategies for managing update
    traffic
  • Would like persistent queries to support
    notification of changes in resource status

15
Resource SelectionExample Input ClassAds format
  • Type"request"
  • Owner"dangulo"
  • RequiredDomains"cs.uiuc.edu", "ucsd.edu"
  • requirements "other.opSysLINUX
  • other.minMemSizegt
    (100G/other.CPUCount)
  • Include(other.domains,
    RequiredDomains)
  • "
  • Rank other.minCPUSpeed other.CPUCount /
    (other.maxCPULoad1)

16
Resource SelectionExample output
  • ltvirtualMachinegt
  • ltresult statusCode"200" statusMessage"OK"/gt
  • ltmachineListgt
  • ltmachine dns"amajor.cs.uiuc.edu"
    processor" 1"gt
  • ltmachine dns"bmajor.cs.uiuc.edu"
    processor" 1"gt
  • ltmachine dns"cmajor.cs.uiuc.edu"
    processor" 1"gt
  • ltmachine dns"dmajor.cs.uiuc.edu"
    processor" 1"gt
  • ltmachine dns"emajor.cs.uiuc.edu"
    processor" 1"gt
  • ltmachine dns"fmajor.cs.uiuc.edu"
    processor" 1"gt
  • ltmachine dns"hmajor.cs.uiuc.edu"
    processor" 1"gt
  • lt/machineListgt
  • lt/virtualMachinegt

17
Performance Model
  • Working on putting Performance Model into
    ClassAds
  • Every processor is assigned to computer XYZ/N
    grid points.
  • Requested Memory gt 16(constant) 512
    (10E-6)(constant) (XYZ / N) (MB)
  • Time needed to perform an iteration (computation
    time communication time) slowdown
  • 800 Floating point operations every grid point
    per iteration.
  • Computation time 800(constant) (XYZ / N)/
    cpuspeed
  • cpuspeed is FLOPS
  • Communication time 1/G 2( T1 2 T2 GXYR)
  • T1 is the communication latency between two
    processors.
  • latency from NWS
  • T2 is the transmit time for a word
  • T2 1 / (available bandwidth)
  • available bandwidth from NWS
  • Slowdown1 cpuload

18
Contract Monitor
  • Driven by three user-controllable parameters
  • Time quantum for time per iteration
  • degradation in time per iteration (relative to
    prior average) before noting violation
  • Number of violations before migration
  • Potential causes of violation
  • Competing load on CPU
  • Computation requires more processing power e.g.,
    mesh refinement, new subcomputation
  • Hardware problems

19
Contract Monitor Details
  • The end user specifies several variables.
  • These variables can be changed during runtime by
    contacting the application with an HTTP
    interface.
  • These variables include
  • time quantum
  • degradation
  • number of violations before migration
  • The system will then calculate the average wall
    clock time per iteration for each time quantum.
  • If the average iteration in any time quantum has
    lower performance (by the percentage specified)
    than the average for all the other previous
    quanta, then a violation is noted.

20
Actions Taken on Contract Violation
  • Occurs when more than the specified number of
    violations have been noted
  • New set of resources requested from the
    ResourceSelector
  • Checkpoints the application
  • Moves checkpoint data to the new resources along
    with other data needed for restart
  • Restarts application on the new resources

21
Migration Manager
  • Allows RS selection to occur asynchronously
  • Make intelligent choice on whether migration will
    actually help
  • Will not migrate to seemingly lower quality
    resources

22
Summary
  • The Worm gives easy adaptability to changing
    grid environments to researchers in physics and
    computational science
  • Data Transfer using GridFTP
  • Resource Selection using MDS-2 and ClassAds
  • Contract Monitoring using MDS-2
  • Intelligent Migration using Gram
Write a Comment
User Comments (0)
About PowerShow.com