Managing Biomolecular Simulations in a Grid Environment with NAMDG - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Managing Biomolecular Simulations in a Grid Environment with NAMDG

Description:

Managing Biomolecular Simulations in a Grid Environment with NAMD-G. Michelle Gower*, Jordi Cohen ... James C. Phillips , Rick Kufrin*, Klaus Schulten ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 25
Provided by: tera3
Category:

less

Transcript and Presenter's Notes

Title: Managing Biomolecular Simulations in a Grid Environment with NAMDG


1
Managing Biomolecular Simulations in a Grid
Environment with NAMD-G
  • Michelle Gower, Jordi Cohen,
  • James C. Phillips, Rick Kufrin, Klaus Schulten
  • University of Illinois at Urbana-Champaign
  • National Center for Supercomputing Applications
  • Theoretical and Computational Biophysics
    GroupNIH Resource for Macromolecular Modeling
    and Bioinformatics

2
Overview
  • Scientific Motivation Hydrogenase O2 Problem
  • Introduction to NAMD-G software
  • Underlying Grid Middleware
  • Technical Challenges/Lessons Learned
  • NAMD-G Accomplishments
  • Future Work
  • Closing Remarks

3
The Hydrogenase O2 Problem
  • O2 permanently deactivates hydrogenase.
  • Can we engineer O2 tolerance?
  • We dont know how the O2 gets to the active
    site.
  • Led to development of a method to study gas
    migration pathways in proteins

?
Image created with VMD (http//www.ks.uiuc.edu/Res
earch/vmd)
4
Gas Migration Pathways
  • Opportunity to study gas migration pathways in
    other proteins.
  • This means running many biomolecular
    simulations.
  • Managing these simulations becomes a problem.

Image created with VMD (http//www.ks.uiuc.edu/Res
earch/vmd)
Sperm Whale MyoglobinO2 Accessibility
5
NAMD
  • Highly-scalable, high-performance molecular
    dynamics code for large biomolecular simulations
    (typically 8-512 processors)
  • Developed by the Theoretical and Computational
    Biophysics Group (TCBG University of Illinois at
    Urbana-Champaign)
  • NAMD can be told to output restart files.

6
Computation
  • A simple simulation consists of the following
    sequence of NAMD runs
  • 2 pre-equilibration runs
  • An equilibration run (1ns)
  • A production run (6ns)
  • A scientist might also want to continue
    simulations for more timesteps or restart
    simulations from interesting points with
    different parameters.

7
Typical Tasks for a Run
1. Store input files on MSS
Local Workstation
2. Submit remote batch job
6. Retrieve restart and
output files
3. Retrieve input and restart files
4. Execute NAMD
5. Store output and restart files
Mass Storage System
Remote HPC Machine
  • Images of remote HPC machine (Mercury) and mass
    storage system (UniTree) courtesy of the National
    Center for Supercomputing Applications (NCSA) and
    the Board of Trustees of the University of
    Illinois

8
Nanny for NAMD
NAMD-G
  • NAMD-G is a grid-based automation engine for
    biomolecular simulations.
  • Given input files and a description of the
    simulation, NAMD-G submits remote batch jobs to
    the specified remote system, handling the
    transfers of input, output, and restart files.
  • If a job dies due to hitting the wallclock limit,
    NAMD-G automatically submits another job until
    the run is complete.

9
NAMD-G Commands
  • NAMD-G is a set of scripts with specific
    knowledge of NAMD wrapped around existing generic
    grid middleware.
  • Submit a simulation ngsubmit RUNFILE
  • Monitor a simulation ngstat
  • Delete a simulation ngdel ID
  • Restart a simulation ngrestart

10
Pre-defined Runs
  • There are pre-defined runs that will
    automatically work for any system.
  • This greatly reduces the learning curve for
    someone to start using NAMD-G.
  • They are very modular so the scientist can pick
    which ones they want to use.

11
Underlying Grid MiddlewareAuthentication
  • Globus Toolkit
  • Is an open source set of software developed by
    the Globus Alliance that can be used to build
    Grid applications.
  • Globus Toolkit Security Component
  • GSI-Authentication
  • Use proxy certificates instead of passwords or
    ssh keys

12
Underlying Grid MiddlewareData Transfers
  • uberFTP
  • GridFTP-enabled interactive client
  • Developed by NCSA
  • Globus Toolkit
  • Pre-WS GridFTP Services

GridFTP Server
GridFTP Server
Mass Storage
Remote HPC Machine
13
Underlying Grid MiddlewareJob Submission
Monitoring
  • Condor
  • Management system developed by the Condor Team,
    led by Miron Livny, at the University of
    Wisconsin-Madison.
  • Condor Condor-G
  • Uses Globus Toolkit behind the scenes to submit
    jobs to remote machines
  • Globus Toolkit GRAM component
  • Pre-WS GRAM Service

Gatekeeper
Batch Jobmanager
Batch System
Remote HPC Machine
14
Underlying Grid MiddlewareWorkflow Management
  • Condor DAGMan
  • Allows the user to specify ordering of jobs
  • DAGMan keeps track of which jobs have been
    successfully completed. Upon failure, it writes
    a file allowing the user to easily restart it at
    the failed job.
  • DAGMan can be told to repeat a job.

Job A
Job B
Job C
15
Underlying Grid MiddlewareDAGMan - Single Run
  • Pre
  • Copy internal files to remote machine
  • NAMD job
  • Retrieve input and restart files
  • Run NAMD
  • Post
  • Transfer output files to MSS
  • Transfer output files to local machine
  • Check whether run has completed
  • Notify user via email

16
Underlying Grid MiddlewareAuthentication Part
Two
  • Globus Toolkit MyProxy
  • Open source project started by NCSA to provide an
    online credential repository
  • Condor-G can automatically renew a proxy using
    MyProxy

17
Grid Middleware Summary
  • Local Workstation
  • Globus Toolkit (no services)
  • Condor
  • uberFTP
  • Remote HPC Machine
  • Globus Toolkit Pre-WS services GridFTP, GRAM
  • uberFTP
  • Mass Storage System
  • Globus Toolkit Pre-WS services GridFTP

18
Technical Challenges/Lessons Learned
  • Local machines were behind firewalls with minimal
    open incoming ports
  • Could not use built-in file staging, had to write
    code to push input files and pull output files
  • Messages from remote batch submission command not
    sent back to local machine.
  • Discovered jobmanager problems
  • Some Globus installations did not correctly use
    single jobtype with processor count greater
    than 1
  • Difficulty distinguishing jobs on remote machine
  • Currently cannot set batch jobname through Globus
  • Currently cannot get batch jobid through Globus

19
Technical Challenges cont.
  • NAMD-G portability issues
  • Shell script portability was an issue
  • Different RSL has to be created depending upon
    remote machine
  • Not all HPC machines have
  • remote GridFTP access to home and scratch
    directories.
  • a fork jobmanager.
  • access to external MSS from compute nodes
  • uberFTP installed

20
NAMD-G Accomplishments
  • NAMD-G developed hand-in-hand with a pilot
    science project
  • Completed projects on gas conduction
  • Comparative O2 pathways in 15 globins, from
    plants to insects to mammals.
  • O2 pathways in two high-profile proteins
    hydrogenase and copper amine oxidase.
  • Ongoing simulation of ribosome

hydrogenase
NAMD-G saves time, especially time spent on
mindless, boring, and error-prone tasks. with
a minimal initial investment, NAMD-G makes
simulations even more convenient than I dared
hope to initially. - Dr Emma Falck, Beckman
Fellow
Cohen, et al., Biophys. J. 91 (Sept. 2006)
soy leghemoglobin
21
Future Work
  • Allow simulations to be easily continued for more
    timesteps
  • Allow simulations to be easily branched from
    other simulations
  • Create NAMD-G configuration files at the system
    and user levels.

22
Closing Remarks
  • Using existing grid middleware allowed for rapid
    development of a functional system.
  • NAMD-G is a perfect example of what can be
    accomplished with tight collaboration where both
    groups provide ideas, design and implementation
    principles.

23
Acknowledgements
  • This work was supported in part by the National
    Science Foundation grants SCI-0451538,
    SCI-0504064, and SCI-0438712. Funding for the
    Resource for Macromolecular Modeling and
    Bioinformatics is provided by the National
    Institutes of Health grant NIH P41 RR05969

24
Links
  • NAMD - www.ks.uiuc.edu/Research/namd
  • Globus Toolkit - www.globus.org/toolkit
  • Condor - www.cs.wisc.edu/Condor
  • UberFTP - dims.ncsa.uiuc.edu/set/uberftp
  • MyProxy grid.ncsa.uiuc.edu/myproxy
  • NAMD-G - www.ks.uiuc.edu/Research/namdg

25
Underlying Grid Middleware
  • Authentication
  • Globus Toolkit Security Component
  • MyProxy
  • Job Submission and Monitoring
  • Condor Condor-G
  • Globus Toolkit Pre-WS GRAM Services
  • Workflow Management
  • Condor DAGMan
  • Data Transfer
  • UberFTP
  • Globus Toolkit Pre-WS GridFTP Service
Write a Comment
User Comments (0)
About PowerShow.com