Title: Using HTC grid infrastructures: practical experiences from the eminerals project Mark Calleja proxy
1Using HTC grid infrastructures practical
experiences from the eminerals projectMark
Calleja (proxy for Martin Dove)University of
Cambridge
2Our view of eScience
Computing grids
Collaborative grids
Data grids
3Science beyond the lab book
- Management of too many tasks
- Management of the resultant data deluge
- Sharing the information content with
collaborators - Maintaining accuracy and verification
4Rock-salt structure of BaCO3
Note disordered positions of oxygen atoms
5BaCO3 lattice parameters
Molecular dynamics simulations on the NGS
6Usable HTC grid tools
- Easy-to-use tools
- Easy access to resources and data
- Enabling me to achieve much more than before
Can I run my jobs before breakfast?
7Useful tools for HTC grids
- Use standard tools and interfaces, eg Globus,
Condor - Heterogenous resources for heterogenous
applications - Metascheduling
- Integrated data grid
- Give as much control as possible to the user
- The key is in the user interface
8Globus is useda) to provide user authentication
via digital certificates b) job submission
middleware
Our data grid is based on the San Diego Storage
Resource Broker
The application server provides databases and
server capabilities for the SRB, metadata tools,
and job submission tool
Researcher
9Job submission process
- Central role the data grid for data staging and
data archiving - Desktop job submission
- Automatic metadata collection
- Wrapped up in our RMCS tool
10Researcher
4. Job runs on grid compute resources
Application server
11RMCS input file
Executable ossia2004 pathToExe
/home/bob.eminerals/OSSIA2004 preferredMachi
neList lv1.nw-grid.ac.uk-serial
dl1.nw-grid.ac.uk-serial jobType
performance numOfProcs 1 Output
trans.out Sdir
/home/bob.eminerals/RMCSdemo Sget
Sput GetEnvMetadata
true RDesc Test sweep of temperature
using ossia RDatasetID 263 AgentXdefault
trans.xml AgentX Energy,trans.xmlPro
pertyList.Propertytitle'Energy'.value AgentX
OrderParameter,trans.xmlModule.Pro
pertytitle'Order parameter'.value AgentX
HeatCapacity,trans.xmlModule.Propertytit
le'Heat capacity'.value AgentX
Susceptibility,trans.xmlModule.Propertytitle
'Susceptibility'.value
12RMCS architecture
Client layer shell tools, GUI
Server layer API, database, job control
Grid resources for computing and data
13RMCS shell interface
RMCS shell commands interact with the RMCS server
via web services removing the need for
complicated middleware installation, and is
firewall friendly Examples of commands
- rmcs_submit submit a job
- rmcs_status how is the job doing?
- rmcs_cancel kill the job
- rmcs_remove remove from status listing
14RMCS GUI interface
15Parameter sweeps
We have perl programs that
- implement bulk file upload to the SRB or other
data grid - generate set of RMCS input files
- submit all the RMCS jobs
Bulk job creation and submission is a one-command
procedure
16Data and information
?
17Data representation XML
Chemical Markup Language
lt?xml version"1.0" encoding"UTF-8"?gt ltcml
convention"FoX_wcml-2.0" fileId"cis1.cml"
version"2.4" xmlns"http//www.xml-cml.org/schema
"gt ltmetadataList name"Metadata"gt ltmetadata
name"Code name" content"ossia"/gt ltmetadata
name"Code version date" content"January 8,
2007, v2007.3"/gt ... lt/metadataListgt
ltmodule title"Initial System" dictRef"emininiti
alModule"gt ltparameterListgt ltparameter
dictRef"ossiatemperature" name"Temperature"gt
ltscalar dataType"xsddouble"
units"cmlUnitseV"gt1.000000000000e-1lt/scalargt
lt/parametergt ltparameter
dictRef"ossiaNumberOfSteps" name"Number of
steps"gt ltscalar dataType"xsdinteger"
units"unitscountable"gt10000000lt/scalargt
lt/parametergt ... lt/parameterListgt
lt/modulegt ... ltmodule title"Finalization"
dictRef"eminfinalModule"gt ltpropertyListgt
ltproperty dictRef"ossiaEnergy"
title"Energy"gt ltscalar
dataType"xsddouble" units"cmlUnitseV"gt2.052516
362912e-1lt/scalargt lt/propertygt ...
lt/propertyListgt lt/modulegt lt/cmlgt
Capturing audit metadata
Capturing initial parameters
Capturing computed properties
18XML and Fortran
- Most of our simulation codes are written in
Fortran, which has little support for XML - Thus we have written a set of XML libraries for
Fortran called FoX to make writing XML easy - We have XML-ised a number of simulation codes,
including SIESTA, CASTEP, DL_POLY and GULP - We have also developed an XML-aware interface to
the SRB called TobysSRB
19What XML gives us
- Simulation code output that is self-describing
(no more mere lists of numbers!) - Data files can be transformed to give
user-centric and information-centric
representations, including plotted data - Easy to extract key information extracted,
essential for large combinatorial studies - Enables automatic capture of metadata, and
metadata is essential for managing data
20XML ? metadata
- RMCS automatically harvests metadata from our
output XML files - We have developed a new set of tools to access
the metadata database (RCommands)
- We use metadata for locating data and datasets
created by our colleagues - We also use metadata for extracting core
information from data useful for analysing
combinatorial studies
21RCommands and metadata
Metadata are associated with a hierarchy of
studies, datasets and data objects, both as
descriptions and as name/value pairs Examples of
commands
- Rls list metadata items
- Rget get metadata
- Rannotate add metadata
- Rgem extract metadata from all data objects
within a dataset
22Researcher A
Researcher B
23Summary
- eMinerals toolset empowers the scientist users in
their use of HTC grid resources - Tools work from our personal computers with easy
installation - Integrates compute, data and collaborative
components
24Credits
Cambridge Kat Austen, Richard Bruin, Mark
Calleja, Gen-Tau Chiang, Ian Frame, Peter
Murray-Rust, Toby White, Andrew Walker STFC
Kerstin Kleese van Dam, Phil Couch, Tom
Mortimer-Jones, Rik Tyer Bath Corrine Arrouvel,
Arnaud Marmier, Steve Parker Funded by NERC