Title: eScience for materials research with intensive computational, data and collaborative requirements Ma
1eScience for materials research with intensive
computational, data and collaborative
requirementsMartin DoveUniversity of Cambridge
2Individual collaborators
Cambridge Peter Murray-Rust, Emilio Artacho,
Richard Bruin, Andrew Walker, Kat Austen, Toby
White, Andrew Walkingshaw CCLRC Rik Tyer,
Kerstin Kleese van Dam, Tom Mortimer-Jones, Ilian
Todorov Bath Steve Parker, Arnaud Marmier,
Corinne Arrouvel Reading Vassil Alexandrov,
Gareth Lewis, Ismael Bhana
3The origins of escience
- Several original ideas
- Linking supercomputers to share large
calculationsUsing spare computer cycles to
significantly increase the amount of useful
computer timeSharing resources leads to the idea
of the virtual organisation
1998
4Grids pervasive and essential
We are used to provision of transparent services
provided by a grid system
Computing is one area for which there is not yet
a transparent grid provision
5eScience in the UK
eScience refers to the large scale science that
will increasingly be carried out through
distributed global collaborations enabled by the
Internet. Collaborative scientific enterprises
will require access to very large data
collections, very large scale computing resources
and high performance visualisation back to the
individual user scientists.
Sir John Taylor Director General of the Research
Councils 19992003
6Our view of eScience
- eScience refers to new science opportunities that
may require distributed collaborations, and which
are enabled by emerging internet technologies. - These technologies include grid computing,
distributed data management and collaborative
tools. - Many tools are still in the process of rapid
development, and in some cases standards are not
yet established.
7Grid computing
Computing grids
Data grids
Collaborative grids
8A scientists anarchic view of escience
- There are many valid perspectives, from the user
to the provider of resources - The eMinerals approach is to focus on scientists
their work, their data and their collaborations - Tool development is tensioned against what the
scientists use - Virtual organisations may resolve a number of
technical issues pragmatically - Scientists get their hands dirty
9User profile
- Our users only want portals/GUIs for specific
tools, not for the working environment - Users do not want their applications pre-wrapped
as services they want to have complete control
over their applications, e.g. to add capability - Users do not want a provider/consumer model that
does not provide the freedom they need
10The eMinerals project team
11CCLRC connection
The collaboration with the eScience centre within
STFC may lead to our tools being deployed within
Diamond and ISIS
12Monte Carlo simulations of cation ordering in
layer silicates
Simulations of cation ordering in layer silicates
based on parameterises Hamiltonian Example of
parametric study which suits grid computing
environments well
13Examples of results
Each temperature from a different grid resource
14Amorphous silica, SiO2
- Amorphous silica is composed of SiO4 tetrahedra
- Tetrahedra are linked at their corners
- The amorphous silica configurations we work with
have perfect connectivity
15Compressibility maximum in amorphous silica
- Density is not quite linear note that the
gradient is larger in the middle of the plot than
at either end. - Bulk modulus (BM)
- BM has minimum around 2 GPa compressibility
1/B has maximum
16Simulated volume curves
17Compressibility curves
18Flexibility histograms
We use our Geometric Algebra tool to measure
rotational RUM flexibility Higher positions of
maxima indicate higher flexibility Most flexible
at intermediate pressures
19Inverse modelling of neutron total scattering data
- Reverse Monte Carlo modelling of total scattering
data - Total scattering gives information about pair
distribution functions, and Bragg scattering
gives information about single particle
distribution functions - The RMC method generates atomic configurations
that are consistent with the experimental data
(total intensity, pair distribution function,
Bragg scattering) - Each simulation takes a long time ...
- ... and we may have a lot of simulations to run
20Data collected on GEM at ISIS
21Orientational disorder in quartz
(JPCM 14, 4645, 2002)
22Diffuse scattering in cristobalite
TEM measurement of diffuse scattering in 001
zone
Diffuse scattering calculated from RMC
configurations
23Phonons from total scattering?
Example of MgO
24Thus we have need for many simulations
- Analysis from parametric sweeps in both
experiments and simulations - Need to generate many configurations for
statistical accuracy and analysis - Grid computing is excellent for this type of
analysis ... - ... but what about the data, and what about
supporting collaboration?
25eScience Science beyond the lab book
- Management of too many computing tasks
- Management of the resultant data deluge
- Sharing the information content with collaborators
eScience can help the human scientist cope,
including maintaining accuracy and accountability
26Compute grids some of the components
- Authentication authorisation, and job
submission, handled by Globus
27Our community grid
Researcher
28Our user interface
Executable ossia2004pathToExe
/home/bob.eminerals/OSSIA2004preferredMachine
List lv1.nw-grid.ac.uk-serial
dl1.nw-grid.ac.uk-serialjobType
performancenumOfProcs 1 Output
trans.outSdir
/home/bob.eminerals/RMCSdemoSget
Sput GetEnvMetadata
trueRDesc Test sweep of temperature
using ossiaRDatasetID 263AgentX
Temperature,trans.xml/ParameterListtitle'Initia
l System'/Parametername'Temperature'AgentX
Energy,trans.xml/PropertyListlast/Prope
rtytitle'Energy'AgentX
OrderParameter,trans.xml/Modulelast/Propertyti
tle'Order parameter'AgentX
HeatCapacity,trans.xml/Modulelast/Propertytitl
e'Heat capacity'AgentX
Susceptibility,trans.xml/Modulelast/Propertyti
tle'Susceptibility'
29Data grid the San Diego Storage Resource Broker
Distributed file management
Distributed data vaults
30SRB client tools
- Unix Scommands (eg Sput, Sls, Scd, Sget)
- Web interface
- GUI for MS Windows (InQ)
31What the SRB has given us
- Our scientists now expect to be able to share
their data with collaborators ...... and they now
expect this to be easy (ie not via a multi-stage
process)Our scientists now routinely produce
complete archives of files associated with a
study easily and automaticallyWe now expect a
single place to deposit data, and for this
process to be easy and automatic
32SRB the good, ambivalent and the not so good
- Good it works easy to use easy to adapt to
easy to add new vaults easy to set up and run
(to a point) - Ambivalent gives something like a view of a file
system, but not quite (two file spaces) - Not so good not well engineered performance
issues under high load designed for permanent
organisations rather than dynamic virtual
organisations (ownership issues) need to manage
points of failure
33Researcher
4. Job runs on grid compute resources
Application server
34Scientific collaboration
Classical molecular dynamics methods
Quantum mechanical methods
35Data and information
?
36Data and information sharing XML data
representation
lt?xml version"1.0" encoding"UTF-8"?gt ltcml
convention"FoX_wcml-2.0" fileId"cis1.cml"
version"2.4" xmlns"http//www.xml-cml.org/schema
"gt ltmetadataList name"Metadata"gt ltmetadata
name"Code name" content"ossia"/gt ltmetadata
name"Code version date" content"January 8,
2007, v2007.3"/gt ... lt/metadataListgt
ltmodule title"Initial System" dictRef"emininiti
alModule"gt ltparameterListgt ltparameter
dictRef"ossiatemperature" name"Temperature"gt
ltscalar dataType"xsddouble"
units"cmlUnitseV"gt1.000000000000e-1lt/scalargt
lt/parametergt ltparameter
dictRef"ossiaNumberOfSteps" name"Number of
steps"gt ltscalar dataType"xsdinteger"
units"unitscountable"gt10000000lt/scalargt
lt/parametergt ... lt/parameterListgt
lt/modulegt ... ltmodule title"Finalization"
dictRef"eminfinalModule"gt ltpropertyListgt
ltproperty dictRef"ossiaEnergy"
title"Energy"gt ltscalar
dataType"xsddouble" units"cmlUnitseV"gt2.052516
362912e-1lt/scalargt lt/propertygt ...
lt/propertyListgt lt/modulegt lt/cmlgt
Chemical Markup Language
Capturing audit metadata
Capturing initial parameters
Capturing computed properties
37XML and Fortran
- Most of our simulation codes are written in
Fortran, which has little support for XML - Thus we have written a set of XML libraries for
Fortran called FoX which make writing XML
easy - We have XML-ised a number of simulation codes,
including SIESTA, CASTEP, DL_POLY and GULP - We have also developed an XML-aware interface to
the SRB called TobysSRB
38What XML gives us
- Simulation code output that is self-describing
(no more mere lists of numbers!) - XML files can be transformed to give user-centric
and information-centric representations of data,
including plotted data - XML files can have key information extracted
easily, essential for large combinatorial studies - XML enables automatic capture of metadata, and
metadata is essential for managing data
39XML ? metadata
- Our job submission tools automatically harvest
metadata from our output XML files - We have developed a new set of tools to access
the metadata database (RCommands)
- We use metadata for locating data and datasets
created by our colleagues - We also use metadata for extracting core
information from data useful for analysing
combinatorial studies
40RCommands and metadata
Metadata are associated with a hierarchy of
studies, datasets and data objects, both as
descriptions and as name/value pairs. Examples of
commands
- Rls list metadata items
- Rget get metadata
- Rannotate add metadata
- Rgem extract metadata from all data objects
within a dataset
41Scientific collaboration
Classical molecular dynamics methods
Quantum mechanical methods
42Researcher A
Researcher B
43Social networking sites, Web 2.0
These have the potential to revolutionise how
scientists work
- MySpace, facebook etc ..
- http//nature.network.com/
- http//www.scispace.net/
44Summary
- Grid computing will radically change how we do
simulation science - Data grid methods and XML will change how we
share information and data - Web 2.0 technologies and social networking
technologies will change how we collaborate - I am happy to demonstrate some of this stuff
until I catch the bus tomorrow