eScience for materials research with intensive computational, data and collaborative requirements Ma - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

eScience for materials research with intensive computational, data and collaborative requirements Ma

Description:

... f 3 f ff fff3f3 33 3f333 f3 f 3 f 3 f 3 f f ... f 3 f f f f ff 3f f ff fff3ff f f f ff 3f ff ffff fffff3fff3 f3 ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 45
Provided by: marti89
Category:

less

Transcript and Presenter's Notes

Title: eScience for materials research with intensive computational, data and collaborative requirements Ma


1
eScience for materials research with intensive
computational, data and collaborative
requirementsMartin DoveUniversity of Cambridge
2
Individual collaborators
Cambridge Peter Murray-Rust, Emilio Artacho,
Richard Bruin, Andrew Walker, Kat Austen, Toby
White, Andrew Walkingshaw CCLRC Rik Tyer,
Kerstin Kleese van Dam, Tom Mortimer-Jones, Ilian
Todorov Bath Steve Parker, Arnaud Marmier,
Corinne Arrouvel Reading Vassil Alexandrov,
Gareth Lewis, Ismael Bhana
3
The origins of escience
  • Several original ideas
  • Linking supercomputers to share large
    calculationsUsing spare computer cycles to
    significantly increase the amount of useful
    computer timeSharing resources leads to the idea
    of the virtual organisation

1998
4
Grids pervasive and essential
We are used to provision of transparent services
provided by a grid system
Computing is one area for which there is not yet
a transparent grid provision
5
eScience in the UK
eScience refers to the large scale science that
will increasingly be carried out through
distributed global collaborations enabled by the
Internet. Collaborative scientific enterprises
will require access to very large data
collections, very large scale computing resources
and high performance visualisation back to the
individual user scientists.
Sir John Taylor Director General of the Research
Councils 19992003
6
Our view of eScience
  • eScience refers to new science opportunities that
    may require distributed collaborations, and which
    are enabled by emerging internet technologies.
  • These technologies include grid computing,
    distributed data management and collaborative
    tools.
  • Many tools are still in the process of rapid
    development, and in some cases standards are not
    yet established.

7
Grid computing
Computing grids
Data grids
Collaborative grids
8
A scientists anarchic view of escience
  • There are many valid perspectives, from the user
    to the provider of resources
  • The eMinerals approach is to focus on scientists
    their work, their data and their collaborations
  • Tool development is tensioned against what the
    scientists use
  • Virtual organisations may resolve a number of
    technical issues pragmatically
  • Scientists get their hands dirty

9
User profile
  • Our users only want portals/GUIs for specific
    tools, not for the working environment
  • Users do not want their applications pre-wrapped
    as services they want to have complete control
    over their applications, e.g. to add capability
  • Users do not want a provider/consumer model that
    does not provide the freedom they need

10
The eMinerals project team
11
CCLRC connection
The collaboration with the eScience centre within
STFC may lead to our tools being deployed within
Diamond and ISIS
12
Monte Carlo simulations of cation ordering in
layer silicates
Simulations of cation ordering in layer silicates
based on parameterises Hamiltonian Example of
parametric study which suits grid computing
environments well
13
Examples of results
Each temperature from a different grid resource
14
Amorphous silica, SiO2
  • Amorphous silica is composed of SiO4 tetrahedra
  • Tetrahedra are linked at their corners
  • The amorphous silica configurations we work with
    have perfect connectivity

15
Compressibility maximum in amorphous silica
  • Density is not quite linear note that the
    gradient is larger in the middle of the plot than
    at either end.
  • Bulk modulus (BM)
  • BM has minimum around 2 GPa compressibility
    1/B has maximum

16
Simulated volume curves
17
Compressibility curves
18
Flexibility histograms
We use our Geometric Algebra tool to measure
rotational RUM flexibility Higher positions of
maxima indicate higher flexibility Most flexible
at intermediate pressures
19
Inverse modelling of neutron total scattering data
  • Reverse Monte Carlo modelling of total scattering
    data
  • Total scattering gives information about pair
    distribution functions, and Bragg scattering
    gives information about single particle
    distribution functions
  • The RMC method generates atomic configurations
    that are consistent with the experimental data
    (total intensity, pair distribution function,
    Bragg scattering)
  • Each simulation takes a long time ...
  • ... and we may have a lot of simulations to run

20
Data collected on GEM at ISIS
21
Orientational disorder in quartz
(JPCM 14, 4645, 2002)
22
Diffuse scattering in cristobalite
TEM measurement of diffuse scattering in 001
zone
Diffuse scattering calculated from RMC
configurations
23
Phonons from total scattering?
Example of MgO
24
Thus we have need for many simulations
  • Analysis from parametric sweeps in both
    experiments and simulations
  • Need to generate many configurations for
    statistical accuracy and analysis
  • Grid computing is excellent for this type of
    analysis ...
  • ... but what about the data, and what about
    supporting collaboration?

25
eScience Science beyond the lab book
  • Management of too many computing tasks
  • Management of the resultant data deluge
  • Sharing the information content with collaborators

eScience can help the human scientist cope,
including maintaining accuracy and accountability
26
Compute grids some of the components
  • Authentication authorisation, and job
    submission, handled by Globus

27
Our community grid
Researcher
28
Our user interface
Executable ossia2004pathToExe
/home/bob.eminerals/OSSIA2004preferredMachine
List lv1.nw-grid.ac.uk-serial
dl1.nw-grid.ac.uk-serialjobType
performancenumOfProcs 1 Output
trans.outSdir
/home/bob.eminerals/RMCSdemoSget
Sput GetEnvMetadata
trueRDesc Test sweep of temperature
using ossiaRDatasetID 263AgentX
Temperature,trans.xml/ParameterListtitle'Initia
l System'/Parametername'Temperature'AgentX
Energy,trans.xml/PropertyListlast/Prope
rtytitle'Energy'AgentX
OrderParameter,trans.xml/Modulelast/Propertyti
tle'Order parameter'AgentX
HeatCapacity,trans.xml/Modulelast/Propertytitl
e'Heat capacity'AgentX
Susceptibility,trans.xml/Modulelast/Propertyti
tle'Susceptibility'
29
Data grid the San Diego Storage Resource Broker
Distributed file management
Distributed data vaults
30
SRB client tools
  • Unix Scommands (eg Sput, Sls, Scd, Sget)
  • Web interface
  • GUI for MS Windows (InQ)

31
What the SRB has given us
  • Our scientists now expect to be able to share
    their data with collaborators ...... and they now
    expect this to be easy (ie not via a multi-stage
    process)Our scientists now routinely produce
    complete archives of files associated with a
    study easily and automaticallyWe now expect a
    single place to deposit data, and for this
    process to be easy and automatic

32
SRB the good, ambivalent and the not so good
  • Good it works easy to use easy to adapt to
    easy to add new vaults easy to set up and run
    (to a point)
  • Ambivalent gives something like a view of a file
    system, but not quite (two file spaces)
  • Not so good not well engineered performance
    issues under high load designed for permanent
    organisations rather than dynamic virtual
    organisations (ownership issues) need to manage
    points of failure

33
Researcher
4. Job runs on grid compute resources
Application server
34
Scientific collaboration
Classical molecular dynamics methods
Quantum mechanical methods
35
Data and information
?
36
Data and information sharing XML data
representation
lt?xml version"1.0" encoding"UTF-8"?gt ltcml
convention"FoX_wcml-2.0" fileId"cis1.cml"
version"2.4" xmlns"http//www.xml-cml.org/schema
"gt ltmetadataList name"Metadata"gt ltmetadata
name"Code name" content"ossia"/gt ltmetadata
name"Code version date" content"January 8,
2007, v2007.3"/gt ... lt/metadataListgt
ltmodule title"Initial System" dictRef"emininiti
alModule"gt ltparameterListgt ltparameter
dictRef"ossiatemperature" name"Temperature"gt
ltscalar dataType"xsddouble"
units"cmlUnitseV"gt1.000000000000e-1lt/scalargt
lt/parametergt ltparameter
dictRef"ossiaNumberOfSteps" name"Number of
steps"gt ltscalar dataType"xsdinteger"
units"unitscountable"gt10000000lt/scalargt
lt/parametergt ... lt/parameterListgt
lt/modulegt ... ltmodule title"Finalization"
dictRef"eminfinalModule"gt ltpropertyListgt
ltproperty dictRef"ossiaEnergy"
title"Energy"gt ltscalar
dataType"xsddouble" units"cmlUnitseV"gt2.052516
362912e-1lt/scalargt lt/propertygt ...
lt/propertyListgt lt/modulegt lt/cmlgt
Chemical Markup Language
Capturing audit metadata
Capturing initial parameters
Capturing computed properties
37
XML and Fortran
  • Most of our simulation codes are written in
    Fortran, which has little support for XML
  • Thus we have written a set of XML libraries for
    Fortran  called FoX which make writing XML
    easy
  • We have XML-ised a number of simulation codes,
    including SIESTA, CASTEP, DL_POLY and GULP
  • We have also developed an XML-aware interface to
    the SRB called TobysSRB

38
What XML gives us
  • Simulation code output that is self-describing
    (no more mere lists of numbers!)
  • XML files can be transformed to give user-centric
    and information-centric representations of data,
    including plotted data
  • XML files can have key information extracted
    easily, essential for large combinatorial studies
  • XML enables automatic capture of metadata, and
    metadata is essential for managing data

39
XML ? metadata
  • Our job submission tools automatically harvest
    metadata from our output XML files
  • We have developed a new set of tools to access
    the metadata database (RCommands)
  • We use metadata for locating data and datasets
    created by our colleagues
  • We also use metadata for extracting core
    information from data  useful for analysing
    combinatorial studies

40
RCommands and metadata
Metadata are associated with a hierarchy of
studies, datasets and data objects, both as
descriptions and as name/value pairs. Examples of
commands
  • Rls list metadata items
  • Rget get metadata
  • Rannotate add metadata
  • Rgem extract metadata from all data objects
    within a dataset

41
Scientific collaboration
Classical molecular dynamics methods
Quantum mechanical methods
42
Researcher A
Researcher B
43
Social networking sites, Web 2.0
These have the potential to revolutionise how
scientists work
  • MySpace, facebook etc ..
  • http//nature.network.com/
  • http//www.scispace.net/

44
Summary
  • Grid computing will radically change how we do
    simulation science
  • Data grid methods and XML will change how we
    share information and data
  • Web 2.0 technologies and social networking
    technologies will change how we collaborate
  • I am happy to demonstrate some of this stuff
    until I catch the bus tomorrow
Write a Comment
User Comments (0)
About PowerShow.com