Title: Collaboratory for Multiscale Chemical Sciences CMCS New Informatics Capabilities for CHEMKIN users
1Collaboratory for Multi-scale Chemical Sciences
(CMCS) New Informatics Capabilities for CHEMKIN
users
- David Leahy and Larry Rahn
- Sandia National Laboratories
- Combustion Symposium 2004
- University of Illinois at Chicago
- July 25, 2004
- http//cmcs.org
2Outline
- Multi-Scale, Multi-Domain Science Challenges
- The Collaboratory for Multi-Scale Sciences
(CMCS) - An adaptive informatics infrastructure
- Data and Metadata Services
- Examples
- Related projects
- Conclusions and Future Work
3Combustion is a Multi-scale Chemical Science
Challenge
- Science relies upon validated of information
shared across physical scales - New knowledge is assimilated from different data,
tools, and disciplines at each scale - Critical science lies at scale interfaces
- Impact through industrial application is mostly
at larger scales - Multi-scale scientific collaboration faces
barriers - Normal publication route is slow and excludes
much important data - Multi-scale information is complex and its
pedigree associated metadata matters - New approaches to developing and sharing
trustworthy data are needed - Community resources are highly distributed
- Complexity of multi-scale science can lead to
unnecessary duplication and impede investment
4Challenge Multi-scale science takes too long
Industrial researcher
Peer Reviewed Publication
Autoignition not predicted by chemical mechanism
NIST review Publish in data base
Evaluation Need Thermochemistry for new radical
New Mechanism developed, validated
Read paper
Peer Reviewed Publication
Conference Presentation
Evaluation Need computational data, collaborate
with Quantum Chemist
Peer Reviewed Publication of new radical
thermochemistry
Peer Reviewed Publication of computation
1 year
Time
5Shared repository speeds multi-scale communication
Industrial researcher accesses new mechanism
Peer Reviewed Publication
NIST review, Publish in data base
Annotation autoignition not predicted by
chemical mechanism
Notification New Mechanism developed, validated
Notification Results in decision to develop new
mechanism
Peer Reviewed Publication
Publicly Shared Data Repository
Parsers Translators Annotators
NIST Repository
Conference Presentation
Peer Reviewed Publication of new radical
thermochemistry
Evaluation Need computational data, collaborate
with Quantum Chemist
Peer Reviewed Publication of computation
1 year
Time
6Collaboratory for Multi-scale Chemical Science
(CMCS)
- A collaboration of 8 national labs and
universities - Chemical scientists spanning the scales from
electronic structure of molecules to simulations
of reacting flow - Computer and information scientists expert in
emerging web-based technologies - Funded by DOE/SC MICS office
- Part of the National Collaboratory Program
- Pilot project within DOE combustion research
community - In our third year, renewed through 2008
- Targets Chemical Science Community and BES SciDAC
projects with much broader goals in the longer
term
7Multi-disciplinary CMCS Team
SNL - Larry Rahn, Christine Yang, Carmen
Pancerella, David Leahy, Darrian Hale PNL -
Brett Didier, James D. Myers, Karen Schuchardt,
Theresa Windus, Carina Lansing ANL - Al Wagner,
Branko Ruscic, Gregor von Laszewski, Reinhardt
Pinzon, Kaizar Amin LLNL- William
Pitz LANL- David Montoya, Rick Knight NIST-
Thomas C. Allison MIT - William H. Green, Jr.
, Luwi Oluwole UCB - Michael Frenklach
denotes Institutional Point of
Contact CMCS Development Partnerships
SAM
National Collaboratory Program
8Goal of CMCS
New forms of data sharing, pedigree
annotation New Paradigms for collaborative
research
- Enhance chemical science research by
providing an adaptive informatics infrastructure
with an integrated set of collaboration tools,
data management tools, and chemistry-specific
applications, data resources.
Increased access to state-of-the-art research
knowledge More rapid and efficient multi-scale
scientific progress
9 CMCS Approach
- Pilot in combustion science to enable
data-centric collaboration knowledge grid - Develop Portal supporting collaboration,
community evaluation, knowledge management, and
research tools - Innovate approaches to capture and present
metadata, annotation, and semantic information - Enable data translation and data interoperability
- Emphasize lightweight just-in-time integration,
aspect-oriented design, open source, Web/grid
standards, technologies
10Reacting Flow Modeler
Thermo-chemist
Thermo-dynamics Application
CHEMKIN Application
XML-based Web technologies enable data
interoperability, metadata capture, annotation
Thermo- dynamics Data base
Thermo data
Kinetics data
Parsers Annotators Translators
Parsers Annotators
Shared Data Repository Distributed Authoring and
Versioning (WebDAV) protocol
Annotation
Annotation
Annotation
XML Thermo- dynamics Data Set
XML Kinetics Data Set
XML Transport Data Set
11CMCS Informatics
- Infrastructure Capabilities
- Collaboration
- Data/metadata management
- Annotation
- Translation
- Visualization
- Notification
- Search
- Security
12CMCS Pilot User Groups
- HCCI University Consortium Bill Pitz (LLNL)
- DNS Feature Tracking Detection David Leahy
and Larry Rahn (SNL) - Reduced Chemical Mechanisms Bill Green (MIT)
- PrIMe led by Michael Frenklach (UCB)
- NIST/PrIMe Data Warehouse
- PrIMe Library of on-demand chemistry models
- IUPAC led by Branko Ruscic (ANL)
- Develop and publish validated thermochemical data
- Real Fuels Project Wing Tsang and Tom Allison
(NIST) - Lead real fuels chemistry at NIST
- Quantum Chemistry Theresa Windus (PNNL)
13CMCS Pilot Databases, Applications
- LLNL Chemistry Database Bill Pitz (LLNL)
- Computational Result Database David Feller
(PNNL) - RIOT Reduced Chemical Mechanisms Bill Green
(MIT) - ReactionLab Michael Frenklach (UCB)
- Development and publishing chemical reaction
models, interfaced with NIST/PrIMe Data Warehouse - ATcT Active Thermochemistry Tables Branko
Ruscic (ANL) - Optimizes networks of thermochemical data
- Chemical Kinetics and Thermochemistry Database
for High-Temperature Materials Synthesis Mark
Allendorf (SNL)
14Integration of Applications Enabled by Flexible
Infrastructure
Browser
Active Table
Command line applications
Portlet API
Web service
Web service
CMCS/DAV API
XML/SOAP
Java Parser of ASCII data
Web service
XML/SOAP
15Translations in CMCS
- Extensible Stylesheet Language Transformation
(XSLT) - XML ? HTML for web viewing
- XML ? HTML for interactive applet tools
- XML ? ASCII formats for other programs
- Web Service Interface
- Command line web services, e.g. OpenBabel for
Geometry translations - Java interfaces for parsing ASCII or binary
files, e.g. Chemkin ? XML
16Application Integration in CMCS
- Portlets interfaced to web services via XML/SOAP
- Active ThermoChemistry Tables (ATcT) Branko
Ruscic (ANL) - Range Identification Optimization Tool (RIOT)
Bill Green (MIT)
17Shared Applications for Collaborative Data
Analysis
Thermochemical Active Tables (ATcT) functionality
available as a Web service accessible from an
enabled Project Team workspace in the CMCS
Portal.
18RIOT -- Reduced Kinetic Models Significantly
Reduce Cost Of Reacting Flow Simulations
11 reduced models plus the full model (model 0)
cover the 12,000 finite volumes
4x speed-up in 2-d laminar methane flame
simulation without loss of accuracy (Lu,
Bhattacharjee, Barton, Green, 2003)
19Disk-like Access to Data Using Desktop Clients
Example Lab View Application on Windows Desktop
LabView writes to Webdrive DAV Client
(http//www.southrivertech.com/) which deposits
data directly in CMCS archive.
20Shared Applications for Collaborative Data
Analysis
Thermochemical Active Tables (ATcT) functionality
available as a Web service accessible from an
enabled Project Team workspace in the CMCS
Portal.
21Data Translation and Visualization
22Summary
- CMCS provides a public data sharing collaborative
workspace for chemists - Modern XML technologies provide better ways for
scientists to share knowledge - Web-based interfaces for data and applications
- Metadata management
- Translations
- Visualization
- CMCS Pilot Groups are providing valuable feedback
to the CMCS iterative development cycle
23CMCS Data/Metadata Philosophy
- Scientific metadata has meaning across chemical
science domains - Scientific data is generally opaque and can be
somewhat meaning-free outside of a discipline - Metadata must be understood and manipulated and
formatted in a machine-comprehensible way - We are not enforcing standards
- There is no schema that spans the scales the CMCS
addresses - Enforcing standards across multiple chemistry
communities would not be pragmatic - Enforcing standards would alienate scientists
- When and if standards exist
- CMCS provides a technological framework for
standard adoption - We encourage the community to develop, review and
adopt standards - We can map our scientific content to and from
standards, as needed
24Chemical Science Data and metadata
?Hatomiz ( )
0
kcal/mol
CH3OOH
calculated, G3//B3LYP, T. Windus, more at
http//...
value and uncertainty
data
units kcal/mol
quantity enthalpy of atomization
species methylhydroperoxide, CAS 3031-73-0
temperature 0 K
calculated G3//B3LYP creator T. Windus using
Ecce more info http//avatar.emsl.pnl.gov8080/Ec
ce/.../CH3OOH/.../GxEnergy
25Scientific Data Provenance
- Data provenance (or data pedigree) -- where a
piece - of data came from and the process by which it
arrived - in the data repository is essential for the
sharing of - scientific or technical data
- Data provenance is the metadata that describes
the - datas context and provides a traceable path to
its origin - Provenance captures the identification of the
data, the traceability of the data, possibly
across scales and/or domains, as well as
information about accuracy and sensitivity - Provenance metadata is associated with CMCS
resources (WebDAV protocol, XML annotation
standards), and is browsable and searchable from
the CMCS portal - Pedigree may include the series of steps
necessary to reproduce the data ? generalized
workflow development, or virtual data - Data is linked to projects, references, inputs,
and outputs
26Metadata Title Active Tables Thermochemistry
Data Table for Methyl peroxy Contributors
Reinhardt Pinzon, Albert F. Wagner, Melita
L.Morton, Gregor von Laszewski, Sandra
Bittner, Sandeep Nijsure, Kaizar Amin, Baoshan
Wang Creation Date 2003-11-10 Creator
Branko Ruscic Keywords Thermodynamics,
molecule, species MIME Type text/xml-activetab
les-thermochemistry
Text
Whiteboard
Sound
Equations
CH3OOqueryResult.xml
references
hastranslations
O Atom Reference NASA7ElementsLexicon in
MainLibrary 0.004
Plot View (text/html)
JANAF format (text/plain)
Active Tables Bibliography in Main Library
(0.001)
hasinputs
PolyatomicRRHOLexicon
references
NetworkEncyclopedia
pitzNotesBibliography
FixedEnthalpiesCompendium
SpeciesDictionary
issanctionedby
Data Provenance Relationships as Graph
IUPAC
27Key Data/Metadata Management Components
- WebDAV -- Web Digital Authoring and Versioning
- Extension to HTTP for file management and
collaborative file sharing on remote web servers - Files/collections and properties (data about
data) - Methods GET, POST, PUT, COPY, MOVE, DELETE,
MKCOL, PROPFIND, PROPPATCH , LOCK/UNLOCK, ACLs - DASL (DAV Searching and Locating) -- search
extension for DAV - http//www.webdav.org
- SAM -- Scientific Annotation Middleware
- Built on top of WebDAV, in particular Jakarta
Slide - Automatic annotation and translation services
- Notifications (tied to email daemon in CMCS)
- Supports multiple perspectives, workflows, goals
- Different users and different applications
- Event enabled data/metadata repository
- Jim Myers, PNNL, P.I. http//www.scidac.org
/SAM
28Sample Metadata from XML Data file
- ltdctitlegtActive Tables Elements
Cookbooklt/dctitlegt - ltdcdescriptiongtThis document contains standard
thermochemical reference states for
elements/isotopes.lt/dcdescriptiongt - ltdccreatorgt
- ltrdfBaggtltrdfligtBranko Rusciclt/rdfligtlt/rdf
Baggt - lt/dccreatorgt
- ltdctermscreatedgt2003-04-06lt/dctermscreatedgt
- ltcmcsispartofprojectgt
- ltrdfBaggtltrdfligt
- ltcmcshref xlinktype"simple"
xlinktitle"methylperoxyNotes (0.001)
xlinkhref"/slide/files/projects/primeThermo/meth
ylperoxyNotes"/gt - lt/rdfligtlt/rdfBaggt
- lt/cmcsispartofprojectgt
- ltcmcshasinputsgt
- ltrdfBaggtltrdfligt
- ltcmcshref xlinktype"simple"
xlinktitle"Active Tables Bibliography in Main
Library (0.001)" xlinkhref"/slide/files/public/A
ctiveTables/MainLibrary/"/gt - lt/rdfligtlt/rdfBaggt
- lt/cmcshasinputsgt
- ltcmcsspeciescasgt
- ltrdfBaggtltrdfligt183748-02-9lt/rdfligtlt/rdfB
aggt - lt/cmcsspeciescasgt
Dublin Core Metadata Elements and Terms
CMCS Metadata Properties
29Related Projects Expanding CMCS
- Reacting Flows Feature Tracking in Numerical
Simulation Datasets - Feature detection and tracking is a data mining
approach with the motivation to extract further
scientific understanding from valuable DNS data
sets. CMCS is working with BES/SciDAC projects
here at the CRF towards adopting new standard
formats for feature data/metadata.
30Related Projects Leveraging CMCS
- DART Metadata
- CMCS teams experience with metadata management
was relevant to the DTA team successfully
reaching its Material Transparency Milestone. - DHS Data Integration
- An integrated Rad/Nuc Countermeasures System
requires the well-organized and efficient flow of
data. - C-MS3D (Outstanding NIH Proposal)
- Structure and function of biological
macromolecules is central problem in biology.
MS3D, an emerging approach, uses intra-molecular
chemical cross-linking followed by mass
spectrometric analysis to gain insights into the
structure of these macro-molecules. C-MS3D would
be a data-centric collaboration infrastructure
for the leaders of the MS3D research community.
31CMCS Explorer Search of CMCS Metadata
32Metadata at Work Data Viewer Registered With SAM
Data translations provided automatically by SAM
for this file type.
33CMCS Metadata Stored as WebDAV Properties
DAV property is a keyword/value pair
namespacetag, and a well-formed XML value.
34CMCS Metadata Use
- Metadata provides identification and
documentation to scientific data. - Example Attaching an owner, creation date,
abstract, type to data. - Example Tracking data to program versions, and
possibly bugs for that version. - Metadata documents the context and value of the
data. - Example The theoretical atomization energy of
methylhydroperoxide (and its uncertainty) from
Ecce (used as input to ATcT) contains information
identifying the species and the quantity, units,
the theoretical method used, vibrational
frequencies and geometry, reference to source
file, creator, etc. - Metadata facilitates cross-scale transfer of
data. - Example Can show a chain of inputs, including
input parameters and configuration files, across
scales. - Example Can retrieve literature references which
describe this data. - Metadata allows users to comment on the data and
its quality. - Example CMCS infrastructure can be used for
scientific peer review of data. - Metadata is necessary for effective
collaboration. - Example Scientific data becomes more usable to
others when it is documented.
Metadata, also referred to as data annotation,
converts scientific data into knowledge.
35CMCS Metadata Elements
- Using Dublin Core for some basic pedigree
properties creator, dates, publisher,
is-referenced-by, references, replaces,
is-replaced-by, has-version, etc. - Dublin Core Element Set and Qualified Dublin Core
- Use of both XML and RDF to encode metadata values
- Use of XLink to express values of hyperlinks
- CMCS properties for chemical science to enable
searching species name, CAS, chemical
properties, and chemical formula. - CMCS properties for defining scientific data
has-inputs, has-outputs, and is-part-of-project. - CMCS properties for scientific publication and
peer review annotations is-sanctioned-by. - Currently defined 36 elements in the core CMCS
pedigree. - Flexible infrastructure for addition of new
metadata. As new metadata is added to
infrastructure,current apps will not break!
CMCS metadata is strongly encouraged, though not
required, for all CMCS data, and CMCS metadata is
extensible.