Collaboratory for Multiscale Chemical Sciences CMCS New Informatics Capabilities for CHEMKIN users - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Collaboratory for Multiscale Chemical Sciences CMCS New Informatics Capabilities for CHEMKIN users

Description:

13. June 2004. http://cmcs.org/ CMCS Pilot Databases, Applications ... Development and publishing chemical reaction models, interfaced with NIST/PrIMe Data Warehouse ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 36
Provided by: San2
Category:

less

Transcript and Presenter's Notes

Title: Collaboratory for Multiscale Chemical Sciences CMCS New Informatics Capabilities for CHEMKIN users


1
Collaboratory for Multi-scale Chemical Sciences
(CMCS) New Informatics Capabilities for CHEMKIN
users
  • David Leahy and Larry Rahn
  • Sandia National Laboratories
  • Combustion Symposium 2004
  • University of Illinois at Chicago
  • July 25, 2004
  • http//cmcs.org

2
Outline
  • Multi-Scale, Multi-Domain Science Challenges
  • The Collaboratory for Multi-Scale Sciences
    (CMCS)
  • An adaptive informatics infrastructure
  • Data and Metadata Services
  • Examples
  • Related projects
  • Conclusions and Future Work

3
Combustion is a Multi-scale Chemical Science
Challenge
  • Science relies upon validated of information
    shared across physical scales
  • New knowledge is assimilated from different data,
    tools, and disciplines at each scale
  • Critical science lies at scale interfaces
  • Impact through industrial application is mostly
    at larger scales
  • Multi-scale scientific collaboration faces
    barriers
  • Normal publication route is slow and excludes
    much important data
  • Multi-scale information is complex and its
    pedigree associated metadata matters
  • New approaches to developing and sharing
    trustworthy data are needed
  • Community resources are highly distributed
  • Complexity of multi-scale science can lead to
    unnecessary duplication and impede investment

4
Challenge Multi-scale science takes too long
Industrial researcher
Peer Reviewed Publication
Autoignition not predicted by chemical mechanism
NIST review Publish in data base
Evaluation Need Thermochemistry for new radical
New Mechanism developed, validated
Read paper
Peer Reviewed Publication
Conference Presentation
Evaluation Need computational data, collaborate
with Quantum Chemist
Peer Reviewed Publication of new radical
thermochemistry
Peer Reviewed Publication of computation
1 year
Time
5
Shared repository speeds multi-scale communication
Industrial researcher accesses new mechanism
Peer Reviewed Publication
NIST review, Publish in data base
Annotation autoignition not predicted by
chemical mechanism
Notification New Mechanism developed, validated
Notification Results in decision to develop new
mechanism
Peer Reviewed Publication
Publicly Shared Data Repository
Parsers Translators Annotators
NIST Repository
Conference Presentation
Peer Reviewed Publication of new radical
thermochemistry
Evaluation Need computational data, collaborate
with Quantum Chemist
Peer Reviewed Publication of computation
1 year
Time
6
Collaboratory for Multi-scale Chemical Science
(CMCS)
  • A collaboration of 8 national labs and
    universities
  • Chemical scientists spanning the scales from
    electronic structure of molecules to simulations
    of reacting flow
  • Computer and information scientists expert in
    emerging web-based technologies
  • Funded by DOE/SC MICS office
  • Part of the National Collaboratory Program
  • Pilot project within DOE combustion research
    community
  • In our third year, renewed through 2008
  • Targets Chemical Science Community and BES SciDAC
    projects with much broader goals in the longer
    term

7
Multi-disciplinary CMCS Team
SNL - Larry Rahn, Christine Yang, Carmen
Pancerella, David Leahy, Darrian Hale PNL -
Brett Didier, James D. Myers, Karen Schuchardt,
Theresa Windus, Carina Lansing ANL - Al Wagner,
Branko Ruscic, Gregor von Laszewski, Reinhardt
Pinzon, Kaizar Amin LLNL- William
Pitz LANL- David Montoya, Rick Knight NIST-
Thomas C. Allison MIT - William H. Green, Jr.
, Luwi Oluwole UCB - Michael Frenklach
denotes Institutional Point of
Contact CMCS Development Partnerships
SAM
National Collaboratory Program
8
Goal of CMCS
New forms of data sharing, pedigree
annotation New Paradigms for collaborative
research
  • Enhance chemical science research by
    providing an adaptive informatics infrastructure
    with an integrated set of collaboration tools,
    data management tools, and chemistry-specific
    applications, data resources.

Increased access to state-of-the-art research
knowledge More rapid and efficient multi-scale
scientific progress
9

CMCS Approach
  • Pilot in combustion science to enable
    data-centric collaboration knowledge grid
  • Develop Portal supporting collaboration,
    community evaluation, knowledge management, and
    research tools
  • Innovate approaches to capture and present
    metadata, annotation, and semantic information
  • Enable data translation and data interoperability
  • Emphasize lightweight just-in-time integration,
    aspect-oriented design, open source, Web/grid
    standards, technologies

10
Reacting Flow Modeler
Thermo-chemist
Thermo-dynamics Application
CHEMKIN Application
XML-based Web technologies enable data
interoperability, metadata capture, annotation
Thermo- dynamics Data base
Thermo data
Kinetics data
Parsers Annotators Translators
Parsers Annotators
Shared Data Repository Distributed Authoring and
Versioning (WebDAV) protocol
Annotation
Annotation
Annotation

XML Thermo- dynamics Data Set
XML Kinetics Data Set
XML Transport Data Set
11
CMCS Informatics
  • Infrastructure Capabilities
  • Collaboration
  • Data/metadata management
  • Annotation
  • Translation
  • Visualization
  • Notification
  • Search
  • Security

12
CMCS Pilot User Groups
  • HCCI University Consortium Bill Pitz (LLNL)
  • DNS Feature Tracking Detection David Leahy
    and Larry Rahn (SNL)
  • Reduced Chemical Mechanisms Bill Green (MIT)
  • PrIMe led by Michael Frenklach (UCB)
  • NIST/PrIMe Data Warehouse
  • PrIMe Library of on-demand chemistry models
  • IUPAC led by Branko Ruscic (ANL)
  • Develop and publish validated thermochemical data
  • Real Fuels Project Wing Tsang and Tom Allison
    (NIST)
  • Lead real fuels chemistry at NIST
  • Quantum Chemistry Theresa Windus (PNNL)

13
CMCS Pilot Databases, Applications
  • LLNL Chemistry Database Bill Pitz (LLNL)
  • Computational Result Database David Feller
    (PNNL)
  • RIOT Reduced Chemical Mechanisms Bill Green
    (MIT)
  • ReactionLab Michael Frenklach (UCB)
  • Development and publishing chemical reaction
    models, interfaced with NIST/PrIMe Data Warehouse
  • ATcT Active Thermochemistry Tables Branko
    Ruscic (ANL)
  • Optimizes networks of thermochemical data
  • Chemical Kinetics and Thermochemistry Database
    for High-Temperature Materials Synthesis Mark
    Allendorf (SNL)

14
Integration of Applications Enabled by Flexible
Infrastructure
Browser
Active Table
Command line applications
Portlet API
Web service
Web service
CMCS/DAV API
XML/SOAP
Java Parser of ASCII data
Web service
XML/SOAP
15
Translations in CMCS
  • Extensible Stylesheet Language Transformation
    (XSLT)
  • XML ? HTML for web viewing
  • XML ? HTML for interactive applet tools
  • XML ? ASCII formats for other programs
  • Web Service Interface
  • Command line web services, e.g. OpenBabel for
    Geometry translations
  • Java interfaces for parsing ASCII or binary
    files, e.g. Chemkin ? XML

16
Application Integration in CMCS
  • Portlets interfaced to web services via XML/SOAP
  • Active ThermoChemistry Tables (ATcT) Branko
    Ruscic (ANL)
  • Range Identification Optimization Tool (RIOT)
    Bill Green (MIT)

17
Shared Applications for Collaborative Data
Analysis
Thermochemical Active Tables (ATcT) functionality
available as a Web service accessible from an
enabled Project Team workspace in the CMCS
Portal.
18
RIOT -- Reduced Kinetic Models Significantly
Reduce Cost Of Reacting Flow Simulations
11 reduced models plus the full model (model 0)
cover the 12,000 finite volumes
4x speed-up in 2-d laminar methane flame
simulation without loss of accuracy (Lu,
Bhattacharjee, Barton, Green, 2003)
19
Disk-like Access to Data Using Desktop Clients
Example Lab View Application on Windows Desktop
LabView writes to Webdrive DAV Client
(http//www.southrivertech.com/) which deposits
data directly in CMCS archive.
20
Shared Applications for Collaborative Data
Analysis
Thermochemical Active Tables (ATcT) functionality
available as a Web service accessible from an
enabled Project Team workspace in the CMCS
Portal.
21
Data Translation and Visualization
22
Summary
  • CMCS provides a public data sharing collaborative
    workspace for chemists
  • Modern XML technologies provide better ways for
    scientists to share knowledge
  • Web-based interfaces for data and applications
  • Metadata management
  • Translations
  • Visualization
  • CMCS Pilot Groups are providing valuable feedback
    to the CMCS iterative development cycle

23
CMCS Data/Metadata Philosophy
  • Scientific metadata has meaning across chemical
    science domains
  • Scientific data is generally opaque and can be
    somewhat meaning-free outside of a discipline
  • Metadata must be understood and manipulated and
    formatted in a machine-comprehensible way
  • We are not enforcing standards
  • There is no schema that spans the scales the CMCS
    addresses
  • Enforcing standards across multiple chemistry
    communities would not be pragmatic
  • Enforcing standards would alienate scientists
  • When and if standards exist
  • CMCS provides a technological framework for
    standard adoption
  • We encourage the community to develop, review and
    adopt standards
  • We can map our scientific content to and from
    standards, as needed

24
Chemical Science Data and metadata
?Hatomiz ( )
0

kcal/mol
CH3OOH
calculated, G3//B3LYP, T. Windus, more at
http//...
value and uncertainty
data
units kcal/mol
quantity enthalpy of atomization
species methylhydroperoxide, CAS 3031-73-0
temperature 0 K
calculated G3//B3LYP creator T. Windus using
Ecce more info http//avatar.emsl.pnl.gov8080/Ec
ce/.../CH3OOH/.../GxEnergy
25
Scientific Data Provenance
  • Data provenance (or data pedigree) -- where a
    piece
  • of data came from and the process by which it
    arrived
  • in the data repository is essential for the
    sharing of
  • scientific or technical data
  • Data provenance is the metadata that describes
    the
  • datas context and provides a traceable path to
    its origin
  • Provenance captures the identification of the
    data, the traceability of the data, possibly
    across scales and/or domains, as well as
    information about accuracy and sensitivity
  • Provenance metadata is associated with CMCS
    resources (WebDAV protocol, XML annotation
    standards), and is browsable and searchable from
    the CMCS portal
  • Pedigree may include the series of steps
    necessary to reproduce the data ? generalized
    workflow development, or virtual data
  • Data is linked to projects, references, inputs,
    and outputs

26
Metadata Title Active Tables Thermochemistry
Data Table for Methyl peroxy Contributors
Reinhardt Pinzon, Albert F. Wagner, Melita
L.Morton, Gregor von Laszewski, Sandra
Bittner, Sandeep Nijsure, Kaizar Amin, Baoshan
Wang Creation Date 2003-11-10 Creator
Branko Ruscic Keywords Thermodynamics,
molecule, species MIME Type text/xml-activetab
les-thermochemistry
Text
Whiteboard
Sound
Equations
CH3OOqueryResult.xml
references
hastranslations
O Atom Reference NASA7ElementsLexicon in
MainLibrary 0.004
Plot View (text/html)
JANAF format (text/plain)
Active Tables Bibliography in Main Library
(0.001)
hasinputs
PolyatomicRRHOLexicon
references
NetworkEncyclopedia
pitzNotesBibliography
FixedEnthalpiesCompendium
SpeciesDictionary
issanctionedby
Data Provenance Relationships as Graph
IUPAC
27
Key Data/Metadata Management Components
  • WebDAV -- Web Digital Authoring and Versioning
  • Extension to HTTP for file management and
    collaborative file sharing on remote web servers
  • Files/collections and properties (data about
    data)
  • Methods GET, POST, PUT, COPY, MOVE, DELETE,
    MKCOL, PROPFIND, PROPPATCH , LOCK/UNLOCK, ACLs
  • DASL (DAV Searching and Locating) -- search
    extension for DAV
  • http//www.webdav.org
  • SAM -- Scientific Annotation Middleware
  • Built on top of WebDAV, in particular Jakarta
    Slide
  • Automatic annotation and translation services
  • Notifications (tied to email daemon in CMCS)
  • Supports multiple perspectives, workflows, goals
  • Different users and different applications
  • Event enabled data/metadata repository
  • Jim Myers, PNNL, P.I. http//www.scidac.org
    /SAM

28
Sample Metadata from XML Data file
  • ltdctitlegtActive Tables Elements
    Cookbooklt/dctitlegt
  • ltdcdescriptiongtThis document contains standard
    thermochemical reference states for
    elements/isotopes.lt/dcdescriptiongt
  • ltdccreatorgt
  • ltrdfBaggtltrdfligtBranko Rusciclt/rdfligtlt/rdf
    Baggt
  • lt/dccreatorgt
  • ltdctermscreatedgt2003-04-06lt/dctermscreatedgt
  • ltcmcsispartofprojectgt
  • ltrdfBaggtltrdfligt
  • ltcmcshref xlinktype"simple"
    xlinktitle"methylperoxyNotes (0.001)
    xlinkhref"/slide/files/projects/primeThermo/meth
    ylperoxyNotes"/gt
  • lt/rdfligtlt/rdfBaggt
  • lt/cmcsispartofprojectgt
  • ltcmcshasinputsgt
  • ltrdfBaggtltrdfligt
  • ltcmcshref xlinktype"simple"
    xlinktitle"Active Tables Bibliography in Main
    Library (0.001)" xlinkhref"/slide/files/public/A
    ctiveTables/MainLibrary/"/gt
  • lt/rdfligtlt/rdfBaggt
  • lt/cmcshasinputsgt
  • ltcmcsspeciescasgt
  • ltrdfBaggtltrdfligt183748-02-9lt/rdfligtlt/rdfB
    aggt
  • lt/cmcsspeciescasgt

Dublin Core Metadata Elements and Terms
CMCS Metadata Properties
29
Related Projects Expanding CMCS
  • Reacting Flows Feature Tracking in Numerical
    Simulation Datasets
  • Feature detection and tracking is a data mining
    approach with the motivation to extract further
    scientific understanding from valuable DNS data
    sets. CMCS is working with BES/SciDAC projects
    here at the CRF towards adopting new standard
    formats for feature data/metadata.

30
Related Projects Leveraging CMCS
  • DART Metadata
  • CMCS teams experience with metadata management
    was relevant to the DTA team successfully
    reaching its Material Transparency Milestone.
  • DHS Data Integration
  • An integrated Rad/Nuc Countermeasures System
    requires the well-organized and efficient flow of
    data.
  • C-MS3D (Outstanding NIH Proposal)
  • Structure and function of biological
    macromolecules is central problem in biology.
    MS3D, an emerging approach, uses intra-molecular
    chemical cross-linking followed by mass
    spectrometric analysis to gain insights into the
    structure of these macro-molecules. C-MS3D would
    be a data-centric collaboration infrastructure
    for the leaders of the MS3D research community.

31
CMCS Explorer Search of CMCS Metadata
32
Metadata at Work Data Viewer Registered With SAM
Data translations provided automatically by SAM
for this file type.
33
CMCS Metadata Stored as WebDAV Properties
DAV property is a keyword/value pair
namespacetag, and a well-formed XML value.
34
CMCS Metadata Use
  • Metadata provides identification and
    documentation to scientific data.
  • Example Attaching an owner, creation date,
    abstract, type to data.
  • Example Tracking data to program versions, and
    possibly bugs for that version.
  • Metadata documents the context and value of the
    data.
  • Example The theoretical atomization energy of
    methylhydroperoxide (and its uncertainty) from
    Ecce (used as input to ATcT) contains information
    identifying the species and the quantity, units,
    the theoretical method used, vibrational
    frequencies and geometry, reference to source
    file, creator, etc.
  • Metadata facilitates cross-scale transfer of
    data.
  • Example Can show a chain of inputs, including
    input parameters and configuration files, across
    scales.
  • Example Can retrieve literature references which
    describe this data.
  • Metadata allows users to comment on the data and
    its quality.
  • Example CMCS infrastructure can be used for
    scientific peer review of data.
  • Metadata is necessary for effective
    collaboration.
  • Example Scientific data becomes more usable to
    others when it is documented.

Metadata, also referred to as data annotation,
converts scientific data into knowledge.
35
CMCS Metadata Elements
  • Using Dublin Core for some basic pedigree
    properties creator, dates, publisher,
    is-referenced-by, references, replaces,
    is-replaced-by, has-version, etc.
  • Dublin Core Element Set and Qualified Dublin Core
  • Use of both XML and RDF to encode metadata values
  • Use of XLink to express values of hyperlinks
  • CMCS properties for chemical science to enable
    searching species name, CAS, chemical
    properties, and chemical formula.
  • CMCS properties for defining scientific data
    has-inputs, has-outputs, and is-part-of-project.
  • CMCS properties for scientific publication and
    peer review annotations is-sanctioned-by.
  • Currently defined 36 elements in the core CMCS
    pedigree.
  • Flexible infrastructure for addition of new
    metadata. As new metadata is added to
    infrastructure,current apps will not break!

CMCS metadata is strongly encouraged, though not
required, for all CMCS data, and CMCS metadata is
extensible.
Write a Comment
User Comments (0)
About PowerShow.com