Title: Collaboratory for Multiscale Chemical Science: Progress and Prospects
1Collaboratory for Multi-scale Chemical Science
Progress and Prospects
- Opportunities for Distributed Science
- A DOE National Collaboratories Workshop
- Boulder, CO
- David Leahy, Larry Rahn, Jim Myers, Carmen
Pancerella, Gregor von Laszewski, Tom Allison - December 1, 2004
2Combustion is a Multiscale Chemical Science
Challenge
- Science relies upon validated information shared
across physical scales - New information is created at each scale from
different data, tools, and disciplines - Knowledge synthesis across scales is required for
reaction and fluid models - New models enable scientific and industrial
applications at larger scales - Multi-scale collaboration faces barriers
- Community resources are highly distributed
- Multi-scale information is complex
- Pedigree and metadata matters
- Scattered in non-interoperable formats
- Scientific publication is slow and has little
means for sharing supporting data - ?Duplication of effort, impeded investment
- ?New approaches to developing and sharing
trustworthy data are needed
3CMCS is Working to Create a Chemical Science
Knowledge Grid
- A collaboration including eight national labs and
universities - Chemical scientists spanning the scales from
electronic structure of molecules to simulations
of reacting flow - Computer and information scientists expert in
semantic, collaborative, and data management
technologies - Funded by DOE/SC MICS office
- Part of the National Collaboratory Program
- Beginning fourth year, renewed through 2007
- Targets combustion community
- Broader long term goal Knowledge Environment
for Collaborative Science (KnECS)
4Multi-disciplinary CMCS Team
SNL - Larry Rahn, Christine Yang, Carmen
Pancerella, David Leahy, Darrian Hale PNL -
Brett Didier, James D. Myers, Karen Schuchardt,
Jun Li, Todd Elsethagen ANL - Gregor von
Laszewski, Reinhardt Pinzon, Branko Ruscic,
Deepti Kodeboy LLNL- William Pitz LANL- David
Montoya, Rick Knight NIST- Thomas C.
Allison MIT - William H. Green, Jr., Luwi
Oluwole UCB - Michael Frenklach, Zoran
Djurisic denotes Institutional Point of Contact
CMCS Development Partnerships
SAM
National Collaboratory Program
5CMCS integrates capabilities into an adaptable
infrastructure
- Infrastructure Capabilities
- Collaboration
- Data/metadata management
- Annotation
- Translation
- Visualization
- Notification
- Web service integration
- Search
- Security
6CMCS Technologies Data Management
- Shared Data Repository, based on Scientific
Annotation Middleware (SAM) (Jim Myers, PNNL) - Open standard digital content management (WebDAV
Web-based Distributed Authoring and Versioning) - Metadata management
- Search capabilities
- Access controls
- Versioning (coming soon)
- Jakarta Slide Servlet running in Jakarta Tomcat
- SAM capabilities
- Automatic extraction of metadata into WebDAV
properties (XSLT, DFDL) - Provenance graphing
- Dynamic translations, visualizations (XSLT, DFDL)
- Notifications (JMS, email with CMCS NED daemon)
- Integrated electronic notebook
7CMCS Technologies -- Portal
- CMCS Portal, based on CHeF collaborative
environment - Portlets using Jakarta Turbine/Velocity/Jetspeed
- Data management graphical user interface
- Data explorer
- Metadata browser/editor
- Provenance graph viewer
- Translations
- Notification subscriptions
- Access control management
- Team collaboration tools teamlets
- Shared data workspaces
- Task lists
- Schedule/calendar
- Announcements
- Chat
8CMCS Data Browser Portlet
Public data
Data URL
Metadata
Buttons for new folders, upload, copy, delete,
notification subscription, access control
management
Translations
Home and Public data
9CMCS Enabling Distributed Scientific Teams
- IUPAC Task Group on Radical Thermochemistry
establishes improved value for carbon atom
enthalpy of formation - Led by Branko Ruscic, Argonne National Laboratory
- International team Branko Ruscic, James E.
Boggs, Alexander Burcat, Attila G. Császár, Jean
Demaison, Rudolf Janoschek, Jan M. L. Martin,
Melita L. Morton, Michel J. Rossi, John F.
Stanton, Péter G. Szalay, Phillip R.
Westmoreland, Friedhelm Zabel, Tibor Bérces - Uses CMCS integrated Active Thermochemistry
Tables service - Process Informatics Model (PrIMe) establishing
the NIST/PrIMe Data Warehouse - Led by Michael Frenklach, University of
California, Berkeley - Data Warehouse will launch with GRI-Mech Methane
Mechanism, NIST Kinetics Database, and Leeds
Methane Oxidation Mechanism - Vision for a manifold of expert teams and dynamic
model provision - HCCI Engine Simulation Team utilizing new complex
chemistry, tools, models - Group formed from within the DOE HCCI Program
10CMCS Adaptive Infrastructure Enables Application
Integration
Browser, EMail
MCS Portal
Portlet API
Shared Data Repository
Portlet API
SAM
SAM Mime-type Assignment Metadata
Extraction Translation Pedigree Relationships
Web service
Active Tables
CMCS/DAV API
CMCS/DAV API
Web service
RIOT
Grid Fabric
Federation ML
NIST Kinetics DB
11Active Thermochemistry Tables (ATcT) portlet in
CMCS Portal
12Thermochemical Network
13Metadata browsing, management
14Provenance graph visualization
15Translations, Visualization/Analysis Tools
16ATcT, IUPAC Team Thermochemical Science Impact
- Refined enthalpy of formation of OH radical (8.83
/- 0.09 kcal/mol) has far-reaching implications
for gas phase and solution phase chemistry - Ruscic, B. et al., J. Phys. Chem. A 106, 2727
(2002) - Significant improvement in the value for C atom
(gas phase) enthalpy of formation (593.16 /-
0.43 kJ/mol at 0 K) establishes a new baseline
for ab initio calculations for all hydrocarbons - Ruscic, B. et al., IUPAC Critical Evaluation of
Thermochemical Properties of Selected Radicals.
Part I (2004)
17Challenges for Collaborative Science
- Adoptability Low barrier to entry
- New capabilities and technologies must be very
easy to adopt for scientists without significant
information technology experience - Adaptability Rapidly evolving technological
landscape - Knowledge Grid infrastructure must present viable
migration paths to accommodate new standards,
technologies - Behavioral challenges
- Scientists are unused to working in distributed
teams - Top-down program management push has proven
effective, for example NIH requirements for
public data - Intellectual property issues
- Data sharing requires some sort of licensing of
databases and tools, again new territory for
scientists
18Future Directions for CMCS
- Greater focus on direct support for scientists
- Data formats, metadata management
- Improved graphical user interfaces
- Expanded data entry capabilities
- Open source Knowledge Environment for
Collaborative Science (KnECS) - Three new projects adopting KnECS infrastructure
e.g., NIH project Data Sharing Tools for Protein
Structure Studies, Carmen Pancerella PI (2.5M
over 5 years) - Explore adoption of new standards
- JSR 170
- Java Server Faces