Title: NEW ERA IN TECHNICAL DATA COMMUNICATIONS: Efficient Management, Data Collection, Delivery and Standardization of Communications, and Critical Evaluation of Thermodynamic Data (A possible model for Kinetic Data)
1NEW ERA IN TECHNICAL DATA COMMUNICATIONSEfficie
nt Management, Data Collection, Delivery and
Standardization of Communications, and Critical
Evaluation of Thermodynamic Data(A possible
model for Kinetic Data)
NIST/Gaithersburg April 21, 2004
Rob Chirico, Michael Frenkel, Vladimir Diky, Qian
Dong Thermodynamics Research Center
(TRC) National Institute of Standards and
Technology (NIST) Boulder, Colorado
2NEW ERA IN TECHNICAL DATA COMMUNICATIONSEfficien
t Collection, Critical Evaluation, Delivery, and
Exchange of THERMODYNAMIC DATA
- Talk Outline Description and Linkage of Major
Components -
- SOURCE
- Relational database for archiving experimental
thermodynamic data - Guided Data Capture (GDC)
- Software for mass-scale data collection
- ThermoML
- XML-based formats for efficient data delivery
exchange - Thermodynamic Data Engine (TDE)
- Software for on-demand critical evaluation
(?Recommended Values)
All Components are Multipurpose Interconnected
3The word data will come up a lot. What are we
talking about
- 5 Data Types
- True Data
- Experimental Data
- Predicted Data
- Derived Data
- Critically Evaluated Data
- Virtual Data
2 of these will be mentioned only once
4True Data (Hypothetical). Exact property values
for a system of defined chemical composition in a
specified state with the following
characteristics
- unique and permanent,
- independent of any experiment or sample, and
- a hypothetical concept with no known values.
5Experimental Data. Those obtained as the result
of a particular experiment on a particular sample
by a particular investigator. The feature that
distinguishes experimental data from predicted
and critically evaluated data is use of a
chemical sample including characterization of its
origin and purity.
6Predicted Data. Those obtained through
application of a predictive model or method such
as a particular molecular dynamics, corresponding
states, or group contribution method etc.
7Derived Data. Derived data can be defined as
property values calculated by mathematical
operations from other data, possibly including
experimental, predicted, and critically evaluated
data.
- azeotropic properties, Henrys Law constants,
virial coefficients, activities and activity
coefficients, fugacities and fugacity
coefficients, and standard properties derived
from high-precision adiabatic heat-capacity
calorimetry
8- Critically Evaluated Data. Critically evaluated
data are recommended property values generated
through consideration of available experimental
and predicted data, or both. - there is no particular sample involved with
critically - evaluated data.
- the feature that distinguishes critically
evaluated data - from predicted data is the involvement of the
judgment - of a data evaluator or evaluation system.
- no distinction between values derived by
traditional - static data-evaluation methods and proposed
- dynamic methods for critical evaluation.
- represent the best approximation of true
- data based on the current state of knowledge.
9- Virtual Data. Virtual data can be defined as
numerical and metadata information whose of
unknown pedigree and whose connection to a
reality is tenuous. - No provision for coverage in ThermoML or any
other aspect
10NEW ERA IN TECHNICAL DATA COMMUNICATIONSEfficien
t Collection, Critical Evaluation, Delivery, and
Exchange of THERMODYNAMIC DATA
- Talk Outline Description and Linkage of Major
Components -
- SOURCE
- Relational database for comprehensive storage of
experimental thermodynamic data - Guided Data Capture (GDC)
- Software for mass-scale data collection
- ThermoML
- XML-based formats for efficient data delivery
exchange - Thermodynamic Data Engine (TDE)
- Software for on-demand critical evaluation
(?Recommended Values)
All Components are Multipurpose Interconnected
11- SOURCE Database
- Relational Database (Oracle 9i)
- ?120 Thermodynamic and Thermophysical Properties
- Chemical systems with 1, 2, or 3 components
- Reactions with up to 8 participants
- Includes
- Full bibliographic info (with article abstracts)
- Sample descriptions (source, purification,
purity detm methods) - Brief experimental method descriptions
- Complete metadata for property spec. (phases,
constraints, etc.) - Uncertainty estimates (sample, method, article
info) - Numerical values
- Data Types
- Experimental Data only, plus selected derived
data - No evaluated, predicted, or virtual data
12- Structure is based on the Gibbs Phase Rule
- Accommodates a wide variety of reported data
representations (absolute, ratio, difference, a
variety of composition measures, etc.)
13- SOURCE statistics
- The largest collection of experimental
thermodynamic property values for pure organic
compounds, mixtures of two and three components,
and reactions in the world. - 1.5 million experimental property values
covering - 17,000 pure compounds
- 16,000 mixtures
- 4000 reactions
- Expansion rate ?0.4 to 0.5 million values/year
- (through cooperation with peer-reviewed journals
and in-house activities. This is 20 fold larger
than any other thermodynamic data collection
operation.)
14(No Transcript)
15NEW ERA IN TECHNICAL DATA COMMUNICATIONSEfficien
t Collection, Critical Evaluation, Delivery, and
Exchange of THERMODYNAMIC DATA
- Talk Outline Description and Linkage of Major
Components -
- SOURCE
- Relational database for storage of experimental
thermodynamic data - Guided Data Capture (GDC)
- Software for mass-scale data collection
- ThermoML
- XML-based formats for efficient data delivery
exchange - Thermodynamic Data Engine (TDE)
- Software for on-demand critical evaluation
(?Recommended Values)
All Components are Multipurpose Interconnected
16- Guided Data Capture (GDC) software
- Purpose Mass-scale abstraction from the
literature of experimental thermophysical and
thermochemical property data for organic chemical
systems involving one, two, and three components,
chemical reactions, and chemical equilibria. - Property values are captured with a strictly
hierarchical system based upon rigorous
application of the thermodynamic constraints of
the Gibbs phase rule. - Full traceability to source documents
- Emphasis on data-quality issues, both in terms of
data accuracy and data integrity - Simple data checks
- Enforced and flexible data specification
(phases, variables, constraints, compositions,
etc.) - USERS In-house undergraduates (10 from CU and
CSM) - Journal authors worldwide
(JCED, JCT, and growing)
17What does data capture entail?
(Enforced Hierarchical Structure)
- REFERENCE (title, authors, keywords, abstract,
etc.) - Integrated database of author names journal
titles included - COMPOUND(S) (name, CASRN, empirical formula)
- Integrated database of gt106 compound names
synonyms - SAMPLE(S) (source, purity, purification method,
analytical method) - Methods may be selected from pre-defined lists or
entered directly - MIXTURE IDENTIFICATION
- Composed of previously identified COMPOUNDS
- PROPERTY SPECIFICATION (specification of phases,
variables, constraints, etc.) - All property, phase, variable, and constraint
selections from pre-defined lists - Brief method descriptions are entered through
pre-defined lists or direct typing - Estimates for variable, constraint, and property
values, if available - VLE and LLE data are entered with a special DATA
TABLES form - NUMERICAL VALUE ENTRY
- direct copy-and-paste operations with
pre-existing tables - (HTML, PDF, ASCII, EXCEL,
WORD, etc.)
18Navigation Tree (User Interface)
- Grows as info is added- Any line can be
accessed for editing- Compound synonyms are
available
19Metadata Phases, Constraints, Variables, Units,
Uncertainties
Numerical Data
Names in plain English
Graphical Representation
The Navigation Tree is in the back and is not
shown.
20All uncertainty information is captured on a
single form. Terminology is in terms of
international recommendations The GUM
Absolute or values can be used.
21Check data by automatic plotting
22Review Plot
23Result appears in the Tree
A new entry appears in the navigation tree
Continue with next data set...
24Multiple data sets are created automatically for
over-determined systems.Common in LLE and VLE
experiments
- One click plotting
- Simple typo detection
- Connect lines are automatic
- Plot can be zoomed
- Partial plots available
25Summary Major Features of the GDC software?
- Guides extraction of information from the
literature - Assures full traceability from bibliographic
info to the numerical values - Assures completeness through
- Data definitions (phases, variables, etc.)
- Consistency Checks (range, variable type, etc.)
- Strict adherence to the Gibbs Phase Rule
- Minimizes typing errors
- Extensive use of pre-defined lists
- Extensive compound author name databases are
included - Tables of data are captured with simple
cut/paste operations - Frees compiler from all knowledge of database
structure formats - NO special codes
- Formats are close to those of original documents
- Conversion to standard names, formats, units,
etc. is transparent - Simple graphical data display for detection of
anomalous values - One click plotting for any dataset
- Detects hidden properties automatically (one
click) - e.g., pure component OR binary data within a VLE
dataset
26(No Transcript)
27Available for free download from the Web
Extensive examples for specific data types are
available on the Web.
28NEW ERA IN TECHNICAL DATA COMMUNICATIONSEfficien
t Collection, Critical Evaluation, Delivery, and
Exchange of THERMODYNAMIC DATA
- Talk Outline Description and Linkage of Major
Components -
- SOURCE
- Relational database for storage of experimental
thermodynamic data - Guided Data Capture (GDC)
- Software for mass-scale data collection
- ThermoML
- XML-based formats for efficient data delivery
exchange - Thermodynamic Data Engine (TDE)
- Software for on-demand critical evaluation
(?Recommended Values)
All Components are Multipurpose Interconnected
29ThermoML XML-Based Approach to Store and
Exchange Thermophysical and Thermochemical Data
- Developed in close cooperation with DIPPR 901
Project - Scope properties of pure compounds, mixtures,
and chemical reactions - Meta- and numerical data records grouped into
nested blocks - Elements of the Gibbs Phase Rule at the core of
the schema - IUPAC terminology used for meta- and numerical
data tagging - Very limited use of abbreviations
- Various methods of numerical data presentation
- Types of data to be covered
- experimental
- critically evaluated
- predicted
- equation representation all with uncertainties
- Extensive validation was done with SOURCE
- (more than 9,000 data sets from more than 7,500
publications) -
-
16
30ThermoML XML-Based Approach to Store and
Exchange Thermophysical and Thermochemical Data
(continued)
- 1) ThermoML framework for experimental data
published - (Journal of Chemical and Engineering Data, 2003,
48, 2-13) - 2) ThermoML extension to cover various measures
of uncertainty conforms with Guide to the
Expression of Uncertainty in Measurement, ISO
International Organization for Standardization),
October 1993 - (Journal of Chemical and Engineering Data, 2003,
48, 1344-1359) -
- 3) Last major extension was completed and covers
critically evaluated, predicted data, and
equation representations - (Journal of Chemical and Engineering Data, May
2004) - Combination of GDC and ThermoML is used to
generate ThermoML files for the data submitted by
the authors, posted on the TRC Web site. (In
place for J. Chem. Eng. Data and J. Chem.
Thermodyn.) - Expansion of the cooperation with other journals
planned to be in place by the end of 2004.
31ThermoML was developed in cooperation with DIPPR
32ThermoML General structure
33ThermoML Citation Block
34ThermoML Compound Sample Description
- Sample description
- source/initial purity
- purification method(s)
- final purity
- purity determination method(s)
Plan incorporation of the IUPAC-NIST Chemical
Identifier (INChI)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38ThermoML Reaction Types
General catagories for easy organization
- combustion with oxygen
- combustion with other elements or compounds
- addition of various compounds to unsaturated
compounds - addition of water to a liquid or solid to produce
a hydrate - atomization or formation from atoms
- esterification
- exchange of alkyl groups
- exchange of hydrogen atoms with other groups
- formation of a compound from elements in their
stable state - halogenation - addition of or replacement by a
halogen - hydrogenation - addition of hydrogen molecules to
unsaturated compounds - hydrohalogenation
- hydrolysis of ions
- other reactions with water
- ion exchange
- neutralization
- oxidation with oxidizing agents other than oxygen
- oxidation with oxygen
- homonuclear dimerization
39(No Transcript)
40ThermoML Reaction Property
Methods are always associated with each property
Experimental, Predicted, Critically Evaluated
Solvent Catalyst can be specified
41(No Transcript)
42ThermoML Reaction Constraints Variables
43(No Transcript)
44(No Transcript)
45Paper 2. Representation of Uncertainties
46Representation of Uncertainty in GDC ThermoML
- All quantities related to the expression of
uncertainty conform to the Guide to the
Expression of Uncertainty in Measurement (a.k.a.,
The GUM), ISO (International Organization for
Standardization), October, 1993. - BIPM, IEC, IFCC (Int. Fed. of Clinical Chem.),
ISO, IUPAC (Int. Union of Pure Appl. Chem.),
IUPAP (Int. Union of Pure Appl. Phys.), and
OIML - Uncertainties are represented for variables,
constraints, and properties. - Combined uncertainties (i.e., propagated) are
included for properties only. - Common representations of "precision" are
included (repeatability, deviations from a fitted
curve, device specifications)
47ThermoML Specification of Uncertainty Information
Precisions
Uncertainties
48ThermoML Uncertainty Values
49Paper 3. Representation of Critically Evaluated
Data, Predicted Data, and Equation Representation
50ThermoML Predicted Data
51ThermoML Predicted Data Types
ab initio molecular dynamics semi-empirical
quantum statistical mechanics corresponding
states correlation group contribution
52ThermoML Critically Evaluated Data
53 Structure of the ReactionData block The arrows
indicate new elements for equation representation.
54ThermoMLEquation schema
- Provides mathematical definition of an equation
- Imports the established MathML schema (does not
reinvent the wheel for mathematics)
55Schemas
Equation Representation
Equation Representation
Files
Exploits modularity of XML
MathML Schema
MathML Schema
Mathml2.
xsd
Mathml2.
xsd
www.w3.org
www.w3.org
ThermoML Schema
ThermoML Schema
ThermoML.
xsd
ThermoML.
xsd
Import
Import
www.
trc
.
nist
.
gov
www.
trc
.
nist
.
gov
Reference
Reference
ThermoML Equation Definition
ThermoML Equation Definition
Schema
for validation
Schema
for validation
ThermoMLEquation
.
xsd
ThermoMLEquation
.
xsd
ThermoML Data Files
ThermoML Data Files
www.
trc
.
nist
.
gov
www.
trc
.
nist
.
gov
data filename.
xml
data filename.
xml
Any location
Any location
Reference
Reference
for validation
for validation
(Names the
Equation Definition File
(Names the
Equation Definition File
Equation Definition Files
Equation Definition Files
and stores variables, fitted
and stores variables, fitted
equation filename.
xml
equation filename.
xml
parameters, constants, etc.)
parameters, constants, etc.)
Internet URL
Internet URL
56Where to find the equation definition
57Equation Definition
Thermodynamic Data Definition
58ThermoML has been accepted as the basis for
development of an XML-based IUPAC standard
59Participants of the Task Group meeting for IUPAC
project 'XML-based IUPAC standard for
experimental and critically evaluated
thermodynamic property data storage and capture'.
Prof. W. A. Wakeham, Dr. A. R. H. Goodwin, Dr. A.
I. Johns Dr. M. Satyro, Dr. D. Lide, Dr. M.
Frenkel, Dr. M. Schmidt, Prof. K. N. Marsh, Dr.
J. W. Magee, Dr. J. H. Dymond.
60(No Transcript)
61ThermoML files are posted on the Web
Links to ThermoML files available for free
download
622003 JCED Issue 1 3 articles Issue 2 21
articles Issue 3 30 articles Issue 4 35 articles
Link to ThermoML file for the individual article
63A ThermoML file available for free download
64(No Transcript)
65Global Data Communication Process
International Association of Chemical
Thermodynamics (IACT)
Committee on Printed Electronic Publication
(CPEP)
IUPAC
Industrial Engineers
Reader Software
Guided Data Capture (GDC)
Measurements J. Chem. Eng. Data (ACS) J. Chem.
Thermodyn. (Elsevier) Fluid Phase
Equilibria (Elsevier) Int. J. Thermophys. (Kluwer
) Thermochimica Acta (Elsevier)
Applications AspenTech (USA) Virtual Materials
Grp. (Canada) Natl Engineering Lab (UK)
WEB
Fiz Chemie (Germany)
Industry DIPPR
66- Talk Outline Description and Linkage of Major
Components -
- SOURCE
- Relational database for storage of experimental
thermodynamic data - Guided Data Capture (GDC)
- Software for mass-scale data collection
- ThermoML
- XML-based formats for efficient data delivery
exchange - Thermodynamic Data Engine (TDE)
- Software for critical evaluation
- (? for on-demand Recommended Values)
67The Data Pipeline Experimentalists ? Users
Experimental Data Sources Researchers
journals, reports, In-house research, etc.
Data Types Chemical Systems
Guided Data Capture (GDC) (Structured
Thermodynamic Data Collection software)
SOURCE (Comprehensive Experimental Data Archive)
ThermoDataEngine (TDE) (Dynamic Data Evaluation
software- ON DEMAND) Captured Art of Critical
Evaluation Models, Predictions, Correlations,
Consistency Enforcement, etc.
Errors, Inconsistencies, Redundancies
ThermoML (Data Exchange Standard)
Data Requests
Fast, flexible, on demand
User Data Requests Researchers, Process
Simulators, etc.
68Major TDE Project Features
- Representation of separate properties
- Consistency enforcement
- An imperfect data source is assumed (robust data
rejection) - Extension of Results Validation with predicted
values - Fully automated transparent decisions
- Flexible default models (adjustments based on
data quality) - Secondary fitting (popular alternative
representations are provided) - Includes comprehensive uncertainty estimation
69Property Blocks
- Phase Diagram
- Triple Point Critical T
- Phase Boundary P
- Volumetric
- Critical Density
- Saturated Single Phase Densities
- Volumetric Coefficients
- Energetic
- Energy Differences
- Energy Derivatives
- Speed of Sound
- Other
- Transport Properties
- Surface Tension
- Refraction
70General Algorithm
Load from SOURCE
Trivial normalization
Non-trivial normalization within block
First property block
Add predicted values
Select models fit properties
Enforce inter-block consistency
Enforce consistency within block
Process Other properties
Calculate uncertainties
Next block?
Y
N
Output
71Property Types
Properties
e.g., Density
Single Phase Region (1 phase, 2 variables)
Triple Point (3 phases, 0 variables)
Phase Boundary (2 phases, 1 variable)
e.g., Ttp, ?Hfus
Properties
Properties
e.g., Vapor Pressure
72Automated decisions
Need for estimated data
Add estimated values
Selection of model
Fit properties
Number of parameters
Enforce consistency within block
Recognition of bad data
Enforce inter-block consistency
Successful?
73Thermodynamic consistency conditions
- In-block
- Equal vapor pressures at triple points
slope/DHtrs consistency - Convergence of condensed phase boundary to triple
point - Convergence of gas and liquid saturation density
curves at Tc - Infinite first derivatives of saturated densities
at Tc - Single phase densities converged to saturated
densities - Inter-block
- Vapor pressure Saturated densities Enthalpy
of vaporization
74Uncertainty Calculation
- Use of uncertainties of primary experimental
values - Re-assessment of stored uncertainties
- Weighting of source data
- Account for data density
- Covariance matrices
- Combination of statistical and experimental parts
- Empirical adjustments
Uncertainties can be propagated in process and
equipment design...
75Manual Structure Drawing
76Experimental and critically evaluated (by TDE)
vapor pressures for biphenyl
? - Rejected Data
77Liquid
Ttp
Blue lines are critically evaluated sublimation
and vapor pressures for biphenyl with consistency
enforcement. Orange Data rejected by TDE
Crystal
78? - A particular data set
Deviation plots with experimental uncertainties
are shown with one click. Individual data sets
can be highlighted (in red) and identified. (full
traceability)
79Enthalpy of vaporization for biphenyl
Available data is highly limited and inconsistent
(wrong slope)
This insert shows data evaluated with
uncertainties by TDE
The curve is based on the vapor pressure curve,
predicted and experimental volumetric properties,
and is constrained in slope and value at Tc.
80Application Advantages
- Automated generation of consistent recommended
values - (on-demand results in minutes vs. months or
years for traditional static methods) - Can be applied to hypothetical compounds
- Requests for compound data can be input as drawn
structures - Full set of properties for pure compounds are
always generated (predictions w/ /-) - Estimated uncertainties for all recommended data
- Can be used to develop new and validate old
models - Reveals published experimental errors
- Provides a comprehensive data source for process
simulation
81(No Transcript)