NEW ERA IN TECHNICAL DATA COMMUNICATIONS: Efficient Management, Data Collection, Delivery and Standardization of Communications, and Critical Evaluation of Thermodynamic Data (A possible model for Kinetic Data) - PowerPoint PPT Presentation

About This Presentation
Title:

NEW ERA IN TECHNICAL DATA COMMUNICATIONS: Efficient Management, Data Collection, Delivery and Standardization of Communications, and Critical Evaluation of Thermodynamic Data (A possible model for Kinetic Data)

Description:

... and Critical Evaluation of Thermodynamic Data (A ... Predicted Data ab initio molecular dynamics semi-empirical quantum statistical mechanics ... – PowerPoint PPT presentation

Number of Views:229
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: NEW ERA IN TECHNICAL DATA COMMUNICATIONS: Efficient Management, Data Collection, Delivery and Standardization of Communications, and Critical Evaluation of Thermodynamic Data (A possible model for Kinetic Data)


1
NEW ERA IN TECHNICAL DATA COMMUNICATIONSEfficie
nt Management, Data Collection, Delivery and
Standardization of Communications, and Critical
Evaluation of Thermodynamic Data(A possible
model for Kinetic Data)
NIST/Gaithersburg April 21, 2004
Rob Chirico, Michael Frenkel, Vladimir Diky, Qian
Dong Thermodynamics Research Center
(TRC) National Institute of Standards and
Technology (NIST) Boulder, Colorado
2
NEW ERA IN TECHNICAL DATA COMMUNICATIONSEfficien
t Collection, Critical Evaluation, Delivery, and
Exchange of THERMODYNAMIC DATA
  • Talk Outline Description and Linkage of Major
    Components
  • SOURCE
  • Relational database for archiving experimental
    thermodynamic data
  • Guided Data Capture (GDC)
  • Software for mass-scale data collection
  • ThermoML
  • XML-based formats for efficient data delivery
    exchange
  • Thermodynamic Data Engine (TDE)
  • Software for on-demand critical evaluation
    (?Recommended Values)

All Components are Multipurpose Interconnected
3
The word data will come up a lot. What are we
talking about
  • 5 Data Types
  • True Data
  • Experimental Data
  • Predicted Data
  • Derived Data
  • Critically Evaluated Data
  • Virtual Data

2 of these will be mentioned only once
4
True Data (Hypothetical). Exact property values
for a system of defined chemical composition in a
specified state with the following
characteristics
  • unique and permanent,
  • independent of any experiment or sample, and
  • a hypothetical concept with no known values.

5
Experimental Data. Those obtained as the result
of a particular experiment on a particular sample
by a particular investigator. The feature that
distinguishes experimental data from predicted
and critically evaluated data is use of a
chemical sample including characterization of its
origin and purity.
6
Predicted Data. Those obtained through
application of a predictive model or method such
as a particular molecular dynamics, corresponding
states, or group contribution method etc.
7
Derived Data. Derived data can be defined as
property values calculated by mathematical
operations from other data, possibly including
experimental, predicted, and critically evaluated
data.
  • azeotropic properties, Henrys Law constants,
    virial coefficients, activities and activity
    coefficients, fugacities and fugacity
    coefficients, and standard properties derived
    from high-precision adiabatic heat-capacity
    calorimetry

8
  • Critically Evaluated Data. Critically evaluated
    data are recommended property values generated
    through consideration of available experimental
    and predicted data, or both.
  • there is no particular sample involved with
    critically
  • evaluated data.
  • the feature that distinguishes critically
    evaluated data
  • from predicted data is the involvement of the
    judgment
  • of a data evaluator or evaluation system.
  • no distinction between values derived by
    traditional
  • static data-evaluation methods and proposed
  • dynamic methods for critical evaluation.
  • represent the best approximation of true
  • data based on the current state of knowledge.

9
  • Virtual Data. Virtual data can be defined as
    numerical and metadata information whose of
    unknown pedigree and whose connection to a
    reality is tenuous.
  • No provision for coverage in ThermoML or any
    other aspect

10
NEW ERA IN TECHNICAL DATA COMMUNICATIONSEfficien
t Collection, Critical Evaluation, Delivery, and
Exchange of THERMODYNAMIC DATA
  • Talk Outline Description and Linkage of Major
    Components
  • SOURCE
  • Relational database for comprehensive storage of
    experimental thermodynamic data
  • Guided Data Capture (GDC)
  • Software for mass-scale data collection
  • ThermoML
  • XML-based formats for efficient data delivery
    exchange
  • Thermodynamic Data Engine (TDE)
  • Software for on-demand critical evaluation
    (?Recommended Values)

All Components are Multipurpose Interconnected
11
  • SOURCE Database
  • Relational Database (Oracle 9i)
  • ?120 Thermodynamic and Thermophysical Properties
  • Chemical systems with 1, 2, or 3 components
  • Reactions with up to 8 participants
  • Includes
  • Full bibliographic info (with article abstracts)
  • Sample descriptions (source, purification,
    purity detm methods)
  • Brief experimental method descriptions
  • Complete metadata for property spec. (phases,
    constraints, etc.)
  • Uncertainty estimates (sample, method, article
    info)
  • Numerical values
  • Data Types
  • Experimental Data only, plus selected derived
    data
  • No evaluated, predicted, or virtual data

12
  • Structure is based on the Gibbs Phase Rule
  • Accommodates a wide variety of reported data
    representations (absolute, ratio, difference, a
    variety of composition measures, etc.)

13
  • SOURCE statistics
  • The largest collection of experimental
    thermodynamic property values for pure organic
    compounds, mixtures of two and three components,
    and reactions in the world.
  • 1.5 million experimental property values
    covering
  • 17,000 pure compounds
  • 16,000 mixtures
  • 4000 reactions
  • Expansion rate ?0.4 to 0.5 million values/year
  • (through cooperation with peer-reviewed journals
    and in-house activities. This is 20 fold larger
    than any other thermodynamic data collection
    operation.)

14
(No Transcript)
15
NEW ERA IN TECHNICAL DATA COMMUNICATIONSEfficien
t Collection, Critical Evaluation, Delivery, and
Exchange of THERMODYNAMIC DATA
  • Talk Outline Description and Linkage of Major
    Components
  • SOURCE
  • Relational database for storage of experimental
    thermodynamic data
  • Guided Data Capture (GDC)
  • Software for mass-scale data collection
  • ThermoML
  • XML-based formats for efficient data delivery
    exchange
  • Thermodynamic Data Engine (TDE)
  • Software for on-demand critical evaluation
    (?Recommended Values)

All Components are Multipurpose Interconnected
16
  • Guided Data Capture (GDC) software
  • Purpose Mass-scale abstraction from the
    literature of experimental thermophysical and
    thermochemical property data for organic chemical
    systems involving one, two, and three components,
    chemical reactions, and chemical equilibria.
  • Property values are captured with a strictly
    hierarchical system based upon rigorous
    application of the thermodynamic constraints of
    the Gibbs phase rule.
  • Full traceability to source documents
  • Emphasis on data-quality issues, both in terms of
    data accuracy and data integrity
  • Simple data checks
  • Enforced and flexible data specification
    (phases, variables, constraints, compositions,
    etc.)
  • USERS In-house undergraduates (10 from CU and
    CSM)
  • Journal authors worldwide
    (JCED, JCT, and growing)

17
What does data capture entail?
(Enforced Hierarchical Structure)
  • REFERENCE (title, authors, keywords, abstract,
    etc.)
  • Integrated database of author names journal
    titles included
  • COMPOUND(S) (name, CASRN, empirical formula)
  • Integrated database of gt106 compound names
    synonyms
  • SAMPLE(S) (source, purity, purification method,
    analytical method)
  • Methods may be selected from pre-defined lists or
    entered directly
  • MIXTURE IDENTIFICATION
  • Composed of previously identified COMPOUNDS
  • PROPERTY SPECIFICATION (specification of phases,
    variables, constraints, etc.)
  • All property, phase, variable, and constraint
    selections from pre-defined lists
  • Brief method descriptions are entered through
    pre-defined lists or direct typing
  • Estimates for variable, constraint, and property
    values, if available
  • VLE and LLE data are entered with a special DATA
    TABLES form
  • NUMERICAL VALUE ENTRY
  • direct copy-and-paste operations with
    pre-existing tables
  • (HTML, PDF, ASCII, EXCEL,
    WORD, etc.)

18
Navigation Tree (User Interface)
- Grows as info is added- Any line can be
accessed for editing- Compound synonyms are
available
19
Metadata Phases, Constraints, Variables, Units,
Uncertainties
Numerical Data
Names in plain English
Graphical Representation
The Navigation Tree is in the back and is not
shown.
20
All uncertainty information is captured on a
single form. Terminology is in terms of
international recommendations The GUM
Absolute or values can be used.
21
Check data by automatic plotting
22
Review Plot
23
Result appears in the Tree
A new entry appears in the navigation tree
Continue with next data set...
24
Multiple data sets are created automatically for
over-determined systems.Common in LLE and VLE
experiments
  • One click plotting
  • Simple typo detection
  • Connect lines are automatic
  • Plot can be zoomed
  • Partial plots available

25
Summary Major Features of the GDC software?
  • Guides extraction of information from the
    literature
  • Assures full traceability from bibliographic
    info to the numerical values
  • Assures completeness through
  • Data definitions (phases, variables, etc.)
  • Consistency Checks (range, variable type, etc.)
  • Strict adherence to the Gibbs Phase Rule
  • Minimizes typing errors
  • Extensive use of pre-defined lists
  • Extensive compound author name databases are
    included
  • Tables of data are captured with simple
    cut/paste operations
  • Frees compiler from all knowledge of database
    structure formats
  • NO special codes
  • Formats are close to those of original documents
  • Conversion to standard names, formats, units,
    etc. is transparent
  • Simple graphical data display for detection of
    anomalous values
  • One click plotting for any dataset
  • Detects hidden properties automatically (one
    click)
  • e.g., pure component OR binary data within a VLE
    dataset

26
(No Transcript)
27
Available for free download from the Web
Extensive examples for specific data types are
available on the Web.
28
NEW ERA IN TECHNICAL DATA COMMUNICATIONSEfficien
t Collection, Critical Evaluation, Delivery, and
Exchange of THERMODYNAMIC DATA
  • Talk Outline Description and Linkage of Major
    Components
  • SOURCE
  • Relational database for storage of experimental
    thermodynamic data
  • Guided Data Capture (GDC)
  • Software for mass-scale data collection
  • ThermoML
  • XML-based formats for efficient data delivery
    exchange
  • Thermodynamic Data Engine (TDE)
  • Software for on-demand critical evaluation
    (?Recommended Values)

All Components are Multipurpose Interconnected
29
ThermoML XML-Based Approach to Store and
Exchange Thermophysical and Thermochemical Data
  • Developed in close cooperation with DIPPR 901
    Project
  • Scope properties of pure compounds, mixtures,
    and chemical reactions
  • Meta- and numerical data records grouped into
    nested blocks
  • Elements of the Gibbs Phase Rule at the core of
    the schema
  • IUPAC terminology used for meta- and numerical
    data tagging
  • Very limited use of abbreviations
  • Various methods of numerical data presentation
  • Types of data to be covered
  • experimental
  • critically evaluated
  • predicted
  • equation representation all with uncertainties
  • Extensive validation was done with SOURCE
  • (more than 9,000 data sets from more than 7,500
    publications)


  • 16

30
ThermoML XML-Based Approach to Store and
Exchange Thermophysical and Thermochemical Data
(continued)
  • 1) ThermoML framework for experimental data
    published
  • (Journal of Chemical and Engineering Data, 2003,
    48, 2-13)
  • 2) ThermoML extension to cover various measures
    of uncertainty conforms with Guide to the
    Expression of Uncertainty in Measurement, ISO
    International Organization for Standardization),
    October 1993
  • (Journal of Chemical and Engineering Data, 2003,
    48, 1344-1359)
  • 3) Last major extension was completed and covers
    critically evaluated, predicted data, and
    equation representations
  • (Journal of Chemical and Engineering Data, May
    2004)
  • Combination of GDC and ThermoML is used to
    generate ThermoML files for the data submitted by
    the authors, posted on the TRC Web site. (In
    place for J. Chem. Eng. Data and J. Chem.
    Thermodyn.)
  • Expansion of the cooperation with other journals
    planned to be in place by the end of 2004.

31
ThermoML was developed in cooperation with DIPPR
32
ThermoML General structure
33
ThermoML Citation Block
34
ThermoML Compound Sample Description
  • Sample description
  • source/initial purity
  • purification method(s)
  • final purity
  • purity determination method(s)

Plan incorporation of the IUPAC-NIST Chemical
Identifier (INChI)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
ThermoML Reaction Types
General catagories for easy organization
  • combustion with oxygen
  • combustion with other elements or compounds
  • addition of various compounds to unsaturated
    compounds
  • addition of water to a liquid or solid to produce
    a hydrate
  • atomization or formation from atoms
  • esterification
  • exchange of alkyl groups
  • exchange of hydrogen atoms with other groups
  • formation of a compound from elements in their
    stable state
  • halogenation - addition of or replacement by a
    halogen
  • hydrogenation - addition of hydrogen molecules to
    unsaturated compounds
  • hydrohalogenation
  • hydrolysis of ions
  • other reactions with water
  • ion exchange
  • neutralization
  • oxidation with oxidizing agents other than oxygen
  • oxidation with oxygen
  • homonuclear dimerization

39
(No Transcript)
40
ThermoML Reaction Property
Methods are always associated with each property
Experimental, Predicted, Critically Evaluated
Solvent Catalyst can be specified
41
(No Transcript)
42
ThermoML Reaction Constraints Variables
43
(No Transcript)
44
(No Transcript)
45
Paper 2. Representation of Uncertainties
46
Representation of Uncertainty in GDC ThermoML
  • All quantities related to the expression of
    uncertainty conform to the Guide to the
    Expression of Uncertainty in Measurement (a.k.a.,
    The GUM), ISO (International Organization for
    Standardization), October, 1993.
  • BIPM, IEC, IFCC (Int. Fed. of Clinical Chem.),
    ISO, IUPAC (Int. Union of Pure Appl. Chem.),
    IUPAP (Int. Union of Pure Appl. Phys.), and
    OIML
  • Uncertainties are represented for variables,
    constraints, and properties.
  • Combined uncertainties (i.e., propagated) are
    included for properties only.
  • Common representations of "precision" are
    included (repeatability, deviations from a fitted
    curve, device specifications)

47
ThermoML Specification of Uncertainty Information
Precisions
Uncertainties
48
ThermoML Uncertainty Values
49
Paper 3. Representation of Critically Evaluated
Data, Predicted Data, and Equation Representation
50
ThermoML Predicted Data
51
ThermoML Predicted Data Types
ab initio molecular dynamics semi-empirical
quantum statistical mechanics corresponding
states correlation group contribution
52
ThermoML Critically Evaluated Data
53
Structure of the ReactionData block The arrows
indicate new elements for equation representation.
54
ThermoMLEquation schema
  • Provides mathematical definition of an equation
  • Imports the established MathML schema (does not
    reinvent the wheel for mathematics)

55
Schemas
Equation Representation
Equation Representation
Files
Exploits modularity of XML
MathML Schema
MathML Schema
Mathml2.
xsd
Mathml2.
xsd
www.w3.org
www.w3.org
ThermoML Schema
ThermoML Schema
ThermoML.
xsd
ThermoML.
xsd
Import
Import
www.
trc
.
nist
.
gov
www.
trc
.
nist
.
gov
Reference
Reference
ThermoML Equation Definition
ThermoML Equation Definition
Schema
for validation
Schema
for validation
ThermoMLEquation
.
xsd
ThermoMLEquation
.
xsd
ThermoML Data Files
ThermoML Data Files
www.
trc
.
nist
.
gov
www.
trc
.
nist
.
gov
data filename.
xml
data filename.
xml
Any location
Any location
Reference
Reference
for validation
for validation
(Names the
Equation Definition File
(Names the
Equation Definition File
Equation Definition Files
Equation Definition Files
and stores variables, fitted
and stores variables, fitted
equation filename.
xml
equation filename.
xml
parameters, constants, etc.)
parameters, constants, etc.)
Internet URL
Internet URL
56
Where to find the equation definition
57
Equation Definition
Thermodynamic Data Definition
58
ThermoML has been accepted as the basis for
development of an XML-based IUPAC standard
59
Participants of the Task Group meeting for IUPAC
project 'XML-based IUPAC standard for
experimental and critically evaluated
thermodynamic property data storage and capture'.
Prof. W. A. Wakeham, Dr. A. R. H. Goodwin, Dr. A.
I. Johns Dr. M. Satyro, Dr. D. Lide, Dr. M.
Frenkel, Dr. M. Schmidt, Prof. K. N. Marsh, Dr.
J. W. Magee, Dr. J. H. Dymond.
60
(No Transcript)
61
ThermoML files are posted on the Web
Links to ThermoML files available for free
download
62
2003 JCED Issue 1 3 articles Issue 2 21
articles Issue 3 30 articles Issue 4 35 articles
Link to ThermoML file for the individual article
63
A ThermoML file available for free download
64
(No Transcript)
65
Global Data Communication Process
International Association of Chemical
Thermodynamics (IACT)
Committee on Printed Electronic Publication
(CPEP)
IUPAC
Industrial Engineers
Reader Software
Guided Data Capture (GDC)
Measurements J. Chem. Eng. Data (ACS) J. Chem.
Thermodyn. (Elsevier) Fluid Phase
Equilibria (Elsevier) Int. J. Thermophys. (Kluwer
) Thermochimica Acta (Elsevier)
Applications AspenTech (USA) Virtual Materials
Grp. (Canada) Natl Engineering Lab (UK)
WEB
Fiz Chemie (Germany)
Industry DIPPR
66
  • Talk Outline Description and Linkage of Major
    Components
  • SOURCE
  • Relational database for storage of experimental
    thermodynamic data
  • Guided Data Capture (GDC)
  • Software for mass-scale data collection
  • ThermoML
  • XML-based formats for efficient data delivery
    exchange
  • Thermodynamic Data Engine (TDE)
  • Software for critical evaluation
  • (? for on-demand Recommended Values)

67
The Data Pipeline Experimentalists ? Users
Experimental Data Sources Researchers
journals, reports, In-house research, etc.
Data Types Chemical Systems
Guided Data Capture (GDC) (Structured
Thermodynamic Data Collection software)
SOURCE (Comprehensive Experimental Data Archive)
ThermoDataEngine (TDE) (Dynamic Data Evaluation
software- ON DEMAND) Captured Art of Critical
Evaluation Models, Predictions, Correlations,
Consistency Enforcement, etc.
Errors, Inconsistencies, Redundancies
ThermoML (Data Exchange Standard)
Data Requests
Fast, flexible, on demand
User Data Requests Researchers, Process
Simulators, etc.
68
Major TDE Project Features
  • Representation of separate properties
  • Consistency enforcement
  • An imperfect data source is assumed (robust data
    rejection)
  • Extension of Results Validation with predicted
    values
  • Fully automated transparent decisions
  • Flexible default models (adjustments based on
    data quality)
  • Secondary fitting (popular alternative
    representations are provided)
  • Includes comprehensive uncertainty estimation

69
Property Blocks
  • Phase Diagram
  • Triple Point Critical T
  • Phase Boundary P
  • Volumetric
  • Critical Density
  • Saturated Single Phase Densities
  • Volumetric Coefficients
  • Energetic
  • Energy Differences
  • Energy Derivatives
  • Speed of Sound
  • Other
  • Transport Properties
  • Surface Tension
  • Refraction

70
General Algorithm
Load from SOURCE
Trivial normalization
Non-trivial normalization within block
First property block
Add predicted values
Select models fit properties
Enforce inter-block consistency
Enforce consistency within block
Process Other properties
Calculate uncertainties
Next block?
Y
N
Output
71
Property Types
Properties
e.g., Density
Single Phase Region (1 phase, 2 variables)
Triple Point (3 phases, 0 variables)
Phase Boundary (2 phases, 1 variable)
e.g., Ttp, ?Hfus
Properties
Properties
e.g., Vapor Pressure
72
Automated decisions
Need for estimated data
Add estimated values
Selection of model
Fit properties
Number of parameters
Enforce consistency within block
Recognition of bad data
Enforce inter-block consistency
Successful?
73
Thermodynamic consistency conditions
  • In-block
  • Equal vapor pressures at triple points
    slope/DHtrs consistency
  • Convergence of condensed phase boundary to triple
    point
  • Convergence of gas and liquid saturation density
    curves at Tc
  • Infinite first derivatives of saturated densities
    at Tc
  • Single phase densities converged to saturated
    densities
  • Inter-block
  • Vapor pressure Saturated densities Enthalpy
    of vaporization

74
Uncertainty Calculation
  • Use of uncertainties of primary experimental
    values
  • Re-assessment of stored uncertainties
  • Weighting of source data
  • Account for data density
  • Covariance matrices
  • Combination of statistical and experimental parts
  • Empirical adjustments

Uncertainties can be propagated in process and
equipment design...
75
Manual Structure Drawing
76
Experimental and critically evaluated (by TDE)
vapor pressures for biphenyl
? - Rejected Data
77
Liquid
Ttp
Blue lines are critically evaluated sublimation
and vapor pressures for biphenyl with consistency
enforcement. Orange Data rejected by TDE
Crystal
78
? - A particular data set
Deviation plots with experimental uncertainties
are shown with one click. Individual data sets
can be highlighted (in red) and identified. (full
traceability)
79
Enthalpy of vaporization for biphenyl
Available data is highly limited and inconsistent
(wrong slope)
This insert shows data evaluated with
uncertainties by TDE
The curve is based on the vapor pressure curve,
predicted and experimental volumetric properties,
and is constrained in slope and value at Tc.
80
Application Advantages
  • Automated generation of consistent recommended
    values
  • (on-demand results in minutes vs. months or
    years for traditional static methods)
  • Can be applied to hypothetical compounds
  • Requests for compound data can be input as drawn
    structures
  • Full set of properties for pure compounds are
    always generated (predictions w/ /-)
  • Estimated uncertainties for all recommended data
  • Can be used to develop new and validate old
    models
  • Reveals published experimental errors
  • Provides a comprehensive data source for process
    simulation

81
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com