0f0ed6bb8e7d1e624da97fc11b2ff8b8.ppt
- Количество слайдов: 81
NIST/Gaithersburg April 21, 2004 NEW ERA IN TECHNICAL DATA COMMUNICATIONS: Efficient Management, Data Collection, Delivery and Standardization of Communications, and Critical Evaluation of Thermodynamic Data (A possible model for Kinetic Data) Rob Chirico, Michael Frenkel, Vladimir Diky, Qian Dong Thermodynamics Research Center (TRC) National Institute of Standards and Technology (NIST) Boulder, Colorado Chemical Science and Technology Laboratory
NEW ERA IN TECHNICAL DATA COMMUNICATIONS: Efficient Collection, Critical Evaluation, Delivery, and Exchange of THERMODYNAMIC DATA Talk Outline – Description and Linkage of Major Components • SOURCE: Ø Relational database for archiving experimental thermodynamic data • Guided Data Capture (GDC): Ø Software for mass-scale data collection • Thermo. ML: Ø XML-based formats for efficient data delivery & exchange • Thermodynamic Data Engine (TDE): Ø Software for on-demand critical evaluation ( Recommended Values) All Components are Multipurpose & Interconnected Chemical Science and Technology Laboratory
The word data will come up a lot. What are we talking about… 5 Data Types: Ø True Data Ø Experimental Data Ø Predicted Data Ø Derived Data Ø Critically Evaluated Data Ø Virtual Data 2 of these will be mentioned only once… Chemical Science and Technology Laboratory
True Data (Hypothetical). Exact property values for a system of defined chemical composition in a specified state with the following characteristics… • unique and permanent, • independent of any experiment or sample, and • a hypothetical concept with no known values. Chemical Science and Technology Laboratory
Experimental Data. Those obtained as the result of a particular experiment on a particular sample by a particular investigator. The feature that distinguishes experimental data from predicted and critically evaluated data is use of a chemical sample including characterization of its origin and purity. Chemical Science and Technology Laboratory
Predicted Data. Those obtained through application of a predictive model or method such as a particular molecular dynamics, corresponding states, or group contribution method etc. Chemical Science and Technology Laboratory
Derived Data. Derived data can be defined as property values calculated by mathematical operations from other data, possibly including experimental, predicted, and critically evaluated data. • azeotropic properties, Henry’s Law constants, virial coefficients, activities and activity coefficients, fugacities and fugacity coefficients, and standard properties derived from highprecision adiabatic heat-capacity calorimetry Chemical Science and Technology Laboratory
Critically Evaluated Data. Critically evaluated data are recommended property values generated through consideration of available experimental and predicted data, or both. • there is no particular sample involved with critically evaluated data. • the feature that distinguishes critically evaluated data from predicted data is the involvement of the judgment of a data evaluator or evaluation system. • no distinction between values derived by traditional ‘static’ data-evaluation methods and proposed ‘dynamic’ methods for critical evaluation. • represent the best approximation of ‘true’ data based on the current state of knowledge. Chemical Science and Technology Laboratory
Virtual Data. Virtual data can be defined as numerical and metadata information whose of unknown pedigree and whose connection to a reality is tenuous. • No provision for coverage in Thermo. ML or any other aspect Chemical Science and Technology Laboratory
NEW ERA IN TECHNICAL DATA COMMUNICATIONS: Efficient Collection, Critical Evaluation, Delivery, and Exchange of THERMODYNAMIC DATA Talk Outline – Description and Linkage of Major Components • SOURCE: Ø Relational database for comprehensive storage of experimental thermodynamic data • Guided Data Capture (GDC): Ø Software for mass-scale data collection • Thermo. ML: Ø XML-based formats for efficient data delivery & exchange • Thermodynamic Data Engine (TDE): Ø Software for on-demand critical evaluation ( Recommended Values) All Components are Multipurpose & Interconnected Chemical Science and Technology Laboratory
SOURCE Database • Relational Database (Oracle 9 i) • 120 Thermodynamic and Thermophysical Properties Ø Chemical systems with 1, 2, or 3 components Ø Reactions with up to 8 participants • Includes Ø Full bibliographic info (with article abstracts) Ø Sample descriptions (source, purification, purity det’m methods) Ø Brief experimental method descriptions Ø Complete metadata for property spec. (phases, constraints, etc. ) Ø Uncertainty estimates (sample, method, article info) Ø Numerical values • Data Types Ø Experimental Data only, plus selected derived data Ø No evaluated, predicted, or virtual data • Purpose: Data archive for use with other components. Chemical Science and Technology Laboratory
• Structure is based on the Gibbs Phase Rule • Accommodates a wide variety of reported data representations (absolute, ratio, difference, a variety of composition measures, etc. ) Chemical Science and Technology Laboratory
SOURCE statistics: • The largest collection of experimental thermodynamic property values for pure organic compounds, mixtures of two and three components, and reactions in the world. • 1. 5 million experimental property values covering: Ø 17, 000 pure compounds Ø 16, 000 mixtures Ø 4000 reactions • Expansion rate 0. 4 to 0. 5 million values/year (through cooperation with peer-reviewed journals and in-house activities. This is 20 fold larger than any othermodynamic data collection operation. ) Chemical Science and Technology Laboratory
SOURCE DATA ENTRY PLANNED PROGRESS TARGET: 3 MILLION DATA POINTS BY 2006 2002 2001 2003 2004 future years 2006 2005 Chemical Science and Technology Laboratory
NEW ERA IN TECHNICAL DATA COMMUNICATIONS: Efficient Collection, Critical Evaluation, Delivery, and Exchange of THERMODYNAMIC DATA Talk Outline – Description and Linkage of Major Components • SOURCE: Ø Relational database for storage of experimental thermodynamic data • Guided Data Capture (GDC): Ø Software for mass-scale data collection • Thermo. ML: Ø XML-based formats for efficient data delivery & exchange • Thermodynamic Data Engine (TDE): Ø Software for on-demand critical evaluation ( Recommended Values) All Components are Multipurpose & Interconnected Chemical Science and Technology Laboratory
Guided Data Capture (GDC) software Purpose: Mass-scale abstraction from the literature of experimental thermophysical and thermochemical property data for organic chemical systems involving one, two, and three components, chemical reactions, and chemical equilibria. • Property values are captured with a strictly hierarchical system based upon rigorous application of thermodynamic constraints of the Gibbs phase rule. • Full traceability to source documents • Emphasis on data-quality issues, both in terms of data accuracy and data integrity Ø Simple data checks Ø Enforced and flexible data specification (phases, variables, constraints, compositions, etc. ) • USERS: In-house– undergraduates (10 from CU and CSM) Journal authors worldwide (JCED, JCT, and growing) Chemical Science and Technology Laboratory
What does data “capture” entail? (Enforced Hierarchical Structure) (1) REFERENCE (title, authors, keywords, abstract, etc. ) • Integrated database of author names & journal titles included (2) COMPOUND(S) (name, CASRN, empirical formula) • Integrated database of >106 compound names & synonyms (3) SAMPLE(S) (source, purity, purification method, analytical method) • Methods may be selected from pre-defined lists or entered directly (4) MIXTURE IDENTIFICATION • Composed of previously identified COMPOUNDS (5) PROPERTY SPECIFICATION (specification of phases, variables, constraints, etc. ) (1) All property, phase, variable, and constraint selections from pre-defined lists (2) Brief method descriptions are entered through pre-defined lists or direct typing (3) Estimates for variable, constraint, and property values, if available (4) VLE and LLE data are entered with a special “DATA TABLES” form (6) NUMERICAL VALUE ENTRY (1) direct “copy-and-paste” operations with pre-existing tables (HTML, PDF, ASCII, EXCEL, WORD, etc. ) Chemical Science and Technology Laboratory
Navigation Tree: (“User Interface”) - Grows as info is added - Any line can be accessed for editing - Compound synonyms are available Chemical Science and Technology Laboratory
Metadata: Phases, Constraints, Variables, Units, Uncertainties Numerical Data Names in plain English Graphical Representation The Navigation Tree is in the back and is not shown. Chemical Science and Technology Laboratory
All uncertainty information is captured on a single form. Terminology is in terms of international recommendations: The GUM Absolute or % values can be used. Chemical Science and Technology Laboratory
Check data by automatic plotting Chemical Science and Technology Laboratory
Review Plot Chemical Science and Technology Laboratory
Result appears in the “Tree” A new entry appears in the navigation tree Continue with next data set. . . Chemical Science and Technology Laboratory
Multiple data sets are created automatically for over-determined systems. Common in LLE and VLE experiments “One click” plotting: -Simple typo detection -Connect lines are automatic -Plot can be “zoomed” -Partial plots available Chemical Science and Technology Laboratory
Summary: Major Features of the GDC software? • Guides extraction of information from the literature • Assures full traceability from bibliographic info to the numerical values • Assures completeness through • Data definitions (phases, variables, etc. ) • Consistency Checks (range, variable type, etc. ) • Strict adherence to the Gibbs Phase Rule • Minimizes typing errors • Extensive use of pre-defined lists • Extensive compound & author name databases are included • Tables of data are captured with simple cut/paste operations • Frees “compiler” from all knowledge of database structure & formats • NO special codes • Formats are close to those of original documents • Conversion to standard names, formats, units, etc. is transparent • Simple graphical data display for detection of anomalous values • One click plotting for any dataset • Detects “hidden” properties automatically (one click) • e. g. , pure component OR binary data within a VLE dataset Chemical Science and Technology Laboratory
Chemical Science and Technology Laboratory
Available for free download from the Web Extensive examples for specific data types are available on the Web. Chemical Science and Technology Laboratory
NEW ERA IN TECHNICAL DATA COMMUNICATIONS: Efficient Collection, Critical Evaluation, Delivery, and Exchange of THERMODYNAMIC DATA Talk Outline – Description and Linkage of Major Components • SOURCE: Ø Relational database for storage of experimental thermodynamic data • Guided Data Capture (GDC): Ø Software for mass-scale data collection • Thermo. ML: Ø XML-based formats for efficient data delivery & exchange • Thermodynamic Data Engine (TDE): Ø Software for on-demand critical evaluation ( Recommended Values) All Components are Multipurpose & Interconnected Chemical Science and Technology Laboratory
Thermo. ML – XML-Based Approach to Store and Exchange Thermophysical and Thermochemical Data Ø Developed in close cooperation with DIPPR 901 Project Ø Scope: properties of pure compounds, mixtures, and chemical reactions Ø Meta- and numerical data records grouped into ‘nested blocks’ Ø Elements of the Gibbs Phase Rule the ‘core’ of the schema at Ø IUPAC terminology used for meta- and numerical data tagging Ø Very limited use of abbreviations Ø Various methods of numerical data presentation Ø Types of data to be covered: experimental critically evaluated predicted equation representation all with uncertainties Ø Extensivevalidation was done with SOURCE (more than 9, 000 data sets from more than 7, 500 publications) Chemical Science 16 and Technology Laboratory
Thermo. ML – XML-Based Approach to Store and Exchange Thermophysical and Thermochemical Data (continued) Ø 1) Thermo. ML framework for experimental data published (Journal of Chemical and Engineering Data , 2003, 48, 2 - 3) 1 Ø 2) Thermo. ML extension cover various to measures ofuncertaintyconforms with Guide to the Expression of Uncertainty in Measurement , ISO International Organization for Standardization), October 1993 (Journal of Chemical and Engineering Data , 2003, 48, 1344 -1359) Ø 3) Last major extension was completed and covers critically evaluated, predicted data, and equation representations (Journal of Chemical and Engineering Data 2004) , May Ø Combination of GDC and Thermo. MLis used to generate Thermo. MLfiles for the data submitted by the authors, posted on the TRC Web site. (In place for J. Chem. Eng. Data and J. Chem. Thermodyn. ) Ø Expansion of the cooperation with other journals planned to be in place by the end of 2004. Chemical Science and Technology Laboratory
Thermo. ML was developed in cooperation with DIPPR Chemical Science and Technology Laboratory
Thermo. ML: General structure Chemical Science and Technology Laboratory
Thermo. ML: Citation Block Chemical Science and Technology Laboratory
Thermo. ML: Compound & Sample Description Sample description: • source/initial purity • purification method(s) • final purity • purity determination method(s) Plan incorporation of the IUPAC-NIST Chemical Identifier (INCh. I) Chemical Science and Technology Laboratory
Thermo. ML: Reaction Data Block Chemical Science and Technology Laboratory
Thermo. ML: Reaction Participant Chemical Science and Technology Laboratory
Thermo. ML: Reaction Data Block Chemical Science and Technology Laboratory
Thermo. ML: Reaction Types General catagories for easy organization • combustion with oxygen • combustion with other elements or compounds • addition of various compounds to unsaturated compounds • addition of water to a liquid or solid to produce a hydrate • atomization or formation from atoms • esterification • exchange of alkyl groups • exchange of hydrogen atoms with other groups • formation of a compound from elements in their stable state • halogenation - addition of or replacement by a halogen • hydrogenation - addition of hydrogen molecules to unsaturated compounds • hydrohalogenation • hydrolysis of ions • other reactions with water • ion exchange • neutralization • oxidation with oxidizing agents other than oxygen • oxidation with oxygen • homonuclear dimerization • polymerization - all other types • solvolysis - solvents other than water • stereoisomerization • structural isomerization • other reactions Chemical Science and Technology Laboratory
Thermo. ML: Reaction Data Block Chemical Science and Technology Laboratory
Thermo. ML: Reaction Property Solvent & Catalyst can be specified Methods are always associated with each property: Experimental, Predicted, Critically Evaluated Chemical Science and Technology Laboratory
Thermo. ML: Reaction Data Block Chemical Science and Technology Laboratory
Thermo. ML: Reaction Constraints & Variables Chemical Science and Technology Laboratory
Thermo. ML: Reaction Data Block Chemical Science and Technology Laboratory
Thermo. ML: Reaction Numerical Values Chemical Science and Technology Laboratory
Paper 2. Representation of Uncertainties Chemical Science and Technology Laboratory
Representation of Uncertainty in GDC & Thermo. ML • All quantities related to the expression of uncertainty conform to the Guide to the Expression of Uncertainty in Measurement (a. k. a. , The GUM), ISO (International Organization for Standardization), October, 1993. BIPM, IEC, IFCC (Int. Fed. of Clinical Chem. ), ISO, IUPAC (Int. Union of Pure & Appl. Chem. ), IUPAP (Int. Union of Pure & Appl. Phys. ), and OIML • Uncertainties are represented for variables, constraints, and properties. • Combined uncertainties (i. e. , propagated) are included for properties only. • Common representations of "precision" are included (repeatability, deviations from a fitted curve, device specifications) Chemical Science and Technology Laboratory
Thermo. ML: Specification of Uncertainty Information Precisions Uncertainties Chemical Science and Technology Laboratory
Thermo. ML: Uncertainty Values Chemical Science and Technology Laboratory
Paper 3. Representation of Critically Evaluated Data, Predicted Data, and Equation Representation Chemical Science and Technology Laboratory
Thermo. ML: Predicted Data Chemical Science and Technology Laboratory
Thermo. ML: Predicted Data Types ab initio molecular dynamics semi-empirical quantum statistical mechanics corresponding states correlation group contribution Chemical Science and Technology Laboratory
Thermo. ML: Critically Evaluated Data Chemical Science and Technology Laboratory
Structure of the Reaction. Data block The arrows indicate new elements for equation representation. Chemical Science and Technology Laboratory
Thermo. MLEquation schema • Provides mathematical definition of an equation • Imports the established Math. ML schema (does not reinvent the wheel for mathematics) Chemical Science and Technology Laboratory
Equation Representation Math. ML Schema Mathml 2. xsd www. w 3. org Import Thermo. ML Equation Definition Schema Thermo. MLEquation. xsd www. trc. nist. gov Reference for validation Equation Definition Files [equation filename]. xml Internet URL & Files Exploits modularity of XML Thermo. ML Schema Thermo. ML. xsd www. trc. nist. gov Reference for validation Thermo. ML Data. Files Thermo. ML Data Files [data filename]. xml Any location (Names the Equation Definition File and stores variables, fitted parameters, constants, etc. ) Chemical Science and Technology Laboratory
Where to find the equation definition Chemical Science and Technology Laboratory
Thermo. ML Property Constraint Variable Linked through: n. Property. Number n. Constraint. Number n. Var. Number Equation Definition Thermo. MLEquation Eq. Property Eq. Constraint Thermodynamic Data Definition Eq. Variable Eq. Parameter Linked through: n. Eq. Par. Number Linked through: s. Eq. Symbol & indexes Eq. Constant Covariance Linked through: s. Eq. Par. Symbol n. Eq. Par. Index Eq. Parameter Eq. Constant Linked through: s. Eq. Constant. Symbol n. Eq. Consant. Index Chemical Science and Technology Laboratory
Thermo. ML has been accepted as the basis for development of an XML-based IUPAC standard Chemical Science and Technology Laboratory
Participants of the Task Group meeting for IUPAC project 'XMLbased IUPAC standard for experimental and critically evaluated thermodynamic property data storage and capture'. Prof. W. A. Wakeham, Dr. A. R. H. Goodwin, Dr. A. I. Johns; Dr. M. Satyro, Dr. D. Lide, Dr. M. Frenkel, Dr. M. Schmidt, Prof. K. N. Marsh, Dr. J. W. Magee, Dr. J. H. Dymond. Chemical Science and Technology Laboratory
Chemical Science and Technology Laboratory
Thermo. ML files are posted on the Web Links to Thermo. ML files available for free download Chemical Science and Technology Laboratory
2003 JCED: Issue 1: 3 articles Issue 2: 21 articles Issue 3: 30 articles Issue 4: 35 articles Link to Thermo. ML file for the individual article Chemical Science and Technology Laboratory
A Thermo. ML file available for free download Chemical Science and Technology Laboratory
Chemical Science and Technology Laboratory
Global Data Communication Process International Association of Chemical Thermodynamics (IACT) Committee on Printed & Electronic Publication (CPEP) IUPAC Thermodynamicists Measurements J. Chem. Eng. Data (ACS) Guided Data Capture (GDC) “Reader” Software WEB NIST/TRC Int. J. Thermophys. (Kluwer) Thermochimica Acta (Elsevier) Applications Aspen. Tech (USA) J. Chem. Thermodyn. (Elsevier) Fluid Phase Equilibria (Elsevier) Industrial Engineers Virtual Materials Grp. (Canada) Nat’l Engineering Lab (UK) Fiz Chemie (Germany) Industry: DIPPR Chemical Science and Technology Laboratory
Talk Outline – Description and Linkage of Major Components • SOURCE: Ø Relational database for storage of experimental thermodynamic data • Guided Data Capture (GDC): Ø Software for mass-scale data collection • Thermo. ML: Ø XML-based formats for efficient data delivery & exchange • Thermodynamic Data Engine Ø Software for critical evaluation (TDE): ( for on-demand Recommended Values) Chemical Science and Technology Laboratory
The Data Pipeline: Experimentalists Users Data Types & Chemical Systems Experimental Data Sources: Researchers: journals, reports, In-house research, etc. Guided Data Capture (GDC) (Structured Thermodynamic Data Collection software) SOURCE (Comprehensive Experimental Data Archive) Thermo. Data. Engine (TDE) (Dynamic Data Evaluation software- ON DEMAND) Captured Art of Critical Evaluation: Models, Predictions, Correlations, Consistency Enforcement, etc. Data Requests Thermo. ML Errors, Inconsistencies, & Redundancies (Data Exchange Standard) Fast, flexible, on demand User Data Requests: Researchers, Process Simulators, etc. Chemical Science and Technology Laboratory
Major TDE Project Features • • Representation of separate properties Consistency enforcement An imperfect data source is assumed (robust data rejection) Extension of Results & Validation with predicted values Fully automated & transparent decisions Flexible default models (adjustments based on data quality) Secondary fitting (popular alternative representations are provided) • Includes comprehensive uncertainty estimation Chemical Science and Technology Laboratory
Property Blocks • Phase Diagram – Triple Point & Critical T – Phase Boundary P • Volumetric – Critical Density – Saturated & Single Phase Densities – Volumetric Coefficients • Energetic – Energy Differences – Energy Derivatives – Speed of Sound • Other – Transport Properties – Surface Tension – Refraction Chemical Science and Technology Laboratory
General Algorithm Load from SOURCE Trivial normalization First property block Non-trivial normalization within block Add predicted values Enforce inter-block consistency Select models & fit properties Process “Other” properties Enforce consistency within block Calculate uncertainties Output Next block? Y N Chemical Science and Technology Laboratory
Property Types e. g. , Density Properties Single Phase Region (1 phase, 2 variables) Triple Point (3 phases, 0 variables) e. g. , Ttp, Hfus Properties Phase Boundary (2 phases, 1 variable) Properties e. g. , Vapor Pressure Chemical Science and Technology Laboratory
Automated decisions Add estimated values Fit properties Need for estimated data Selection of model Number of parameters Enforce consistency within block Recognition of bad data Enforce inter-block consistency Successful? Chemical Science and Technology Laboratory
Thermodynamic consistency conditions In-block: • Equal vapor pressures at triple points & slope/ Htrs consistency • Convergence of condensed phase boundary to triple point • Convergence of gas and liquid saturation density curves at Tc • Infinite first derivatives of saturated densities at Tc • Single phase densities converged to saturated densities Inter-block: • Vapor pressure + Saturated densities + Enthalpy of vaporization Chemical Science and Technology Laboratory
Uncertainty Calculation • • Use of uncertainties of primary experimental values Re-assessment of stored uncertainties Weighting of source data Account for data density Covariance matrices Combination of statistical and experimental parts Empirical adjustments Uncertainties can be propagated in process and equipment design. . . Chemical Science and Technology Laboratory
Manual Structure Drawing Chemical Science and Technology Laboratory
Experimental and critically evaluated (by TDE) vapor pressures for biphenyl - Rejected Data Chemical Science and Technology Laboratory
Liquid Ttp Blue lines are critically evaluated sublimation and vapor pressures for biphenyl with consistency enforcement. Orange = Data rejected by TDE Crystal Chemical Science and Technology Laboratory
- A particular data set Deviation plots with ‘experimental’ uncertainties are shown with one click. Individual data sets can be highlighted (in red) and identified. (full traceability) Chemical Science and Technology Laboratory
Enthalpy of vaporization for biphenyl Available data is highly limited and inconsistent (wrong slope) This insert shows data evaluated with uncertainties by TDE The curve is based on the vapor pressure curve, predicted and experimental volumetric properties, and is Chemical Science and Technology Laboratory constrained in slope and value at Tc.
Application & Advantages • Automated generation of consistent recommended values – (on-demand: results in minutes vs. months or years for traditional static methods) • Can be applied to hypothetical compounds – Requests for compound data can be input as drawn structures • Full set of properties for pure compounds are always generated (predictions w/ +/-) • Estimated uncertainties for all recommended data • Can be used to develop new and validate old models • Reveals published experimental errors • Provides a comprehensive data source for process simulation Chemical Science and Technology Laboratory
The Data Pipeline: Experimentalists Users Data Types & Chemical Systems Experimental Data Sources: Researchers: journals, reports, In-house research, etc. Guided Data Capture (GDC) (Structured Thermodynamic Data Collection software) SOURCE In place (Comprehensive Experimental Data Archive) Thermo. Data. Engine (TDE) (Dynamic Data Evaluation software- ON DEMAND) Captured Art of Critical Evaluation: Models, Predictions, Correlations, Consistency Enforcement, etc. Data Requests Thermo. ML Errors, Inconsistencies, & Redundancies (Data Exchange Standard) Fast, flexible, on demand User Data Requests: Researchers, Process Simulators, etc. Chemical Science and Technology Laboratory
0f0ed6bb8e7d1e624da97fc11b2ff8b8.ppt