e4551a5f6d9b4db6d962f72cf28a3f9c.ppt
- Количество слайдов: 37
GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH DIMENSIONAL MODEL REPRESENTATION (RS-HDMR) Herschel Rabitz Department of Chemistry, Princeton University, Princeton, New Jersey 08544
HDMR Methodology • HDMR expresses a system output as a hierarchical correlated function expansion of inputs:
HDMR Methodology (Contd. ) • HDMR component functions are optimally defined as: - where are unconditional and conditional probability density functions:
RS (Random Sampling) – HDMR (Contd. ) • RS-HDMR component functions are approximated by expansions of orthonormal polynomials - Inputs can be sampled independently and/or in a correlated fashion - Only one set of data is needed to determine all of the component functions - Statistical analysis (F-test) is used proper truncation of RS -HDMR expansion
Global Sensitivity Analysis by RS-HDMR • Individual RS-HDMR component functions have a direct statistical correlation interpretation, which permits the model output variance to be decomposed into its input contributions - Where are defined as the covariances of with f(x), respectively
A Propellant Ignition Model Calculated profiles of temperature and major mole fractions for the ignition and combustion of the M 10 solid propellant
A Propellant Ignition Model • 10 independent and 44 cooperative contributions of inputs were identified as significant
A Propellant Ignition Model • Nonlinear global sensitivity indexes efficiently identified all significant contributions of inputs
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling Microenvironmental/exposure/dose modeling system Structure of TCE-PBPK model (adapted from Fisher et. al. , 1998)
Example: Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling • The coupled microenvironmental/pharmacokinetic model: - Three exposure routes (inhalation, ingestion, and dermal absorption) - Release of TCE from water into the air within the residence - Activities of individuals and physiological uptake processes • Seven input variables [age (x 1), tap water concentration (x 2), shower stall volume (x 3), drinking water consumption rate (x 4), shower flow rate (x 5), shower time (x 6), time in bathroom after shower (x 7)] are used to construct the RS-HDMR orthonormal polynomials • Target outputs: the total internal doses from intake (inhalation and ingestion) and uptake (dermal absorption) - The amount inhaled or ingested: - The amount absorbed: - C(t): exposure concentration, IR(t): inhalation or ingestion rate, Kp: permeability coefficient, SA(t): surface area exposed
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling • Inputs (x 1, x 2, x 3, x 4) have a uniform distribution, and inputs (x 5, x 6, x 7) have a triangular distribution; 10, 000 input-output data were generated The data distributions for the uniformly distributed variable x 1 and the triangularly distributed variable x 5
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling • Seven independent, fifteen 2 nd order and one 3 rd order cooperative contributions of inputs were identified as significant First order sensitivity indexes
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling • Nonlinear global sensitivity indexes (2 nd order and above) efficiently identified all significant contributions of inputs The ten largest 2 nd and 3 rd order sensitivity indexes
Identification of bionetwork model parameters Characteristics of the problem: § System nonlinearity § Limited number & type of experiments § Considerable biological and measurement noise Multiple solutions exist ! Problems with traditional identification methods: § Provide only one or a few solutions for each parameter § Assume linear propagation from data noise to parameter uncertainties The closed-loop identification protocol (CLIP): § Extract the full parameter distribution by global identification § Iteratively look for the most informative experiments for minimizing parameter uncertainty
General operation of CLIP Pre-lab analysis and design of the most informative experiments Iterative experiment optimization and data acquisition Global parameter identification
Isoleucyl-t. RNA synthetase proofreading valyl-t. RNAIle * * * * Rate constants to be identified Okamoto and Savageau, Biochemistry, 23: 1701 -1709 (1984)
The inversion module: identifying the rate constant distribution The Genetic Algorithm (GA) The inversion cost function Mutation 1101 1111+1100 0010 1101+1100 0110 Crossover Typical rate constant distribution after random perturbation/control 1101 1100 + 1111 0010 1101 0010 + 1111 1100 Q Inversion quality index Q
The analysis module: estimating the most informative experiments Ø Estimate the best species for monitoring system behavior Ø Determine the best species for perturbing the system Nonlinear sensitivity analysis by Random-Sampling High Dimensional Model Representation (RS-HDMR)
Optimally controlled identification: squeezing on the rate constant distribution The control cost function Inversion quality Non- Feng and Rabitz, Biophys. J. , 86: 1270 -1281 (2004) Feng, Rabitz, Turinici, and Le. Bris, J. Phys. Chem. A, 110: 7755 -7762 (2006)
Network property optimization: Observed Response Biological System Control Objective Learning Algorithm Control Design Optimal Network Performance Optimal Controls Initial Guess/ Random Control A. Identifying the best targeted B. network locations for intervention B. Identifying the optimal network control
A. Molecular target identification for network engineering Random-sampling high dimensional model representation (RS-HDMR) Randomly sample k Advantages of RS-HDMR: § Global sensitivity analysis § Nonlinear component functions § Physically meaningful representation § Favorable scalability Li, Rosenthal, and Rabitz, J. Phys. Chem. A, 105: 7765 -7777 (2001)
Laboratory data on the mutants k 10 ─ k 13 fixed k 6 k 10 ─ k 13 Feng, Hooshangi, Chen, Li, Weiss, and Rabitz, Biophys. J. , 87: 2195 -2202 (2004)
Example: Biochemical multi-component formulation mapping • Allosteric regulation of aspartate transcarbamoylase (ATcase) in vitro by all four ribonucleotide triphosphates (NTPs) • ATcase activity (output) was measured for 300 random NTP concentration combinations (inputs) in the laboratory • A second order RS-HDMR as an input -> output map was constructed. Its accuracy is comparable with the laboratory error The absolute error of repeated measurements
Biochemical multi-component formulation mapping The comparison of the laboratory data and the 2 nd order RS-HDMR approximation for “used” and “test” data Note: The two parallel lines are absolute error ± 0. 2
The s-space network identification procedure (SNIP) IPTG a. Tc Tet. R p(lac. Iq) tet. R Laboratory data on the transcriptional cascade Lac. I p. L(tet) lac. I EYFP P(lac) eyfp a. Tc: x 1 IPTG: x 2 EYFP: y(x 1, x 2) Encode: x 1→x 1 m 1(s) x 2→x 2 m 2(s) Response measurement: y→y(s) Decode: Fourier transform
Nonlinear property prediction by SNIP Nonlinear, cooperative behavior revealed Unmeasured region correctly predicted Feng, Nichols, Mitra, Hooshangi, Weiss, and Rabitz, In preparation
SNIP application to an intracellular signaling network Laboratory single cell measurement data Sachs, et al. , Science, 308: 523 -529 (2005)
Identified network with predictive capability Network connections identified by SNIP and Bayesian analysis Reliable SNIP prediction of Akt levels
Example: Ionospheric measured data • The ionospheric critical frequencies determined from groundbased ionosonde measurements at Huancayo, Peru from years 1957 - 1987 (8694 points) • Input: year, day, solar flux (f 10. 7), magnetic activity index (kp), geomagnetic field index (dst), previous day's value of fo. E • Output: ionospheric critical frequencies fo. E • The inputs are not controllable and not independent; the pdf of the inputs is not separable, and was not explicitly known
Ionospheric measured data The dependence of fo. E on the input “day” Ionosonde data distribution: the dependences between normalized input variables: year and f 10. 7, kp and dst for the data at 12 UT
Ionospheric measured data The accuracy of the 2 nd order RS-HDMR expansion for the output, fo. E
Quantitative molecular property prediction Standard QSAR X 1 General strategy: Molecular activity is a function of its chemical/physical/structural descriptors Problems: § Overfitting (choice of descriptors) § Underlying physics X 2 A simple solution: y=f(x 1, x 2), x 1=1, 2, …, N 1, x 2=1, 2, …, N 2 Descriptor-free quantitative molecular property interpolation
Descriptor-free property prediction from an arbitrary substituent order
Property prediction from the optimal substituent order Cost function: Complexity of the search: N 1! • N 2!=14! • 8!=1015 Shenvi, Geremia, and Rabitz, J. Phys. Chem. A, 107: 2066 (2003)
Application to a chromophore transition metal complex library Before reordering After reordering Cost function: Outliers captured by the reordering algorithm Liang, Feng, Lowry, and Rabitz J. Phys. Chem. B, 109: 5842 -5854 (2005)
Application to a drug compound library 15% of data >14, 000 compounds Cost function: Reorder Prediction
THE MODERN WAY TO DO SCIENCE* * Adaptively under high duty cycle and automated “You should understand the physics, write down the correct equations, and let nature do the calculations. ” Peter Debye
e4551a5f6d9b4db6d962f72cf28a3f9c.ppt