Title: Advances in methods for uncertainty and sensitivity analysis Nicolas Devictor CEA Nuclear Energy Division nicolas.devictor@cea.fr in co-operation with: Nadia PEROT, Michel MARQUES and Bertrand IOOSS (CEA) Julien JACQUES (INRIA Rh
1Advances in methods for uncertainty and
sensitivity analysisNicolas DevictorCEA
Nuclear Energy Divisionnicolas.devictor_at_cea.fri
n co-operation withNadia PEROT, Michel MARQUES
and Bertrand IOOSS (CEA)Julien JACQUES (INRIA
Rhône-Alpes, PhD student),Christian LAVERGNE
(Montpellier 2 University INRIA).International
Workshop on level 2 PSA and Severe Accident
Management Koln, Germany, March 2004
2Introduction (1/2)
- In the framework of the study of the influence of
uncertainties on the results of severe accidents
computer codes, and then on results of Level 2
PSA (responses, hierarchy of important inputs) - Why taken account uncertainty ?
- A lot of sources of uncertainty
- To show explicitly and tracebly their impact ?
decision process that could be robust against
uncertainties. - Probabilistic framework is one of the tools for a
coherent and rational treatment of uncertainties
in a decision-making process. - Some applications of treatment of uncertainty by
probabilistic methods - For a best understanding of a phenomenon
- To evaluate the most influential input variables.
To steer RD. - For an improvement of a modelling or a code
- Calibration, Qualification
- In a risk decision-making process
- Hierarchy of contributors ? interest for actions
to reduce uncertainty or to define a mitigation
mean (for example a SAM measure) - Confidence intervals or probabilistic density
functions or margins - In any analysis, we must keep in mind the choice
in modelling and the assumptions. - Case a variable has a big influence on the
response variability, but we have a low
confidence on his value
3Sources of uncertainties
Real phenomenon
Human understanding Simplified model
Theory
Input variables
mathematics
Equations
Code
Output Meaning ? Variability ?
Numerical schemes Convergence criteria
Model parameters
4Introduction (2/2)
- A lot of methods exist, but these methods are
often not suitable, from a theoretical point of
view, when - the phenomena that are modelled by the computer
code are discontinuous in the variation range of
influent parameters - input variables are statistically dependent.
- For an overview of the method ? see paper
- The talk will mainly speak about
- Sensitivity analysis in the case of dependent
input variables. - The validation of response surfaces.
- The estimation of the additional error that is
introduced by the use of a response surface on
the results of the uncertainty and sensitivity
analysis. - Clustering methods, that could be useful when we
want apply statistical methods based on
Monte-Carlo simulation.
5(in this talk) influence of uncertainties means
- Inputs for the study
- Probabilistic models of the uncertainties on
physical variables and parameters - Mathematical model of the ageing or failure
phenomenon - Acceptance criterion
- Propagation of uncertainties
- Probability to exceed a
- threshold
- Sensitivity analysis
6Sensitivity analyses
- y f(x1, , xp) (where y could be a
probability) - 1st Question what is the impact of a variation
of the value of an input variable on the value of
the response Y ? - Gradient, differential analysis
- Often deterministic approach
- 2nd Question what is the part of the variance
of Y that comes from the variance of Xi (or a
set Xi) ? - Usual sensitivity indices
- Pearsons correlation coefficient, Spearmans
correlation coefficient, Coefficients from a
linear regression, PRCC - In the case of non linear or non monotonous
Sobols method or FAST - with very time consuming code (? use of response
surface), - problems with correlated uncertainties.
- All these indices are defined under the
assumptions that the variables inputs are
satistically independent.
7Sensitivity analyses dependent inputs
- The problem of sensitivity analysis for model
with dependant inputs is a real one, and concerns
the interpretation of sensitivity indices values. - Inputs are statistically independent ? the sum of
these sensitivity indices 1. - Inputs are statistically dependent
- the terms of model function decomposition
(Sobols method) are not orthogonal, so it
appears a new term in the variance decomposition.
- ? the sum of all order sensitivity indices is not
equal to 1. - Effectively, variabilities of two correlated
variables are linked, and so when we quantify
sensitivity to one of this two variables we
quantify too a part of sensitivity to the other
variable. And so, in sensitivity indices of two
variables the same information is taken into
account several times, and sum of all indices is
thus greatest than 1. - We have studied the natural idea to define
multidimensional sensitivity indices for groups
of correlated variables. - We can also define higher order indices and total
sensitivity indices. - If all input variables are independent, those
sensitivity indices are the same than in case of
independant variables. - The assessment is often time consuming (extension
of Sobols method) ? some computational
improvements are in progress and very promising.
8Response surface method
- Interest for a response surface (or meta-model or
surrogated model) - Good capability in approximation (study on the
training sample) - Good capability in prediction
- Low CPU time for a calculation.
- Data needed in a Response Surface Method (RSM)
- a training sample D of points (x(i), z(i)), where
P(X,Z) the probability law of the random vector
(X,Z) (unknown in practice) - a family F of function f(x,c), where c is either
a parameter vector or a index vector that
identifies the different elements of F. - The best function in the family F is then the
function f0 that minimized a risk function - In practice, often use of an empirical risk
function
9Examples of response surface
- Polynomial models
- Generalized Linear Models (GLM)
- Regression models (assumption continuous
function). - Other possibility discriminant function (logit,
probit models). - Qualitative and quantitative inputs.
- Thin plate spline
- Regression models (assumption continuous
function). - PLS (Partial Least Squares)
- Regression models (assumption continuous
function). - Qualitative and quantitative inputs.
- Neural networks
- Regression models (assumption continuous
function). - Other possibility discriminant function (logit,
probit models). - A simplified physical model (3D ?1D, )
10With regard to the validation step
- The characteristic good approximation is
subjective and depends on the use of the response
surface. - What is the future use of the built response
surface ? - What are the constraints that are forced by the
use ? - How to define the validity domain of a response
surface ? - Calibration, modelling, prediction, probability
computation - Specific criteria in the decision making process
- Conservatism / A bound on the remainder / Better
accuracy in a interest area (distribution tail). - How defines the expected accuracy ?
- Ratio residual deviance / null deviance ?
- Calibration representativeness of the most
influential parameters, - Prediction robustness bias/variance
compromise, - The quality of the response surface should be
compatible with the accuracy of the studied code.
11Validation of a response surface
- Statistics
- (often under assumptions like Gauss-Markov
assumptions) - Variance analysis
- Estimator of the variance s²
- R² statistics
- Confidence area 1-d for coefficients c
- ...
- Prediction test base (bias), cross validation
- Bootstrap method
- to improve the estimation of the bias between
learning and generalization error, - to estimate the sensitivity of the trained model
f in relation to available data. - Comparison of results
- Pdf of the output, Confidence interval
12Example The direct containment heating (DCH)
- In the framework of a contract with the PSA Level
2 project at IRSN (in 2000). - Code RUPUICUV module of Escadre (? Model has
changed since 2000) - The calculations have been performed with the in
2000. A database of 300 calculations is
available. The inputs vectors for these
calculations have been generated randomly in the
variation domain. - Responses
- maximum pressure in the containment
- the presence of corium in the containment outside
the reactor pit it is a discrete response with
value 0 (no corium) or 1 (presence). - Inputs variables
- MCOR mass of corium, uniformly distributed
between 20 and 80 tons, - FZRO fraction of oxyded Zr, uniformly
distributed between 0,5 and 1, - PVES primary pressure, uniformly distributed
between 1 and 166 bars, - DIAM break size, uniformly distributed between
1 cm and 1 m, - ACAV section de passage dans le puits de cuve
(varie entre 8 and 22 m 2 ) - FRAC fraction of corium directly ejected in the
containment, uniformly distributed between 0 and
1, - CDIS discharge coefficient at the break,
uniformly distributed between 0,1 and 0,9, - KFIT adjustment parameter, uniformly
distributed between 0,1 and 0,3, - HWAT water height in the reactor pit, discrete
random variable (0 or 3 meter)
13Example maximum pressure (1/2)
- Use of the empirical risk function
- Approximation capabilities all the RS seems
good - Prediction capabilities
- Non negligible residues
14Example maximum pressure (2/2)
- Training sample Test sample
15About the impact of response surface error
- Use of a RS in an UASA ? a bias or an error on
the results of the uncertainty and sensitivity
analysis. - Usual questions are
- What is the impact of this error on the
results of an uncertainty and sensitivity
analysis made on a response surface? - Can we deduce results on the true function from
results obtained from a response surface? - ? residual function ?(x1, , xp) RS(x1, ,
xp) - f(x1, , xp) - Assume that all Xi are independent, and
sensitivity analysis have been done on the two
function RS and ?, and we note SRS,i and S?,i the
computed sensitivity analysis. - V(E(f(X1, , Xp)/Xi)) from
- SRS,i and S?,i is
-
- Problem of the computation of the covariance term
? generally impossible to deduce results on the
true function from results obtained from a RS. - Only cases where results can be deduce are
- SR is a truncated model obtained from a
decomposition in a orthogonal basis - ? is not very sensitive of the variables X1, ,
Xp - SSR,i / (V(?(x1, , xp))V(SR(x1, , xp)))
16Discontinuous model
- No usual response surface family is suitable.
- In practice, discontinuous behaviour means
generally that more than one physical phenomenon
is implemented in the code. - To avoid misleading in interpretation of results
of uncertainty and sensitivity analysis,
discriminant analysis should be used to define
areas where the function is continuous. Analysis
are led on each continuous area. - Possible methods
- neural networks with sigmoid activation function,
- GLM models with a logit link or logistic
regression, - Vector support machine
- Decision tree, and variants like random forest
- Practical problems are often encountered if the
sample is linearly separable . - Support vector machines and methods based on
Decision Trees are very promising for that case.
17Example presence of corium in the containment
- First tool ? generalized linear model with a
logit link. - It exists always a model that explains 100 of
the dispersion of the results for the training
set. - But there is some drawbacks
- the list of the terms that are statistically
significant varies strongly with the training
set - the prediction error is around 20.
- Use of neural networks ? similar problems.
- Other methods ? SVM, decision trees and random
forest - Conclusion (for that example)
- The most efficient method is the Random Forest
method. - The methods J48 and Random Forest are faster than
the algorithms based on optimisation step (like
Naïve Bayes, SVM, Neural Network). - The principle of decision trees and random forest
is simple and based on the building of a set of
logical combination of decision rules. They are
often very readable, and have very prediction
capabilities (like shown by the example).
18Example presence of corium in the containment
A more global indicator of the quality
(approximation prediction capabilities) of the
model is obtained by cross validation method.
19Conclusions
- A lot of methods exist for UASA in the framework
of level 2 PSA and severe accident codes. - As these methods are often not suitable, from a
theoretical point of view, when - the phenomena that are modelled by the computer
code are discontinuous in the variation range of
influent parameters - input variables are statistically dependent,
- new results and ideas to overcome these problems
have been described in the paper. - Practical interest of these new methods should
be confirmed, by application on real problems.
20(No Transcript)
21Response uncertainty
- Probability distribution
- Simulation fit statistical tests
(asymptotical) - First statistical moments
- Statistics on a sample (convergence, Bootstrap)
- Approximation of the standard deviation
- Confidence interval
- From the density function
- Wilks formula
22Monte-Carlo Simulations
- Variance reduction methods conditional MC,
stratified MC, Hypercube Latin - More suitable for the computation of a
probability importance sampling, directional
simulation - Practical problem with very time consuming
code?Response surface
23FORM/SORM Methods
- Probabilistic transformation Z U
- (Ui is N(0,1)-distributed and are
independents) - In U-space, a new failure surface G(U)H(T(Z))0
- Design point and Hasofer-Lind index U
- FORM approximation
- SORM approximation (Breitung)
- Sensitivity factors
24FORM simple case
- Ramdom variables N(0,1)-distributed and are
independents - Limit state function hyper plane
25Validation of the FORM/SORM results
- Sets of results FORM, SORM, Conditional
importance sampling, etc. - Comparison of FORM, SORM and Conditional
Importance Sampling (CIS) results - Coherence of all these results ?
- If yes, a good confidence is obtained in FORM
result and geometrical assumption of FORM method. - Coherence of FORM and CIS results ?
- If yes, a good confidence is obtained in FORM
result and the geometrical assumption of FORM
method. - Coherence of SORM and CIS results ?
- If yes, a good confidence is obtained in SORM
result, and the geometrical assumption of FORM
method is false. - If no coherence
- Geometrical assumptions for FORM and SORM are
false. - Existence of other minima ?
- Monte-Carlo simulation or a variance reduction
method (with or without a response surface). - New tests have been developed to check that the
computed minimum is a global minimum (non
negligible costs).
26Conditional importance sampling
27Comparison of methods
28Examples of response surface
- Polynomial models
- Generalized Linear Models (GLM)
- Regression models (assumption continuous
function). - Other possibility discriminant function (logit,
probit models). - Qualitative and quantitative variables.
- Thin plate spline
- Regression models (assumption continuous
function). - Qualitative (if 2 factors) and quantitative
variables. - PLS (Partial Least Squares)
- Regression models (assumption continuous
function). - Qualitative and quantitative variables.
- Neural networks
- Regression models (assumption continuous
function). - Other possibility discriminant function (logit,
probit models). - Qualitative (if 2 factors) and quantitative
variables.