Probability bounds analysis - PowerPoint PPT Presentation

1 / 197
About This Presentation
Title:

Probability bounds analysis

Description:

Case study: dike reliability. D. wave. sea level. revetment. blocks. clay layer. H tan( ) Z ... delta = [1.60, 1.65] //relative density of the revetment blocks ... – PowerPoint PPT presentation

Number of Views:191
Avg rating:3.0/5.0
Slides: 198
Provided by: scottf63
Category:

less

Transcript and Presenter's Notes

Title: Probability bounds analysis


1
Probability bounds analysis

Scott Ferson, scott_at_ramas.com 2 October 2007,
Stony Brook University, MAR 550, Challenger 165
2
Outline
  • P-boxes marrying intervals and probability
  • Calculations via the Cartesian product
  • Getting p-boxes
  • Moment and shape constraints
  • From empirical data
  • Dependence
  • Model uncertainty
  • Updating
  • Backcalculation
  • Sensitivity analysis
  • Differential equations
  • Conclusions

Examples Dike reliability PCBs and duck
hunters Dioxin inhalation Hydrocarbon plume
3
Typical risk analysis problems
  • Sometimes little or even no data
  • Updating is rarely used
  • Very simple arithmetic (or logical) calculations
  • Occasionally, finite element meshes or
    differential equations
  • Usually small in number of inputs
  • Nuclear power plants are the exception
  • Results are important and often high profile
  • But the approach is being used ever more widely

4
Probability vs. intervals
  • Probability theory
  • Can handle likelihoods and dependence well
  • Has an inadequate model of ignorance
  • Lying saying more than you really know
  • Interval analysis
  • Can handle epistemic uncertainty (ignorance) well
  • Inadequately models frequency and dependence
  • Cowardice saying less than you know

5
Whats needed
  • Reliable, conservative assessments of tail risks
  • Using available information but without forcing
    analysts to make unjustified assumptions
  • Neither computationally expensive nor
    intellectually taxing

6

7
Probability box (p-box)
Interval bounds on an cumulative distribution
function (CDF)
1
Cumulative probability
0










1.0

2.0




3.0


0.0
X
8
Generalizes an uncertain number
Probability distribution
Probability box
Interval
1
Cumulative probability
0
0
10
20
30
40
Not a uniform distribution
9
Probability bounds analysis
  • Marries intervals with probability theory
  • Distinguishes variability and incertitude
  • Solves many problems in uncertainty analysis
  • Input distributions unknown
  • Imperfectly known correlation and dependency
  • Large measurement error, censoring, small sample
    sizes
  • Model uncertainty

10
Calculations
  • All standard mathematical operations
  • Arithmetic operations (, ?, , , , min, max)
  • Logical operations (and, or, not, if, etc.)
  • Transformations (exp, ln, sin, tan, abs, sqrt,
    etc.)
  • Backcalculation (deconvolutions, updating)
  • Magnitude comparisons (lt, , gt, , ?)
  • Other operations (envelope, mixture, etc.)
  • Faster than Monte Carlo
  • Guaranteed to bounds answer
  • Good solutions often easy to compute

11
Faithful generalization
  • When inputs are distributions, its answers
    conform with probability theory
  • When inputs are intervals, it agrees with
    interval analysis

12
Probability bounds arithmetic
1
1
A
B
Cumulative Probability
Cumulative Probability
0
0
0
2
4
6
8
10
12
14
0
1
2
3
4
5
6
Whats the sum of AB?
13
Cartesian product
A mix(1,1,3, 1,2,4, 1, 3,5) B mix(1,
2,8, 1, 6,10, 1,8,12) A B
(range3,17, mean7.3,14, var0,22) A
B (range3,17, mean7.3,14,
var0,38)
AB independence
A?1,3 p1 1/3
A?3,5 p3 1/3
A?2,4 p2 1/3
B?2,8 q1 1/3
AB?3,11 prob1/9
B?6,10 q2 1/3
B?8,12 q3 1/3
14
AB under independence
1.00
0.75
0.50
Cumulative probability
  • Rigorous
  • Best possible

0.25
0.00
15
0
3
6
9
12
18
AB
15
Case study dike reliability
revetment
blocks
wave
sea level
clay layer
D
H tan(?) Z ?D ?
cos(?) M ?s (all
variables ?)
?
16
Inputs
  • relative density of the revetment blocks
  • ? ? 1.60, 1.65
  • revetment blocks thickness
  • D ? 0.68, 0.72 meters
  • slope of the revetment
  • ? ? arc tangent(0.32, 0.34) 0.309, 0.328
    radians
  • model parameter
  • M ? 3.0, 5.2
  • significant wave height
  • H Weibull(scale 1.2,1.5 meters, shape
    10,12)
  • offshore peak wave steepness
  • s normal(mean 0.039,0.041, stdev
    0.005,0.006)
  • Reliability depends on the density and
    thickness of its facing masonry
  • relative density of the revetment blocks
  • ? 1.60, 1.65
  • revetment blocks thickness
  • D 0.68, 0.72 m
  • slope of the revetment
  • ? atan(0.32, 0.34)
  • model parameter
  • M 3.0, 5.2
  • significant wave height
  • H (1,1.5, 1/8), (1.1,1.5, ¼), (1.3,
    1.6, ¼), (1.3, 1.4), (1.5, 1.7,1/8) m
  • offshore peak wave steepness
  • s (3,3.6, 1/20), (3.4,4.2, 9/20),
    (3.9, 4, 9/20), (4.5, 4.8, 1/20)

17
1
1
1
1
1
1
D
D
H
Cumulative probability
Cumulative probability
0
0
0
0
0
0
1.5
1.6
1.7
0
1
2
1.5
1.6
1.7
1
2
0.6
0.7
0.8
0.6
0.7
0.8
meters
meters
meters
meters
1
1
1
1
1
1
a
M
s
0
0
0
0
0
0
0.2
0.3
0.4
2
3
4
5
6
0.02
0.04
0.06
0.2
0.3
0.4
2
3
4
5
6
0.02
0.04
0.06

radians
radians
18
Discretize each p-box
1
H
Probability
0
0
1
2
meters
19
100 ? 100 Cartesian product
H
0, 1.02

Z
(
D
,
D
,
M
,
a
,
H
,
s
)

0.01
Independent



0.290,1.19

0.0001
0.314, 1.19
0.0001
s
0.536, 1.19
0.0001
0.544, 1.19
0.0001
20
Each cell is an interval calculation
Note that variable ? is uncertain and repeated,
but it does not lead to an inflation of the
uncertainty because the function is monotone
increasing over the range of ?s uncertainty
21
Reliability function
  • Reassembled from 10,000 intervals

1
0
0
1
PBA says the risk Z is less than zero is less
than 0.044 Monte Carlo said this risk is about
0.000014
The risk Z is less than zero is less than 0.091
// PBA delta 1.60, 1.65 //relative
density of the revetment blocks D 0.68, 0.72
//revetment blocks thickness alpha
atan(0.32, 0.34) //slope of the revetment
M 3.0, 5.2 //model parameter H
weibull(1.2,1.5, 10,12) //significant wave
height s N(0.039,0.041, 0.005,0.006)//offsho
re peak wave steepness Zb delta D - (H
tan(alpha) / cos(alpha)) / (M sqrt(s))
delta U(1.60, 1.65) D U(0.68, 0.72)
alpha atan(U(0.32, 0.34)) M
U(3.0, 5.2) H weibull(1.35, 11) s
normal(0.04, 0.0055) Zp delta D - (H
tan(alpha) / cos(alpha)) / (M sqrt(s))
show Zb in green show Zp in yellow _alpha(Zb,0)
0, 0.050505050506 _alpha(Zp,0) 0,
0.010101010102
(precise)
probabilistic calculation in R many lt-
1000000 delta lt- runif(many,1.60, 1.65) D lt-
runif(many,0.68, 0.72) alpha lt-
atan(runif(many,0.32, 0.34)) M lt-
runif(many,3.0, 5.2) H lt- rweibull(many,
11, 1.35) (parm order reversed)
1.2,1.5, 10,12 rgumbel lt- function(n,a,b)
return(a-blog(log(1/runif(n)))) s lt-
rnorm(many,0.04, 0.0055) 0.039,0.041,
0.005,0.006) Zp lt- delta D - (H tan(alpha)
/ cos(alpha)) / (M sqrt(s)) risk
ZpZplt0 length(risk)/many 1 1.6e-05
SLOW! plot(sort(Zp), 1length(Zp)/many, t's',
xlab'T', ylab'Probability', ylimc(0,1))
Over 3 orders of magnitude
22
Implementation in R
  • levels lt- 100
  • delta.1 lt- 1.60 delta.2 lt- 1.65
  • D.1 lt- 0.68 D.2 lt- 0.72
  • alpha.1 lt- atan(0.32) alpha.2 lt- atan(0.34)
  • M.1 lt- 3.0 M.2 lt- 5.2
  • alpha.1 lt- atan(0.32) alpha.2 lt- atan(0.34)
  • R.1 lt- tan(alpha.1) / cos(alpha.1) R.2 lt-
    tan(alpha.2) / cos(alpha.2)
  • slice lt- function(p, df, a1, a2, b1, b2) w lt-
    do.call(df, list(p,a1, b1)) x lt- do.call(df,
    list(p,a1, b2))
  • y lt- do.call(df, list(p,a2, b1)) z lt-
    do.call(df, list(p,a2, b2))
  • list(onepmin(w,x,y,z)1levels,
    twopmax(w,x,y,z)2(levels1))
  • p lt- c(0.005, 1(levels-1)/levels, 0.995)
  • H lt- slice(p,'qweibull',10,12, 1.2,1.5)
  • s lt- slice(p,'qnorm', 0.039,0.041, 0.005,0.006)
  • Hs.1 lt- as.vector(outer(Hone, sqrt(stwo),
    '/')) Hs.2 lt- as.vector(outer(Htwo,
    sqrt(sone), '/'))
  • Zb.1 lt- delta.1 D.1 - (Hs.2 R.2) / M.1 Zb.2
    lt- delta.2 D.2 - (Hs.1 R.1) / M.2
  • Zb.1 lt- sort(Zb.1) Zb.2 lt- sort(Zb.2)

Lower endpoints
Upper endpoints
23
Monte Carlo simulation
  • many lt- 1000000
  • delta lt- runif(many,1.60, 1.65)
  • D lt- runif(many,0.68, 0.72)
  • alpha lt- atan(runif(many,0.32, 0.34))
  • M lt- runif(many,3.0, 5.2)
  • H lt- rweibull(many, 11, 1.35) parm order
    reversed
  • s lt- rnorm(many,0.04, 0.0055)
  • Zp lt- delta D - (H tan(alpha) / cos(alpha)) /
    (M sqrt(s))
  • risk ZpZplt0
  • length(risk)/many
  • 1 1.6e-05
  • Zp lt- Zp12000 else plotting takes too long
  • plot(sort(Zp), 1length(Zp)/many, t's',
    xlab'T', ylab'Probability', ylimc(0,1))

Intervals replaced by uniform distributions or,
if theyre parameters, by their midpoint values
24
Selecting p-box inputs
25
Where do we get p-boxes?
  • Assumption
  • Modeling
  • Robust Bayesian analysis
  • Constraint propagation
  • Data with incertitude
  • Measurement error
  • Sampling error
  • Censoring

26
Constraint propagation
27
(Maximum entropy erases uncertainty)
1
1
1
mean, std
positive, quantile
min,max
0
0
0
2
4
6
8
10
-10
0
10
20
0
10
20
30
40
50
1
1
1
min, max, mean
sample data, range
integral, min,max
0
0
0
2
4
6
8
10
2
4
6
8
10
2
4
6
8
10
1
1
1
min, mean
min,max, mean,std
p-box
0
0
0
0
10
20
30
40
50
2
4
6
8
10
2
4
6
8
10
28
Data of poor repute
  • Data sets whose values are intervals

All measurements are actually intervals
Generalize empirical distributions to p-boxes
Censoring is sometimes significant
Model intervals as uniform distributions
Interval uncertainty doesnt exist in real life
Ignore intervals or model them as uniforms
-Tony OHagan
29
Incertitude is common in data
  • Periodic observations
  • When did the fish in my aquarium die during the
    night?
  • Plus-or-minus measurement uncertainties
  • Coarse measurements, measurements from digital
    readouts
  • Non-detects and data censoring
  • Chemical detection limits, studies prematurely
    terminated
  • Privacy requirements
  • Epidemiological or medical information, census
    data
  • Theoretical constraints
  • Concentrations, solubilities, probabilities,
    survival rates
  • Bounding studies
  • Presumed or hypothetical limits in what-if
    calculations

30
A tale of two data sets
  • Skinny Puffy
  • 1.00, 1.52 3.5, 6.4
  • 2.68, 2.98 6.9, 8.8
  • 7.52, 7.67 6.1, 8.4
  • 7.73, 8.35 2.8, 6.7
  • 9.44, 9.99 3.5, 9.7
  • 3.66, 4.58 6.5, 9.9
  • 0.15, 3.8
  • 4.5, 4.9
  • 7.1, 7.9

31
Empirical p-boxes
1
1
1
Skinny
Puffy
Cumulative probability
Cumulative probability
0
0
0
2
4
6
8
0
10
2
4
6
8
0
10
2
4
6
8
0
10
2
4
6
8
0
10
x
x
x
x
32
Many possible precise data sets
1
Puffy
Cumulative probability
0
0
2
4
6
8
10
x
33
Statistics for data that are intervals
  • Some statistics are easy to compute
  • Empirical distribution, mean, median,
    percentiles, etc.
  • Some are tricky, but easy for a computer
  • Variance, upper confidence limit, correlation,
    etc.
  • Tradeoff between more versus better data
  • Review just published as a Sandia report

34
P-box parameter fitting
  • Method of matching moments
  • Regression approaches
  • Maximum likelihood

35
Method of matching moments
  • Mean, variance, etc. are now intervals
  • Very easy for one-parameter distributions
  • In general, moments are dependent
  • (The largest mean cannot be associated with the
    largest variance)
  • Envelope distributions that extremize the moments
  • Resulting p-boxes represent the assumptions about
    the shape family as informed by the available data

36

37
Sampling uncertainty
  • When youve measured only part of a population
  • Kolmogorov-Smirnov confidence intervals
  • Distribution-free
  • Assumes only random sampling
  • Parametric versions
  • normal, lognormal, exponential, Weibull, etc.
  • Bound the whole distribution
  • 95 of the time, the true distribution
    falls entirely inside the limits

38
Distributional confidence limits
1.0
0.8
Kolmogorov-Smirnov (distribution-free)
0.6
Cumulative probability
0.4
0.2
Normal distribution
0.0
300000
350000
400000
450000
500000
Volumetric heat capacity, ?Cp (J/m3C)
39
Confidence limits on p-boxes
1
95 KS confidence limits
Skinny and Puffy pooled (n 15)
Cumulative probability
0
0
5
10
x
40
Uncertainties expressible with p-boxes
  • Sampling uncertainty
  • Distributional confidence limits
  • Measurement incertitude
  • Intervals
  • Uncertainty about distribution shape
  • Constraints (non-parametric p-boxes)
  • Surrogacy uncertainty
  • Modeling

41
Example PCBs and duck hunters
  • Location Massachusetts and Connecticut
  • Receptor Adult human hunters of waterfowl
  • Contaminant PCBs (polychorinated biphenyls)
  • Exposure route dietary consumption of
    contaminated waterfowl

Based on the assessment for non-cancer risks from
PCB to adult hunters who consume contaminated
waterfowl described in Human Health Risk
Assessment GE/Housatonic River Site Rest of
River, Volume IV, DCN GE-031203-ABMP, April
2003, Weston Solutions (West Chester,
Pennsylvania), Avatar Environmental (Exton,
Pennsylvania), and Applied Biomathematics
(Setauket, New York).
42
Hazard quotient
HQ (EF IR C (1- LOSS)) / (AT
BW RfD)
min, max, mean, std
  • EF mmms(1, 52, 5.4, 10) meals per year //
    exposure frequency, censored data, n 23
  • IR mmms(1.5, 675, 188, 113) grams per meal //
    poultry ingestion rate from EPAs EFH
  • C 7.1, 9.73 mg per kg // exposure point
    (mean) concentration
  • LOSS 0 // loss due to cooking
  • AT 365.25 days per year // averaging time (not
    just units conversion)
  • BW mixture(BWfemale, BWmale) // Brainard and
    Burmaster (1992)
  • BWmale lognormal(171, 30) pounds // adult male
    n 9,983
  • BWfemale lognormal(145, 30) pounds // adult
    female n 10,339
  • RfD 0.00002 mg per kg per day // reference dose
    considered tolerable

43
Inputs
1 - CDF
1
1
Exceedance risk
EF
IR
0
0
0
10
20
30
40
50
60
0
200
400
600
meals per year
grams per meal
1
1
males
females
C
BW
0
0
0
10
20
0
100
200
300
mg per kg
pounds
44
Results
EF mmms(1, 52, 5.4, 10) units('meals per
year') // exposure frequency, censored data, n
23 IR mmms(1.5, 675, 188, 113) units('grams
per meal') // poultry ingestion rate from EPAs
EFH C 7.1, 9.73 units('mg per kg') //
exposure point (mean) concentration LOSS 0 //
loss due to cooking AT 365.25 units('days per
year') // averaging time (not just units
conversion) BWmale lognormal(171, 30)
units('pounds') // adult male n 9,983 BWfemale
lognormal(145, 30) units('pounds') // adult
female n 10,339 BW mixture(BWfemale,
BWmale) // Brainard and Burmaster (1992) RfD
0.00002 units('mg per kg per day') // reference
dose considered tolerable HQ (EF IR C
(1- LOSS)) / (AT BW RfD) HQ
(range5.52769,557906, mean1726,13844,
var0,7e09) meals grams meal-1 days-1 pounds-1
day HQ HQ 0 HQ (range0.0121865,1229.97
, mean3.8,30.6, var0,34557) mean(HQ)
3.8072899212, 30.520271393 sd(HQ) 0,
185.89511004 median(HQ) 0.66730580299,
54.338212132 cut(HQ,95) 3.5745830321,
383.33656983 range(HQ) 0.012186461038,
1229.9710013
1
mean 3.8, 31 standard
deviation 0, 186 median 0.6,
55 95th percentile 3.5 , 384 range
0.01, 1230
Exceedance risk
0
0
500
1000
HQ
45
Dependence
46
Dependence
  • Not all variables are independent
  • Body size and skin surface area
  • Common-cause variables
  • Known dependencies should be modeled
  • What can we do when we dont know them?

47
How to do other dependencies?
  • Independent
  • Perfect (comonotonic)
  • Opposite (countermonotonic)

48
Perfect dependence
AB perfect positive
A?1,3 p1 1/3
A?3,5 p3 1/3
A?2,4 p2 1/3
B?2,8 q1 1/3
AB?3,11 prob1/3
AB?5,13 prob0
AB?4,12 prob0
B?6,10 q2 1/3
AB?7,13 prob0
AB?9,15 prob0
AB?8,14 prob1/3
B?8,12 q3 1/3
AB?9,15 prob0
AB?11,17 prob1/3
AB?10,16 prob0
49
Opposite dependence
AB opposite positive
A?1,3 p1 1/3
A?3,5 p3 1/3
A?2,4 p2 1/3
B?2,8 q1 1/3
AB?3,11 prob0
AB?5,13 prob1/3
AB?4,12 prob0
B?6,10 q2 1/3
AB?7,13 prob0
AB?9,15 prob0
AB?8,14 prob1/3
B?8,12 q3 1/3
AB?9,15 prob 1/3
AB?11,17 prob0
AB?10,16 prob0
50
Perfect and opposite dependencies
1
Cumulative probability
0
0
3
6
9
12
15
18
AB
51
Uncertainty about dependence
  • Sensitivity analyses usually used
  • Vary correlation coefficient between ?1 and 1
  • But this underestimates the true uncertainty
  • Example suppose X, Y uniform(0,24) but we
    dont know the dependence between X and Y

52
Varying the correlation coefficient

1
1
Cumulative probability
X, Y uniform(0,24)
0
0
0
10
20
30
40
50
0
10
20
30
40
50
XY
53
Counterexample
54
What about other dependencies?
  • Independent
  • Perfectly positive
  • Opposite
  • Positively or negatively associated
  • Specified correlation coefficient
  • Nonlinear dependence (copula)
  • Unknown dependence

55
Fréchet inequalities
  • They make no assumption about dependence (Fréchet
    1935)
  • max(0, P(A)P(B)1) ? P(A B) ? min(P(A), P(B))
  • max(P(A), P(B)) ? P(A ? B) ? min(1, P(A)P(B))

56
Fréchet case (no assumption)
AB Fréchet case
A?1,3 p1 1/3
A?3,5 p3 1/3
A?2,4 p2 1/3
B?2,8 q1 1/3
AB?3,11 prob0,1/3
AB?5,13 prob0,1/3
AB?4,12 prob0,1/3
B?6,10 q2 1/3
AB?7,13 prob0,1/3
AB?9,15 prob0,1/3
AB?8,14 prob0,1/3
B?8,12 q3 1/3
AB?9,15 prob0,1/3
AB?11,17 prob0,1/3
AB?10,16 prob0,1/3
57
Naïve Fréchet case
1
This p-box is not best possible
Cumulative Probability
0
0
3
6
9
12
15
18
AB
58
Fréchet can be improved
  • Interval estimates of probabilities dont reflect
    the fact that the sum must equal one
  • Resulting p-box is too fat
  • Linear programming needed to get the optimal
    answer using this approach
  • Frank, Nelsen and Sklar gave a way to compute the
    optimal answer directly

59
Frank, Nelsen and Sklar (1987)
If X F and Y G, then the distribution of XY
is
where C is the copula between F and G. In any
case, and irrespective of this dependence, this
distribution is bounded by
This formula can be generalized to work with
bounds on F and G.
60
Best possible bounds
1
Cumulative Probability
0
0
3
6
9
12
15
18
AB
61
Unknown dependence
1
Fréchet
Cumulative probability
X,Y uniform(1,24)
0
10
20
30
40
50
0
X Y
62
Between independence and Fréchet
  • May know something about dependence that tightens
    better than Fréchet
  • Dependence is positive (PQD)
  • P(X ? x, Y ? y) ? P(X ? x) P(Y ? y) for all x
    and y
  • Variables are uncorrelated
  • Pearson correlation r is zero
  • Dependence is a particular kind

63
Unknown but positive dependence
1
Positive
Cumulative probability
X,Y uniform(1,24)
0
10
20
30
40
50
0
X Y
64
Uncorrelated variables
1
Uncorrelated
Cumulative probability
X,Y uniform(1,24)
0
10
20
30
40
50
0
X Y
65
Varying correlation between ?1 and 1
1
Pearson-normal
Cumulative probability
X,Y uniform(1,24)
0
10
20
30
40
50
0
X Y
66
Can model dependence exactly too
1
Frank (medial)
0.5
0
4
6
8
10
12
1
Cumulative probability
Mardia (Kendall)
0.5
0
X normal(5,1) Y uniform(2,5) various
correlations and dependence functions (copulas)
4
6
8
10
12
1
Clayton
0.5
0
4
6
8
10
12
X Y
67
Example dioxin inhalation
  • Location Superfund site in California
  • Receptor adults in neighboring community
  • Contaminant dioxin
  • Exposure route inhalation of windborne soil

Modified from Table II and IV in Copeland, T.L.,
A.M. Holbrow, J.M Otani, K.T. Conner and D.J.
Paustenbach 1994. Use of probabilistic methods to
understand the conservativism in Californias
approach to assessing health risks posed by air
contaminant. Journal of the Air and Waste
Management Association 44 1399-1413.
68
Total daily intake from inhalation
  • R normal(20, 2) //respiration rate, m3/day
  • CGL 2 //concentration at ground level, mg/m3
  • Finh uniform(0.46,1) //fraction of
    particulates retained in lung, unitless
  • ED exponential(11) //exposure duration, years
  • EF uniform(0.58, 1) //exposure frequency,
    fraction of a year
  • BW normal(64.2, 13.19) //receptor body
    weight, kg
  • AT gumbel(70, 8) //averaging time, years

69
Input distributions
CGL
70
Results
1
All variables mutually independent
Exceedance risk
No assumptions about dependencies
0
0
1
2
TDI, mg kg?1 day?1
71
Uncertainty about dependence
  • Impossible with sensitivity analysis since its
    an infinite-dimensional problem
  • Kolmogorov-Fréchet bounding lets you be sure
  • Can be a large or a small consequence

72
Model uncertainty
73
Model uncertainty
  • Doubt about the structural form of the model
  • Usually incertitude rather than variability
  • Often the elephant in the middle of the room

74
Uncertainty in probabilistic analyses
  • Parameters
  • Data surrogacy
  • Distribution shape
  • Intervariable dependence
  • Arithmetic expression
  • Level of abstraction

75
General strategies
  • Sensitivity (what-if) studies
  • Probabilistic mixture
  • Bayesian model averaging
  • Enveloping and bounding analyses

76
Sensitivity (what-if) studies
  • Simply re-computes the analysis with alternative
    assumptions
  • Intergovernmental Panel on Climate Change
  • No theory required to use or understand

77
Example
  • The function f is one of two possibilities.
    Either
  • f(A,B) fPlus(A,B) A B
  • or
  • f(A,B) fTimes(A,B) A ? B
  • is the correct model, but the analyst does not
    know which. Suppose that A triangular(?2.6, 0,
    2.6) and B triangular(2.4, 5, 7.6).

78
1
0.8
0.6
Cumulative probability
0.4
Plus
Times
0.2
0
-15
-10
-5
0
5
10
15
X
79
Drawbacks of what-if
  • Consider a long-term model of the economy under
    global warming stress
  • 3 baseline weather trends
  • 3 emission scenarios
  • 3 population models
  • 3 mitigation plans
  • Combinatorially complex as more model components
    are considered
  • Cumbersome to summarize results

80
Probabilistic mixture
  • Identify all possible models
  • Translate model uncertainty into choices about
    distributions
  • Vertically average probability densities (or
    cumulative probabilities)
  • Can use weights to account for credibility

Show graph with both density and cdf, with equal
weights
81
Example
1
P(f) 2P(f?)
0.8
0.6
Cumulative probability
What-if
0.4
Mixture
0.2
0
-15
-10
-5
0
5
10
15
X
82
Probabilistic mixture
  • State of the art in probabilistic risk analysis
  • Nuclear power plant risk assessments
  • Need to know what all the possibilities are
  • If dont know the weights, assume equality

83
Drawbacks of mixture
  • If you cannot enumerate the possible models, you
    cant use this approach
  • Averages together incompatible theories and
    yields an answer that neither theory supports
  • Can underestimate tail risks

84
Bayesian model averaging
  • Similar to the probabilistic mixture
  • Updates prior probabilities to get weights
  • Takes account of available data

85
Bayesian model averaging
  • Assume one of the two models is correct
  • Compute probability distribution for f(A,B)
  • Read off probability density of observed data
  • Thats the likelihood of the model
  • Repeat above steps for each model
  • Compute posterior ? prior ? likelihood
  • This gives the Bayes factors
  • Use the posteriors as weights for the mixture

86
1
Datum f(A,B) 7.59
0.8
0.6
Bayes
Cumulative probability
What-if
0.4
Mixture
0.2
0
-15
-10
-5
0
5
10
15
X
87
Bounding probabilities
  • Translate model uncertainties to a choice among
    distributions
  • Envelope the cumulative distributions

88
1
0.8
0.6
Bayes
Cumulative probability
What-if
0.4
Mixture
Envelope
0.2
0
-15
-10
-5
0
5
10
15
X
89
Strategy for enumerable models
  • What-if analysis isnt feasible in big problems
  • Probabilistic mixture is, at best, ad hoc
  • For abundant data, Bayesian approach is best
  • Otherwise, its probably just wishful thinking
  • Bounding is reliable, but may be too wide

90
Uncertainty about distribution shape
  • Suppose correct model is known to be f(A,B)
    AB, but the distributions for the independent
    variables A and B are not precisely known.

1
1
0
0
-10
0
10
20
-10
0
10
91
Strategy for distribution shape
  • Very challenging for sensitivity analysis since
    infinite-dimensional problem
  • Bayesians usually fall back on a maximum entropy
    approach, which erases uncertainty rather than
    propagates it
  • Bounding seems most reasonable, but should
    reflect all available information

92
Uncertainty about dependence
  • Neither sensitivity studies nor Monte Carlo
    simulation can comprehensively assess it
  • Bayesian model averaging cant even begin
  • Only bounding strategies work

93
What-if sensitivity analysis
  • Simple theory
  • Straightforward to implement
  • Doesnt confuse variability and incertitude
  • Must enumerate all possible models
  • Combinatorial complexity
  • Hard to summarize

94
Probabilistic mixture
  • Produces single distribution as answer
  • Can account for differential credibility
  • Must enumerate all possible models
  • Confounds variability and incertitude
  • Averages together incompatible theories
  • Underestimates tail risks

95
Bayesian model averaging
  • Produces single distribution as answer
  • Can account for differential prior credibility
  • Takes account of available data
  • Must enumerate all possible models
  • May be computationally challenging
  • Confounds variability and incertitude
  • Averages together incompatible theories
  • Underestimates tail risks

96
Bounding probability
  • Straightforward theoretically
  • Yields single mathematical object as answer
  • Doesnt confuse variability and incertitude
  • Doesnt underestimate tail risks
  • Cannot account for differential credibility
  • Cannot take account of available data
  • Optimality may be computationally expensive

97
Strategy for model uncertianty
  • Bounding seems best when data are sparse
  • When the answer is clear, strong assurance for
    the conclusion
  • If the answer is too wide, need more information
    to tighten it
  • Unless its okay to mislead people about the
    reliability of your conclusions

98
Updating
99
Updating
  • Using knowledge of how variables are related to
    tighten their estimates
  • Removes internal inconsistency and explicates
    unrecognized knowledge
  • Also called constraint updating or editing
  • Also called natural extension

100
Example
  • Suppose
  • W 23, 33
  • H 112, 150
  • A 2000, 3200
  • Does knowing W ? H A let us to say any more?

101
Answer
  • Yes, we can infer that
  • W 23, 28.57
  • H 112, 139.13
  • A 2576, 3200
  • The formulas are just W intersect(W, A/H), etc.

To get the largest possible W, for instance, let
A be as large as possible and H as small as
possible, and solve for W A/H.
102
Bayesian strategy
Prior
Likelihood
Posterior
103
Bayes rule
  • Concentrates mass onto the manifold of feasible
    combinations of W, H, and A
  • Answers have the same supports as intervals
  • Computationally complex
  • Needs specification of priors
  • Yields distributions that are not justified (come
    from the choice of priors)
  • Expresses less uncertainty than is present

104
Updating with p-boxes
1
1
1
A
H
W
0
0
0
2000
3000
4000
20
30
40
120
140
160
105
Answers
106
Calculation with p-boxes
  • Agrees with interval analysis whenever inputs are
    intervals
  • Relaxes Bayesian strategy when precise priors are
    not warranted
  • Produces more reasonable answers when priors not
    well known
  • Much easier to compute than Bayes rule

107
Backcalculation
108
Backcalculation
  • Needed for cleanup and remediation planning
  • Untangles an equation in uncertain numbers when
    we know all but one of the variables
  • For instance, backcalculation finds B such that
    AB C, from estimates for A and C

109
Hard with probability distributions
  • Inverting the equation doesnt work
  • Available analytical algorithms are unstable for
    almost all problems
  • Except in a few special cases, Monte Carlo
    simulation cannot compute backcalculations trial
    and error methods are required

110
Cant just invert the equation
  • Dose Concentration Intake
  • Concentration Dose / Intake
  • When concentration is put back into the forward
    equation, the resulting dose is wider than planned

111
Backcalculation with p-boxes
  • Suppose A B C, where
  • A normal(5, 1)
  • C 0 ? C, median ? 1.5, 90th ile ? 35, max ?
    50

1
C
0
0
10
20
30
40
50
60
112
Getting the answer
  • The backcalculation algorithm basically reverses
    the forward convolution
  • Not hard at allbut a little messy to show
  • Any distribution totally inside B is
    sure to satisfy the constraint
    its a kernel

1
B
0
-10
0
10
20
30
40
50
113
Check it by plugging it back in
  • A B C ? C

114
Precise distributions dont work
  • Precise distributions cant express the target
  • A concentration distribution giving a prescribed
    distribution of doses seems to say we want some
    doses to be high
  • Any distribution to the left would be better
  • A p-box on the dose target expresses this idea

115
Backcalculation algebra
  • Can define untanglings for all basic operations
  • e.g., if A ? B C, then B exp(backcalc(ln A,
    ln C))
  • Can chain them together for big problems
  • Assuming independence widens the result
  • Repeated terms need special strategies

116
Conclusion
  • Planning cleanup requires backcalculation
  • Monte Carlo methods dont generally work except
    in a trial-and-error approach
  • Can express the dose target as a p-box

117
Sensitivity analysis
118
Sensitivity analysis with p-boxes
  • Local sensitivity via derivatives
  • Explored macroscopically over the uncertainty in
    the input
  • Describes the ensemble of tangent slopes to the
    function over the range of uncertainty

119
Monotone function
Nonlinear function
range of input
range of input
120
Sensitivity analysis of p-boxes
  • Quantifies the reduction in uncertainty of a
    result when an input is pinched
  • Pinching is hypothetically replacing it by a less
    uncertain characterization

121
Pinching to a point value
1
1
Cumulative probability
Cumulative probability
0
0
1
2
3
0
1
2
3
0
X
X
122
Pinching to a (precise) distribution
1
1
Cumulative probability
Cumulative probability
0
0
1
2
3
0
1
2
3
0
X
X
123
Pinching to a zero-variance interval
1
Cumulative probability
0
1
2
3
0
X
  • Assumes value is constant, but unknown
  • Theres no analog of this in Monte Carlo

124
Using sensitivity analyses
  • There is only one take-home message
  • Shortlisting variables for treatment is bad
  • Reduces dimensionality, but erases uncertainty

125
Differential equations
126
Uncertainty usually explodes
The explosion can be traced to numerical
instabilities
x
Time
127
Uncertainty
  • Artifactual uncertainty
  • Too few polynomial terms
  • Numerical instability
  • Can be reduced by a better analysis
  • Authentic uncertainty
  • Genuine unpredictability due to input uncertainty
  • Cannot be reduced by a better analysis

Only by more information, data or assumptions
128
Uncertainty propagation
  • We want the prediction to break down if thats
    what should happen
  • But we dont want artifactual uncertainty
  • Numerical instabilities
  • Wrapping effect
  • Dependence problem
  • Repeated parameters

129
Problem
  • Nonlinear ordinary differential equation (ODE)
  • dx/dt f(x, ?)
  • with uncertain ? and uncertain initial state x0
  • Information about ? and x0 comes as
  • Interval ranges
  • Probability distributions
  • Probability boxes

130
Model Initial states (range) Parameters (range)
VSPODE Mark Stadherr et al. (Notre Dame)
Taylor models interval Taylor series code, see
also COSY and VNODE
List of constants plus remainder
131
Example ODE
  • dx1/dt ?1 x1(1 x2)
  • dx2/dt ?2 x2(x11)
  • What are the states at t 10?
  • x0 (1.2, 1.1)T
  • ?1 ?2.99, 3.01
  • ?2 ?0.99, 1.01
  • VSPODE
  • Constant step size h 0.1, Order of Taylor model
    q 5,
  • Order of interval Taylor series k 17, QR
    factorization

132
VSPODE tells how to compute x1
tt env(uniform1(0, 0.004), uniform1(0,
0.005)) t1 tt3 t2 tt 1 t1 t1 - 3 t2
t2 - 1 1.916037656181642 1 t21
0.689979149231081 t11 1
-4.690741189299572 1 t22
-2.275734193378134 t11 t21
-0.450416914564394 t12 1
-29.788252573360062 1 t23
-35.200757076497972 t11 t22
-12.401600707197074 t12 t21
-1.349694561113611 t13 1
6.062509834147210 1 t24
-29.503128650484253 t11 t23
-25.744336555602068 t12 t22
-5.563350070358247 t13 t21
-0.222000132892585 t14 1
218.607042326120308 1 t25
390.260443722081675 t11 t24
256.315067368131281 t12 t23
86.029720297509172 t13 t22
15.322357274648443 t14 t21
1.094676837431721 t15 1
1.1477537620811058, 1.1477539164945061
  • 1.916037656181642 ? ?10 ? ?21
    0.689979149231081 ? ?11 ? ?20
  • -4.690741189299572 ? ?10 ? ?22
    -2.275734193378134 ? ?11 ? ?21
  • -0.450416914564394 ? ?12 ? ?20
    -29.788252573360062 ? ?10 ? ?23
  • -35.200757076497972 ? ?11 ? ?22
    -12.401600707197074 ? ?12 ? ?21
  • -1.349694561113611 ? ?13 ? ?20
    6.062509834147210 ? ?10 ? ?24
  • -29.503128650484253 ? ?11 ? ?23
    -25.744336555602068 ? ?12 ? ?22
  • -5.563350070358247 ? ?13 ? ?21
    -0.222000132892585 ? ?14 ? ?20
  • 218.607042326120308 ? ?10 ? ?25
    390.260443722081675 ? ?11 ? ?24
  • 256.315067368131281 ? ?12 ? ?23
    86.029720297509172 ? ?13 ? ?22
  • 15.322357274648443 ? ?14 ? ?21
    1.094676837431721 ? ?15 ? ?20
  • 1.1477537620811058, 1.1477539164945061

where ? s are centered forms of the parameters
?1 ?1 ? 3, ?2 ?2 ? 1
133
Input p-boxes
?1
?2
t1 env(uniform1(0, 0.004), uniform1(0,
0.005)) tt t13 show tt in red tt t11 tt
N(3, .002,.0038) tt N(1, .002,.0038) tt
mmms(2.99, 3.0099999999, 3, .005) tt mmms(.99,
1.0099999999, 1, .005) tt U(2.99,3.0099999999) t
t U(.99,1.0099999999)
uniform
normal
min, max, mean, var
precise
134
Results
t1 env(uniform1(0, 0.004), uniform1(0,
0.005)) t2 t1 o MY1map2(t1,t2) clear show o
in red o MY2map2(t1,t2) t1 N(3,
.002,.0038) - 3 t2 N(1, .002,.0038) - 1 o
MY1map2(t1,t2) o MY2map2(t1,t2) t1
mmms(2.99, 3.0099999999, 3, .005) - 3 t2
mmms(.99, 1.0099999999, 1, .005) - 1 o
MY1map2(t1,t2) o MY2map2(t1,t2) t1
U(2.99,3.0099999999) - 3 t2 U(.99,1.0099999999)
- 1 o MY1map2(t1,t2) o MY2map2(t1,t2)
x1
x2
uniform
normal
min, max, mean, var
precise
135
Still repeated uncertainties
  • 1.916037656181642 ? ?10 ? ?21
    0.689979149231081 ? ?11 ? ?20
  • -4.690741189299572 ? ?10 ? ?22
    -2.275734193378134 ? ?11 ? ?21
  • -0.450416914564394 ? ?12 ? ?20
    -29.788252573360062 ? ?10 ? ?23
  • -35.200757076497972 ? ?11 ? ?22
    -12.401600707197074 ? ?12 ? ?21
  • -1.349694561113611 ? ?13 ? ?20
    6.062509834147210 ? ?10 ? ?24
  • -29.503128650484253 ? ?11 ? ?23
    -25.744336555602068 ? ?12 ? ?22
  • -5.563350070358247 ? ?13 ? ?21
    -0.222000132892585 ? ?14 ? ?20
  • 218.607042326120308 ? ?10 ? ?25
    390.260443722081675 ? ?11 ? ?24
  • 256.315067368131281 ? ?12 ? ?23
    86.029720297509172 ? ?13 ? ?22
  • 15.322357274648443 ? ?14 ? ?21
    1.094676837431721 ? ?15 ? ?20
  • 1.1477537620811058, 1.1477539164945061

136
Subinterval reconstitution
  • Subinterval reconstitution (SIR)
  • Partition the inputs into subintervals
  • Apply the function to each subinterval
  • Form the union of the results
  • Still rigorous, but often tighter
  • The finer the partition, the tighter the union
  • Many strategies for partitioning
  • Apply to each cell in the Cartesian product

137
Discretizations
1
0
2.99
3
3.01
138
Contraction from SIR
clear show Y1.4 in blue show Y1.6 in blue show
Y1.4 in gray show Y1.7 in blue show Y1.6 in
gray show Y1.8 in blue show Y1.7 in
gray clear show Y2.4 in blue show Y2.6 in blue
show Y2.4 in gray show Y2.7 in blue show Y2.6 in
gray show Y2.8 in blue show Y2.7 in gray import
Y2.8 Importing variable from Y2-8.prn
1
1
Probability
Best possible bounds reveal the authentic
uncertainty
0
0
0.87
0.88
0.89
0.9
1.12
1.14
1.16
x1
x2
139
Monte Carlo is more limited
  • Monte Carlo cannot propagate incertitude
  • Monte Carlo cannot produce validated results
  • Though can be checked by repeating simulation
  • Validated results from distributions can be
    obtained by modeling inputs with (narrow) p-boxes
    and applying probability bounds analysis
  • Results converge to narrow p-boxes obtained from
    infinitely many Monte Carlo replications

140
Conclusions
  • VSPODE is useful for bounding solutions of
    parametric nonlinear ODEs
  • Probability bounds analysis is useful when
    distributions are known imprecisely
  • Together, they rigorously propagate uncertainty
    through a nonlinear ODE
  • Intervals
  • Distributions
  • P-boxes

Initial states Parameters
141
Conclusions
142
Rigorousness
  • Automatically verified calculations
  • The computations are guaranteed to enclose the
    true results (so long as the inputs do)
  • You can still be wrong, but the method wont be
    the reason if you are

143
How to use the results
  • When uncertainty makes no difference
    (because results are so clear), bounding gives
    confidence in the reliability of the decision
  • When uncertainty obscures the decision
  • (i) use results to identify inputs to study
    better, or
  • (ii) use other criteria within probability bounds

144
Can uncertainty swamp the answer?
  • Sure, if uncertainty is huge
  • This should happen (its not unhelpful)
  • If you think the bounds are too wide, then put in
    whatever information is missing
  • If there isnt any such information, do you want
    to mislead your readers?

145
Monte Carlo is problematic
  • Probability doesnt accumulate gross uncertainty
    in an intuitive way
  • Precision of the answer (measured as cv) depends
    strongly on the number of inputs
  • The more inputs, the tighter the answer,
    irrespective of the distribution shape

146
A few grossly uncertain inputs
147
A lot of grossly uncertain inputs...
Where does this surety come from? What justifies
it?
148
Smoke and mirrors certainty
  • P-boxes give a vacuous answer if all you provide
    are vacuous inputs
  • Conventional probability theory, at least as its
    naively applied, seems to manufacture certainty
Write a Comment
User Comments (0)
About PowerShow.com