Gaussian process modelling - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Gaussian process modelling

Description:

An emulator is a particular kind of meta-model ... The emulator's mean function provides the central estimate for predicting the model output f(x) ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 24
Provided by: anthony162
Category:

less

Transcript and Presenter's Notes

Title: Gaussian process modelling


1
Gaussian process modelling
2
Outline
  • Emulators
  • The basic GP emulator
  • Practical matters

3
Emulators
4
Simulator, meta-model, emulator
  • Ill refer to a computer model as a simulator
  • It aims to simulate some real-world phenomenon
  • A meta-model is a simplified representation or
    approximation of a simulator
  • Built using a training set of simulator runs
  • Importantly, it should run much more quickly than
    the simulator itself
  • So it serves as a quick surrogate for the
    simulator, for any task that would require many
    simulator runs
  • An emulator is a particular kind of meta-model
  • More than just an approximation, it makes fully
    probabilistic predictions of what the simulator
    would produce
  • And those probability statements correctly
    reflect the training information

5
Meta-models
  • Various kinds of meta-models have been proposed
    by modellers and model users
  • Notably regression models and neural networks
  • But misrepresenttraining data
  • Line does not passthrough the points
  • Variance around theline also has thewrong form

6
Emulation
  • Desirable properties for a meta-model
  • If asked to predict the simulator output at one
    of the training data points, it returns the
    observed output with zero variance
  • Assuming the simulator output doesnt have random
    noise
  • So it must be sufficiently flexible to pass
    through all the training data points
  • Not restricted to some regression form
  • If asked to predict output at another point its
    predictions will have non-zero variance,
    reflecting realistic uncertainty
  • Given enough training data it should be able to
    predict simulator output to any desired accuracy
  • These properties characterise what we call an
    emulator

7
2 code runs
  • Consider one input and one output
  • Emulator estimate interpolates data
  • Emulator uncertainty grows between data points

8
3 code runs
  • Adding another point changes estimate and reduces
    uncertainty

9
5 code runs
  • And so on

10
The basic GP emulator
11
Gaussian processes
  • A Gaussian process (GP) is a probability
    distribution for an unknown function
  • A kind of infinite dimensional multivariate
    normal distribution
  • If a function f(x) has a GP distribution we write
  • f(.) GP(m(.), c(.,.))
  • m(.) is the mean function
  • c(.,.) is the covariance function
  • f(x) has a normal distribution with mean m(x) and
    variance c(x,x)
  • c(x,x') is the covariance between f(x) and f(x')
  • A GP emulator represents the simulator as a GP
  • Conditional on some unknown parameters
  • Estimated from the training data

12
The mean function
  • The emulators mean function provides the central
    estimate for predicting the model output f(x)
  • It has two parts
  • A conventional regression component
  • r(x) µ ß1h1(x) ß2h2(x) ßphp(x)
  • The regression terms hj(x) are a modelling choice
  • Should reflect how we expect the simulator to
    respond to its inputs
  • E.g. r(x) µ ß1x1 ß2x2 ßpxp models a
    general linear trend
  • The coefficients µ and ßj are estimated from the
    training data
  • A smooth interpolator of the residuals yi
    r(xi) at the training points
  • Smoothness is controlled by correlation length
    parameters
  • Also estimated from the training data

13
The mean function example
Red dots are training data Green line is
regression line Black line is emulator mean
Red dots are residuals from regression through
training data Black line is smoothed residuals.
14
The prediction variance
  • The variance of f(x) depends on where x is
    relative to training data
  • At a training data point, it is zero
  • Moving away from a training point, it grows
  • Growth depends on correlation lengths
  • When far from any training point (relative to
    correlation lengths), it resolves into two
    components
  • The usual regression variance
  • An interpolator variance
  • Estimated from observed variance of residuals
  • The mean function is then just the regression part

15
Correlation length
  • Correlation length parameters are crucial
  • But difficult to estimate
  • There is one correlation length for each input
  • Points less than one correlation length away in a
    single input are highly correlated
  • Learning f(x') says a lot about f(x)
  • So if x' is a training point, the predictive
    uncertainty about f(x) is small
  • But if we go more than about two correlation
    lengths away, the correlation is minimal
  • We now ignore f(x') when predicting f(x)
  • Just use regression
  • Large correlation length signifies an input with
    very smooth and predictable effect on simulator
    output
  • Small correlation length denotes an input with
    more variable and fine scale influence on the
    output

16
Correlation length and variance
Examples of GP realisations. GEM-SA uses a
roughness parameter b which is the inverse square
of correlation length. s2 is the interpolation
variance.
17
Practical matters
18
Modelling
  • The main modelling decision is to choose the
    regression terms hj(x)
  • Want to capture the broad shape of the response
    of the simulator to its inputs
  • Then residuals are small
  • Emulator predicts f(x) with small variance
  • And predicts realistically for x far from
    training data
  • If we get it wrong
  • Residuals will be unnecessarily large
  • Emulator has unnecessarily large variance when
    interpolating
  • And extrapolates wrongly

19
Design
  • Another choice is the set of training data points
  • This is a kind of experimental design problem
  • We want points spread over the part of the input
    space for which the emulator is needed
  • So that no prediction is too far from a training
    point
  • We want this to be true also when we project the
    points into lower dimensions
  • So that prediction points are not too far from
    training points in dimensions (inputs) with small
    correlation lengths
  • We also want some points closer to each other
  • To estimate correlation lengths better
  • Conventional designs dont take account of this
    yet!

20
Validation
  • No emulator is perfect
  • The GP emulator is based on assumptions
  • A particular form of covariance function
    parametrised by just one correlation length
    parameter per input
  • Homogeneity of variance and correlation structure
  • Simulators rarely behave this nicely!
  • Getting the regression component right
  • Normality
  • Not usually a big issue
  • Estimating parameters accurately from the
    training data
  • Can be a problem for correlation lengths
  • Failure of these assumptions will mean the
    emulator does not predict faithfully
  • f(x) will too often lie outside the range of its
    predictive distribution
  • So we need to apply suitable diagnostic checks

21
When to use GP emulation
  • The simulator output should vary smoothly in
    response to changing its inputs
  • Discontinuities are difficult to emulate
  • Very rapid and erratic responses to inputs also
    may need unreasonably many training data points
  • The simulator is computer intensive
  • So its not practical to run many thousands of
    times for Monte Carlo methods
  • But not so that we cant run it a few hundred
    times to build a good emulator
  • Not too many inputs
  • Fitting the emulator is hard
  • Particularly if more than a few inputs influence
    the output strongly

22
Stochastic simulators
  • Throughout this course we are assuming the
    simulator is deterministic
  • Running it again at the same inputs will produce
    the same outputs
  • If there is random noise in the outputs we can
    modify the emulation theory
  • Mean function doesnt have to pass through the
    data
  • Noise increases predictive variance
  • The benefits of the GP emulator are less
    compelling
  • But we are working on this!

23
References
  1. O'Hagan, A. (2006). Bayesian analysis of computer
    code outputs a tutorial. Reliability Engineering
    and System Safety 91, 1290-1300.
  2. Santner, T. J., Williams, B. J. and Notz, W. I.
    (2003). The Design and Analysis of Computer
    Experiments. New York Springer.
  3. Rasmussen, C. E., and Williams, C. K. I. (2006).
    Gaussian Processes for Machine Learning.
    Cambridge, MA MIT Press.
Write a Comment
User Comments (0)
About PowerShow.com