Errors%20and%20uncertainties%20in%20measuring%20and%20modelling%20surface-atmosphere%20exchanges - PowerPoint PPT Presentation

About This Presentation

Title:

Errors%20and%20uncertainties%20in%20measuring%20and%20modelling%20surface-atmosphere%20exchanges

Description:

A measurement is never perfect - data are not 'truth' (corrupted truth? ... Flux measurement errors are non-Gaussian and have non-constant variance ... – PowerPoint PPT presentation

Number of Views:134

Avg rating:3.0/5.0

Slides: 55

Provided by: andrewri

Learn more at: https://www.forest.sr.unh.edu

Category:

more less

Transcript and Presenter's Notes

Title: Errors%20and%20uncertainties%20in%20measuring%20and%20modelling%20surface-atmosphere%20exchanges

1
Errors and uncertainties in measuring and
modellingsurface-atmosphere exchanges

Andrew D. Richardson
University of New Hampshire
NSF/NCAR Summer Course on Flux Measurements
Niwot Ridge, July 17 2008

2
Outline

Introduction to errors in data
Errors in flux measurements
Different methods to quantify flux errors
Implications for modeling

3
(No Transcript)
4
32,000 how much?(note the super calculation
error!)
5
Introduction to errors in data
6
Errors are unavoidable,but errors dont have to
cause disaster
(Gare Montparnasse, Paris, 22 October 1895)
7
Errors in data

Why do we have measurement errors?

8
Errors in data

Why do we have measurement errors?
Instrument errors, glitches, bugs
Instrument calibration errors
Imperfect instrument design, less-than-ideal
application
Instrument resolution
Problem of definition what are we trying to
measure, anyway?
Errors are unavoidable and inevitable, but they
can always be reduced
Errors are not necessarily bad, but not knowing
what they are, or having an unrealistic view of
what they are, is bad

9
Contemporary political perspective

There are known knowns. These are things we know
that we know. There are known unknowns. That is
to say, there are things that we know we don't
know. But there are also unknown unknowns. There
are things we don't know we don't know.
Donald Rumsfeld
(February 12, 2002)

10
What measurement uncertainty?

A measurement is never perfect - data are not
truth (corrupted truth?)
Uncertainty describes the inevitable error of a
measurement
x x ? ?

x is what is actually measured it includes both
systematic (d) and random (e) components
typically, e is assumed Gaussian, and is
characterized by its standard deviation, ?(e)

11
Types of errors

Random error
Unpredictable, stochastic
Scatter, noise, precision
Cannot be corrected (because they are stochastic)
Example noisy analyzer (electrical interference)
Systematic error
Deterministic, predictable
Bias, accuracy
Can be corrected (if you know what the correction
is)
Example mis-calibrated analyzer (bad zero or bad
span)

12
Propagation of errors

Random errors
true value, x, measure xi x ei, where ei is
a random variable N(0,si)
average out over time, thus errors accumulate
in quadrature
expected error on (x1 x2) is
, which is
Systematic errors
fixed biases dont average out, but rather
accumulate linearly
measure xi x di,where di is not a random
variable
expected error on (x1 x2) is just (d1 d2)
So random and systematic errors are
fundamentally different in how they affect data
and interpretation

13
Precision and AccuracyTarget analogy

Accuracy how close to center
Precision how close together

High accuracy, low precision
High precision, low accuracy
14
Evaluating Errors

Random errors and precision
Make repeated measurements of the same thing
What is the scatter in those measurements?
Systematic errors and accuracy
Measure a reference standard
What is the bias?
Not always possible to quantify some systematic
errors (know theyre there, but dont have a
standard we can measure)

15
What can we do about errors?

(Honestly) quantify sources of error
Random
Systematic
Minimize/eliminate them
Identify ways to reduce specific sources of error
Examples more frequent calibration, better QC
procedures, reduce instrument noise
Correct for biases where possible
Evaluate reductions in error

16
Why do we care about quantifying errors?
17
Why do we care about quantifying errors?

How much confidence do we have in the data?
How certain are we about a particular
measurement?
How close to the true value are we?
Are our data biased? In which direction?
Errors influence our interpretation of data
Errors reduce the usefulness of the information
in our data
Errors in data propagate to subsequent analyses

18
Random errors lead to statistical universes

The observed data are just one realization, drawn
from a statistical universe of data sets (Press
et al. 1992)
Different realizations of the random draw lead to
different estimates of true model parameters
(or other statistics calculated from data)
Parameters estimates are therefore themselves
uncertain (but we want to describe their
distributions!)

A statistical universe
19
Monte Carlo Example

Two options for propagating errors
Complicated mathematics, based on theory and
first principles
Monte Carlo simulations
Characterize uncertainty
Generate synthetic data (model uncertainty)
i.e. new realization from statistical universe
Estimate statistics or parameters, P, of interest
Repeat (2 3) many times
Posterior evaluation of distribution of P

20
Viva Las Vegas!
Offered the choice between mastery of a
five-foot shelf of analytical statistics books
and middling ability at performing statistical
Monte Carlo simulations, we would surely choose
to have the latter skill. William H.
Press, Numerical Recipes in Fortran 77
Why you should learn a programming language (even
fossil languages like BASIC) Its really easy
and fast to do MC simulations. In spreadsheet
programs, it is extremely tedious (and slow),
especially with large data sets (like you have
with eddy flux data!).
21
What we want to know

What are the characteristics of the error
What are the sources of error, and do the sources
of error change over time?
How big are the errors (105 or 10.00010.0001),
and which are the biggest sources?
Are the errors systematic, random, or some
combination thereof?
Can we correct or adjust for errors?
Can we reduce the errors (better instruments,
more careful technician, etc.?)

22
Characteristics of interest

How is the error distributed?
What is the (approximate) pdf
What are its moments?
First moment mean (average value)
Second moment standard deviation (how variable)
Third moment skewness (how symmetric)
Fourth moment kurtosis (how peaky)

23
Random error distributions

Assumptions
Normal distribution
Constant variance
Independent errors
Reality
Other distributions are possible!
Variance may not be constant
Independence?

Normal
Laplace
Lognormal
Uniform
24
Constant variance?

Homoscedastic (constant variance) vs.
Heteroscedastic (nonconstant variance)
Assume homoscedastic, but commonly error
variance scales with measurement
Large outliers more likely when error variance is
large

25
Independence

Measurement error in period (t) are uncorrelated
with errors in period (t-1)
Independent
coin toss (white noise)
Negative autocorrelation
growth rate estimated from size measurements
Positive autocorrelation
weekly biomass estimates confounded by seasonal
variation in other factors (e.g. summer drought
and shrinking xylem)
Difficult to detect or test for independence
without a very good underlying model (with poor
model, apparent autocorrelation may be due to
model structure and not error structure!)

26
Take-home messages

For modelling,
More noise in data more uncertainty in
estimated model parameters
Systematic error in data biased estimates of
model parameters
Monte Carlo simulations as an easy way to
evaluate impact of errors

27
Errors in flux measurements
28
Why characterize flux measurement uncertainty?

Uncertainty information needed to compare
measurements, measurements and models, and to
propagate errors (scaling up in space and time)
Uncertainty information needed to set policy
for risk analysis (what are confidence intervals
on estimated C sink strength?)
Uncertainty information needed for all aspects of
data-model fusion (correct specification of cost
function, forward prediction of states, etc.)
Small uncertainties are not necessarily good
large uncertainties are not necessarily bad
biased prediction is ok if truth is within
confidence limits if truth is outside of
confidence limits, uncertainties are
under-estimated

29
Challenge of flux data

Complex
Multiple processes (but only measure NEE)
Diurnal, synoptic, seasonal, annual scales of
variation
Gaps in data (QC criteria, instrument
malfunction, unsuitable weather, etc.)
Multiple sources of error and uncertainty (known
unknowns as well as unknown unknowns!)
Random errors are large but tolerable
Systematic errors are evil, and the corrections
for them are largely uncertain (sometimes even in
sign)
Many sources of systematic error (sometimes in
different directions)

30
Systematic errors
Random errors
31
Systematic errors
Random errors

examples
nocturnal biases
imperfect spectral response
advection
energy balance closure
operate at varying time scales fully systematic
vs. selectively systematic
variety of influences fixed offset vs. relative
offset
cannot be identified through statistical analyses
can correct for systematic errors (but
corrections themselves are uncertain)
uncorrected systematic errors will bias DMF
analyses

examples
surface heterogeneity and time varying footprint
turbulence sampling errors
measurement equipment (IRGA and sonic anemometer)
random errors are stochastic characteristics of
pdf can be estimated via statistical analyses
(but may be time-varying)
affect all measurements
cannot correct for random errors
random errors limit agreement between
measurements and models, but should not bias
results

32
How to estimate distributions of random flux
errors
33
Why focus on random errors?

Systematic errors cant be identified through
analysis of data
Systematic errors are harder to quantify (leave
that to the geniuses)
For modeling, must correct for systematic errors
first (or assume they are zero)
Knowing something about random errors is much
more important from modeling perspective

34
Two methods
35
Two methods

Repeated measurements of the same thing
Paired towers (rarely applicable)
Hollinger et al., 2004 GCB Hollinger and
Richardson, 2005 Tree Phys
Paired observations (applicable everywhere)
Hollinger and Richardson, 2005 Tree Phys
Richardson et al. 2006 AFM
Comparison with truth
Model residuals (assume model truth)
Richardson et al., 2005 AFM Hagen et al. JGR
2006 Richardson et al. 2008 AFM

36
Paired tower approach

Repeated measurements Use simultaneous but
independent measurements from two towers, x1 and
x2
Howland Main and West towers
- same environmental conditions
- located in similar patches of forest
- non-overlapping footprints (independent
turbulence)

Main West
800m
37
Paired measurements

Assume we have measurements x1, x2 from two
towers
var(x1 x2) var(x1) var(x2) 2 covar (x1,
x2)
Since x1 and x2 are assumed independent,
covar(x1, x2) 0
Also, var(x1) var(x2) var(e), where e is the
random error
So var(x1 x2) 2 var(e)
And thus
Use multiple pairs x1, x2 to infer distribution
of e!

38
Alternatively

Earlier, suggested quantifying random error by
the standard deviation (s) of multiple
independent measurements of the same thing (xi)
For i 1,2, reduces to or
If following this approach, mean s across
multiple x1, x2 pairs is calculated as(i.e. as
a geometric mean, or square root of the mean
variance).

39
Another paired approach

Two tower approach can only rarely be used
Alternative substitute time for space
Use x1, x2 measured 24 h apart under similar
environmental conditions
PPFD, VPD, Air/Soil temperature, Wind speed
Tradeoff tight filtering criteria not many
paired measurements, poor estimates of
statistics loose filtering other factors
confound uncertainty estimate

40
Model residuals

Common in many fields (less so in flux world)
to conduct posterior analyses of residuals to
investigate pdf of errors, homoscedasticity, etc.
Disadvantage
Model must be good or uncertainty estimates
confounded by model error
Advantages
can evaluate asymmetry in error distribution (not
possible with paired approach)
many data points with which to estimate
statistics

41
A double exponential (Laplace) pdf better
characterizes the uncertainty

Strong central peak
heavy tails (leptokurtic)
non-Gaussian pdf
Better double-exponential pdf,
f(x) exp(x/?)/2?

The double-exponential is characterized by the
scale parameter b
42
The standard deviation of the uncertaintyscales
with the magnitude of the flux

Larger fluxes are more uncertain than small
fluxes
Relative error decreases with flux magnitude
(even when flux 0 there is still some
uncertainty)
Large errors are not uncommon
95 CI 60
75 CI 30

To obtain maximum likelihood parameter estimates,
cannot use OLS must account for the fact that
the flux measurement errors are non-Gaussian and
have non-constant variance.
43
Generality of results

Scaling of uncertainty with flux magnitude has
been validated using data from a range of
forested CarboEurope sites y-axis intercept
(base uncertainty) varies among sites (factors
tower height, canopy roughness, average wind
speed), but slope constant across sites
(Richardson et al., 2007)
Similar results (non-Gaussian, heteroscedastic)
have been demonstrated for measurements of water
and energy fluxes (H and LE) (Richardson et al.,
2006)
Results are in agreement with predictions of Mann
and Lenschow (1994) error model based on
turbulence statistics (Hollinger Richardson,
2005 Richardson et al., 2006)

44
Generality of results
s(H) 19.5 W m-2 s(LE) 16.5 W m-2 s(FCO2)
2.0 mmol m-2 s-1
Uncertainties of all fluxes increase with flux
magnitude.
45
Comparison of approaches

Error estimates vary by 10 across models, are
20 lower for paired approach than for best
model
Errors are more Gaussian for large uptake
fluxes and less Gaussian for fluxes 0 mmol m-2
s-1

46
Uncertainty at various time scales

Systematic errors accumulate linearly over time
(constant relative error)
Random errors accumulate in quadrature (so
relative uncertainty decreases as flux
measurements are aggregated over longer time
periods)

role of Central Limit Theorem as fluxes are
aggregated
Monte Carlo simulations suggest that uncertainty
in annual NEE integrals uncertainty is 30 g C
m-2 y-1 at 95 confidence (combination of random
measurement error and associated uncertainty in
gap filling)
Biases due to advection, etc., are probably much
larger than this but remain very hard to quantify

47
Implications for modeling
48
"To put the point provocatively, providing data
and allowing another researcher to provide the
uncertainty is indistinguishable from allowing
the second researcher to make up the data in the
first place."

Raupach et al. (2005). Model data synthesis in
terrestrial carbon observation methods, data
requirements and data uncertainty specifications.
Global Change Biology 11378-97.

49
Why does it matter for modeling?

Cost function (Bayesian or not) depends on error
structure
likelihood function the probability of actually
observing the data, given a particular
parameterization of model
appropriate form of likelihood function depends
on pdf of errors
maximum likelihood optimization determine model
parameters that would be most likely to generate
the observed data, given what is known or assumed
about the measurement error
Ordinary least squares generates ML estimates
only when assumptions of normality and constant
variance are met

50
Maximum likelihood paradigm
what model parameters values are most likely to
have generated the observed data, giventhe model
and what is known about measurement errors?
Assumptions about errors affect specification of
the ML cost function Other cost functions
are possibledepends on error structure!
For Gaussian data (weighted least squares)
For double exponential data (weighted absolute
deviations)
51
Specifying a different cost function affects
optimal parameter estimates

Lloyd Taylor (1994) respiration model

model parameters differ depending on how the
uncertainty is treated (explanation nocturnal
errors have slightly skewed distribution)
Why? error assumptions influence form of
likelihood function

Reco respiration T soil temperature A, E0, T0
parameters
LS AD
A 24.9 43.9
T0 263.9 259.5
E0 33.6 58.5
52
Influence of cost function specification on
model predictions

Half-hourly model predictions depend on
parameter-ization integrated annual sum
decreases by 10 decrease (40 of NEE) when
absolute deviations is used
Influences NEE partitioning, annual sum of GPP
Trivial model but relevant example

53
and also

Random errors are stochastic noise
do not reflect real ecosystem activity
cannot be modeled because they are stochastic
ultimately limit agreement between models and
data
make it difficult
to obtain precise parameter estimates (as shown
by previous Monte Carlo example)
to select or distinguish among candidate models
(more than one model gives acceptably good fit)

54
Summary

Two types of error, random and systematic
Random errors can be inferred from data
Flux measurement errors are non-Gaussian and have
non-constant variance
These characteristics need to be taken into
account when fitting models, when comparing
models and data, and when estimating statistics
from data (annual sums, physiological parameters,
etc.)

Write a Comment

User Comments (0)