Uncertainty and Sampling - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Uncertainty and Sampling

Description:

A least-squares fit. CORM 2002: Uncertainty. Combining a Smooth. and Interpolation ... Use this parameter instead of the least-squares in the fit calculations. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 62
Provided by: richar636
Category:

less

Transcript and Presenter's Notes

Title: Uncertainty and Sampling


1
Uncertainty and Sampling
  • Dr. Richard Young
  • Optronic Laboratories, Inc.

2
Introduction
  • Uncertainty budgets are a growing requirement of
    measurements.
  • Multiple measurements are generally required for
    estimates of uncertainty.
  • Multiple measurements can also decrease
    uncertainties in results.
  • How many measurement repeats are enough?

3
Random Data Simulation
Here is an example probability distribution
function of some hypothetical measurements.
We can use a random number generator with this
distribution to investigate the effects of
sampling.
4
Random Data Simulation
Here is a set of 10,000 data points
5
Random Data Simulation
Plotting Sample on a log scale is better to
show behaviour at small samples.
6
Random Data Simulation
There is a lot of variation, but how is this
affected by the data set?
7
Sample Mean
Here we have results for 200 data sets.
8
Sample Mean
9
Sample Standard Deviation
10
Sample Standard Deviation
The most probable value for the sample standard
deviation of 2 samples is zero! Many samples are
needed to make 10 most probable.
11
Cumulative Distribution
Sometimes it is best to look at the CDF.
The 50 level is where lower or higher values are
equally likely.
12
Uniform Distribution
What if the distribution was uniform instead of
normal?
The most probable value for gt2 samples is ? 10.
13
Uniform Distribution
Underestimated values are still more probable
because the PDF is asymmetric.
14
Uniform Distribution
  • Throwing a die is an example of a uniform random
    distribution.
  • A uniform distribution is not necessarily random
    however.
  • It may be cyclic e.g. temperature variations due
    to air conditioning.
  • With computer controlled acquisition, data
    collection is often at regular intervals.
  • This can give interactions between the cycle
    period and acquisition interval.

15
Cyclic Variations
For symmetric cycles, any multiple of two data
points per cycle will average to the average of
the cycle.
16
Cyclic Variations
Unless synchronized, data collection may begin at
any point (phase) within the cycle.
Correct averages are obtained when full cycles
are sampled, regardless of the phase.
17
Cyclic Variations
Again, whole cycles are needed to give good
values.
Standard Deviation
The value is not 10 because sample standard
deviation has a (n-1)0.5 term.
18
Cyclic Variations
The population standard deviation is 10 at each
complete cycle.
Each cycle contains all the data of the
population.
The standard deviation for full cycle averages
0.
19
Smoothing
  • Smoothing involves combining adjacent data points
    to create a smoother curve than the original.
  • A basic assumption is that data contains noise,
    but the calculation does NOT allow for
    uncertainty.
  • Smoothing should be used with caution.

20
Smoothing
What is the difference?
21
Savitzky-Golay Smoothing
Here is a spectrum of a white LED.
It is recorded at very short integration time to
make it deliberately noisy.
22
Savitzky-Golay Smoothing
A 25 point Savitzky-Golay smooth gives a line
through the center of the noise.
23
Savitzky-Golay Smoothing
The result of the smooth is very close to the
same device measured with optimum integration time
24
Spectral Sampling
But how does the number of data points affect
results?
Here we have 1024 data points.
25
Spectral Sampling
Now we have 512 data points.
26
Spectral Sampling
Now we have 256 data points.
27
Spectral Sampling
Now we have 128 data points.
28
Spectral Sampling
A 25 point smooth follows the broad peak but not
the narrower primary peak.
29
Spectral Sampling
To follow the primary peak we need to use a 7
point smooth
But it doesnt work so well on the broad peak.
30
Spectral Sampling
This is because some of the higher signal data
have been removed.
Comparing to the optimum scan, the intensity of
the primary peak is underestimated.
31
Spectral Sampling
Beware of under-sampling peaks you may
underestimate or overestimate intensities.
32
Exponential Smoothing
Here is the original data again.
What about other types of smoothing?
33
Exponential Smoothing
An exponential smooth shifts the peak.
Beware of asymmetric algorithms!
34
Sampling Without Noise
This is the optimum integration scan but with 128
points like the noisy example.
With lower noise, can we describe curves with
fewer points?
35
Sampling Without Noise
64 points.
36
Sampling Without Noise
32 points.
Is this enough to describe the peak?
37
Interpolation
  • Interpolation is the process of estimating data
    between given points.
  • National Laboratories often provide data that
    requires interpolation to be useful.
  • Interpolation algorithms generally estimate a
    smooth curve.

38
Interpolation
  • There are many forms of interpolation
  • LeGrange, B-spline, Bezier, Hermite, Cardinal
    spline, cubic, etc.
  • They all have one thing in common
  • They go through each given point and hence ignore
    uncertainty completely.
  • Generally, interpolation algorithms are local in
    nature and commonly use just 4 points.

39
Interpolation
The interesting thing about interpolating data
containing random noise is you never know what
you will get.
40
Interpolation
Uneven sampling can cause overshoots.
The Excel curve can even double back.
41
Combining a Smoothand Interpolation
  • If a spectrum can be represented by a function,
    e.g. polynomial, the closest fit to the data
    can provide smoothing and give the values between
    points.
  • The fit is achieved by changing the
    coefficients of the function until it is closest
    to the data.
  • A least-squares fit.

42
Combining a Smoothand Interpolation
  • The square of the differences between values
    predicted by the function, and those given by the
    data are added to give a goodness of fit
    measure.
  • Coefficients are changed until the goodness of
    fit is minimized.
  • Excel has a regression facility that performs
    this calculation.

43
Combining a Smoothand Interpolation
  • Theoretically, any simple smoothly varying curve
    can be fitted by a polynomial.
  • Sometimes it is better to extract the data you
    want to fit by some reversible calculation.
  • This means you can use, say, 9th order
    polynomials instead of 123rd order to make the
    calculations easier.

44
Polynomial Fitting
NIST provide data at uneven intervals.
To use the data, we have to interpolate to
intervals required by our measurements.
45
Method 1
NIST recommend to fit a high-order polynomial to
data values multiplied by l5/exp(ab/l) for
interpolation.
The result looks good, but
46
Method 1
...on a log scale, the match is very poor at
lower values.
47
Method 1
When converted back to the original scale, lower
values bear no relation to the data.
48
What went wrong?
  • The goodness of fit parameter is a measure of
    absolute differences, not relative differences.
  • NIST use a weighting of 1/E2 to give relative
    differences, and hence closer matching, but that
    is not easy in Excel.
  • Large values tend to dominate smaller ones in the
    calculation.
  • A large dynamic range of values should be
    avoided.
  • We are trying to match data over 4 decades!

49
How do NIST deal with it?
  • Although NISTs 1/E2 weighting gives closer
    matches than this data, to get best results they
    split the data into 2 regions and calculate
    separate polynomials for each.
  • This a reasonable thing to do but can lead to
    local data effects and arbitrary splits that do
    not fit all examples.
  • Is there an alternative?

50
Alternative Method 1
A plot of the log of El5 values vs. l-1 is a
gentle curve almost a straight line.
We can calculate a polynomial without splitting
the data.
The fact that we are fitting a log scale means we
are effectively using relative differences in the
least squares calculation.
51
Method 2
Incandescent lamp emission is close to that of a
blackbody.
52
Method 2
If we calculate a scaled blackbody curve as we
would to get the distribution temperature
and then divide the data by the blackbody...
53
Method 2
...we get a smooth curve with very little dynamic
range.
The fit is not good because of the high initial
slope and almost linear falling slope.
54
Method 2
Plotting vs. l-1, as in alternative method 1,
allows close fitting of the polynomial.
55
Comparing results
Method 2 shows lower residuals, but there is not
much difference.
56
Comparing results
All methods discussed give essentially the same
result when converted back to the original scale.
57
Algorithms and Uncertainty
  • None of the algorithms mentioned allow for
    uncertainty or assume it is constant.
  • If we replaced the least-squares goodness of
    fit parameter with most probable, this would
    use the uncertainty we know is there to determine
    the best fit.
  • Why is this not done?
  • Difficult in Excel.
  • Easy with custom programs.

58
Algorithms and Uncertainty
From the data value (mean) and the standard
deviation, we can calculate the PDF.
59
Algorithms and Uncertainty
  • Multiply the probabilities at each point to give
    the goodness of fit parameter.
  • Use this parameter instead of the least-squares
    in the fit calculations.
  • MAXIMIZE the goodness of fit parameter to
    obtain the best fit.
  • The fit will be closest where uncertainties are
    lowest.

60
Conclusions
  • Standard deviations may be under-estimated with
    small samples.
  • Cyclic variations should be integrated for
    complete cycle periods.
  • Smoothing and interpolation should be used with
    caution
  • Do not assume results are valid check.

61
Conclusions
  • Polynomial fits can give good results, but
  • Avoid large dynamic range
  • Avoid complex curvatures
  • Avoid high initial slopes
  • All these manipulations ignore uncertainty (or
    assume it is constant).
  • But least-squares fits can be replaced by maximum
    probability to take uncertainty into
    consideration.
Write a Comment
User Comments (0)
About PowerShow.com