Regression and Calibration - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Regression and Calibration

Description:

... for human identification, post-mortem interval, time since discharge ... Values for post-mortem interval (PMI) and vitreous potassium ion concentration ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 26
Provided by: osirisSun
Category:

less

Transcript and Presenter's Notes

Title: Regression and Calibration


1
Regression and Calibration
  • Forensic Statistics CIS205

2
Introduction
  • More on the nature of the relationship between
    covariates which are continuous.
  • This is useful because many variables are
    impossible to measure directly in a forensic
    context age at death for human identification,
    post-mortem interval, time since discharge for a
    firearm.
  • However, changes occur which are covariates of
    these immeasurables, e.g. root dentine
    translucency, concentration of potassium in the
    vitreous humour, chemical residues in the barrels
    and chambers of firearms.
  • From measurements of these variables and exact
    knowledge of the relationships with their
    immeasurable covariables, an estimate of the
    immeasurable covariate can be made.
  • This process is called calibration.

3
Values for post-mortem interval (PMI) and
vitreous potassium ion concentration K for a
sample of 8 cadavers (Munoz et al., 2001)
4
Linear Models
  • Figure 7.1 is a scatterplot of the data in Table
    7.1
  • From Figure 7.1 it is easy to imagine a single
    straight line going through the cloud of x,y
    points.
  • This line would represent a linear model of PMI
    and K. Such a line is marked as the straight
    line on Figure 7.2.

5
(No Transcript)
6
(No Transcript)
7
Parameters
  • Any linear model which describes the relationship
    between two covariates may be described by two
    parameters, the values of which are termed
    coefficients.
  • The first is the gradient or slope, the second is
    the intercept with the y axis.
  • The slope is dx /dy (the change in y divided by
    the change in x). The gradient of a simple linear
    model is conventionally denoted b.
  • In Figure 7.2 the gradient is (9.35 7.07) / (20
    7) 2.28 / 13 0.17.
  • Thus for an increase of 1h in PMI there is an
    increase of 0.17 mmol/l in vitreous K. This
    can be turned round to say that for every
    increase in 1mmol/l in K there is an increase
    of 1/0.17 5.88h in PMI.
  • a is the value of K when PMI 0. From Figure
    7.2 this is 5.85.
  • A general form for a linear model is y a bx.
  • E.g. K 5.85 (0.17 PMI).

8
Calculation of a linear regression model
  • How are a and b calculated?
  • Usually in linear regression we calculate a best
    fit model which minimises the sum of squared
    errors in either the x or y direction
  • These errors are called the residuals, and are
    the difference between the model and the true
    data.
  • Figure 7.3 is a detail of three of the points
    from Figure 7.2 showing these residuals. The
    objective of least squares regression is to
    select the model which minimises dy1² dy2²
    dy3²

9
(No Transcript)
10
Estimating a and b
  • Without going into mathematical detail, the
    estimate of the gradient b is b Sxy / Sxx,
  • Where Sxx S(x mean x)²,
  • Sxy S(x mean x)(y mean y)
  • Sxx is related to variance, while Sxy is the
    covariance between x and y.
  • An estimate of the intercept a is given by
  • a mean y b(mean x)

11
Table 7.2 Calculations for the regression y
(K) fitted to x (PMI)
12
Calculation of a and b continued
  • Sxx S(x mean x)² 371.73
  • Sxy S(x mean x)(y mean y) 64.89
  • b Sxy / Sxx 64.89 / 371.73 0.1745
  • a mean y b(mean x) 7.47 (0.1745 9.28)
    5.85

13
Testing goodness of fit
  • One of the first things we may wish to know about
    our model is whether it is a good fit to the
    data. Measures of this type are known as
    goodness of fit statistics.
  • A suitable test statistic is as follows, where df
    n 2, y is the estimate of y from the model

14
Calculations for the goodness of fit statistic
for the regression y (K) fitted to x (PMI)
15
Calculation of F continued
  • F 11.33 / ( 1/6 4.92) 13.82
  • See appendix F, the F-distribution, df n 2
    6
  • For 6df at 5 significance, F 5.99. The
    calculated value of F is 13.82 which is greater
    than 5.99, so we act as though the model is an
    adequate fit at the 5 level of significance.
  • Other assumptions include that the residuals must
    be normally distributed.

16
Testing coefficients a and b
17
Estimated Standard Errors
  • Using the equations on the previous slide,
  • s 0.9
  • ESE(b) 0.047
  • ESE(a) 0.54
  • A confidence interval for both a and b is found
    using the t-distribution with n-2 df, so the 99
    confidence level occurs at 3.707 standard
    errors.
  • This means the confidence level for b 0.1745
    (3.707 0.047) 0.0002 ? 0.3487. The lower
    limit is very close to 0, so it might be wise not
    to reject the null hypothesis that the gradient
    of the model is 0, and that there is no relation
    between K and PMI.
  • The 99 confidence level for a 5.85(3.707
    0.54) 3.85 ? 7.85. This interval does not
    contain 0, so can be regarded as evidence for the
    hypothesis that the intercept is non-zero.

18
Calibration
  • We have now established that there is good
    evidence to suggest that there is a linear
    relationship between K and PMI, and we know
    the parameters for the relationship.
  • In this case PMI is not directly observable, but
    a measurement of K is possible. From the
    regression model it should be possible to make an
    estimate of PMI from the measurement of K.
    This is calibration.
  • From our equation K (0.17 PMI) 5.85, we
    derive PMI (K 5.85) / 0.17.
  • Check this by drawing a residual plot of (PMI
    estimates of PMI) vs. PMI.
  • We will see later how to calculate the standard
    errors of the estimates.

19
(No Transcript)
20
Table 7.5. Calculation of estimated PMI values
and standard errors
21
An approximation of standard error for the point
value (y0)
22
Interpretation of Table 7.5
  • The final column Table 7.5 gives the calculation
    on the previous slide for all 8 points.
  • To arrive at a suitable confidence interval the
    standard error of the estimate has to be
    multiplied by the appropriate value from the
    t-distribution for n-2 degrees of freedom, i.e.
    x0 x0 t x Sx0
  • From appendix C at 95 confidence and at n 2
    6 df, the value of t is 2.447.
  • The standard error for the first value in Table
    7.5 is 5.91 and the point estimate for PMI is
    16.87, so a 95 confidence interval is 16.87
    (5.91 x 2.447) 16.87 14.46 2.41 ? 31.33.

23
Points to remember
  • Avoid using overly complex models use linear
    modelling unless background theory suggests a
    non-linear model, or unless a linear model does
    not meet goodness of fit criteria.
  • If covariates really are related in a non-linear
    way, then usually a simple transformation such as
    taking logs of one or both of the variables will
    produce an adequate linear fit.
  • Plot the independent variable (e.g. PMI) on the x
    axis, and the dependent variable (e.g. K) on
    the y axis. This is because of the notion of
    causation. It is PMI which causes K, but
    there is no way K could cause PMI. Minimise
    the residuals in the y direction (PTO).
  • Finally, always plot covariates and residuals.
    Good eyeball statistics are more effective and
    give a better understanding of the relationships
    between variables than poorly understood and
    inappropriate tests.

24
(No Transcript)
25
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com