Nonlinear Regression - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Nonlinear Regression

Description:

Calculate the hessian matrix. matrix of partial second derivatives ... such that the hessian matrix is. are the eigenvalues of the hessian matrix. When for all , ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 73
Provided by: CONCO6
Category:

less

Transcript and Presenter's Notes

Title: Nonlinear Regression


1
Nonlinear Regression
  • Didier Concordet

2
An example
3
Questions
  • What does nonlinear mean ?
  • What is a nonlinear kinetics ?
  • What is a nonlinear statistical model ?
  • For a given model, how to fit the data ?
  • Is this model relevant ?

4
What does nonlinear mean ?
  • Definition An operator (P) is linear if
  • for all objects x, y on which it operates
  • P(xy) P (x) P(y)
  • for all numbers a and all objects x
  • P (ax) a P(x)

When an operator is not linear, it is nonlinear
5
Examples
Among the operators below which one are nonlinear
?
  • P(a,b) a ? t b ? t²
  • P(A,a) A exp (- a t)
  • P(A) A exp (- 0.1 t)
  • P(t) A exp (- a t)
  • P (t) a ? t
  • P(t) a
  • P(t) a b? t
  • P(t) a ? t b ? t²

6
What is a nonlinear kinetics ?
Concentration at time t, C(t,D)
For a given dose D
The kinetics is linear when the operator
is linear
When P(D) is not linear, the kinetics is nonlinear
7
What is a nonlinear kinetics ?
Examples
8
What is a nonlinear statistical model ?
A statistical model
Observation Dep. variable
Parameters
Covariates indep. variables
Error residual
function
9
What is a nonlinear statistical model ?
A statistical model is linear when the operator
is linear.
When
is not linear
the model is nonlinear
10
What is a nonlinear statistical model ?
Example
Y Concentration t time
The model
is linear
11
Examples
Among the statistical models below which one are
nonlinear ?
12
Questions
  • What does nonlinear mean ?
  • What is a nonlinear kinetics ?
  • What is a nonlinear statistical model ?
  • For a given model, how to fit the data ?
  • Is this model relevant ?

13
How to fit the data ?
Proceed in three main steps
  • Write a (statistical) model
  • Choose a criterion
  • Minimize the criterion

14
Write a (statistical) model
  • Find a function of covariate(s) to describe the
    mean variation of the dependent variable (mean
    model).
  • Find a function of covariate(s) to describe the
    dispersion of the dependent variable about the
    mean (variance model).

15
Example
is assumed gaussian with a constant variance
homoscedastic model
16
How to choose the criterion to optimize ?
Homoscedasticity Ordinary Least Squares (OLS)
When normality OLS are equivalent to maximum
likelihood
Heteroscedasticity Weight Least Squares
(WLS) Extended Least Squares (ELS)
17
Homoscedastic models
The Ordinary Least-Squares criterion
Define
18
Heteroscedastic models Weight Least-Squares
criterion
Define
19
How to choose the weights ?
When the model
is heteroscedastic (ie is not constant
with i)
It is possible to rewrite it as
where does not depend on i
The weights are chosen as
20
Example
with
The model can be rewritten as
with
The weights are chosen as
21
Extended (Weight) Least Squares
Define
22
Balance sheet
23
The criterion properties
It converges It leads to consistent (unbiased)
estimates It leads to efficient estimates
It has several minima
24
It converges
When the sample size increases, it concentrates
about a value of the parameter
Example Consider the homoscedastic model
The criterion to use is the Least Squares
criterion
25
It converges
Small sample size
Large sample size
26
It leads to consistent estimates
The criterion concentrates about the true value
27
It leads to efficient estimates
For a fixed n, the variance of an consistent
estimator is always greater than a limit
(Cramer-Rao lower bound).
For a fixed n, the "precision" of a consistent
estimator is bounded
An estimator is efficient when its
variance equals this lower bound
28
Geometric interpretation
This ellipsoid is a confidence region of the
parameter
29
It leads to efficient estimates
For a given large n, it does not exist a
criterion giving consistent estimates more
"convex" than - 2 ln(likelihood)
- 2 ln(likelihood)
criterion
30
It has several minima
criterion
31
Minimize the criterion
Suppose that the criterion to optimize has been
chosen
We are looking for the value of
denoted
which achieve the minimum of the criterion.
We need an algorithm to minimize such a criterion
32
Example
Consider the homoscedastic model
We are looking for the value of
denoted
which achieve the minimumof the criterion
33
Isocontours
34
Different families of algorithms
  • Zero order algorithms computation of the
    criterion
  • First order algorithms computation of the
    first derivative of the criterion
  • Second order algorithms computation of the
    second derivative of the criterion

35
Zero order algorithms
  • Simplex algorithm
  • Grid search and Monte-Carlo methods

36
Simplex algorithm
37
Monte-carlo algorithm
38
First order algorithms
  • Line search algorithm
  • Conjugate gradient

39
First order algorithms
The derivatives of the criterion cancel at its
optima
Suppose that there is only one parameter to
estimate
The criterion (e.g. SS) depends only on
How to find the value(s) of where the
criterion cancels ?
40
Line search algorithm
Derivative of the criterion
1
0
q
2
41
Second order algorithms
Gauss-Newton (steepest descent method) Marquardt
42
Second order algorithms
The derivatives of the criterion cancel at its
optima. When the criterion is (locally) convex
there is a path to reach the minimum the
steepest direction.
43
Gauss Newton (one dimension)
Derivative of the criterion
3
2
1
The criterion is convex
44
Gauss Newton (one dimension)
Derivative of the criterion
0
q
1
2
The criterion is not convex
45
Gauss Newton
46
Marquardt
Allows to deal with the case where the criterion
is not convex
When the second derivative lt0 (first derivative
decreases) it is set to a positive value
Derivative of the criterion
0
q
1
2
3
47
Balance sheet
48
Questions
  • What does nonlinear mean ?
  • What is a nonlinear kinetics ?
  • What is a nonlinear statistical model ?
  • For a given model, how to fit the data ?
  • Is this model relevant ?

49
Is this model relevant ?
  • Graphical inspection of the residuals
  • mean model ( f )
  • variance model ( g )
  • Inspection of numerical results
  • variance-correlation matrix of the estimator
  • Akaike indice

50
Graphical inspection of the residuals
For the model
Calculate the weight residuals
and draw
vs
51
Check the mean model
scatterplot of weight residuals vs fitted values
0
0
structure in the residuals change the mean
model (f function)
No structure in the residuals OK
52
Check the variance model homoscedasticity
Scatterplot of weight residuals vs fitted values
0
0
No structure in the residuals but
heteroscedasticity change the model (g function)
homoscedasticity OK
53
Example
homoscedastic model
Criterion OLS
54
Example
structure in the residuals
change the mean model
New model
homoscedastic model
55
Example
heteroscedasticity
change the variance model
New model
Need WLS
56
Example
57
Inspection of numerical results
correlation matrix of the estimator
  • Strong correlations between estimators
  • the model is over-parametrized
  • the parametrization is not good
  • the model is not identifiable

58
The model is over-parametrized
Change the mean and/or variance model (f and/or
g )
Example The appropriate model is
and you fitted
Perform a test or check the AIC
59
The parametrization is not good
Change the parametrization of your model
Example you fitted
try
the parametric curvature the intrinsic curvature
Two useful indices
60
The model is not identifiable
The model has too many parameters compare to the
number of data there are lots of solutions to
the optimisation
Examples
Look at the eigenvalues of the correlation matrix
if
is too large and/or
too small, simplify the model
61
The Akaike indice
The Akaike indice allows to select a model among
several models in "competition".
The Akaike indice is nothing else but the
penalized log likelihood. That is, it chooses
the model which is the more likely.
The penality is chosen such that the indice is
convergent when the sample size increases, the
indice selects the "true" model.
n sample size, SS (Weight or Ordinary) SS p
number of parameters that have been estimated
The model with the smaller AIC is the best among
the compared models
62
Example
Iteration
Loss
63
Example
essentially intrinsic curvature
R
64
About the ellipsoid
It is linked to the convexity of the criterion It
is linked to the variance of the estimator
The convexity of the criterion is linked to the
variance of the estimator
65
Different degres of convexity
flat criterion weakly convex
convex criterion
locally convex
convex in some directions
locally convex
66
How to measure convexity ?
When the second derivative is positive, the
criterion is convex at the point where the
second derivative is evaluated
One parameter
Calculate the hessian matrix matrix of partial
second derivatives
Several parameters
67
How to measure convexity ?
It is possible to find a linear transformation of
the parameters such that the hessian matrix is
are the eigenvalues of the hessian matrix
the criterion is convex
68
How to measure convexity ?
When for some ,
and
the criterion is locally convex
When
are low (but gt0),
and
the criterion is flat
69
The variance-covariance matrix
The variance-covariance matrix of the estimator
(denoted V) is proportional to
It is possible to find a linear transformation
of the parameters such that V is
70
The variance-covariance matrix
are the eigenvalues of the variance-covariance
matrix V
71
The correlation matrix
The correlation matrix of the estimator (denoted
C ) is obtained from V
correlation matrix
72
Geometric interpretation
criterion
Axes of the ellipsoid // axes
r 0
Write a Comment
User Comments (0)
About PowerShow.com