Lecture 8 The Principle of Maximum Likelihood - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 8 The Principle of Maximum Likelihood

Description:

Lecture 8 The Principle of Maximum Likelihood – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 56
Provided by: BillM223
Category:

less

Transcript and Presenter's Notes

Title: Lecture 8 The Principle of Maximum Likelihood


1
Lecture 8 The Principle of Maximum Likelihood
2
Syllabus
Lecture 01 Describing Inverse ProblemsLecture
02 Probability and Measurement Error, Part
1Lecture 03 Probability and Measurement Error,
Part 2 Lecture 04 The L2 Norm and Simple Least
SquaresLecture 05 A Priori Information and
Weighted Least SquaredLecture 06 Resolution and
Generalized Inverses Lecture 07 Backus-Gilbert
Inverse and the Trade Off of Resolution and
VarianceLecture 08 The Principle of Maximum
LikelihoodLecture 09 Inexact TheoriesLecture
10 Nonuniqueness and Localized AveragesLecture
11 Vector Spaces and Singular Value
Decomposition Lecture 12 Equality and Inequality
ConstraintsLecture 13 L1 , L8 Norm Problems and
Linear ProgrammingLecture 14 Nonlinear
Problems Grid and Monte Carlo Searches Lecture
15 Nonlinear Problems Newtons Method Lecture
16 Nonlinear Problems Simulated Annealing and
Bootstrap Confidence Intervals Lecture
17 Factor AnalysisLecture 18 Varimax Factors,
Empircal Orthogonal FunctionsLecture
19 Backus-Gilbert Theory for Continuous
Problems Radons ProblemLecture 20 Linear
Operators and Their AdjointsLecture 21 Fréchet
DerivativesLecture 22 Exemplary Inverse
Problems, incl. Filter DesignLecture 23
Exemplary Inverse Problems, incl. Earthquake
LocationLecture 24 Exemplary Inverse Problems,
incl. Vibrational Problems
3
Purpose of the Lecture
Introduce the spaces of all possible data, all
possible models and the idea of likelihood Use
maximization of likelihood as a guiding principle
for solving inverse problems
4
Part 1The spaces of all possible data,all
possible models and the idea of likelihood
5
viewpoint
  • the observed data is one point in the space of
    all possible observations
  • or
  • dobs is a point in S(d)

6
plot of dobs
7
plot of dobs
dobs
8
now suppose
  • the data are independent
  • each is drawn from a Gaussian distribution
  • with the same mean m1 and variance s2
  • (but m1 and s unknown)

9
plot of p(d)
10
plot of p(d)
cloud centered on d1d2d3 with radius
proportional to s
11
now interpret
  • p(dobs)
  • as the probability that the observed data was in
    fact observed

L log p(dobs) called the likelihood
12
find parameters in the distribution
  • maximize
  • p(dobs)
  • with respect to m1 and s

maximize the probability that the observed
data were in fact observed the Principle of
Maximum Likelihood
13
Example
14
solving the two equations
15
solving the two equations
usual formula for the sample mean
almost the usual formula for the sample standard
deviation
16
these two estimates linked to the assumption of
the data being Gaussian-distributed might
get a different formula for a different p.d.f.
17
example of a likelihood surface
18
likelihood maximization process will fail if
p.d.f. has no well-defined peak
19
Part 2Using the maximization of likelihood as
a guiding principle for solving inverse problems
20
linear inverse problem for with
Gaussian-distibuted datawith known covariance
cov dassumeGmdgives the mean d
T
21
principle of maximum likelihoodmaximize L log
p(dobs)minimize
T
with respect to m
22
principle of maximum likelihoodmaximize L log
p(dobs)minimize
T
E
This is just weighted least squares
23
principle of maximum likelihoodwhen data
Gaussian-distributedsolve Gmd with weighted
least squareswith weighting of
24
special case of uncorrelated dataeach datum with
a different variancecov dii sdi2minimize
25
special case of uncorrelated dataeach datum with
a different variancecov dii sdi2minimize
errors weighted by their certainty
26
but what about a priori information?
27
probabilistic representation of a priori
information
  • probability that the model parameters are
  • near m
  • given by p.d.f.
  • pA(m)

28
probabilistic representation of a priori
information
  • probability that the model parameters are
  • near m
  • given by p.d.f.
  • pA(m)

centered at a priori value ltmgt
29
probabilistic representation of a priori
information
  • probability that the model parameters are
  • near m
  • given by p.d.f.
  • pA(m)

variance reflects uncertainty in a priori
information
30
uncertain
certain
ltm2gt
ltm2gt
m2
m2
ltm1gt
ltm1gt
m1
m1
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
assessing the information contentin pA(m)
  • Do we know a little about m
  • or
  • a lot about m ?

35
Information Gain, S
-S called Relative Entropy,
36
Relative Entropy, Salso called Information Gain
null p.d.f. state of no knowledge
37
Relative Entropy, Salso called Information Gain
uniform p.d.f. might work for this
38
probabilistic representation of data
  • probability that the data are
  • near d
  • given by p.d.f.
  • pA(d)

39
probabilistic representation of data
  • probability that the data are
  • near d
  • given by p.d.f.
  • p(d)

centered at observed data dobs
40
probabilistic representation of data
  • probability that the data are
  • near d
  • given by p.d.f.
  • p(d)

variance reflects uncertainty in measurements
41
probabilistic representation of both prior
information and observed data
  • assume observations and a priori information are
    uncorrelated

42
Example of
43
the theoryd g(m)is a surface in the combined
space of data and model parameterson which the
estimated model parameters and predicted data
must lie
44
the theoryd g(m)is a surface in the combined
space of data and model parameterson which the
estimated model parameters and predicted data
must liefor a linear theorythe surface is
planar
45
the principle of maximum likelihood says
  • maximize

on the surface dg(m)
46
(A)
47
(No Transcript)
48
(No Transcript)
49
minimize
principle of maximum likelihoodwithGaussian-dist
ributed dataGaussian-distributed a priori
information
50
this is just weighted least squareswith
so we already know the solution
51
solve Fmf with simple least squares
52
when cov dsd2I and cov msm2I
53
this provides and answer to the questionWhat
should be the value of e2in damped least
squares?The answer
it should be set to the ratio of variances of the
data and the a priori model parameters
54
if the a priori information isHmhwith
covariance cov hAthen the Fmf becomes
55
Gmdobs with covariance cov dHmh with
covariance cov hAmest (FTF)-1FTdobs
the most useful formula in inverse theory
with
Write a Comment
User Comments (0)
About PowerShow.com