4. Maximum Likelihood

About This Presentation

Title:

4. Maximum Likelihood

Description:

Title: Slide 1 Author: Statistics Administrator Last modified by: aa Created Date: 1/3/2003 11:26:30 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 12

Provided by: Statis4

Category:

more less

Transcript and Presenter's Notes

Title: 4. Maximum Likelihood

1
4. Maximum Likelihood

Prof. A.L. Yuille
Stat 231. Fall 2004.

2
Learning Probability Distributions.

Learn the likelihood functions and priors from
datasets.
Two Main Strategies. Parametric and
Non-Parametric.
This Lecture and the next will concentrate on
Parametric methods.
(This assumes a parametric form for the
distributions).

3
Maximum Likelihood Estimation.

Assume distribution is of form
Independent Identically Distributed (I.I.D.)
samples
Choose

4
Supervised versus Unsupervised Learning.

Supervised Learning assumes that we known the
class label for each datapoint.
I.e. We are given pairs
where is the datapoint and
is the class label.
Unsupervised Learning does not assume that the
class labels are specified. This is a harder
task.
But unsupervised methods can also be used for
supervised data if the goal is to determine
structure in the data (e.g. mixture of
Gaussians).
Stat 231 is almost entirely concerned with
supervised learning.

5
Example of MLE.

One-Dimensional Gaussian Distribution
Solve for
by differentiation

6
MLE

The Gaussian is unusual because the parameters
of the distribution can be expressed as an
analytic expression of the data.
More usually, algorithms are required.
Modeling problem for complicated patterns
shape of fish, natural language, etc. it
requires considerable work to find a suitable
parametric form for the probability
distributions.

7
MLE and Kullback-Leibler

What happens if the data is not generated by the
model that we assume?
Suppose the true distribution is
and our models are of form
The Kullback-Leiber divergence is
This is
K-L is a measure of the difference between

8
MLE and Kullback-Leibler

Samples
Approximate
By the empirical KL
Minimizing the empirical KL is equivalent to MLE.
We find the distribution of form

9
MLE example
We denote the log-likelihood as a function of q
q is computed by solving equations
For example, the Gaussian family gives close form
solution.
10
Learning with a Prior.