Ch' 3' MaximumLikelihood and Bayesian Parameter Estimation - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Ch' 3' MaximumLikelihood and Bayesian Parameter Estimation

Description:

In a typical case, we merely have some vague, general knowledge ... Maximum A Posteriori (MAP) Estimation. Posterior density p( |D): p( |D) p(D| )p( )= l( )p ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 41
Provided by: bmeYon
Category:

less

Transcript and Presenter's Notes

Title: Ch' 3' MaximumLikelihood and Bayesian Parameter Estimation


1
Ch. 3. Maximum-Likelihood and Bayesian Parameter
Estimation
2
Introduction
  • If we knew the prior probabilities P(?i) and the
    class-conditional densities p(x ?i), we could
    design an optimal classifier
  • Unfortunately, we rarely have this kind of
    complete knowledge about the probabilistic
    structure of the problem
  • In a typical case, we merely have some vague,
    general knowledge about situation, with a number
    of design samples or training data particular
    representatives of patterns

3
Introduction
  • Problem find some way to use this information to
    design or train the classifier
  • An approach to this problem is to use the
    samples to estimate the unknown probabilities and
    probability densities
  • And to use the result estimates as if they were
    the true values

4
Introduction Maximum Likelihood Parameter
Estimation
  • The ML approach assumes the parameter are fixed,
    but unknown
  • The ML approach is to estimate the best parameter
    that maximize the probability of obtaining the
    (given) training set
  • The ML approach seeks parameter estimates that
    maximize likelihood function

5
Introduction Bayesian Estimation
  • Bayesian approach models the parameters to be
    estimated as random variables with some (assumed)
    known a priori distribution
  • Bayesian approach uses the training set to update
    the training-set-conditioned density function of
    the unknown parameters

6
Maximum Likelihood Estimation
7
Formulation of ML estimation
  • ML estimation assumes the parameters to be
    estimated are unknown, but constant.
  • ML formulation assumes
  • We have a training set D in the form of c subsets
    of the samples or feature vectors D1,D2,,Dc
  • Samples in Di are assumed to be generated by the
    underlying density function for class i, p(x?i).
    i.e. It is assumed the parametric form of p(x?i)
    is known
  • Parameter vector ?i is the set of parameter to
    be estimated for class i
  • In the Gaussian case where x N(mi,Ci), the
    components of ?i are the elements of mi and Ci

8
Use of the training set, ML
  • Use of the Training Set
  • We consider the training of each class
    separately.
  • Samples in Di do not give information about ?j,
    j?i
  • i.e. It is assumed that the parameters for the
    different classes are functionally independent
  • Use a set Di of training samples drawn
    independently for the probability density p(x?i)
    to estimate the unknown parameter vector ?i

9
The likelihood function
  • Suppose Dix1,x2,,xn
  • If the xk within Di are assumed independent, the
    joint parameter-conditional pdf of Di is
  • where p(Di?i) is the likelihood function of ?i
  • Maximum-likelihood estimate of ?i Given Di, the
    objective of ML is to find ?i that maximizes
    p(Di?i)
  • i.e. find ?i to maximize the likelihood of ?i
  • The goal is to maximize p(Di?i) with respect to
    parameter vector ?i

10
Maximum-likelihood estimation
  • Given Di, the objective of ML estimation is to
    find ?i, that maximizes p(Di?i), i.e. find ?i to
    maximize the likelihood of ?i.
  • The goal is to maximize p(Di ?i) with respect to
    parameter vector ?i

11
ML Estimation Example - 1D Gaussian
12
ML Estimation
13
ML Estimation, log-likelihood
14
ML estimation Example Gaussian with unknown m,
known C
15
ML estimation Example Gaussian with unknown m,
unknown C
16
ML estimation example, 2D
17
ML estimation example, 3D
18
Maximum A Posteriori (MAP) Estimation
  • Posterior density p(?D) p(?D)?p(D ?)p(?)
    l(?)p(?)
  • MAP estimation Find the value of ? that
    maximizes
  • l(?)p(?)p(D ?)p(?)
  • Maximum likelihood estimator a MAP estimator for
    the uniform (flat) prior
  • MAP estimator finds the peak (mode) of a
    posterior density
  • Generally speaking, information on p(?) is
    derived from the designers knowledge of the
    problem domain (beyond our study of the
    classifier design)

19
Bayesian Parameter Estimation
20
Bayesian Parameter Estimation
  • Although the desired probability density p(x) is
    unknown, we assume that it has a known parameter
    form
  • ?The function p(x?) is completely known
  • The only thing assumed unknown is the value of
    parameter vector ?
  • Any information about ? is assumed to be
    contained in a density p(?)
  • Observation of the samples D converts this
    density p(?) to a posterior density p(?D), which
    is sharply peaked about the true value of ?

21
Bayesian Parameter Estimation
22
Class-conditional Densities
23
Basic Assumptions of Bayesian Parameter Estimation
  • The basic assumptions are summarized as
  • The form of the density p(x?) is assumed to be
    known, but the value of the parameter vector ? is
    not known exactly
  • The initial knowledge about ? is assumed to be
    contained in a known a priori density p(?)
  • The rest of knowledge about ? is contained in a
    set D of n samples x1,,xn drawn independently
    according to the unknown probability density p(x)

24
The Parameter Distribution
  • The important problem is to compute the posterior
    density p(?D), because we can calculate p(xD)
    as follows
  • If p(?D) is sharply peaked about some value ?0,
    then we obtain p(xD)p(x?0)
  • We are less certain about the exact value of ?
    (general case), we use above equation

25
Example for Gaussian density with unknown mean
vector
  • Problem Calculate a posterior density p(?D)
    and the desired pdf p(xD) for p(xm)N(m,C)

26
Example for Gaussian density with unknown mean
vector
27
Example for Gaussian density with unknown mean
vector
28
Example for Gaussian density with unknown mean
vector
29
Example for Gaussian density with unknown mean
vector
30
Estimation of p(mD)
As the training sample size increases, p(mD)
becomes more sharply-peaked
31
The Univariate Gaussian Case p(xD)
32
Bayesian Parameter Estimation General theory
The basic problem is to compute the posterior
density p(?D), because from this we can
calculate p(xD)
  • By Bayes formula
  • By independence assumption

33
Bayesian Parameter Estimation
34
Bayesian Parameter Estimation
35
Questions remain
  • Difficulty of carrying out these computation?
  • Convergence of p(xD) to p(x)?

36
When ML and Bayesian Methods Differs
  • ML is easier since they require merely
    differential calculus techniques or gradient
    search for ? (cf. complex multidimensional
    integration needed for Bayesian estimation)
  • Bayesian method is more accurate

37
Bayesian Parameter Estimation
  • The probabilities p(?ix), i1,2,,c are needed
    for classification
  • The objective is to form the posterior
    probabilities p(?ix,Di) for given training set
    Di
  • A application of Bayes rule yields

38
Estimation of p(?ix,Di)
  • The estimation of the posterior probabilities
    p(?ix,Di) needs computation of a priori
    probability p(?iDi) and the density function
    p(x ?i,Di)
  • Assumption for simplication
  • 1. The probability p(?iDi) is independent of a
    training set Di. i.e.,
  • 2. The probability p(?i), i1,2,,c is known
  • 3. A training set Di have information about only
    the parameter of class ?i
  • 4. The functional form of p(xDi) is known

39
Estimation of p(?ix,Di)
  • Then, p(?ix,Di) can be computed from p(xDi)
  • Therefore, the problem is to estimate a random
    vector of parameters ?i for probability p(xDi)

40
Estimation Equation
p(x?i,Di)p(x?i)
Write a Comment
User Comments (0)
About PowerShow.com