Outline PowerPoint PPT Presentation

presentation player overlay
1 / 27
About This Presentation
Transcript and Presenter's Notes

Title: Outline


1
Outline
  • Parameter estimation continued
  • Non-parametric methods

2
Maximum-Likelihood Estimation
  • Assumptions
  • We separate a collection of samples according to
    class
  • D1, D2, ....., Dc
  • Samples in Dj are drawn independently according
    to the probability p(xwj)
  • We assume that p(xwj) has a known parametric
    form and is uniquely determined by the value of a
    parameter vector ?j
  • To simplify further, we assume that samples in Di
    give no information about ?j if i ? j

3
Maximum-Likelihood Estimation cont.
  • Suppose that D contains n samples
  • x1, ....., xn
  • By assumption that samples were drawn
    independently, we have
  • The maximum-likelihood estimate of ? is the value
    of ? that maximizes p(D ?)

4
Bayesian Estimation
  • Assumptions
  • The form of the density p(xq) is assumed to be
    known, but the value of the parameter vector q is
    not known exactly
  • Our initial knowledge about q is assumed to be
    contained in a known prior density p(q)
  • The rest of our knowledge about q is contained in
    a set D of n samples x1, ....., xn drawn
    independently according to the unknown
    probability density p(x)

5
Bayesian Estimation cont.
  • General theory
  • The basic problem is to compute the posterior
    density p(qD)
  • By Bayes formula we have
  • By the independence assumption

6
Bayesian Estimation cont.
  • Gaussian case
  • The univariate case p(mD)

7
Bayesian Estimation cont.
  • Gaussian case continued
  • The univariate case p(xD)
  • The multivariate case

8
Non-parametric Methods
  • In maximum-likelihood and Bayesian estimation
  • The forms of the probability densities are
    assumed to be known
  • However, the assumed forms rarely fit the
    densities in practice
  • In particular, all of the classical parametric
    densities are uni-modal

9
A Multimodal Density
10
Solutions
  • More complicated parametric models
  • Mixture of Gaussians
  • More general, some basis functions to describe a
    probability density
  • Learning is intrinsically more difficult when we
    have more parameters
  • Non-parametric methods

11
Non-parametric Methods
  • Most of the non-parametric density estimation
    methods are based on the following fact
  • The probability P that a vector x will fall in a
    region R is given by

12
Non-parametric Methods cont.
  • For n smaples x1, ....., xn that are drawn
    independently according to p(x), the probability
    that k of n will be in R is given by

V is the volume of R
13
Non-parametric Methods cont.
14
Non-parametric Methods cont.
  • Problems to be addressed
  • If we fix the volume V and have more samples, the
    ratio k/n will converge as desired
  • Averaged version of p(x)
  • How to estimate p(x)?
  • Let V approach zero?

15
Parzen Windows
  • Parzen windows
  • We use a window function for interpolation, each
    sample contributing to the estimate in accordance
    with its distance from x
  • Here hn is a parameter

16
Parzen Windows - cont.
  • Choice of hn
  • Too large, the spatial resolution is low
  • Too small, the estimate will have a large variance

17
Parzen Windows - cont.
  • Properties
  • Convergence of mean
  • As n approaches infinity, the estimate will also
    approach p(x) if p(x) is continuous
  • Smaller Vn is better
  • Convergence of variance
  • A smaller variance needs a larger Vn

18
Parzen Windows - cont.
19
Parzen Windows - cont.
20
Parzen Windows - cont.
21
Parzen Windows - cont.
22
Kn-Nearest-Neighbor Estimation
  • Let the cell volume be a function of the training
    data
  • To estimate p(x) from n samples, we can center a
    cell about x and let it grow until it captures kn
    samples

23
Kn-Nearest-Neighbor Estimation cont.
24
Kn-Nearest-Neighbor Estimation cont.
25
Kn-Nearest-Neighbor Estimation cont.
26
The Nearest-Neighbor Rule
  • The nearest-neighbor rule
  • Let Dnx1, ...., xn denote a set of n labeled
    prototypes
  • Suppose that x' be the prototype nearest to a
    test point x
  • We classify x to the class associated with x'

27
The Nearest-Neighbor Rule cont.
Write a Comment
User Comments (0)
About PowerShow.com