Pattern Classification All materials in these slides were taken from Pattern Classification 2nd ed b - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Pattern Classification All materials in these slides were taken from Pattern Classification 2nd ed b

Description:

Pattern Classification. All materials in these s were taken from ... There are two different ways of obtaining sequences of regions that satisfy these conditions: ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 25
Provided by: djamelbo
Category:

less

Transcript and Presenter's Notes

Title: Pattern Classification All materials in these slides were taken from Pattern Classification 2nd ed b


1
Pattern ClassificationAll materials in these
slides were taken from Pattern Classification
(2nd ed) by R. O. Duda, P. E. Hart and D. G.
Stork, John Wiley Sons, 2000 with the
permission of the authors and the publisher
2
Chapter 4 (Part 1)Non-Parametric Classification
(Sections 4.1-4.3)
  • Introduction
  • Density Estimation
  • Parzen Windows

3
4.1 Introduction
  • All Parametric densities are unimodal (have a
    single local maximum), whereas many practical
    problems involve multi-modal densities
  • Nonparametric procedures can be used with
    arbitrary distributions and without the
    assumption that the forms of the underlying
    densities are known
  • There are two types of nonparametric methods
  • Estimating P(x ?j )
  • Bypass probability and go directly to
    a-posteriori probability estimation

4
4.2 Density Estimation
  • Basic ideaThe most fundamental technique rely
    on the fact the Probability P that a vector x
    will fall in region R is
  • P is a smoothed (or averaged) version of the
    density function p(x). Clearly, the probability
    that k points of n samples fall in R is then
  • and the expected value for k is
  • E(k) nP
    (3)
  • Therefore, the ratio k/n is a good estimate for
    the probability P and hence for the density
    function p(x).

5
Density Estimation (cont.)
  • p(x) is continuous and that the region R is so
    small that p does not vary significantly within
    R, we can write
  • where x is a point within R and V the volume
    enclosed by R.

6
Since p(x) constant, it is not a part of the
sum.
  • Where ?(R) is a surface in the Euclidean space
    R2
  • a volume in the Euclidean space R3
  • a hypervolume in the Euclidean space Rn
  • Since p(x) ? p(x) constant, therefore in the
    Euclidean space R3

7
(No Transcript)
8
  • Fix the volume V and take more and more training
    samples, the ration k/n will converge, but we
    have only obtained an estimated of the
    space-averaged value of p(x)
  • Fix the number of samples and let V approach
    zero, the region will eventually become so small
    that it will enclose no samples, and our estimate
    p(x) 0 will be useless.

9
  • Practically, V cannot be allowed to become small
    since the number of samples is always limited
  • The volume V cannot be allowed become arbitrarily
    small. If we still want use this estimation, we
    will have to accept a certain amount of variance
    in the ratio k/n and a certain amount of
    averaging of the density p(x)
  • Theoretically, if an unlimited number of samples
    is available, to estimate the density of x, we
    form a sequence of regions
  • R1, R2,containing x the first region contains
    one sample, the second two samples and so on.
  • Let Vn be the volume of Rn, kn the number of
    samples falling in Rn and pn(x) be the nth
    estimate for p(x)
  • pn(x) (kn/n)/Vn (7)

10
  • Three necessary conditions should apply if we
    want pn(x) to converge to p(x)
  • There are two different ways of obtaining
    sequences of regions that satisfy these
    conditions
  • (a) Shrink an initial region where Vn 1/?n and
    show that
  • This is called the Parzen-window
    estimation method
  • (b) Specify kn as some function of n, such as
    kn ?n the volume Vn is
  • grown until it encloses kn neighbors of x.
    This is called the kn-nearest
  • neighbor estimation method

11
(No Transcript)
12
4.3 Parzen Windows
  • Parzen-window approach to estimate densities
    assume that the region Rn is a d-dimensional
    hypercube
  • ?((x-xi)/hn) is equal to 1 if xi falls within the
    hypercube of volume Vn centered at x and equal to
    zero otherwise.

13
  • The number of samples in this hypercube is
  • By substituting kn in equation (7), we obtain the
    following estimate
  • This equation suggests a more general approach to
    estimating density functions. Pn(x) estimates
    p(x) as an average of functions of x and the
    samples (xi) (i 1, ,n). Each sample
    contributing to the estimate in accordance with
    its distance from x.

14
  • Illustration
  • The behavior of the Parzen-window method
  • Case where p(x) ?N(0,1) (p(x) is a zero mean,
    uni-variance, univariate normal density.)
  • Let ?(u) (1/?(2?) exp(-u2/2) and hn h1/?n
    (ngt1)

  • (h1 known parameter)
  • Thus
  • these results depend both on n and h1.

15
  • Numerical results
  • For n 1 and h11
  • For n 10 and h 0.1, the contributions of the
    individual samples are clearly observable !

16
(No Transcript)
17
(No Transcript)
18
  • Analogous results are also obtained in two
    dimensions as illustrated

19
(No Transcript)
20
  • Case where p(x) mixture of a uniform and a
    triangle density

21
(No Transcript)
22
  • Classification example
  • In classifiers based on Parzen-window
    estimation
  • We estimate the densities for each category and
    classify a test point by the label corresponding
    to the maximum posterior
  • The decision region for a Parzen-window
    classifier also depends upon the choice of window
    function as illustrated in the following figure.

23
(No Transcript)
24
Conclusion
  • Power and some of limitations of nonparametric
    methods
  • Power Generality --- we did not need to make any
    assumptions about the density ahead of time. The
    same procedure was used for the unimodal normal
    case or the bimodal mixture case.
  • Limitation
  • The number of samples needed may very large
    indeed.
  • The demand for a large number of samples grows
    exponentially with the dimensionality of the
    feature space.
Write a Comment
User Comments (0)
About PowerShow.com