Title: Outline
1Outline
- Parameter estimation continued
- Non-parametric methods
2Maximum-Likelihood Estimation
- Assumptions
- We separate a collection of samples according to
class - D1, D2, ....., Dc
- Samples in Dj are drawn independently according
to the probability p(xwj) - We assume that p(xwj) has a known parametric
form and is uniquely determined by the value of a
parameter vector ?j - To simplify further, we assume that samples in Di
give no information about ?j if i ? j
3Maximum-Likelihood Estimation cont.
- Suppose that D contains n samples
- x1, ....., xn
- By assumption that samples were drawn
independently, we have - The maximum-likelihood estimate of ? is the value
of ? that maximizes p(D ?)
4Bayesian Estimation
- Assumptions
- The form of the density p(xq) is assumed to be
known, but the value of the parameter vector q is
not known exactly - Our initial knowledge about q is assumed to be
contained in a known prior density p(q) - The rest of our knowledge about q is contained in
a set D of n samples x1, ....., xn drawn
independently according to the unknown
probability density p(x)
5Bayesian Estimation cont.
- General theory
- The basic problem is to compute the posterior
density p(qD) - By Bayes formula we have
- By the independence assumption
6Bayesian Estimation cont.
- Gaussian case
- The univariate case p(mD)
7Bayesian Estimation cont.
- Gaussian case continued
- The univariate case p(xD)
- The multivariate case
8Non-parametric Methods
- In maximum-likelihood and Bayesian estimation
- The forms of the probability densities are
assumed to be known - However, the assumed forms rarely fit the
densities in practice - In particular, all of the classical parametric
densities are uni-modal
9A Multimodal Density
10Solutions
- More complicated parametric models
- Mixture of Gaussians
- More general, some basis functions to describe a
probability density - Learning is intrinsically more difficult when we
have more parameters - Non-parametric methods
11Non-parametric Methods
- Most of the non-parametric density estimation
methods are based on the following fact - The probability P that a vector x will fall in a
region R is given by
12Non-parametric Methods cont.
- For n smaples x1, ....., xn that are drawn
independently according to p(x), the probability
that k of n will be in R is given by
V is the volume of R
13Non-parametric Methods cont.
14Non-parametric Methods cont.
- Problems to be addressed
- If we fix the volume V and have more samples, the
ratio k/n will converge as desired - Averaged version of p(x)
- How to estimate p(x)?
- Let V approach zero?
15Parzen Windows
- Parzen windows
- We use a window function for interpolation, each
sample contributing to the estimate in accordance
with its distance from x - Here hn is a parameter
16Parzen Windows - cont.
- Choice of hn
- Too large, the spatial resolution is low
- Too small, the estimate will have a large variance
17Parzen Windows - cont.
- Properties
- Convergence of mean
- As n approaches infinity, the estimate will also
approach p(x) if p(x) is continuous - Smaller Vn is better
- Convergence of variance
- A smaller variance needs a larger Vn
18Parzen Windows - cont.
19Parzen Windows - cont.
20Parzen Windows - cont.
21Parzen Windows - cont.
22Kn-Nearest-Neighbor Estimation
- Let the cell volume be a function of the training
data - To estimate p(x) from n samples, we can center a
cell about x and let it grow until it captures kn
samples
23Kn-Nearest-Neighbor Estimation cont.
24Kn-Nearest-Neighbor Estimation cont.
25Kn-Nearest-Neighbor Estimation cont.
26The Nearest-Neighbor Rule
- The nearest-neighbor rule
- Let Dnx1, ...., xn denote a set of n labeled
prototypes - Suppose that x' be the prototype nearest to a
test point x - We classify x to the class associated with x'
27The Nearest-Neighbor Rule cont.