Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

About This Presentation

Title:

Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Description:

(C) 2006 SNU CSE Biointelligence Lab. 3 /23. 3 ... ,t-1) together with a Gamma prior Gam(t|a,b) and we integrate out the precision, ... – PowerPoint PPT presentation

Number of Views:199

Avg rating:3.0/5.0

Slides: 23

Provided by: peo62

Category:

more less

Transcript and Presenter's Notes

Title: Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

1
Ch 2. Probability DistributionPattern
Recognition and Machine Learning, C. M. Bishop,
2006.

Summarized by
M.H. Kim
Biointelligence Laboratory, Seoul National
University
http//bi.snu.ac.kr/

2
Content

2.3 The Gaussian Distribution
2.3.6 Bayesian inference for the Gaussian
2.3.7 Student's t-distribution
2.3.8 Periodic variables
2.3.9 Mixtures of Gaussians
2.4 The Exponential Family
2.4.1 Maximum likelihood and sufficient
statistics
2.4.2 Conjugate priors
2.4.3 Noninformative priors
2.5 Nonparametric Methods
2.5.1 Kernel density estimators
2.5.2 Nearest-neighbour methods

3
2.3.6 Bayesian inference for the Gaussian

Bayesian inference
Suppose that the variance s2 is known and we
consider the task of inferring the mean µ given a
set of N observation Xx1,,xN
Likelihood Function
If takes the form of the exponential of a
quadratic form in µ. Thus if we choose a prior
p(µ) given by a Gaussian, it will be a conjugate
distribution for this likelihood function.

4
2.3.6 Bayesian inference for the Gaussian

Prior distribution
Posterior distribution

5
2.3.6 Bayesian inference for the Gaussian

Likelihood function
Prior distribution gamma disribution
Posterior distribution

6
2.3.6 Bayesian inference for the Gaussian

Likelihood function
Prior distribution normal-G or Gaussian- G
distribution

7
2.3.7 Student's t-distribution

If we have a univariate Gaussian N(xµ,t-1)
together with a Gamma prior Gam(ta,b) and we
integrate out the precision, we obtain the
marginal distribution of x
Students t-distribution

8
2.3.7 Student's t-distribution

Gaussian vs t-distribution

9
2.3.7 Student's t-distribution

Multivariate t-distribution
where D is the dimensionality of x

10
2.3.8 Periodic variables

von Mises distribution

11
2.3.8 Periodic variables

The simplest approach is to use a histogram of
observations in which the angular coordinate is
divided into fixed bins.
Another approach starts, like the von Mises
distribution from a Gaussian distribution over a
Euclidean space but now marginalizeds onto the
unit circle rather than conditioning.
However, this leads to more complex forms of
distribution.

12
2.3.9 Mixtures of Gaussians

Example of a Gaussian mixture distribution

13
2.4 The Exponential Family

The probability distributions that we have
studied so far in this chap. are specific
examples of a broad class of distributions called
the exponential family.
The exponential family of distributions over x,
given parameters ?, is defined to be the set of
distributions of the form
p(x?) h(x)g(?)exp?Tu(x) (2.194)
where x may be scalar or vector, and may be
discrete or continuous.

14
2.4 The Exponential Family

Here ? are called the natural parameters of the
distribution, and u(x) is some function of x.
The function g(? )can be interpreted as the
coefficient that ensures that the distribution is
normalized and therefore satisfies
g(?)?h(x)exp?Tu(x) dx 1 (2.195)
where the integration is replaced by summation
if x is a discrete variable.

15
2.4 The Exponential Family

Gaussian distribution
P(x?,s2)

16
2.4.1 Maximum likelihood and sufficient statistics

Problem of estimating the parameter vector ? in
the general exponential family distribution
The likelihood function

17
2.4.2 Conjugate priors

Conjugate prior
Posterior distribution

18
2.4.3 Noninformative priors

Noninformative prior
It is intended to have as little influence on the
posterior distribution as possible.
Example density then the
parameter µ is known as a location parameter
as we have seen, the conjugate prior
distribution for µ in this case is a Gaussian
p(µµ0,s02) N(µµ0,s02) , and we obtain a
noninformative prior by taking the limit s02 ?8

19
2.5 Nonparametric Methods

Histogram methods for density estimation
To estimate the probability
density at a particular loca-
tion, we should consider
the data points that lie with-
in some local neighbourhood
of that point
The value of the smoothing parameter should be
neither too large nor too small

20
2.5.1 Kernel density estimators

Probability mass associated with this region
In order to count K, set the kernal function

21
2.5.1 Kernel density estimators
22
2.5.2 Nearest-neighbour methods

Fixed K and find an appropriate value for V

Write a Comment

User Comments (0)

About PowerShow.com

Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006. - PowerPoint PPT Presentation

Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

(C) 2006 SNU CSE Biointelligence Lab. 3 /23. 3 ... ,t-1) together with a Gamma prior Gam(t|a,b) and we integrate out the precision, ... – PowerPoint PPT presentation