Title: Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006.
1Ch 2. Probability DistributionPattern
Recognition and Machine Learning, C. M. Bishop,
2006.
- Summarized by
- M.H. Kim
- Biointelligence Laboratory, Seoul National
University - http//bi.snu.ac.kr/
2Content
- 2.3 The Gaussian Distribution
- 2.3.6 Bayesian inference for the Gaussian
- 2.3.7 Student's t-distribution
- 2.3.8 Periodic variables
- 2.3.9 Mixtures of Gaussians
- 2.4 The Exponential Family
- 2.4.1 Maximum likelihood and sufficient
statistics - 2.4.2 Conjugate priors
- 2.4.3 Noninformative priors
- 2.5 Nonparametric Methods
- 2.5.1 Kernel density estimators
- 2.5.2 Nearest-neighbour methods
32.3.6 Bayesian inference for the Gaussian
- Bayesian inference
- Suppose that the variance s2 is known and we
consider the task of inferring the mean µ given a
set of N observation Xx1,,xN - Likelihood Function
- If takes the form of the exponential of a
quadratic form in µ. Thus if we choose a prior
p(µ) given by a Gaussian, it will be a conjugate
distribution for this likelihood function.
42.3.6 Bayesian inference for the Gaussian
- Prior distribution
- Posterior distribution
52.3.6 Bayesian inference for the Gaussian
- Likelihood function
- Prior distribution gamma disribution
- Posterior distribution
62.3.6 Bayesian inference for the Gaussian
- Likelihood function
- Prior distribution normal-G or Gaussian- G
distribution
72.3.7 Student's t-distribution
- If we have a univariate Gaussian N(xµ,t-1)
together with a Gamma prior Gam(ta,b) and we
integrate out the precision, we obtain the
marginal distribution of x - Students t-distribution
82.3.7 Student's t-distribution
- Gaussian vs t-distribution
92.3.7 Student's t-distribution
- Multivariate t-distribution
- where D is the dimensionality of x
102.3.8 Periodic variables
112.3.8 Periodic variables
- The simplest approach is to use a histogram of
observations in which the angular coordinate is
divided into fixed bins. - Another approach starts, like the von Mises
distribution from a Gaussian distribution over a
Euclidean space but now marginalizeds onto the
unit circle rather than conditioning. - However, this leads to more complex forms of
distribution.
122.3.9 Mixtures of Gaussians
- Example of a Gaussian mixture distribution
132.4 The Exponential Family
- The probability distributions that we have
studied so far in this chap. are specific
examples of a broad class of distributions called
the exponential family. - The exponential family of distributions over x,
given parameters ?, is defined to be the set of
distributions of the form - p(x?) h(x)g(?)exp?Tu(x) (2.194)
- where x may be scalar or vector, and may be
discrete or continuous.
142.4 The Exponential Family
- Here ? are called the natural parameters of the
distribution, and u(x) is some function of x. - The function g(? )can be interpreted as the
coefficient that ensures that the distribution is
normalized and therefore satisfies - g(?)?h(x)exp?Tu(x) dx 1 (2.195)
- where the integration is replaced by summation
if x is a discrete variable.
152.4 The Exponential Family
- Gaussian distribution
- P(x?,s2)
162.4.1 Maximum likelihood and sufficient statistics
- Problem of estimating the parameter vector ? in
the general exponential family distribution - The likelihood function
172.4.2 Conjugate priors
- Conjugate prior
- Posterior distribution
182.4.3 Noninformative priors
- Noninformative prior
- It is intended to have as little influence on the
posterior distribution as possible. - Example density then the
parameter µ is known as a location parameter - as we have seen, the conjugate prior
distribution for µ in this case is a Gaussian
p(µµ0,s02) N(µµ0,s02) , and we obtain a
noninformative prior by taking the limit s02 ?8
192.5 Nonparametric Methods
- Histogram methods for density estimation
- To estimate the probability
- density at a particular loca-
- tion, we should consider
- the data points that lie with-
- in some local neighbourhood
- of that point
- The value of the smoothing parameter should be
neither too large nor too small
202.5.1 Kernel density estimators
- Probability mass associated with this region
- In order to count K, set the kernal function
212.5.1 Kernel density estimators
222.5.2 Nearest-neighbour methods
- Fixed K and find an appropriate value for V