Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006. - PowerPoint PPT Presentation

About This Presentation
Title:

Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Description:

(C) 2006 SNU CSE Biointelligence Lab. 3 /23. 3 ... ,t-1) together with a Gamma prior Gam(t|a,b) and we integrate out the precision, ... – PowerPoint PPT presentation

Number of Views:199
Avg rating:3.0/5.0
Slides: 23
Provided by: peo62
Category:

less

Transcript and Presenter's Notes

Title: Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006.


1
Ch 2. Probability DistributionPattern
Recognition and Machine Learning, C. M. Bishop,
2006.
  • Summarized by
  • M.H. Kim
  • Biointelligence Laboratory, Seoul National
    University
  • http//bi.snu.ac.kr/

2
Content
  • 2.3 The Gaussian Distribution
  • 2.3.6 Bayesian inference for the Gaussian
  • 2.3.7 Student's t-distribution
  • 2.3.8 Periodic variables
  • 2.3.9 Mixtures of Gaussians
  • 2.4 The Exponential Family
  • 2.4.1 Maximum likelihood and sufficient
    statistics
  • 2.4.2 Conjugate priors
  • 2.4.3 Noninformative priors
  • 2.5 Nonparametric Methods
  • 2.5.1 Kernel density estimators
  • 2.5.2 Nearest-neighbour methods

3
2.3.6 Bayesian inference for the Gaussian
  • Bayesian inference
  • Suppose that the variance s2 is known and we
    consider the task of inferring the mean µ given a
    set of N observation Xx1,,xN
  • Likelihood Function
  • If takes the form of the exponential of a
    quadratic form in µ. Thus if we choose a prior
    p(µ) given by a Gaussian, it will be a conjugate
    distribution for this likelihood function.

4
2.3.6 Bayesian inference for the Gaussian
  • Prior distribution
  • Posterior distribution

5
2.3.6 Bayesian inference for the Gaussian
  • Likelihood function
  • Prior distribution gamma disribution
  • Posterior distribution

6
2.3.6 Bayesian inference for the Gaussian
  • Likelihood function
  • Prior distribution normal-G or Gaussian- G
    distribution

7
2.3.7 Student's t-distribution
  • If we have a univariate Gaussian N(xµ,t-1)
    together with a Gamma prior Gam(ta,b) and we
    integrate out the precision, we obtain the
    marginal distribution of x
  • Students t-distribution

8
2.3.7 Student's t-distribution
  • Gaussian vs t-distribution

9
2.3.7 Student's t-distribution
  • Multivariate t-distribution
  • where D is the dimensionality of x

10
2.3.8 Periodic variables
  • von Mises distribution

11
2.3.8 Periodic variables
  • The simplest approach is to use a histogram of
    observations in which the angular coordinate is
    divided into fixed bins.
  • Another approach starts, like the von Mises
    distribution from a Gaussian distribution over a
    Euclidean space but now marginalizeds onto the
    unit circle rather than conditioning.
  • However, this leads to more complex forms of
    distribution.

12
2.3.9 Mixtures of Gaussians
  • Example of a Gaussian mixture distribution

13
2.4 The Exponential Family
  • The probability distributions that we have
    studied so far in this chap. are specific
    examples of a broad class of distributions called
    the exponential family.
  • The exponential family of distributions over x,
    given parameters ?, is defined to be the set of
    distributions of the form
  • p(x?) h(x)g(?)exp?Tu(x) (2.194)
  • where x may be scalar or vector, and may be
    discrete or continuous.

14
2.4 The Exponential Family
  • Here ? are called the natural parameters of the
    distribution, and u(x) is some function of x.
  • The function g(? )can be interpreted as the
    coefficient that ensures that the distribution is
    normalized and therefore satisfies
  • g(?)?h(x)exp?Tu(x) dx 1 (2.195)
  • where the integration is replaced by summation
    if x is a discrete variable.

15
2.4 The Exponential Family
  • Gaussian distribution
  • P(x?,s2)

16
2.4.1 Maximum likelihood and sufficient statistics
  • Problem of estimating the parameter vector ? in
    the general exponential family distribution
  • The likelihood function

17
2.4.2 Conjugate priors
  • Conjugate prior
  • Posterior distribution

18
2.4.3 Noninformative priors
  • Noninformative prior
  • It is intended to have as little influence on the
    posterior distribution as possible.
  • Example density then the
    parameter µ is known as a location parameter
  • as we have seen, the conjugate prior
    distribution for µ in this case is a Gaussian
    p(µµ0,s02) N(µµ0,s02) , and we obtain a
    noninformative prior by taking the limit s02 ?8

19
2.5 Nonparametric Methods
  • Histogram methods for density estimation
  • To estimate the probability
  • density at a particular loca-
  • tion, we should consider
  • the data points that lie with-
  • in some local neighbourhood
  • of that point
  • The value of the smoothing parameter should be
    neither too large nor too small

20
2.5.1 Kernel density estimators
  • Probability mass associated with this region
  • In order to count K, set the kernal function

21
2.5.1 Kernel density estimators
22
2.5.2 Nearest-neighbour methods
  • Fixed K and find an appropriate value for V
Write a Comment
User Comments (0)
About PowerShow.com