Part 2: Unsupervised Learning - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Part 2: Unsupervised Learning

Description:

Machine Learning Techniques for Computer Vision (ECCV 2004) Christopher M. Bishop ... Automatic relevance determination (ARD) ML PCA. Bayesian PCA ... – PowerPoint PPT presentation

Number of Views:229
Avg rating:3.0/5.0
Slides: 73
Provided by: cmbi5
Category:

less

Transcript and Presenter's Notes

Title: Part 2: Unsupervised Learning


1
Machine Learning Techniques for Computer Vision
  • Part 2 Unsupervised Learning

Christopher M. Bishop
Microsoft Research Cambridge
ECCV 2004, Prague
2
Overview of Part 2
  • Mixture models
  • EM
  • Variational Inference
  • Bayesian model complexity
  • Continuous latent variables

3
The Gaussian Distribution
  • Multivariate Gaussian
  • Maximum likelihood

mean
4
Gaussian Mixtures
  • Linear super-position of Gaussians
  • Normalization and positivity require

5
Example Mixture of 3 Gaussians
6
Maximum Likelihood for the GMM
  • Log likelihood function
  • Sum over components appears inside the log
  • no closed form ML solution

7
EM Algorithm Informal Derivation
8
EM Algorithm Informal Derivation
  • M step equations

9
EM Algorithm Informal Derivation
  • E step equation

10
EM Algorithm Informal Derivation
  • Can interpret the mixing coefficients as prior
    probabilities
  • Corresponding posterior probabilities
    (responsibilities)

11
Old Faithful Data Set
Time betweeneruptions (minutes)
Duration of eruption (minutes)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Latent Variable View of EM
  • To sample from a Gaussian mixture
  • first pick one of the components with probability
  • then draw a sample from that component
  • repeat these two steps for each new data point

19
Latent Variable View of EM
  • Goal given a data set, find
  • Suppose we knew the colours
  • maximum likelihood would involve fitting each
    component to the corresponding cluster
  • Problem the colours are latent (hidden) variables

20
Incomplete and Complete Data
incomplete
complete
21
Latent Variable Viewpoint
22
Latent Variable Viewpoint
  • Binary latent variables
    describing which component generated each data
    point
  • Conditional distribution of observed variable
  • Prior distribution of latent variables
  • Marginalizing over the latent variables we obtain

23
Graphical Representation of GMM
24
Latent Variable View of EM
  • Suppose we knew the values for the latent
    variables
  • maximize the complete-data log likelihood
  • trivial closed-form solution fit each component
    to the corresponding set of data points
  • We dont know the values of the latent variables
  • however, for given parameter values we can
    compute the expected values of the latent
    variables

25
Posterior Probabilities (colour coded)
26
Over-fitting in Gaussian Mixture Models
  • Infinities in likelihood function when a
    component collapses onto a data point
    with
  • Also, maximum likelihood cannot determine the
    number K of components

27
Cross Validation
  • Can select model complexity using an independent
    validation data set
  • If data is scarce use cross-validation
  • partition data into S subsets
  • train on S-1 subsets
  • test on remainder
  • repeat and average
  • Disadvantages
  • computationally expensive
  • can only determine one or two complexity
    parameters

28
Bayesian Mixture of Gaussians
  • Parameters and latent variables appear on equal
    footing
  • Conjugate priors

29
Data Set Size
  • Problem 1 learn the functionfor
    from 100 (slightly) noisy examples
  • data set is computationally small but
    statistically large
  • Problem 2 learn to recognize 1,000 everyday
    objects from 5,000,000 natural images
  • data set is computationally large but
    statistically small
  • Bayesian inference
  • computationally more demanding than ML or
    MAP(but see discussion of Gaussian mixtures
    later)
  • significant benefit for statistically small data
    sets

30
Variational Inference
  • Exact Bayesian inference intractable
  • Markov chain Monte Carlo
  • computationally expensive
  • issues of convergence
  • Variational Inference
  • broadly applicable deterministic approximation
  • let denote all latent variables and parameters
  • approximate true posterior using a
    simpler distribution
  • minimize Kullback-Leibler divergence

31
General View of Variational Inference
  • For arbitrarywhere
  • Maximizing over would give the true
    posterior
  • this is intractable by definition

32
Variational Lower Bound
33
Factorized Approximation
  • Goal choose a family of q distributions which
    are
  • sufficiently flexible to give good approximation
  • sufficiently simple to remain tractable
  • Here we consider factorized distributions
  • No further assumptions are required!
  • Optimal solution for one factor, keeping the
    remainder fixed
  • coupled solutions so initialize then cyclically
    update
  • message passing view (Winn and Bishop, 2004)

34
(No Transcript)
35
Lower Bound
  • Can also be evaluated
  • Useful for maths/code verification
  • Also useful for model comparison

36
Illustration Univariate Gaussian
  • Likelihood function
  • Conjugate prior
  • Factorized variational distribution

37
Initial Configuration
38
After Updating
39
After Updating
40
Converged Solution
41
Variational Mixture of Gaussians
  • Assume factorized posterior distribution
  • No other approximations needed!

42
Variational Equations for GMM
43
Lower Bound for GMM
44
VIBES
  • Bishop, Spiegelhalter and Winn (2002)

45
ML Limit
  • If instead we choosewe recover the maximum
    likelihood EM algorithm

46
Bound vs. K for Old Faithful Data
47
Bayesian Model Complexity
48
Sparse Bayes for Gaussian Mixture
  • Corduneanu and Bishop (2001)
  • Start with large value of K
  • treat mixing coefficients as parameters
  • maximize marginal likelihood
  • prunes out excess components

49
(No Transcript)
50
(No Transcript)
51
Summary Variational Gaussian Mixtures
  • Simple modification of maximum likelihood EM code
  • Small computational overhead compared to EM
  • No singularities
  • Automatic model order selection

52
Continuous Latent Variables
  • Conventional PCA
  • data covariance matrix
  • eigenvector decomposition
  • Minimizes sum-of-squares projection
  • not a probabilistic model
  • how should we choose L ?

53
Probabilistic PCA
  • Tipping and Bishop (1998)
  • L dimensional continuous latent space
  • D dimensional data space

PCA
factor analysis
54
Probabilistic PCA
  • Marginal distribution
  • Advantages
  • exact ML solution
  • computationally efficient EM algorithm
  • captures dominant correlations with few
    parameters
  • mixtures of PPCA
  • Bayesian PCA
  • building block for more complex models

55
EM for PCA
56
EM for PCA
57
EM for PCA
58
EM for PCA
59
EM for PCA
60
EM for PCA
61
EM for PCA
62
Bayesian PCA
  • Bishop (1998)
  • Gaussian prior over columns of
  • Automatic relevance determination (ARD)

ML PCA
Bayesian PCA
63
Non-linear Manifolds
  • Example images of a rigid object

64
Bayesian Mixture of BPCA Models
65
(No Transcript)
66
Flexible Sprites
  • Jojic and Frey (2001)
  • Automatic decomposition of video sequence into
  • background model
  • ordered set of masks (one per object per frame)
  • foreground model (one per object per frame)

67
(No Transcript)
68
Transformed Component Analysis
  • Generative model
  • Now include transformations (translations)
  • Extend to L layers
  • Inference intractable so use variational
    framework

69
(No Transcript)
70
Bayesian Constellation Model
  • Li, Fergus and Perona (2003)
  • Object recognition from small training sets
  • Variational treatment of fully Bayesian model

71
Bayesian Constellation Model
72
Summary of Part 2
  • Discrete and continuous latent variables
  • EM algorithm
  • Build complex models from simple components
  • represented graphically
  • incorporates prior knowledge
  • Variational inference
  • Bayesian model comparison
Write a Comment
User Comments (0)
About PowerShow.com