A VIEW OF THE EM ALGORITHM THAT JUSTIFIES INCREMENTAL, SPARSE, AND OTHER VARIANTS - PowerPoint PPT Presentation

Loading...

PPT – A VIEW OF THE EM ALGORITHM THAT JUSTIFIES INCREMENTAL, SPARSE, AND OTHER VARIANTS PowerPoint presentation | free to download - id: 213151-ZGViZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

A VIEW OF THE EM ALGORITHM THAT JUSTIFIES INCREMENTAL, SPARSE, AND OTHER VARIANTS

Description:

Then, it is easy to justify an incremental variant of the EM algorithm. Also, for sparse algorithm, and other variants. Introduction ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 24
Provided by: Kyubae8
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: A VIEW OF THE EM ALGORITHM THAT JUSTIFIES INCREMENTAL, SPARSE, AND OTHER VARIANTS


1
A VIEW OF THE EM ALGORITHM THAT JUSTIFIES
INCREMENTAL, SPARSE, AND OTHER VARIANTS
  • RADFORD M. NEAL
  • GEOFFREY E. HINTON
  • ?? ???

2
Abstract
  • First, the concept of the negative free energy
    is introduced.
  • E step maximizes this with respect to the
    distribution over unobserved variables.
  • M step also maximizes this with respect to the
    model parameters.
  • Then, it is easy to justify an incremental
    variant of the EM algorithm.
  • Also, for sparse algorithm, and other variants

3
Introduction
  • EM algorithms find maximum likelihood parameter
    estimates in problems where some variables were
    unobserved.
  • It can be shown that each iteration improves the
    true likelihood, or leaves it unchanged.
  • The M step can be partially implemented.
  • Not maximizing, but improving
  • Generalized EM algorithm, ECM
  • The E step can also be partially implemented.
  • Incremental EM algorithm
  • The unobserved variables are commonly independent.

4
Introduction(contd)
  • A view of the EM algorithm here
  • Maximizing the joint function of the parameters
    and of the distribution over the unobserved
    variables.
  • And this is analogous to the free energy
    function used in statistical physics.
  • This can also be viewed as Kullback-Leibler
    divergence.
  • E step maximizes this function with respect to
    the distribution over unobserved variables.
  • M step also maximizes this function with respect
    to the model parameters.

5
General Theory
  • Simple notations
  • Z observed variable
  • Y unobserved variable
  • P(y, z ?)
  • The joint probability of Y and Z.
  • ? is the parameter.
  • Given observed data z, we wish to find the value
    of ? that maximizes the log likelihood, L(?)
    log P(z ?).

6
EM Algorithm
  • EM algorithms start with initial guess at the
    parameter ? (0) and then proceeds following
    steps.
  • Each iteration improves, or leaves unchanged the
    true likelihood.
  • The algorithm converges to a local maximum of
    L(?).
  • GEM algorithm is also guaranteed to converge.

7
A View of Increasing One Function.
  • The function
  • Variational free energy
  • Kullback-Leibler divergence
  • P?(y) P(y z, ?)

8
Lemma 1
9
Lemma 2
10
EM Algorithms in New Point of View
11
The Function F and Likelihood L
12
Incremental Algorithms
  • In typical applications
  • Z (Z1, ..., Zn)
  • Y (Y1, ..., Yn)
  • P(y, z ?) ?i P(yi, zi ?)
  • Then,

13
Incremental Algorithms(contd)
  • So, the algorithm is

14
Sufficient Statistics
  • Vector of sufficient statistics
  • s(y, z) ?i si(yi, zi)
  • Standard EM Algorithm using sufficient statistics

15
Sufficient Statistics(contd)
  • Incremental EM using sufficient statistics
  • Fast convergence
  • Intermediate variant is
  • as fast as incremental algorithm.

16
An Incremental Variant of EM by Nowlan(1991)
  • The vicinity of the correct answer
  • More rapid.

17
Demonstration for a Mixture Model
  • A simple two Gaussian mixture
  • Zi observed variable, real-valued
  • Yi unobserved, binary variable
  • indicates from which of two Gaussian
    distributions the corresponding observed variable
    was generated.
  • parameter ? is
  • (?, ?0, ?0, ?1, ?1)

18
Gaussian Mixture Model
  • Sufficient statistics

19
Standard EM vs. Incremental EM
  • But, the computation time in a pass of
    incremental EM is as twice as that of standard EM.

20
Incremental EM vs. Incremental Variant
  • There may be the combination of the variant and
    the incremental.

21
A Sparse Algorithm
  • A sparse variant of the EM algorithm may be
    advantageous
  • when the unobserved variable, Y, can take on many
    possible values, but only a small set of
    plausible values have non-negligible
    probability.
  • Update only plausible values.
  • At infrequent intervals, all probabilities are
    recomputed, and the plausible set is changed.
  • In detail,

22
Iterations in Sparse Algorithm
  • A normal iteration
  • A full iteration

23
Other Variants
  • Winner-take-all variant of EM algorithm
  • Early stages of maximizing F
  • In estimating Hidden Markov Models for speech
    recognition
About PowerShow.com