# A VIEW OF THE EM ALGORITHM THAT JUSTIFIES INCREMENTAL, SPARSE, AND OTHER VARIANTS - PowerPoint PPT Presentation

PPT – A VIEW OF THE EM ALGORITHM THAT JUSTIFIES INCREMENTAL, SPARSE, AND OTHER VARIANTS PowerPoint presentation | free to download - id: 213151-ZGViZ

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## A VIEW OF THE EM ALGORITHM THAT JUSTIFIES INCREMENTAL, SPARSE, AND OTHER VARIANTS

Description:

### Then, it is easy to justify an incremental variant of the EM algorithm. Also, for sparse algorithm, and other variants. Introduction ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 24
Provided by: Kyubae8
Category:
Tags:
Transcript and Presenter's Notes

Title: A VIEW OF THE EM ALGORITHM THAT JUSTIFIES INCREMENTAL, SPARSE, AND OTHER VARIANTS

1
A VIEW OF THE EM ALGORITHM THAT JUSTIFIES
INCREMENTAL, SPARSE, AND OTHER VARIANTS
• GEOFFREY E. HINTON
• ?? ???

2
Abstract
• First, the concept of the negative free energy
is introduced.
• E step maximizes this with respect to the
distribution over unobserved variables.
• M step also maximizes this with respect to the
model parameters.
• Then, it is easy to justify an incremental
variant of the EM algorithm.
• Also, for sparse algorithm, and other variants

3
Introduction
• EM algorithms find maximum likelihood parameter
estimates in problems where some variables were
unobserved.
• It can be shown that each iteration improves the
true likelihood, or leaves it unchanged.
• The M step can be partially implemented.
• Not maximizing, but improving
• Generalized EM algorithm, ECM
• The E step can also be partially implemented.
• Incremental EM algorithm
• The unobserved variables are commonly independent.

4
Introduction(contd)
• A view of the EM algorithm here
• Maximizing the joint function of the parameters
and of the distribution over the unobserved
variables.
• And this is analogous to the free energy
function used in statistical physics.
• This can also be viewed as Kullback-Leibler
divergence.
• E step maximizes this function with respect to
the distribution over unobserved variables.
• M step also maximizes this function with respect
to the model parameters.

5
General Theory
• Simple notations
• Z observed variable
• Y unobserved variable
• P(y, z ?)
• The joint probability of Y and Z.
• ? is the parameter.
• Given observed data z, we wish to find the value
of ? that maximizes the log likelihood, L(?)
log P(z ?).

6
EM Algorithm
parameter ? (0) and then proceeds following
steps.
• Each iteration improves, or leaves unchanged the
true likelihood.
• The algorithm converges to a local maximum of
L(?).
• GEM algorithm is also guaranteed to converge.

7
A View of Increasing One Function.
• The function
• Variational free energy
• Kullback-Leibler divergence
• P?(y) P(y z, ?)

8
Lemma 1
9
Lemma 2
10
EM Algorithms in New Point of View
11
The Function F and Likelihood L
12
Incremental Algorithms
• In typical applications
• Z (Z1, ..., Zn)
• Y (Y1, ..., Yn)
• P(y, z ?) ?i P(yi, zi ?)
• Then,

13
Incremental Algorithms(contd)
• So, the algorithm is

14
Sufficient Statistics
• Vector of sufficient statistics
• s(y, z) ?i si(yi, zi)
• Standard EM Algorithm using sufficient statistics

15
Sufficient Statistics(contd)
• Incremental EM using sufficient statistics
• Fast convergence
• Intermediate variant is
• as fast as incremental algorithm.

16
An Incremental Variant of EM by Nowlan(1991)
• The vicinity of the correct answer
• More rapid.

17
Demonstration for a Mixture Model
• A simple two Gaussian mixture
• Zi observed variable, real-valued
• Yi unobserved, binary variable
• indicates from which of two Gaussian
distributions the corresponding observed variable
was generated.
• parameter ? is
• (?, ?0, ?0, ?1, ?1)

18
Gaussian Mixture Model
• Sufficient statistics

19
Standard EM vs. Incremental EM
• But, the computation time in a pass of
incremental EM is as twice as that of standard EM.

20
Incremental EM vs. Incremental Variant
• There may be the combination of the variant and
the incremental.

21
A Sparse Algorithm
• A sparse variant of the EM algorithm may be
• when the unobserved variable, Y, can take on many
possible values, but only a small set of
plausible values have non-negligible
probability.
• Update only plausible values.
• At infrequent intervals, all probabilities are
recomputed, and the plausible set is changed.
• In detail,

22
Iterations in Sparse Algorithm
• A normal iteration
• A full iteration

23
Other Variants
• Winner-take-all variant of EM algorithm
• Early stages of maximizing F
• In estimating Hidden Markov Models for speech
recognition