Loading...

PPT – A VIEW OF THE EM ALGORITHM THAT JUSTIFIES INCREMENTAL, SPARSE, AND OTHER VARIANTS PowerPoint presentation | free to download - id: 213151-ZGViZ

The Adobe Flash plugin is needed to view this content

A VIEW OF THE EM ALGORITHM THAT JUSTIFIES

INCREMENTAL, SPARSE, AND OTHER VARIANTS

- RADFORD M. NEAL
- GEOFFREY E. HINTON
- ?? ???

Abstract

- First, the concept of the negative free energy

is introduced. - E step maximizes this with respect to the

distribution over unobserved variables. - M step also maximizes this with respect to the

model parameters. - Then, it is easy to justify an incremental

variant of the EM algorithm. - Also, for sparse algorithm, and other variants

Introduction

- EM algorithms find maximum likelihood parameter

estimates in problems where some variables were

unobserved. - It can be shown that each iteration improves the

true likelihood, or leaves it unchanged. - The M step can be partially implemented.
- Not maximizing, but improving
- Generalized EM algorithm, ECM
- The E step can also be partially implemented.
- Incremental EM algorithm
- The unobserved variables are commonly independent.

Introduction(contd)

- A view of the EM algorithm here
- Maximizing the joint function of the parameters

and of the distribution over the unobserved

variables. - And this is analogous to the free energy

function used in statistical physics. - This can also be viewed as Kullback-Leibler

divergence. - E step maximizes this function with respect to

the distribution over unobserved variables. - M step also maximizes this function with respect

to the model parameters.

General Theory

- Simple notations
- Z observed variable
- Y unobserved variable
- P(y, z ?)
- The joint probability of Y and Z.
- ? is the parameter.
- Given observed data z, we wish to find the value

of ? that maximizes the log likelihood, L(?)

log P(z ?).

EM Algorithm

- EM algorithms start with initial guess at the

parameter ? (0) and then proceeds following

steps. - Each iteration improves, or leaves unchanged the

true likelihood. - The algorithm converges to a local maximum of

L(?). - GEM algorithm is also guaranteed to converge.

A View of Increasing One Function.

- The function
- Variational free energy
- Kullback-Leibler divergence
- P?(y) P(y z, ?)

Lemma 1

Lemma 2

EM Algorithms in New Point of View

The Function F and Likelihood L

Incremental Algorithms

- In typical applications
- Z (Z1, ..., Zn)
- Y (Y1, ..., Yn)
- P(y, z ?) ?i P(yi, zi ?)
- Then,

Incremental Algorithms(contd)

- So, the algorithm is

Sufficient Statistics

- Vector of sufficient statistics
- s(y, z) ?i si(yi, zi)
- Standard EM Algorithm using sufficient statistics

Sufficient Statistics(contd)

- Incremental EM using sufficient statistics
- Fast convergence
- Intermediate variant is
- as fast as incremental algorithm.

An Incremental Variant of EM by Nowlan(1991)

- The vicinity of the correct answer
- More rapid.

Demonstration for a Mixture Model

- A simple two Gaussian mixture
- Zi observed variable, real-valued
- Yi unobserved, binary variable
- indicates from which of two Gaussian

distributions the corresponding observed variable

was generated. - parameter ? is
- (?, ?0, ?0, ?1, ?1)

Gaussian Mixture Model

- Sufficient statistics

Standard EM vs. Incremental EM

- But, the computation time in a pass of

incremental EM is as twice as that of standard EM.

Incremental EM vs. Incremental Variant

- There may be the combination of the variant and

the incremental.

A Sparse Algorithm

- A sparse variant of the EM algorithm may be

advantageous - when the unobserved variable, Y, can take on many

possible values, but only a small set of

plausible values have non-negligible

probability. - Update only plausible values.
- At infrequent intervals, all probabilities are

recomputed, and the plausible set is changed. - In detail,

Iterations in Sparse Algorithm

- A normal iteration
- A full iteration

Other Variants

- Winner-take-all variant of EM algorithm
- Early stages of maximizing F
- In estimating Hidden Markov Models for speech

recognition