Expectation-Maximization - PowerPoint PPT Presentation

About This Presentation
Title:

Expectation-Maximization

Description:

Posterior probability Logarithm of the joint distribution ... Construct a tractable lower-bound B( ; t) that contains a sum of logarithms. ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 27
Provided by: fatih7
Category:

less

Transcript and Presenter's Notes

Title: Expectation-Maximization


1
Expectation-Maximization
  • Markoviana Reading Group
  • Fatih Gelgi, ASU, 2005

2
Outline
  • What is EM?
  • Intuitive Explanation
  • Example Gaussian Mixture
  • Algorithm
  • Generalized EM
  • Discussion
  • Applications
  • HMM Baum-Welch
  • K-means

3
What is EM?
  • Two main applications
  • Data has missing values, due to problems with or
    limitations of the observation process.
  • Optimizing the likelihood function is extremely
    hard, but the likelihood function can be
    simplified by assuming the existence of and
    values for additional missing or hidden
    parameters.

4
Key Idea
  • The observed data U is generated by some
    distribution and is called the incomplete data.
  • Assume that a complete data set exists Z (U,J),
    where J is the missing or hidden data.
  • Maximize the posterior probability of the
    parameters ? given the data U, marginalizing over
    J

5
Intuitive Explanation of EM
  • Alternate between estimating the unknowns ? and
    the hidden variables J.
  • In each iteration, instead of finding the best J
    ? J, compute a distribution over the space J.
  • EM is a lower-bound maximization process
    (Minka,98).
  • E-step construct a local lower-bound to the
    posterior distribution.
  • M-step optimize the bound.

6
Intuitive Explanation of EM
  • Lower-bound approximation method

Sometimes provides faster convergence than
gradient descent and Newtons method
7
Example Mixture Components
8
Example (contd)True Likelihood of Parameters
9
Example (contd)Iterations of EM
10
Lower-bound Maximization
  • Posterior probability ? Logarithm of the joint
    distribution
  • Idea start with a guess ?t, compute an easily
    computed lower-bound B(? ?t) to the function log
    P(?U) and maximize the bound instead.

11
Lower-bound Maximization (cont.)
  • Construct a tractable lower-bound B(? ?t) that
    contains a sum of logarithms.
  • ft(J) is an arbitrary prob. dist.
  • By Jensens inequality,

12
Optimal Bound
  • B(? ?t) touches the objective function log
    P(U,?) at ?t.
  • Maximize B(?t ?t) with respect to ft(J)
  • Introduce a Lagrange multiplier ? to enforce the
    constraint

13
Optimal Bound (cont.)
  • Derivative with respect to ft(J)
  • Maximizes at

14
Maximizing the Bound
  • Re-write B(??t) with respect to the
    expectations
  • where
  • Finally,

15
EM Algorithm
  • EM converges to a local maximum of log P(U,?) ?
    maximum of log P(?U).

16
A Relation to the Log-Posterior
  • An alternative way to compute expected
    log-posterior
  • which is the same as maximization with respect
    to ?,

17
Generalized EM
  • Assume and B function are
    differentiable in
  • .The EM likelihood converges to a point
    where
  • GEM Instead of setting ?t1 argmax B(??t)
  • Just find ?t1 such that
  • B(??t1) gt B(??t)
  • GEM also is guaranteed to converge

18
HMM Baum-Welch Revisited
Estimate the parameters (a, b, ?) st. number of
correct individual states to be maximum.
gt(i) is the probability of being in state Si at
time t
xt(i,j) is the probability of being in state Si
at time t, and Sj at time t1
19
Baum-Welch E-step
20
Baum-Welch M-step
21
K-Means
  • Problem Given data X and the number of clusters
    K, find clusters.
  • Clustering based on centroids,
  • A point belongs to the cluster with closest
    centroid.
  • Hidden variables centroids of the clusters!

22
K-Means (cont.)
  • Starting with an initial ?0, centroids,
  • E-step Split the data into K clusters according
    to distances to the centroids (Calculate the
    distribution ft(J)).
  • M-step Update the centroids (Calculate ?t1).

23
K Means Example(K2)
Reassign clusters
Converged!
24
Discussion
  • Is EM a Primal-Dual algorithm?

25
Reference
  • A.P.Dempster et al Maximum-likelihood from
    incomplete data Journal of the Royal Statistical
    Society. Series B (Methodological), Vol. 39, No.
    1. (1977), pp. 1-38.
  • F. Dellaert, The Expectation Maximization
    Algorithm, Tech. Rep. GIT-GVU-02-20, 2002.
  • T. Minka, Expectation-Maximization as lower
    bound maximization, 1998
  • Y. Chang, M. Kölsch. Presentation Expectation
    Maximization, UCSB, 2002.
  • K. Andersson, Presentation Model Optimization
    using the EM algorithm, COSC 7373, 2001

26
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com