Expectation Maximization Algorithm - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Expectation Maximization Algorithm

Description:

There is a jar with balls in three different colors. ... In an experiment, one guy picked N balls (n1 red, n2 green and n3 blue) ... – PowerPoint PPT presentation

Number of Views:464

Avg rating:3.0/5.0

Slides: 20

Provided by: NanZ1

Category:

more less

Transcript and Presenter's Notes

Title: Expectation Maximization Algorithm

1
Expectation Maximization Algorithm

Davis Zhou
College of Information Science Technology
Drexel University

2
Outline

EM Example 1
EM Example 2
EM Modeling
Comments on EM

3
Example1-1

Description
There is a jar with balls in three different
colors. The probability of drawing a red ball is
p1, a green ball p2, and a blue ball p3. After we
picked a ball we return it to the jar. In an
experiment, one guy picked N balls (n1 red, n2
green and n3 blue). Assume p11/4, p21/4p/4,
p31/2-p/4. Please estimate the parameter p.

4
Example1-2

Maximum Likelihood Estimate
By maximizing the likelihood, p (2n2-n3)/(n2n3)
Variation
Assume that the man who is doing the experiment
is actually colorblind and cant discern the red
from green balls. He draws N balls, but only sees
m1n1n2 red/green balls, and m2n3 blue balls.
Can the man still estimate the parameter p, the
number of red balls and green balls?

5
Example1-3

MLE Again
By maximizing the likelihood, p 2(m1-m2)/(m1m2)
En1m1(p1/(p1p2))m11/4(m1m2)
En2m1(p2/(p1p2))m11/4(3m1-m2)
E.g. m163, m237, p0.52, En1m125,
En2m138
EM
Complete Data (n1,n2,n3,m1,m2) n1 and n2 are
hidden variables
Observed Data (m1,m2)

6
Example1-4

EM
E-Step
M-Step

7
Example1-5

Iteration
Initializations m163,m237, p00

8
Example1-6

Conclusions
EM is the extension of MLE with hidden variables
We use both MLE and EM and obtain the same
estimates.
In this example, it is easy to get the analytic
likelihood function of the observed data. In most
cases, the form of likelihood function of the
observed data is very complex or difficult to get
while the analytic likelihood function of
complete data is usually simple. Therefore we
turn instead to EM algorithm.

9
Example2-1

Description
We observe a series of coin-tosses which we
assume have been generated in the following way
a person has two coins in her pocket. Coin 1 has
probability of headsp1, coin 2 has probability
p2. At each point she chooses coin 1 with
probability ?,coin 2 with probability 1- ?, and
tosses it 3 times. Thus the observed data is a
sequence of triples of coin tosses, e.g.
YltHHHgt, ltTTTgt, ltHHHgt, ltTTTgt. The complete data
X, if we could observe it, would additionally
show the coin chosen at each step, e.g. X
ltHHH,1gt, ltTTT,2gt, ltHHH,1gt, ltTTT,2gt. The
parameters, all of which are to be estimated, are
T?,p1,p2

10
Example2-2

Expectation Step

11
Example2-3

Expectation Step

12
Example2-4

Maximization Step
Maximizing this function by setting the
differentials w.r.t., ?,p1 and p2 respectively to
0 gives the following formulae

13
Intuitive EM Model

Complete Data vs. Observed Data
In general, log (XT) (p.d.f. of complete data)
will have an easily-defined, analytically
solvable maximum, but maximization of L(T)
(likelihood function of observed data) has no
analytic solution.
Expectation vs. Likelihood
If we had the complete data, we would simply
estimate Tto maximize log (XT). But with some
of the complete data missing we instead maximize
the expectation of log(XT) given the observed
data and the current value of Tt.

14
Formalized EM Model

Model
Y is the observed data, X is the complete data.
If the distribution (XT) is well defined, the
probability of Y given Tis
EM attempts to solve the following problem given
that a sample from Y is observed, but the
corresponding X is unobserved, or hidden, find
the maximum-likelihood estimate TML, which
maximizes L(T) logg(YT).

15
Solution to EM Model

Rationale
By maximizing Q(T, T), we reach the purpose of
maximizing L(T) because Tt? Tt1 infers L(Tt1)
L(Tt1).
Expectation
Maximization
Iteration
Iterate expectation step and maximization step
until ?L(T) reaches the given threshold.

16
Key Points of EM Modeling

(1) Identify Y (observed data) and X (complete
data containing hidden variables)
(2) Define the probability density function,
(XT).
(3) Define the conditional probability of X
given Y and T, i.e. p(XY, T).
(4) Get the expectation of log (XT), i.e.
Q(T,T)Elog(XT) p(XY,T)
(5) Get the estimates of T, by maximizing Q.
(6) Repeat (4) and (5) until ?L(T) reaches the
given threshold.

17
Advantages of EM Algorithm

It leads to a steady increase in the likelihood
of the observed data.
It is Numerically stable and avoids overshooting
and undershooting a maximum in the likelihood
along the current direction.
It handles parameter constraints in the solution
to the M-Step, e.g., it maximizes over a set that
satisfies the constraints.

18
Disadvantages of EM Algorithm

Very slow convergence rate in the neighborhood of
the optimal point the rate of convergence
depends on the amount of missing data in the
problem.
Convergence of the parameter estimates is
guaranteed under mild conditions, but the
estimate will likely converge to local maximum a
global maximum can usually be reached by starting
with a good initial estimate.

19
References

Dempster A, Laird N, Rubin D, Maximum Likelihood
from Incomplete Data via the EM Algorithm, J
Royal Statist Soc Series B391-38, 1977
Sean Fain, Maximum Likelihood Estimation with the
EM Algorithm, 2004 Lecture Notes.
Frank Dellaert, The Expectation Maximization
Algorithm, 2002 Lecture Notes.
Jeff A. Bilmes, A Gentle Tutorial of the EM
Algorithm and its Application to Parameter
Estimation for Gaussian Mixture and Hidden Markov
Models, 1998 Lecture Notes.
Michael Collins, The EM Algorithm, 1997 Lecture
Notes.
Jiangsheng Yu, Expectation Algorithm--An Approach
to Parameter Estimation Presentation.
Chengxiang Zhai, A Note on the Expectation-Maximiz
ation (EM) Algorithm, 2003 Lecture Notes.