Expectation Maximization Algorithm - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Expectation Maximization Algorithm

Description:

There is a jar with balls in three different colors. ... In an experiment, one guy picked N balls (n1 red, n2 green and n3 blue) ... – PowerPoint PPT presentation

Number of Views:464
Avg rating:3.0/5.0
Slides: 20
Provided by: NanZ1
Category:

less

Transcript and Presenter's Notes

Title: Expectation Maximization Algorithm


1
Expectation Maximization Algorithm
  • Davis Zhou
  • College of Information Science Technology
    Drexel University

2
Outline
  • EM Example 1
  • EM Example 2
  • EM Modeling
  • Comments on EM

3
Example1-1
  • Description
  • There is a jar with balls in three different
    colors. The probability of drawing a red ball is
    p1, a green ball p2, and a blue ball p3. After we
    picked a ball we return it to the jar. In an
    experiment, one guy picked N balls (n1 red, n2
    green and n3 blue). Assume p11/4, p21/4p/4,
    p31/2-p/4. Please estimate the parameter p.

4
Example1-2
  • Maximum Likelihood Estimate
  • By maximizing the likelihood, p (2n2-n3)/(n2n3)
  • Variation
  • Assume that the man who is doing the experiment
    is actually colorblind and cant discern the red
    from green balls. He draws N balls, but only sees
    m1n1n2 red/green balls, and m2n3 blue balls.
    Can the man still estimate the parameter p, the
    number of red balls and green balls?

5
Example1-3
  • MLE Again
  • By maximizing the likelihood, p 2(m1-m2)/(m1m2)
  • En1m1(p1/(p1p2))m11/4(m1m2)
  • En2m1(p2/(p1p2))m11/4(3m1-m2)
  • E.g. m163, m237, p0.52, En1m125,
    En2m138
  • EM
  • Complete Data (n1,n2,n3,m1,m2) n1 and n2 are
    hidden variables
  • Observed Data (m1,m2)

6
Example1-4
  • EM
  • E-Step
  • M-Step

7
Example1-5
  • Iteration
  • Initializations m163,m237, p00

8
Example1-6
  • Conclusions
  • EM is the extension of MLE with hidden variables
  • We use both MLE and EM and obtain the same
    estimates.
  • In this example, it is easy to get the analytic
    likelihood function of the observed data. In most
    cases, the form of likelihood function of the
    observed data is very complex or difficult to get
    while the analytic likelihood function of
    complete data is usually simple. Therefore we
    turn instead to EM algorithm.

9
Example2-1
  • Description
  • We observe a series of coin-tosses which we
    assume have been generated in the following way
    a person has two coins in her pocket. Coin 1 has
    probability of headsp1, coin 2 has probability
    p2. At each point she chooses coin 1 with
    probability ?,coin 2 with probability 1- ?, and
    tosses it 3 times. Thus the observed data is a
    sequence of triples of coin tosses, e.g.
    YltHHHgt, ltTTTgt, ltHHHgt, ltTTTgt. The complete data
    X, if we could observe it, would additionally
    show the coin chosen at each step, e.g. X
    ltHHH,1gt, ltTTT,2gt, ltHHH,1gt, ltTTT,2gt. The
    parameters, all of which are to be estimated, are
    T?,p1,p2

10
Example2-2
  • Expectation Step

11
Example2-3
  • Expectation Step

12
Example2-4
  • Maximization Step
  • Maximizing this function by setting the
    differentials w.r.t., ?,p1 and p2 respectively to
    0 gives the following formulae

13
Intuitive EM Model
  • Complete Data vs. Observed Data
  • In general, log (XT) (p.d.f. of complete data)
    will have an easily-defined, analytically
    solvable maximum, but maximization of L(T)
    (likelihood function of observed data) has no
    analytic solution.
  • Expectation vs. Likelihood
  • If we had the complete data, we would simply
    estimate Tto maximize log (XT). But with some
    of the complete data missing we instead maximize
    the expectation of log(XT) given the observed
    data and the current value of Tt.

14
Formalized EM Model
  • Model
  • Y is the observed data, X is the complete data.
  • If the distribution (XT) is well defined, the
    probability of Y given Tis
  • EM attempts to solve the following problem given
    that a sample from Y is observed, but the
    corresponding X is unobserved, or hidden, find
    the maximum-likelihood estimate TML, which
    maximizes L(T) logg(YT).

15
Solution to EM Model
  • Rationale
  • By maximizing Q(T, T), we reach the purpose of
    maximizing L(T) because Tt? Tt1 infers L(Tt1)
    L(Tt1).
  • Expectation
  • Maximization
  • Iteration
  • Iterate expectation step and maximization step
    until ?L(T) reaches the given threshold.

16
Key Points of EM Modeling
  • (1) Identify Y (observed data) and X (complete
    data containing hidden variables)
  • (2) Define the probability density function,
    (XT).
  • (3) Define the conditional probability of X
    given Y and T, i.e. p(XY, T).
  • (4) Get the expectation of log (XT), i.e.
    Q(T,T)Elog(XT) p(XY,T)
  • (5) Get the estimates of T, by maximizing Q.
  • (6) Repeat (4) and (5) until ?L(T) reaches the
    given threshold.

17
Advantages of EM Algorithm
  • It leads to a steady increase in the likelihood
    of the observed data.
  • It is Numerically stable and avoids overshooting
    and undershooting a maximum in the likelihood
    along the current direction.
  • It handles parameter constraints in the solution
    to the M-Step, e.g., it maximizes over a set that
    satisfies the constraints.

18
Disadvantages of EM Algorithm
  • Very slow convergence rate in the neighborhood of
    the optimal point the rate of convergence
    depends on the amount of missing data in the
    problem.
  • Convergence of the parameter estimates is
    guaranteed under mild conditions, but the
    estimate will likely converge to local maximum a
    global maximum can usually be reached by starting
    with a good initial estimate.

19
References
  • Dempster A, Laird N, Rubin D, Maximum Likelihood
    from Incomplete Data via the EM Algorithm, J
    Royal Statist Soc Series B391-38, 1977
  • Sean Fain, Maximum Likelihood Estimation with the
    EM Algorithm, 2004 Lecture Notes.
  • Frank Dellaert, The Expectation Maximization
    Algorithm, 2002 Lecture Notes.
  • Jeff A. Bilmes, A Gentle Tutorial of the EM
    Algorithm and its Application to Parameter
    Estimation for Gaussian Mixture and Hidden Markov
    Models, 1998 Lecture Notes.
  • Michael Collins, The EM Algorithm, 1997 Lecture
    Notes.
  • Jiangsheng Yu, Expectation Algorithm--An Approach
    to Parameter Estimation Presentation.
  • Chengxiang Zhai, A Note on the Expectation-Maximiz
    ation (EM) Algorithm, 2003 Lecture Notes.
Write a Comment
User Comments (0)
About PowerShow.com