EM algorithm and applications - PowerPoint PPT Presentation

About This Presentation
Title:

EM algorithm and applications

Description:

The coin is tossed twice, but only the 1st outcome, T, is seen. So the data is x = (T, ... However, testing individuals for their genotype is a very expensive. ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 34
Provided by: hat89
Category:

less

Transcript and Presenter's Notes

Title: EM algorithm and applications


1
EM algorithm and applications
2
Relative Entropy
  • Let p,q be two probability distributions on the
    same sample space. The relative entropy between p
    and q is defined by
  • H(pq) ?x p(x)logp(x)/q(x)
  • ?x p(x)log(1/(q(x)) -
  • -? x p(x)log(1/(p(x)).

The inefficiency of assuming distribution q when
the correct distribution is p.
3
EM algorithm approximating MLE from Incomplete
Data
  • Finding MLE parameters nonlinear optimization
    problem

log P(x ?)
E ?log P(x,y?)
?
?
?
General idea of EM Use current point ? to
construct alternative function Q? (which is
nice) if Q?(?)gtQ?(?), than ? has higher
likelihood than ?.
4
The EM algorithm
  • Consider a model where, for observed data x and
    model parameters ?
  • p(x?)?yp(x,y?).
  • (y are hidden data).
  • The EM algorithm receives x and parameters ?, and
    returns new parameters ?, s.t. p(x?) gt
    p(x?).

5
The EM algorithm
  • Finding ? which maximizes
  • p(x?)?yp(x,y?).
  • is equivalent to finding ? which maximizes the
    logarithm
  • log p(x?) log (?yp(x,y ?))
  • Which is what the EM algorithm attempts to do.

6
The EM algorithm
  • In each iteration the EM algorithm does the
    following.
  • (E step) Calculate
  • Q? (?) ?y p(yx,?)log p(x,y?)
  • (M step) Find ? which maximizes Q? (?)
  • (Next iteration sets ? ?? and repeats).

7
Example Baum-Welsh EM for HMM
  • The Baum-Welsh algorithm is the EM algorithm for
    HMM
  • E step for HMM Q? (?) ?S p(sx,?)log p(s,x?),
    where ? are the new parameters akl,ek(b).

(The are the counts of state
transitions and symbol emissions in (s,x)).
8
Baum Welsh EM for HMM
  • M step For HMM Find ? which maximizes Q? (?).
    As we proved, ? is given by the relative
    frequencies of the Akls and the Ek(b)s

9
A simplest example EM for 2 coin tosses
  • Given a coin with two possible outcomes H (head)
    and T (tail), with probabilities qH, qT 1- qH.
  • The coin is tossed twice, but only the 1st
    outcome, T, is seen. So the data is x (T,).
  • We wish to apply the EM algorithm to get
    parameters that increase the likelihood of the
    data.
  • Let the initial parameters be ? (qH, qT) (
    ¼, ¾ ).

10
EM for 2 coin tosses
The hidden data which can produce x are the
sequences y1 (T,H) y2(T,T) Hence the
likelihood of x with parameters (qH, qT), is p(x
?) P(x,y1 ?) P(x,y2 ?) qHqTqT2 For the
initial parameters ? ( ¼, ¾ ), we have p(x ?)
¾ ¼ ¾ ¾ ¾
  • Note that in this case P(x,yi ?) P(yi ?), for
    i 1,2.
  • we can always define y so that (x,y) y
    (otherwise we set y ?(x,y) and replace the y
    s by y s).

11
EM for 2 coin tosses - E step
  • Calculate Q? (?) Q?(qH,qT).
  • Q? (?) p(y1x,?)log p(x,y1?)
  • p(y2x,?)log p(x,y2?)
  • p(y1x,?) p(y1,x?)/p(x?) (¾ ¼)/ (¾)
    ¼
  • p(y2x,?) p(y2,x?)/p(x?) (¾ ¾)/ (¾)
    ¾
  • Thus we have
  • Q? (?) ¼ log p(x,y1?) ¾ log p(x,y2?)

12
EM for 2 coin tosses - E step
  • For a sequence y of coin tosses, let NH(y) be the
    number of Hs in y, and NT(y) be the number of
    Ts in y. Then
  • log p(y?) NH(y) log qH NT(y) log qT
  • In our example
  • y1 (T,H) y2(T,T), hence
  • NH(y1) NT(y1)1, NH(y2) 0, NT(y2)2

13
Example 2 coin tosses - E step
  • Thus
  • ¼ log p(x,y1?) ¼ (NH(y1) log qH NT(y1) log
    qT) ¼ (log qH log qT)
  • ¾ log p(x,y2?) ¾ ( NH(y2) log qH NT(y2) log
    qT) ¾ (2 log qT)
  • Substituting in the equation for Q? (?)
  • Q? (?) ¼ log p(x,y1?) ¾ log p(x,y2?)
  • ( ¼ NH(y1) ¾ NH(y2))log qH ( ¼ NT(y1) ¾
    NT(y2))log qT

NT 7/4
NH ¼
Q? (?) NHlog qH NTlog qT
14
EM for 2 coin tosses - M step
Find ? which maximizes Q? (?) Q? (?) NHlog qH
NTlog qT ¼ log qH 7/4 log qT We saw earlier
that this is maximized when
The optimal parameters (0,1), will never be
reached by the EM algorithm!
15
EM for single random variable (dice)
Now, the probability of each y ((x,y)) is given
by a sequence of dice tosses. The dice has m
outcomes, with probabilities q1,..,qm. Let Nl(y)
(outcome l occurs in y). Then
Let Nl be the expected value of Nl(y), given x
and ? NlE(Nlx,?) ?y p(yx,?) Nl(y),
16
Q? (?) for one dice
17
EM algorithm for n independent observations x1,,
xn
Expectation step It can be shown that, if the xj
are independent, then
18
Example The ABO locus
A locus is a particular place on the chromosome.
Each locus state (called genotype) consists of
two alleles one parental and one maternal.
Some loci (plural of locus) determine
distinguished features. The ABO locus, for
example, determines blood type.
Suppose we randomly sampled N individuals and
found that Na/a have genotype a/a, Na/b have
genotype a/b, etc. Then, the MLE is given by
19
The ABO locus
However, testing individuals for their genotype
is a very expensive. Can we estimate the
proportions of genotype using the common cheap
blood test with outcome being one of the four
blood types (A, B, AB, O) ?
The problem is that among individuals measured to
have blood type A, we dont know how many have
genotype a/a and how many have genotype a/o. So
what can we do ?
20
The ABO locus
The Hardy-Weinberg equilibrium rule states that
in equilibrium the frequencies of the three
alleles qa,qb,qo in the population determine the
frequencies of the genotypes as follows qa/b
2qa qb, qa/o 2qa qo, qb/o 2qb qo, qa/a
qa2, qb/b qb2, qo/o qo2. In fact,
Hardy-Weinberg equilibrium rule follows from
modeling this problem as data x with hidden
parameters y.
21
The ABO locus
The dice outcome are the three possible alleles
a, b and o. The observed data are the blood types
A, B, AB or O. Each blood type is determined by
two successive random sampling of alleles, which
is an ordered genotypes pair this is the
hidden data. For instance blood type A
corresponds to the ordered genotypes pairs (a,a),
(a,o) and (o,a). So we have three parameters of
one dice qa,qb,qo - that we need to estimate.
22
EM setting for the ABO locus problem
The observed data x (x1,..,xn) is a sequence of
letters (blood types) from the alphabet
A,B,AB,O. eg (B,A,B,B,O,A,B,A,O,B, AB) are
observations (x1,x11). The hidden data (ie the
ys) for each letter xj is the set of ordered
pairs of alleles that generates it. For instance,
for A it is the set aa, ao, oa. The
parameters ? qa ,qb, qo are the probabilities
of the alleles. We need is to find the
parameters ? qa ,qb, qo that maximize the
likelihood of the given data. We do this by the
EM algorithm
23
EM for ABO locus problem
  • For each observed blood type xj?A,B,AB,O and
    for each allele z in a,b,o we compute Nz(xj) ,
    the expected number of times that z appear in xj.

Where the sum is taken over the ordered genotype
pairs yj, and Nz(yj) is the number of times
allele z occurs in the pair yj. eg, Na(o,b)0
Nb(o,b) No(o,b) 1.
24
EM for ABO locus problem
The computation for blood type B
P(B?) P((b,b)?) p((b,o)?) p((o,b)?))
qb2 2qbqo. Since Nb((b,b))2, and
Nb((b,o))Nb((o,b)) No((o,b))No((b,o))1, No(B)
and Nb(B) , the expected number of occurrences
of o and b in B, are given by
Observe that Nb(B) No(B) 2
25
EM for ABO loci
Similarly, P(A?) qa2 2qaqo.
P(AB?) p((b,a)?) p((a,b)?)) 2qaqb
Na(AB) Nb(AB) 1
P(O?) p((o,o)?) qo2
No(O) 2
Nb(O) Na(O) No(AB) Nb(A) Na(B) 0
26
E step compute Na, Nb and No
Let (A)3, (B)5, (AB)1, (O)2 be the number
of observations of A, B, AB, and O respectively.
M step set ?( qa, qb , qo)
27
Example the Motif Finding Problem
  • Given a set of DNA sequences
  • cctgatagacgctatctggctatccacgtacgtaggtcctctgtgcgaat
    ctatgcgtttccaaccat
  • agtactggtgtacatttgatacgtacgtacaccggcaacctgaaacaaac
    gctcagaaccagaagtgc
  • aaacgtacgtgcaccctctttcttcgtggctctggccaacgagggctgat
    gtataagacgaaaatttt
  • agcctccgatgtaagtcatagctgtaactattacctgccacccctattac
    atcttacgtacgtataca
  • ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgct
    cgatcgttaacgtacgtc
  • Find the motif in each of the individual sequences

28
Motif Finding Problem a reformulation
  • Collect all substrings with the same length k
    from the input sequences
  • With N sequences with same length L, nN?(L-k1)
    substrings can be derived
  • Find a significant number of substring that can
    be described by a profile model.

29
Fitting a mixture model by EM
  • A finite mixture model
  • data X (X1,,Xn) arises from two or more groups
    with g models ? (?1, , ??g).
  • Indicator vectors Z (Z1,,Zn), where Zi
    (Zi1,,Zig), and Zij 1 if Xi is from group j,
    and 0 otherwise.
  • P(Zij 1?) ?j . For any given i, all Zij are
    0 except one j
  • g2 class 1 (the motif) and class 2 (the
    background) are given by position specific and a
    general multinomial distribution

30
Complete data likelihood
  • Under the assumption that the pairs (Zi,Xi)
    are mutually independent, their joint density may
    be written
  • P(Z, X ?) ?ij ?j P(Xi?j)
    Zij
  • The log likelihood of the model is thus
  • log L(?, ? Z, X) ?? Zij log ?j
    P(Xi?j) .
  • The EM algorithm iteratively computes the
    expectation of the likelihood given the observed
    data X, and initial estimates ? and ? of ? and
    ? (the E-step), and then maximizes the result in
    the free variables ? and ? leading to new
    estimates ? and ? (the M-step).

31
Mixture models the E-step
  • Since the log likelihood is the sum of over i
    and j of terms multiplying Zij, and these are
    independent across i, we need only consider the
    expectation of one such, given Xi. Using initial
    parameter values ? and ?, and the fact that
    the Zij are binary, we get
  • E(Zij X,?,?)?jP(Xi?j)/ ?k
    ?kP(Xi?k) Zij

32
Mixture models the M-step
  • Now we want to maximize the result of an
    E-step
  • ?? Zij ?j ?? Zij log P(Xi
    ?j).
  • The maximization over ? is independent of the
    rest and is readily achieved with
  • ?j ?iZij / n.

33
Mixture models the M-step
  • Note, P(Xi?1) ?j?k fjk I(k,Xij) , and P(Xi
    ?2) ?j?k f0k I(k,Xij)
  • where Xij is the letter in the jth position of
    sample i, and I(k,a) 1 if aak, and 0
    otherwise.
  • c0k ?? Zi2I(k,Xij), and cjk ??
    Zi1I(k,Xij).
  • Here c0k is the expected number of times
    letter ak appears in the background, and cjk the
    expected number of times ak appears in
    occurrences of the motif in the data.
  • fjk cjk / ?k1L cjk , j
    0,1,,w k 1,,L.
  • In practice, care must be taken to avoid zero
    frequencies, so one adds pseudo-counts small
    constants ?j ,??j ?, giving
  • fjk (cjk ?j )/ (?k1L cjk ?) , j
    0,1,,w k 1,,L.
Write a Comment
User Comments (0)
About PowerShow.com