Hidden Markov Model - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Hidden Markov Model

Description:

We 'solve' the first question by modeling strings with and without CpG islands ... we define a Markov Chain over 8 states, all interconnected (hence it is ergodic) ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 43
Provided by: shlomo9
Category:
Tags: ergodic | hidden | markov | model

less

Transcript and Presenter's Notes

Title: Hidden Markov Model


1
Hidden Markov Model
2
Example CpG Island
We consider two questions (and some
variants) Question 1 Given a short stretch of
genomic data, does it come from a CpG island
? Question 2 Given a long piece of genomic
data, does it contain CpG islands in it, where,
what length ? We solve the first question by
modeling strings with and without CpG islands as
Markov Chains over the same states A,C,G,T but
different transition probabilities
3
CpG Island Question 2
Now we solve the 2nd question Question 2
Given a long piece of genomic data, does it
contain CpG islands in it, and where? For this,
we need to decide which parts of a given long
sequence of letters is more likely to come from
the model, and which parts are more likely to
come from the model. This is done by using
the Hidden Markov Model, to be defined.
4
Question 2 Finding CpG Islands
Given a long genomic str with possible CpG
Islands, we define a Markov Chain over 8 states,
all interconnected (hence it is ergodic)
The problem is that we dont know the sequence of
states which are traversed, but just the sequence
of letters.
C
T
G
A
C-
T-
G-
A-
Therefore we use here Hidden Markov Model
5
Hidden Markov Models - HMM
Hidden variables
Observed data
6
Hidden Markov Model
A Markov chain (s1,,sL)
and for each state s and a symbol x we have
p(XixSis)
Application in communication message sent is
(s1,,sm) but we receive (x1,,xm) . Compute
what is the most likely message sent ?
Application in speech recognition word said is
(s1,,sm) but we recorded (x1,,xm) . Compute
what is the most likely word said ?
7
Hidden Markov Model
Notations Markov Chain transition
probabilities p(Si1 tSi s)
ast Emission probabilities p(Xi b Si s)
es(b)
8
Example The Dishonest Casino
  • A casino has two dice
  • Fair die
  • P(1) P(2) P(3) P(5) P(6) 1/6
  • Loaded die
  • P(1) P(2) P(3) P(5) 1/10
  • P(6) 1/2
  • Casino player switches back--forth between fair
    and loaded die once every 20 turns
  • Game
  • You bet 1
  • You roll (always with a fair die)
  • Casino player rolls (maybe with fair die, maybe
    with loaded die)
  • Highest number wins 2

9
Question 1 Decoding
  • GIVEN
  • A sequence of rolls by the casino player
  • 12455264621461461361366616646616366163661636165156
    15115146123562344
  • QUESTION
  • What portion of the sequence was generated with
    the fair die, and what portion with the loaded
    die?
  • This is the DECODING question in HMMs

10
Question 2 Evaluation
  • GIVEN
  • A sequence of rolls by the casino player
  • 12455264621461461361366616646616366163661636165156
    15115146123562344
  • QUESTION
  • How likely is this sequence, given our model of
    how the casino works?
  • This is the EVALUATION problem in HMMs

11
Question 3 Learning
  • GIVEN
  • A sequence of rolls by the casino player
  • 12455264621461461361366616646616366163661636165156
    15115146123562344
  • QUESTION
  • How loaded is the loaded die? How fair is the
    fair die? How often does the casino player change
    from fair to loaded, and back?
  • This is the LEARNING question in HMMs

12
The dishonest casino model
0.05
0.95
0.95
FAIR
LOADED
P(1F) 1/6 P(2F) 1/6 P(3F) 1/6 P(4F)
1/6 P(5F) 1/6 P(6F) 1/6
P(1L) 1/10 P(2L) 1/10 P(3L) 1/10 P(4L)
1/10 P(5L) 1/10 P(6L) 1/2
0.05
13
A parse of a sequence
  • Given a sequence x x1xN,
  • A parse of x is a sequence of states ? ?1, ,
    ?N

1
2
2
K
x1
x2
x3
xK
14
Likelihood of a parse
  • Given a sequence x x1xN
  • and a parse ? ?1, , ?N,
  • To find how likely is the parse
  • (given our HMM)
  • P(x, ?) P(x1, , xN, ?1, , ?N)
  • P(xN, ?N x1xN-1, ?1, , ?N-1) P(x1xN-1,
    ?1, , ?N-1)
  • P(xN, ?N ?N-1) P(x1xN-1, ?1, , ?N-1)
  • P(xN, ?N ?N-1) P(xN-1, ?N-1
    ?N-2)P(x2, ?2 ?1) P(x1, ?1)
  • P(xN ?N) P(?N ?N-1) P(x2 ?2) P(?2
    ?1) P(x1 ?1) P(?1)
  • a0?1 a?1?2a?N-1?N e?1(x1)e?N(xN)

1
2
2
K
x1
x2
x3
xK
15
Example the dishonest casino
  • Let the sequence of rolls be
  • x 1, 2, 1, 5, 6, 2, 1, 6, 2, 4
  • Then, what is the likelihood of
  • ? Fair, Fair, Fair, Fair, Fair, Fair, Fair,
    Fair, Fair, Fair?
  • (say initial probs a0Fair ½, a0Loaded ½)
  • ½ ? P(1 Fair) P(Fair Fair) P(2 Fair) P(Fair
    Fair) P(4 Fair)
  • ½ ? (1/6)10 ? (0.95)9 .00000000521158647211
    0.5 ? 10-9

16
Example the dishonest casino
  • So, the likelihood the die is fair in all this
    run
  • is just 0.521 ? 10-9
  • OK, but what is the likelihood of
  • ? Loaded, Loaded, Loaded, Loaded, Loaded,
    Loaded, Loaded, Loaded, Loaded, Loaded?
  • ½ ? P(1 Loaded) P(Loaded, Loaded) P(4
    Loaded)
  • ½ ? (1/10)8 ? (1/2)2 (0.95)9 .000000000787811762
    15 0.79 ? 10-9
  • Therefore, it somewhat more likely that the die
    is fair all the way, than that it is loaded all
    the way

17
Example the dishonest casino
  • Let the sequence of rolls be
  • x 1, 6, 6, 5, 6, 2, 6, 6, 3, 6
  • Now, what is the likelihood ? F, F, , F?
  • ½ ? (1/6)10 ? (0.95)9 0.5 ? 10-9, same as
    before
  • What is the likelihood
  • ? L, L, , L?
  • ½ ? (1/10)4 ? (1/2)6 (0.95)9 .000000492382351347
    35 0.5 ? 10-7
  • So, it is 100 times more likely the die is loaded

18
C-G Islands Example
G
A
change
C
T
G
A
C
T
19
Hidden Markov Model for CpG Islands
Domain(Si), - (2 values)
The states
In this representation P(xi si) 0 or 1
depending on whether xi is consistent with si .
E.g. xi G is consistent with si(,G) and with
si(-,G) but not with any other state of si.
20
1. Most Probable state path decoding
First Question Given an output sequence x
(x1,,xL), A most probable path s (s1,,sL)
is one which maximizes p(sx).
21
Decoding
  • GIVEN x x1x2xN
  • We want to find ? ?1, , ?N,
  • such that P x, ? is maximized
  • ? argmax? P x, ?
  • We can use dynamic programming!
  • Let Vk(i) max?1,,i-1 Px1xi-1, ?1, , ?i-1,
    xi, ?i k
  • Probability of most likely sequence of states
    ending at state ?i k

22
Decoding main idea
  • Given that for all states k,
  • and for a fixed position i,
  • Vk(i) max?1,,i-1 Px1xi-1, ?1, , ?i-1,
    xi, ?i k
  • What is Vj(i1)?
  • From definition,
  • Vj(i1) max?1,,iP x1xi, ?1, , ?i, xi1,
    ?i1 j
  • max?1,,iP(xi1, ?i1 j
    x1xi,?1,, ?i) Px1xi, ?1,, ?i
  • max?1,,iP(xi1, ?i1 j ?i )
    Px1xi-1, ?1, , ?i-1, xi, ?i
  • maxk P(xi1, ?i1 j ?i k)
    max?1,,i-1Px1xi-1,?1,,?i-1, xi,?ik
  • ej(xi1) maxk akj Vk(i)

23
The Viterbi Algorithm
  • Input x x1xN
  • Initialization
  • V0(0) 1 (0 is the imaginary first position)
  • Vk(0) 0, for all k gt 0
  • Iteration
  • Vj(i) ej(xi) ? maxk akj Vk(i-1)
  • Ptrj(i) argmaxk akj Vk(i-1)
  • Termination
  • P(x, ?) maxk Vk(N)
  • Traceback
  • ?N argmaxk Vk(N)
  • ?i-1 Ptr?i (i)

24
The Viterbi Algorithm
x1 x2 x3 ..xN
State 1
2
Vj(i)
K
  • Similar to aligning a set of states to a
    sequence
  • Time
  • O(K2N)
  • Space
  • O(KN)

25
Viterbi Algorithm a practical detail
  • Underflows are a significant problem
  • P x1,., xi, ?1, , ?i a0?1 a?1?2a?i
    e?1(x1)e?i(xi)
  • These numbers become extremely small underflow
  • Solution Take the logs of all values
  • Vl(i) log ek(xi) maxk Vk(i-1) log akl

26
Example
  • Let x be a sequence with a portion of 1/6 6s,
    followed by a portion of ½ 6s
  • x 12345612345612345 66263646561626364656
  • Then, it is not hard to show that optimal parse
    is
  • FFF...F LLL...L
  • 6 characters 123456 parsed as F, contribute
    .956?(1/6)6 1.6?10-5
  • parsed as L, contribute
    .956?(1/2)1?(1/10)5 0.4?10-5
  • 162636 parsed as F, contribute
    .956?(1/6)6 1.6?10-5
  • parsed as L, contribute
    .956?(1/2)3?(1/10)3 9.0?10-5

27
2. Computing p(x) evaluation
Given an output sequence x (x1,,xL), Compute
the probability that this sequence was generated
The summation taken over all state-paths s
generating x.
28
Evaluation
  • We will develop algorithms that allow us to
    compute
  • P(x) Probability of x given the model
  • P(xixj) Probability of a substring of x given
    the model
  • P(?i k x) Probability that the ith state is
    k, given x

29
The Forward Algorithm
  • We want to calculate
  • P(x) probability of x, given the HMM
  • Sum over all possible ways of generating x
  • P(x) ??? P(x, ?) ??? P(x ?) P(?)
  • To avoid summing over an exponential number of
    paths ?, define
  • fk(i) P(x1xi, ?i k) (the forward
    probability)

30
The Forward Algorithm derivation
  • Define the forward probability
  • fk(i) P(x1xi, ?i k)
  • ??1?i-1 P(x1xi-1, ?1,, ?i-1, ?i k)
    ek(xi)
  • ?j ??1?i-2 P(x1xi-1, ?1,, ?i-2, ?i-1 j)
    ajk ek(xi)
  • ek(xi) ?j fj(i-1) ajk

31
The Forward Algorithm
  • We can compute fk(i) for all k, i, using dynamic
    programming!
  • Initialization
  • f0(0) 1
  • fk(0) 0, for all k gt 0
  • Iteration
  • fk(i) ek(xi) ?j fj(i-1) ajk
  • Termination
  • P(x) ?k fk(N) ak0
  • Where, ak0 is the probability that the
    terminating state is k (usually a0k)

32
Relation between Forward and Viterbi
  • VITERBI
  • Initialization
  • V0(0) 1
  • Vk(0) 0, for all k gt 0
  • Iteration
  • Vj(i) ej(xi) maxk Vk(i-1) akj
  • Termination
  • P(x, ?) maxk Vk(N)
  • FORWARD
  • Initialization
  • f0(0) 1
  • fk(0) 0, for all k gt 0
  • Iteration
  • fj(i) ej(xi) ?k fk(i-1) akj
  • Termination
  • P(x) ?k fk(N) ak0

33
Motivation for the Backward Algorithm
  • We want to compute
  • P(?i k x),
  • the probability distribution on the ith position,
    given x
  • We start by computing
  • P(?i k, x) P(x1xi, ?i k, xi1xN)
  • P(x1xi, ?i k) P(xi1xN x1xi, ?i
    k)
  • P(x1xi, ?i k) P(xi1xN ?i k)
  • Then, P(?i k x) P(?i k, x) / P(x)

Forward, fk(i)
Backward, bk(i)
34
The Backward Algorithm derivation
  • Define the backward probability
  • bk(i) P(xi1xN ?i k)
  • ??i1?N P(xi1,xi2, , xN, ?i1, ,
    ?N ?i k)
  • ?j ??i1?N P(xi1,xi2, , xN, ?i1
    j, ?i2, , ?N ?i k)
  • ?j ej(xi1) akj ??i2?N P(xi2, , xN,
    ?i2, , ?N ?i1 j)
  • ?j ej(xi1) akj bj(i1)

35
The Backward Algorithm
  • We can compute bk(i) for all k, i, using dynamic
    programming
  • Initialization
  • bk(N) ak0, for all k
  • Iteration
  • bk(i) ?j ej(xi1) akj bj(i1)
  • Termination
  • P(x) ?j a0j ej(x1) bj(1)

36
Computational Complexity
  • What is the running time, and space required, for
    Forward, and Backward?
  • Time O(K2N)
  • Space O(KN)
  • Useful implementation technique to avoid
    underflows
  • Viterbi sum of logs
  • Forward/Backward rescaling at each position
    by multiplying by a constant

37
Viterbi, Forward, Backward
  • VITERBI
  • Initialization
  • V0(0) 1
  • Vk(0) 0, for all k gt 0
  • Iteration
  • Vj(i) ej(xi) maxk Vk(i-1) akj
  • Termination
  • P(x, ?) maxk Vk(N)
  • FORWARD
  • Initialization
  • f0(0) 1
  • fk(0) 0, for all k gt 0
  • Iteration
  • fj(i) ej(xi) ?k fk(i-1) akj
  • Termination
  • P(x) ?k fk(N) ak0

BACKWARD Initialization bk(N) ak0, for all
k Iteration bj(i) ?k ej(xi1) akj
bk(i1) Termination P(x) ?k a0k ek(x1)
bk(1)
38
Posterior Decoding
  • We can now calculate
  • fk(i) bk(i)
  • P(?i k x)
  • P(x)
  • Then, we can ask
  • What is the most likely state at position i of
    sequence x
  • Define ? by Posterior Decoding
  • ?i argmaxk P(?i k x)

39
Posterior Decoding
  • For each state,
  • Posterior Decoding gives us a curve of likelihood
    of state for each position
  • That is sometimes more informative than Viterbi
    path ?
  • Posterior Decoding may give an invalid sequence
    of states
  • Why?

40
Posterior Decoding
x1 x2 x3 xN
State 1
P(?ilx)
l
k
  • P(?i k x) ?? P(? x) 1(?i k)
  • ? ??i k P(? x)

41
Posterior Decoding
x1 x2 x3 xN
State 1
P(?ilx)
l
P(?jlx)
k
  • Example How do we compute P(?i l, ?j?i l
    x)?
  • fl(i) bl(j)
  • P(?i l, ?i?I l x)
  • P(x)

42
Applications of the model
  • Given a DNA region x,
  • The Viterbi algorithm predicts locations of CpG
    islands
  • Given a nucleotide xi, (say xi A)
  • The Viterbi parse tells whether xi is in a CpG
    island in the most likely general scenario
  • The Forward/Backward algorithms can calculate
  • P(xi is in CpG island) P(?i A x)
  • Posterior Decoding can assign locally optimal
    predictions of CpG islands
  • ?i argmaxk P(?i k x)

43
A model of CpG Islands (2) Transitions
  • What about transitions between () and (-)
    states?
  • They affect
  • Avg. length of CpG island
  • Avg. separation between two CpG islands

1-p
Length distribution of region X PlX 1
1-p PlX 2 p(1-p) PlX k
pk(1-p) ElX 1/(1-p) Geometric distribution,
with mean 1/(1-p)
X
Y
p
q
1-q
Write a Comment
User Comments (0)
About PowerShow.com