Machine Learning presentation

About This Presentation

Transcript and Presenter's Notes

Title: Machine Learning

1
Machine Learning

Hidden Markov Models

2
The Markov Property

A stochastic process has the Markov property if
the conditional probability of future states of
the process, depends only upon the present state.
i.e. what Im likely to do next
depends only on where I am
now, NOT on how I got here.
P(qt qt-1,,q1) P(qt qt-1)
Which processes have the Markov property?

3
Markov model for Dow Jones
4
The Dishonest Casino

A casino has two dice
Fair die
P(1) P(2) P(5) P(6) 1/6
Loaded die
P(1) P(2) P(5) 1/10 P(6) ½
I think the casino switches back and forth
between fair and loaded die once every 20 turns,
on average

5
My dishonest casino model
This is a hidden Markov model (HMM)
0.05
0.95
0.95
FAIR
LOADED
P(1F) 1/6 P(2F) 1/6 P(3F) 1/6 P(4F)
1/6 P(5F) 1/6 P(6F) 1/6
P(1L) 1/10 P(2L) 1/10 P(3L) 1/10 P(4L)
1/10 P(5L) 1/10 P(6L) 1/2
0.05
6
Elements of a Hidden Markov Model

A finite set of states Q q1, ..., qK
A set of transition probabilities between states,
A
each aij, in A is the prob. of going from state
i to state j
The probability of starting in each state P
p1, , pK each pK in P is the probability of
starting in state k
A set of emission probabilities, B
where each bi(oj) in B is the probability of
observing output oj when in state i

7
My dishonest casino model
This is a HIDDEN Markov model because the states
are not directly observable. If the fair die
were red and the unfair die were blue, then the
Markov model would NOT be hidden.
0.05
0.95
0.95
FAIR
LOADED
0.05
8
HMMs are good for

Speech Recognition
Gene Sequence Matching
Text Processing
Part of speech tagging
Information extraction
Handwriting recognition

9
The Three Basic Problems for HMMs

Given observation sequence O(o1o2oT), of
events from the alphabet ?, and HMM model ?
(A,B,?)
Problem 1 (Evaluation)
What is P(O ?), the probability of the
observation sequence, given the model
Problem 2 (Decoding)
What sequence of states Q(q1q2qT) best explains
the observations
Problem 3 (Learning)
How do we adjust the model parameters ? (A,B,?)
to maximize P(O ? )?

10
The Evaluation Problem

Given observation sequence O and HMM ?, compute
P(O ?)
Helps us pick which model is the best one

O 1,6,6,2,6,3,6,6
11
Computing P(O?)

Naïve Try every path through the model
Sum the probabilities of all possible paths
This can be intractable. O(NT)
What we do instead
The Forward Algorithm. O(N2T)

12
The Forward Algorithm
13
The inductive step,

Computation of ?t(j) by summing all previous
values ?t-1(i) for all i

A hidden state at time t-1
transition probability
?t-1(i)
?t(j)
14
Forward Algorithm Example
Model
P(1F) 1/6 P(2F) 1/6 P(3F) 1/6 P(4F)
1/6 P(5F) 1/6 P(6F) 1/6
P(1L) 1/10 P(2L) 1/10 P(3L) 1/10 P(4L)
1/10 P(5L) 1/10 P(6L) 1/2
Start prob
P (fair) .7 P (loaded) .3
Observation sequence 1,6,6,2
?2(i)
?1(i)
?3(i)
?4(i)
?1(1)0.051/6 ?1(2)0.051/6
?2(1)0.051/6 ?2(2)0.051/6
?3(1)0.051/6 ?3(2)0.051/6
0.71/6
State 1 (fair)
?3(1)0.951/10 ?3(2)0.951/10
?2(1)0.951/2 ?2(2)0.951/2
?1(1)0.951/2 ?1(2)0.951/2
State 2 (loaded)
0.31/10
15
Markov model for Dow Jones
16
Forward trellis for Dow Jones
17
The Decoding Problem

What sequence of states Q(q1q2qT) best explains
the observation sequence O(o1o2oT)?
Helps us find the path through a model.

ART
N
V
ADV
The dog sat quietly
18
The Decoding Problem

What sequence of states Q(q1q2qT) best explains
the observation sequence O(o1o2oT)?
Viterbi Decoding
slight modification of the forward algorithm
the major difference is the maximization over
previous states
Note Most likely state sequence is not the same
as the sequence of most likely states

19
The Viterbi Algorithm
20
The Forward inductive step

Computation of at(j)

ot-1
ot
at-1(j)
21
The Viterbi inductive step

Computation of vt(j)

Keep track of who the predecessor was at each
step.
ot-1
ot
vt-1(i)
22
Viterbi for Dow Jones
23
The Learning Problem

Given O, how do we adjust the model parameters ?
(A,B,?) to maximize P(O ? )?
In other words How do we make a hidden Markov
Model that best models the what we observe?

24
Baum-Welch Local Maximization

1st step You determine
The number of hidden states, N
The emission (observation alphabet)
2nd step randomly assign values to
A - the transition probabilities
B - the observation (emission) probabilities
- the starting state probabilities
3rd step Let the machine re-estimate
A, B, p

25
Estimation Formulae
26
Learning transitions
27
Math
28
Estimation of starting probs.
This is number of transitions from i at time t
29
Estimation Formulae
30
Estimation Formulae
k
31
What are we maximizing again?
32
The game is

EITHER the current model is at a local maximum
and
reestimate current model
OR our reestimate will be slightly better and
reestimate ! current model
SO we feed in the reestimate as the current
model, over and over until we cant improve any
more.

33
Caveats

This is a kind of hill-climbing technique
Often has serious problems with local maxima
You dont know when youre done

34
Sohow else could we do this?

Standard gradient descent techniques?
Hill climb?
Beam search?
Genetic Algorithm?

35
Back to the fundamental question

Which processes have the Markov property?
What if a hidden state variable is included?(an
in an HMM)

Write a Comment

User Comments (0)

About PowerShow.com

Machine Learning PowerPoint PPT Presentation