Ch-9: Markov Models - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Ch-9: Markov Models

Description:

Ch-9: Markov Models Prepared by Qaiser Abbas (07-0906) * – PowerPoint PPT presentation

Number of Views:116

Avg rating:3.0/5.0

Slides: 29

Provided by: cz57

Category:

more less

Transcript and Presenter's Notes

Title: Ch-9: Markov Models

1

Ch-9 Markov Models
Prepared by Qaiser Abbas (07-0906)

2
Outline

Markov Models
Hidden MarKov Models (HMM)
Three problems in HMM and their solutions

3
Credits and References

Materials used in this representation are taken
from following textbooks or web resources
1."Foundations of Statistical Natural Language
Processing" by Manning Schütze. Chapter 9,
Markov Models
2.SPEECH and LANGUAGE PROCESSING An
Introduction to Natural Language Processing,
Computational Linguistics, and Speech
Recognition, by D. Jurafsky and J.H. Martin,
updated chapters are available on authors
website Chapter 9 Automatic Speech
Recognition
3.Spoken Language Processing - A Guide to
Theory, Algorithm, and System Development, by X.
Huang, A. Acero, and H.W. Hon. Chapter 8Hidden
Markov Models Chapter 12, Basic Search
Algorithms
4.Dr. Andrew W. Moore, Carnegie Melon University,
http//www.cs.cmu.edu/awm/tutorials
5.Larry Rabiners tutorial on HMMs

4
A Markov System
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1,
s2
s1
s3
N 3 t0
5
A Markov System
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1, On the tth
timestep the system is in exactly one of the
available states. Call it qt Note qt ?s1, s2 ..
sN
s2
s1
s3
N 3 t0 qtq0s3
6
A Markov System
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1, On the tth
timestep the system is in exactly one of the
available states. Call it qt Note qt ?s1, s2 ..
sN Between each timestep, the next state is
chosen by random.
s2
s1
s3
N 3 t1 qtq1s2
7
A Markov System
P(qt1s1qts2) 1/2 P(qt1s2qts2)
1/2 P(qt1s3qts2) 0
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1, On the tth
timestep the system is in exactly one of the
available states. Call it qt Note qt ?s1, s2 ..
sN The current state determines the probability
distribution for the next state.
s2
P(qt1s1qts1) 0 P(qt1s2qts1)
0 P(qt1s3qts1) 1
1/2
2/3
1/2
s1
s3
1/3
N 3 t1 qtq1s2
1
P(qt1s1qts3) 1/3 P(qt1s2qts3)
2/3 P(qt1s3qts3) 0
8
Markov Property
P(qt1s1qts2) 1/2 P(qt1s2qts2)
1/2 P(qt1s3qts2) 0
qt1 is conditionally independent of qt-1,
qt-2, q1, q0 given qt. In other words P(qt1
sj qt si ) P(qt1 sj qt si ,any
earlier history) The sequence of q is said to be
a Markov chain ,or to have the Markov property if
the next state depends only upon the current
state and not on any past states
s2
P(qt1s1qts1) 0 P(qt1s2qts1)
0 P(qt1s3qts1) 1
1/2
2/3
1/2
s1
s3
1/3
N 3 t1 qtq1s2
1
P(qt1s1qts3) 1/3 P(qt1s2qts3)
2/3 P(qt1s3qts3) 0
9
Transition Matrix
Question What is the probability of states
sequence of
10
Example A Simple Markov Model For Weather
Prediction

Any given day, the weather can be described as
being in one of three states
State 1 snowy
State 2 cloudy
State 3 sunny

transition matrix
11
Question

Given that the weather on day 1(t1) is sunny
(state 3), What is the probability that the
weather for eight consecutive days is
sun-sun-sun-rain-rain-sun-cloudy-sun?
Solution
O sun sun sun rain rain sun cloudy sun
3 3 3 1 1 3 2
3

12
From Markov To Hidden Markov

The previous model assumes that each state can be
uniquely associated with an observable event
Once an observation is made, the state of the
system is then trivially retrieved
This model, however, is too restrictive to be of
practical use for most realistic problems
To make the model more flexible, we will assume
that the outcomes or observations of the model
are a probabilistic function of each state
Each state can produce a number of outputs
according to a probability distribution, and each
distinct output can potentially be generated at
any state
These are known a Hidden Markov Models (HMM),
because the state sequence is not directly
observable, it can only be approximated from the
sequence of observations produced by the system

13
Example A Crazy Soft Drink Machine

Suppose you have a crazy soft drink machine it
can be in two states, cola preferring (CP) and
iced tea preferring (IP), but it switches between
them randomly after each purchase, as shown below

Three possible outputs( observations) cola, iced
Tea, lemonade
14
Question

What is the probability of seeing the output
sequence lem, ice_t if the machine always
starts off in the cola preferring state?
Solution
We need to consider all paths that might be
taken through the HMM, and then to sum over them.
We know that the machine starts in state CP.
There are then four possibilities to produce the
observations
CP-gtCP-gtCP
CP-gtCP-gt IP
CP-gtIP-gtCP
CP-gtIP-gtIP
So the total probability is

15
A Crazy Soft Drink Machine (Continued)
16
General Form of an HMM

HMM is specified by a five-tuple
1)
Set of hidden states
N the number of states the state
at time t
2)
Set of observation symbols
M the number of observation symbols
3)
The initial state distribution
4)
State transition probability distribution
5)
Observation symbol probability distribution in
state

17
General Form of an HMM (Continued)
Two assumptions 1.Markov assumption
represents the state sequence 2.Output
independence assumption
represents the output sequence
18
Three Basic Problems in HMM
How to evaluate an HMM? Forward Algorithm

1.The Evaluation Problem Given a model and a
sequence of observations
, what is the probability
i.e., the probability of the model that
generates the observations?
2.The Decoding Problem Given a model and
a sequence of observation
, what is the most likely state sequence
in the model that
produces the observations?
3.The Learning Problem Given a model and
a set of observations, how can we adjust the
model parameter to maximize the joint
probability
?

How to Decode an HMM? Viterbi Algorithm
How to Train an HMM? Baum-Welch Algorithm
19
How to Evaluate an HMM-A Straightforward Method

To calculate the probability (likelihood)
of the observation sequence
, given the HMM , the most intuitive
way is to sum up the probabilities of all
possible state sequences

Applying Markov assumption
Applying output independent assumption
20
How to Evaluate an HMM-A Straightforward Method
(complexity)
For any given state sequence, we start from
initial state with probability or
. We take a transition from to
with probability and generate the
observation with probability
until we reach the last transition.
21
How to Evaluate an HMM-The Forward Algorithm

Define forward probability

is the probability that the HMM is in
state having generated partial observation
The computation is done in a time- synchronous
fashion from left to right
22
How to Evaluate an HMM-The Forward Algorithm
It needs exactly N(N1)(T-1)N multiplications
and N(N-1)(T-1) additions, so the complexity for
this algorithm is O(N2T). For N5, T100, we
need about 3000 computations for the forward
algorithm, versus 1072 computations for the
straightforward method.
23
How to Decode an HMM-The Viterbi Algorithm

Instead of summing up probabilities from
different paths coming to the same destination
state, the Viterbi algorithm picks and remembers
the best path.
Define the best-path probability

is the probability of the most likely
state sequence at time t, which has generated the
observation (until time t) and ends in
state i.
24
How to Decode an HMM-The Viterbi Algorithm
The computation is done in a time-synchronous
fashion from left to right. The complexity is
also O(N2T).
25
HMM Training UsingBaum-Welch Algorithm

A Hidden Markov Model is a probabilistic model of
the joint probability of a collection of random
variables O1,OT, Q1,QT. The Ot variables are
discrete observations and the Qt variables are
hidden and discrete states. Under HMM, two
conditional independence assumptions are
1. the tth hidden variable, given the (t-1)st
hidden variable, is independent of previous
variables, or P(Qt Qt-1, Ot-1, , Q1, O1)
P(Qt Qt-1).
2. the tth observation depends only on the tth
state. P(Ot Qt,Ot,, Q1, O1) P(Ot Qt).
EM algorithm for finding the MLE of the
parameters of a HMM given a set of observed
feature vectors. This algorithm is also known as
the Baum-Welch algorithm.
Qt is a discrete random variable with N possible
values 1.N. We further assume that the
underlying hidden Markov chain defined by P(Qt
Qt-1 is time-homogeneous (i.e., is
independent of the time t). Therefore, we can
represent P(Qt Qt-1 as a time-independent
stochastic transition matrix Aaijp(QtjQt-1i
.
The special case of time t1 is described by the
initial state distribution piP(Q1i). We say
that we are in state j at time t if Qt j. A
particular sequence of states is described by q
(q1. . . qT ) where qt? 1..N is the state at
time t.
The observation is one of L possible observation
symbols, Ot? o1,.oL.The probability of a
particular observation vector at a particular
time t for state j is described by bj(ot) p(Ot
otQt j). (Bbij is an L by N matrix). A
particular observation sequence O is described as
O (O1 o1, , , OT oT ).

Therefore, we can describe a HMM by? (A,B, p).
Given an observation O, the Baum-Welch algorithm
finds that is, the
HMM ?, that maximizes the probability of the
observation O.
The Baum-Welch algorithm
Initialization set with random initial
conditions. The algorithm updates the parameters
of ? iteratively until convergence, following the
procedure below.
The forward procedure We define ai(t) p(O1
o1, , ,Ot ot, Qt i ?), which is the
probability of seeing the partial sequence o1, ,
, ot and ending up in state i at time t. We can
efficiently calculate ai(t) recursively as
The backward procedure This is the probability
of the ending partial sequence ot1, , , oT given
that we started at state i, at time t. We can
efficiently calculate ßi(t) as
using a and ß, we can calculate the following
variables