Ch-9: Markov Models - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Ch-9: Markov Models

Description:

Ch-9: Markov Models Prepared by Qaiser Abbas (07-0906) * – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 29
Provided by: cz57
Category:

less

Transcript and Presenter's Notes

Title: Ch-9: Markov Models


1
  • Ch-9 Markov Models
  • Prepared by Qaiser Abbas (07-0906)

2
Outline
  • Markov Models
  • Hidden MarKov Models (HMM)
  • Three problems in HMM and their solutions

3
Credits and References
  • Materials used in this representation are taken
    from following textbooks or web resources
  • 1."Foundations of Statistical Natural Language
    Processing" by Manning Schütze. Chapter 9,
    Markov Models
  • 2.SPEECH and LANGUAGE PROCESSING An
    Introduction to Natural Language Processing,
    Computational Linguistics, and Speech
    Recognition, by D. Jurafsky and J.H. Martin,
    updated chapters are available on authors
    website Chapter 9 Automatic Speech
    Recognition
  • 3.Spoken Language Processing - A Guide to
    Theory, Algorithm, and System Development, by X.
    Huang, A. Acero, and H.W. Hon. Chapter 8Hidden
    Markov Models Chapter 12, Basic Search
    Algorithms
  • 4.Dr. Andrew W. Moore, Carnegie Melon University,
    http//www.cs.cmu.edu/awm/tutorials
  • 5.Larry Rabiners tutorial on HMMs

4
A Markov System
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1,
s2
s1
s3
N 3 t0
5
A Markov System
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1, On the tth
timestep the system is in exactly one of the
available states. Call it qt Note qt ?s1, s2 ..
sN
s2
s1
s3
N 3 t0 qtq0s3
6
A Markov System
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1, On the tth
timestep the system is in exactly one of the
available states. Call it qt Note qt ?s1, s2 ..
sN Between each timestep, the next state is
chosen by random.
s2
s1
s3
N 3 t1 qtq1s2
7
A Markov System
P(qt1s1qts2) 1/2 P(qt1s2qts2)
1/2 P(qt1s3qts2) 0
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1, On the tth
timestep the system is in exactly one of the
available states. Call it qt Note qt ?s1, s2 ..
sN The current state determines the probability
distribution for the next state.
s2
P(qt1s1qts1) 0 P(qt1s2qts1)
0 P(qt1s3qts1) 1
1/2
2/3
1/2
s1
s3
1/3
N 3 t1 qtq1s2
1
P(qt1s1qts3) 1/3 P(qt1s2qts3)
2/3 P(qt1s3qts3) 0
8
Markov Property
P(qt1s1qts2) 1/2 P(qt1s2qts2)
1/2 P(qt1s3qts2) 0
qt1 is conditionally independent of qt-1,
qt-2, q1, q0 given qt. In other words P(qt1
sj qt si ) P(qt1 sj qt si ,any
earlier history) The sequence of q is said to be
a Markov chain ,or to have the Markov property if
the next state depends only upon the current
state and not on any past states
s2
P(qt1s1qts1) 0 P(qt1s2qts1)
0 P(qt1s3qts1) 1
1/2
2/3
1/2
s1
s3
1/3
N 3 t1 qtq1s2
1
P(qt1s1qts3) 1/3 P(qt1s2qts3)
2/3 P(qt1s3qts3) 0
9
Transition Matrix
Question What is the probability of states
sequence of
10
Example A Simple Markov Model For Weather
Prediction
  • Any given day, the weather can be described as
    being in one of three states
  • State 1 snowy
  • State 2 cloudy
  • State 3 sunny

transition matrix
11
Question
  • Given that the weather on day 1(t1) is sunny
    (state 3), What is the probability that the
    weather for eight consecutive days is
    sun-sun-sun-rain-rain-sun-cloudy-sun?
  • Solution
  • O sun sun sun rain rain sun cloudy sun
  • 3 3 3 1 1 3 2
    3

12
From Markov To Hidden Markov
  • The previous model assumes that each state can be
    uniquely associated with an observable event
  • Once an observation is made, the state of the
    system is then trivially retrieved
  • This model, however, is too restrictive to be of
    practical use for most realistic problems
  • To make the model more flexible, we will assume
    that the outcomes or observations of the model
    are a probabilistic function of each state
  • Each state can produce a number of outputs
    according to a probability distribution, and each
    distinct output can potentially be generated at
    any state
  • These are known a Hidden Markov Models (HMM),
    because the state sequence is not directly
    observable, it can only be approximated from the
    sequence of observations produced by the system

13
Example A Crazy Soft Drink Machine
  • Suppose you have a crazy soft drink machine it
    can be in two states, cola preferring (CP) and
    iced tea preferring (IP), but it switches between
    them randomly after each purchase, as shown below

Three possible outputs( observations) cola, iced
Tea, lemonade
14
Question
  • What is the probability of seeing the output
    sequence lem, ice_t if the machine always
    starts off in the cola preferring state?
  • Solution
  • We need to consider all paths that might be
    taken through the HMM, and then to sum over them.
    We know that the machine starts in state CP.
    There are then four possibilities to produce the
    observations
  • CP-gtCP-gtCP
  • CP-gtCP-gt IP
  • CP-gtIP-gtCP
  • CP-gtIP-gtIP
  • So the total probability is

15
A Crazy Soft Drink Machine (Continued)
16
General Form of an HMM
  • HMM is specified by a five-tuple
  • 1)
  • Set of hidden states
  • N the number of states the state
    at time t
  • 2)
  • Set of observation symbols
  • M the number of observation symbols
  • 3)
  • The initial state distribution
  • 4)
  • State transition probability distribution
  • 5)
  • Observation symbol probability distribution in
    state

17
General Form of an HMM (Continued)
Two assumptions 1.Markov assumption
represents the state sequence 2.Output
independence assumption
represents the output sequence
18
Three Basic Problems in HMM
How to evaluate an HMM? Forward Algorithm
  • 1.The Evaluation Problem Given a model and a
    sequence of observations
    , what is the probability
    i.e., the probability of the model that
    generates the observations?
  • 2.The Decoding Problem Given a model and
    a sequence of observation
    , what is the most likely state sequence
    in the model that
    produces the observations?
  • 3.The Learning Problem Given a model and
    a set of observations, how can we adjust the
    model parameter to maximize the joint
    probability
  • ?

How to Decode an HMM? Viterbi Algorithm
How to Train an HMM? Baum-Welch Algorithm
19
How to Evaluate an HMM-A Straightforward Method
  • To calculate the probability (likelihood)
    of the observation sequence
    , given the HMM , the most intuitive
    way is to sum up the probabilities of all
    possible state sequences

Applying Markov assumption
Applying output independent assumption
20
How to Evaluate an HMM-A Straightforward Method
(complexity)
For any given state sequence, we start from
initial state with probability or
. We take a transition from to
with probability and generate the
observation with probability
until we reach the last transition.
21
How to Evaluate an HMM-The Forward Algorithm
  • Define forward probability

is the probability that the HMM is in
state having generated partial observation
The computation is done in a time- synchronous
fashion from left to right
22
How to Evaluate an HMM-The Forward Algorithm
It needs exactly N(N1)(T-1)N multiplications
and N(N-1)(T-1) additions, so the complexity for
this algorithm is O(N2T). For N5, T100, we
need about 3000 computations for the forward
algorithm, versus 1072 computations for the
straightforward method.
23
How to Decode an HMM-The Viterbi Algorithm
  • Instead of summing up probabilities from
    different paths coming to the same destination
    state, the Viterbi algorithm picks and remembers
    the best path.
  • Define the best-path probability

is the probability of the most likely
state sequence at time t, which has generated the
observation (until time t) and ends in
state i.
24
How to Decode an HMM-The Viterbi Algorithm
The computation is done in a time-synchronous
fashion from left to right. The complexity is
also O(N2T).
25
HMM Training UsingBaum-Welch Algorithm
  • A Hidden Markov Model is a probabilistic model of
    the joint probability of a collection of random
    variables O1,OT, Q1,QT. The Ot variables are
    discrete observations and the Qt variables are
    hidden and discrete states. Under HMM, two
    conditional independence assumptions are
  • 1. the tth hidden variable, given the (t-1)st
    hidden variable, is independent of previous
    variables, or P(Qt Qt-1, Ot-1, , Q1, O1)
    P(Qt Qt-1).
  • 2. the tth observation depends only on the tth
    state. P(Ot Qt,Ot,, Q1, O1) P(Ot Qt).
  • EM algorithm for finding the MLE of the
    parameters of a HMM given a set of observed
    feature vectors. This algorithm is also known as
    the Baum-Welch algorithm.
  • Qt is a discrete random variable with N possible
    values 1.N. We further assume that the
    underlying hidden Markov chain defined by P(Qt
    Qt-1 is time-homogeneous (i.e., is
    independent of the time t). Therefore, we can
    represent P(Qt Qt-1 as a time-independent
    stochastic transition matrix Aaijp(QtjQt-1i
    .
  • The special case of time t1 is described by the
    initial state distribution piP(Q1i). We say
    that we are in state j at time t if Qt j. A
    particular sequence of states is described by q
    (q1. . . qT ) where qt? 1..N is the state at
    time t.
  • The observation is one of L possible observation
    symbols, Ot? o1,.oL.The probability of a
    particular observation vector at a particular
    time t for state j is described by bj(ot) p(Ot
    otQt j). (Bbij is an L by N matrix). A
    particular observation sequence O is described as
    O (O1 o1, , , OT oT ).

26
  • Therefore, we can describe a HMM by? (A,B, p).
    Given an observation O, the Baum-Welch algorithm
    finds that is, the
    HMM ?, that maximizes the probability of the
    observation O.
  • The Baum-Welch algorithm
  • Initialization set with random initial
    conditions. The algorithm updates the parameters
    of ? iteratively until convergence, following the
    procedure below.
  • The forward procedure We define ai(t) p(O1
    o1, , ,Ot ot, Qt i ?), which is the
    probability of seeing the partial sequence o1, ,
    , ot and ending up in state i at time t. We can
    efficiently calculate ai(t) recursively as
  • The backward procedure This is the probability
    of the ending partial sequence ot1, , , oT given
    that we started at state i, at time t. We can
    efficiently calculate ßi(t) as
  • using a and ß, we can calculate the following
    variables

27
  • having ? and ? , one can define update rules as
    follows

28
Toolkits for HMM
  • Hidden Markov Model Toolkit (HTK)
    http//htk.eng.cam.ac.uk/
  • Hidden Markov Model (HMM) Toolbox for Matlab
  • http//www.cs.ubc.ca/murphyk/Software/HMM/hmm.ht
    ml
Write a Comment
User Comments (0)
About PowerShow.com