Title: Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20)
1Combined LectureCS621 Artificial Intelligence
(lecture 19)CS626/449 Speech-NLP-Web/Topics-in-A
I (lecture 20)
Hidden Markov Models
- Pushpak Bhattacharyya
- Computer Science and Engineering Department
- IIT Bombay
2Example Blocks World
- STRIPS A planning system Has rules with
precondition deletion list and addition list
Robot hand
Robot hand
A
C
B
A
C
B
START
GOAL
on(B, table) on(A, table) on(C, A) hand
empty clear(C) clear(B)
on(C, table) on(B, C) on(A, B) hand
empty clear(A)
3Rules
- R1 pickup(x)
- Precondition Deletion List handempty,
on(x,table), clear(x) - Add List holding(x)
- R2 putdown(x)
- Precondition Deletion List holding(x)
- Add List handempty, on(x,table), clear(x)
4Rules
- R3 stack(x,y)
- Precondition Deletion List holding(x),
clear(y) Add List on(x,y), clear(x), handempty - R4 unstack(x,y)
- Precondition Deletion List on(x,y),
clear(x),handempty - Add List holding(x), clear(y)
5Plan for the block world problem
- For the given problem, Start ? Goal can be
achieved by the following sequence - Unstack(C,A)
- Putdown(C)
- Pickup(B)
- Stack(B,C)
- Pickup(A)
- Stack(A,B)
- Execution of a plan achieved through a data
structure called Triangular Table.
6Why Probability?
- (discussion based on the book Automated
Planning by Dana Nau)
7Motivation
c
a
b
Intendedoutcome
- In many situations, actions may havemore than
one possible outcome - Action failures
- e.g., gripper drops its load
- Exogenous events
- e.g., road closed
- Would like to be able to plan in such situations
- One approach Markov Decision Processes
c
a
b
Graspblock c
a
b
c
Unintendedoutcome
8Stochastic Systems
- Stochastic system a triple ? (S, A, P)
- S finite set of states
- A finite set of actions
- Pa (s? s) probability of going to s? if we
execute a in s - ?s? ? S Pa (s? s) 1
9Example
- Robot r1 startsat location l1
- State s1 inthe diagram
- Objective is toget r1 to
- location l4
- State s4 inthe diagram
Start
Goal
10Example
- No classical plan (sequence of actions) can be a
solution, because we cant guarantee well be in
a state where the next action is applicable - e.g.,
- p ?move(r1,l1,l2), move(r1,l2,l3),
move(r1,l3,l4)?
Start
Goal
11Another Example
A colored ball choosing example
Urn 1 of Red 30 of Green 50 of Blue
20
Urn 3 of Red 60 of Green 10 of Blue
30
Urn 2 of Red 10 of Green 40 of Blue
50
Probability of transition to another Urn after
picking a ball
U1 U2 U3
U1 0.1 0.4 0.5
U2 0.6 0.2 0.2
U3 0.3 0.4 0.3
12Example (contd.)
R G B
U1 0.3 0.5 0.2
U2 0.1 0.4 0.5
U3 0.6 0.1 0.3
U1 U2 U3
U1 0.1 0.4 0.5
U2 0.6 0.2 0.2
U3 0.3 0.4 0.3
Given
and
Observation RRGGBRGR
State Sequence ??
Not so Easily Computable.
13Example (contd.)
- Here
- S U1, U2, U3
- V R,G,B
- For observation
- O o1 on
- And State sequence
- Q q1 qn
- p is
U1 U2 U3
U1 0.1 0.4 0.5
U2 0.6 0.2 0.2
U3 0.3 0.4 0.3
A
R G B
U1 0.3 0.5 0.2
U2 0.1 0.4 0.5
U3 0.6 0.1 0.3
B
14Hidden Markov Models
15Model Definition
- Set of states S where SN
- Output Alphabet V
- Transition Probabilities A aij
- Emission Probabilities B bj(ok)
- Initial State Probabilities p
16Markov Processes
- Properties
- Limited Horizon Given previous n states, a state
i, is independent of preceding 0i-n1 states. - P(XtiXt-1, Xt-2 , X0) P(XtiXt-1, Xt-2
Xt-n) - Time invariance
- P(XtiXt-1j) P(X1iX0j) P(XniX0-1j)
17Three Basic Problems of HMM
- Given Observation Sequence O o1 oT
- Efficiently estimate P(O?)
- Given Observation Sequence O o1 oT
- Get best Q q1 qT i.e.
- Maximize P(QO, ?)
- How to adjust to best
maximize - Re-estimate ?
-
18Three basic problems (contd.)
- Problem 1 Likelihood of a sequence
- Forward Procedure
- Backward Procedure
- Problem 2 Best state sequence
- Viterbi Algorithm
- Problem 3 Re-estimation
- Baum-Welch ( Forward-Backward Algorithm )
19Problem 2
- Given Observation Sequence O o1 oT
- Get best Q q1 qT i.e.
- Solution
- Best state individually likely at a position i
- Best state given all the previously observed
states and observations - Viterbi Algorithm
20Example
- Output observed aabb
- What state seq. is most probable? Since state
seq. cannot be predicted with certainty, the
machine is given qualification hidden. - Note ? P(outlinks) 1 for all states
21Probabilities for different possible seq
1
1,2
0.15
...and so on
22Viterbi for higher order HMM
- If
- P(sisi-1, si-2) (order 2 HMM)
- then the Markovian assumption will take effect
only after two levels. - (generalizing for n-order after n levels)
23Forward and Backward Probability Calculation
24A Simple HMM
a 0.2
a 0.3
b 0.2
b 0.1
a 0.2
b 0.1
b 0.5
a 0.4
25Forward or a-probabilities
- Let ai(t) be the probability of producing w1,t-1,
while ending up in state si - ai(t) P(w1,t-1,Stsi), tgt1
26Initial condition on ai(t)
1.0 if i1
ai(t)
0 otherwise
27Probability of the observation using ai(t)
- P(w1,n)
- S1 s P(w1,n, Sn1si)
- Si1 s ai(n1)
- s is the total number of states
28Recursive expression for a
- aj(t1)
- P(w1,t, St1sj)
- Si1 s P(w1,t, Stsi, St1sj)
- Si1 s P(w1,t-1, Stsj)
- P(wt, St1sjw1,t-1, Stsi)
- Si1 s P(w1,t-1, Stsi)
- P(wt, St1sjStsi)
- Si1 s aj(t) P(wt, St1sjStsi)
29The forward probabilities of bbba
Time Ticks 1 2 3 4 5
INPUT e b bb bbb bbba
1.0 0.2 0.05 0.017 0.0148
0.0 0.1 0.07 0.04 0.0131
P(w,t) 1.0 0.3 0.12 0.057 0.0279
30Backward or ß-probabilities
- Let ßi(t) be the probability of seeing wt,n,
given that the state of the HMM at t is si - ßi(t) P(wt,n,Stsi)
31Probability of the observation using ß
32Recursive expression for ß
- ßj(t-1)
- P(wt-1,n St-1sj)
- Sj1 s P(wt-1,n, Stsi St-1si)
- Si1 s P(wt-1, StsjSt-1si)
P(wt,n,wt-1,Stsj, St-1si) - Si1 s P(wt-1, StsjSt-1si) P(wt,n, Stsj)
(consequence of Markov Assumption) - Sj1 s P(wt-1, StsjSt-1si) ßj(t)
33Problem 1 of the three basic problems
34Problem 1 (contd)
- Order 2TNT
- Definitely not efficient!!
- Is there a method to tackle this problem? Yes.
- Forward or Backward Procedure
35Forward Procedure
Forward Step
36Forward Procedure
37Backward Procedure
38Backward Procedure
39Forward Backward Procedure
- Benefit
- Order
- N2T as compared to 2TNT for simple computation
- Only Forward or Backward procedure needed for
Problem 1
40Problem 2
- Given Observation Sequence O o1 oT
- Get best Q q1 qT i.e.
- Solution
- Best state individually likely at a position i
- Best state given all the previously observed
states and observations - Viterbi Algorithm
41Viterbi Algorithm
- i.e. the sequence which has the best joint
probability so far. - By induction, we have,
42Viterbi Algorithm
43Viterbi Algorithm
44Problem 3
- How to adjust to best maximize
- Re-estimate ?
- Solutions
- To re-estimate (iteratively update and improve)
HMM parameters A,B, p - Use Baum-Welch algorithm
45Baum-Welch Algorithm
- Define
- Putting forward and backward variables
46Baum-Welch algorithm
47- Define
- Then, expected number of transitions from Si
- And, expected number of transitions from Sj to Si
48(No Transcript)
49Baum-Welch Algorithm
- Baum et al have proved that the above equations
lead to a model as good or better than the
previous