HMM and n-gram tagger (Recap)

About This Presentation

Title:

HMM and n-gram tagger (Recap)

Description:

We use si and wk to refer to what is in an HMM structure. ... data into two sets: create the voc from set1, and estimate P( unk |t) from set2. ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 28

Provided by: coursesWa5

Category:

more less

Transcript and Presenter's Notes

Title: HMM and n-gram tagger (Recap)

1
HMM and n-gram tagger (Recap)

LING 575
Week 1 1/08/08

2
HMM

Two types of HMM
Arc-emission HMM
State-emission HMM
The two types are equivalent.
We normally use state-emission HMM to build
n-gram taggers.

3
Definition of state-emission HMM

A HMM is a tuple
A set of states Ss1, s2, , sN.
A set of output symbols Sw1, , wM.
Initial state probabilities
Transition prob Aaij.
Emission prob Bbjk
We use si and wk to refer to what is in an HMM
structure.
We use Xi and Oi to refer to what is in a
particular HMM path and its output

4
An HMM structure

s1
s2
sN
w1
w2
w1
w5
w3
w1

Two kinds of parameters
Transition probability P(sj si)
Emission probability P(wk si)
? of Parameters O(NMN2)

5
A path for an output sequence

State sequence X1,n1
Output sequence O1,n

6
Definition of arc-emission HMM

A HMM is a tuple
A set of states Ss1, s2, , sN.
A set of output symbols Sw1, , wM.
Initial state probabilities
Transition prob Aaij.
Emission prob Bbijk
We use si and wk to refer to what is in an HMM
structure.
We use Xi and Oi to refer to what is in a
particular HMM path and its output

7
Arc-emission vs. state-emission
8
Three fundamental questions for HMMs

To train an HMM to learn the transition and
emission probabilities
To find the best state sequence for a given
observation
To compute the probability of a given observation

9
Training an HMM estimating the probabilities

Supervised learning
The state sequences in the training data are
known
ML estimation by simple counting
Unsupervised learning
The state sequences in the training data are
unknown
The forward-backward algorithm
More in later slides

10
HMM as a parser Finding the best state sequence

Given the observation O1,To1oT, find the state
sequence X1,T1X1 XT1 that maximizes P(X1,T1
O1,T).
? Viterbi algorithm

11
Viterbi algorithm

The probability of the best path that produces
O1,t-1 while ending up in state sj

Initialization
Induction
12
HMM as an LM computing P(o1, , oT)
13
Definition of the forward probability

The probability of producing O1,t-1 while ending
up in state si

14
Calculating forward probability
Initialization
Induction
15
HMM Summary

Definition hidden states, output symbols
Two types of HMMs
Three basic questions in HMM
Estimate probability MLE
Find the best sequence Viterbi algorithm
Find the probability of an observation forward
probability

16
N-gram POS tagger
17
N-gram POS tagger
18
N-gram POS tagger (cont)
Bigram model
Trigram model
19
The bigram tagger

States Each state corresponds to a POS tag,
plus a state for BOS (and a state for EOS)
Output symbols Each output symbol is a word or
ltsgt or lt/sgt
Initial probability
Transition probability aij P(sj si)
Emission probability bjk P(wk sj)

20
The bigram tagger (cont)
21
The trigram tagger

States Each state corresponds to a tag pair, a
tag is a POS tag or BOS or EOS
Output symbols words, ltsgt, lt/sgt
Initial probability
Transition probability
aij P(t3 t1, t2), where si(t1, t2),
and sj(t2, t3)
0, where si(t1, t2), and sj(t2, t3),
and t2 ! t2
Emission probability
bjk P(wk t), where sj(t,t) for any
t

22
The trigram tagger (cont)
23
Training a n-gram tagger
24
Estimating the probability
25
Smoothing

To handle unseen tag sequences
? to smooth the transition prob
To handle unknown words
? to smooth the emission prob
To handle unseen (word, tag) pairs, where both
word and tag are known
There is very low percentage of such pairs (e.g.,
0.44) in PTB.

26
Handling unseen tag sequences

Ex To smooth P(t3t1, t2) for a trigram tagger.
Can we use GT smoothing for this?
How about interpolation?
P(t3 t1, t2)

27
Handling unknown words?

Introduce a new output symbol ltunkgt
Estimate P(ltunkgt t) for each tag t
Ex split training data into two sets create the
voc from set1, and estimate P(ltunkgtt) from set2.
Add P(ltunkgt t) to the emission prob and
renormalize so that sumw P(wt) 1.
Ex Keep P(ltunkgtt) the same, and make

Write a Comment

User Comments (0)