Sridhar Raghavan - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Sridhar Raghavan

Description:

... sum of the numerator for all words w' occuring in the same time instant ta to te. ... that we can obtain a greater discrimination in confidence levels if we also ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 22

Provided by: RAG123

Category:

more less

Transcript and Presenter's Notes

Title: Sridhar Raghavan

1
Confidence Measure using Word Graphs
Sridhar Raghavan
2
Abstract

Confidence measure using word posterior
There is a strong need for determining the
confidence of a word hypothesis in a LVCSR system
because conventional viterbi decoding just
generates the overall one best sequence, but the
performance of a speech recognition system is
based on Word error rate and not sentence error
rate.
Word posterior probability in a hypothesis is a
good estimate of the confidence.
The word posteriors can be computed from a word
graph where the links correspond to the words.
A forward-backward algorithm is used to compute
the link posteriors.

Foundation

The equation for computing the posterior of the
word is as follows Wessel.F
The idea here is to sum up the posterior
probabilities of all those word hypothesis
sequences that contain the word w with same
start and end times.
4

Foundation continued

We cannot compute the above posterior directly,
so we decompose it into likelihood and priors
using Bayes rule.
The value in the numerator can be computed
using the well known forward backward algorithm.
The denominator term is simply the sum of the
numerator for all words w occuring in the same
time instant ta to te.
5

What is exactly a word posterior from a word
graph?

A word posterior is a probability that is
computed by considering a words acoustic score,
language model score and its presence is a
particular path through the word graph. An
example of a word graph is given below, note that
the nodes are the start-stop times and the links
are the words. The goal is to determine the link
posterior probabilities. Every link holds an
acoustic score and a language model probability.
quest
6

Example

Let us consider an example as shown below
The values on the links are the likelihoods.
7

Forward-backward algorithm

Using forward-backward algorithm for determining
the link probability. The equations used to
compute the alphas and betas for an HMM are as
follows Computing alphas Step 1
Initialization In a conventional HMM
forward-backward algorithm we would perform the
following
We need to use a slightly modified version of the
above equation for processing a word graph. The
emission probability will be the acoustic score
and the initial probability is taken as 1 since
we always begin with a silence.
8

Forward-backward algorithm continue

The a for the first node in the word graph is
computed as follows
Step 2 Induction
This step is the main reason we use
forward-backward algorithm for computing such
probabilities. The alpha values computed in the
previous step is used to compute the alphas for
the succeeding nodes. Note Unlike in HMMs where
we move from left to right at fixed intervals of
time, over here we move from one start time of a
word to the next closest words start time.
9

Forward-backward algorithm continue

Let us see the computation of the alphas from
node 2, the alpha for node 1 was computed in the
previous step during initialization. Node 2
a1.675E-03
a 0.5025
is
4
Node 3
a 1
3/6
2/6
Sil
1
4/6
3
3/6
3/6
Sil
Node 4
this
2
a 0.5
The alpha calculation continues in this manner
for all the remaining nodes
The forward backward calculation on word-graphs
is similar to the calculations used on HMMs, but
in word graphs the transition matrix is populated
by the language model probabilities and the
emission probability corresponds to the acoustic
score.
10

Forward-backward algorithm continue

Once we compute the alphas using the forward
algorithm we begin the beta computation using the
backward algorithm. The backward algorithm is
similar to the forward algorithm, but we start
from the last node and proceed from right to
left. Step 1 Initialization
Step 2 Induction
11

Forward-backward algorithm continue

Let us see the computation of the beta values
from node 14 and backwards. Node 14
ß1.66E-3
ß0.1667
1/6
14
sense
1/6
Sil
1/6
11
Sil
sentence
Node 13
5/6
13
15
4/6
ß1
sentence
ß0.833
12
Node 12
ß5.55E-3
12

Forward-backward algorithm continue

Node 11
In a similar manner we obtain the beta values for
all the nodes till node 1.
We can compute the probabilities on the links
(between two nodes) as follows Let us call this
link probability as G. Therefore G(t-1,t) is
computed as the product of a(t-1)ß(t)aij. These
values give the un-normalized posterior
probabilities of the word on the link considering
all possible paths through the link.
13

Word graph showing the computed alphas and betas

This is the word graph with every node with its
corresponding alpha and beta value.
a1.675E-5 ß4.61E-9
a2.79E-8 ß2.766E-6
a1.2923E-13 ß0.1667
a7.751E-11 ß1.66E-3
a1.675E-03 ß1.536E-11
quest
a 0.5025 ß5.740E-12
1/6
a
14
sense
1/6
1/6
the
Sil
8
1/6
2/6
is
a 1 ß2.8843E-14
4
1/6
11
guest
Sil
sentence
3/6
2/6
1/6
5/6
Sil
6
2/6
This
1
13
15
4/6
3
is
9
4/6
3/6
a1.861E-8 ß2.766E-6
a2.88E-14 ß1
3/6
the
Sil
sentence
5
a3.438E-12 ß0.833
2/6
1/6
this
is
a3.35E-3 ß8.527E-10
test
2
4/6
a
12
4/6
7
10
a 0.5 ß2.87E-14
a4.964E-10 ß5.55E-3
a7.446E-8 ß3.7E-5
a1.117E-5 ß2.512E-7
Assumption here is that the probability of
occurrence of any word is 0.01. i.e. we have 100
words in a loop grammar
14

Link probabilities calculated from alphas and
betas

The following word graph shows the links with
their corresponding link posterior probabilities
(not yet normalized).
G4.649E-13
G4.649E-13
quest
1/6
a
14
sense
1/6
G7.72E-14
G1.292E-15
1/6
the
G1.292E-13
Sil
G7.71E-14
8
1/6
2/6
is
4
1/6
G5.74E-12
11
guest
Sil
sentence
3/6
2/6
1/6
5/6
6
Sil
G6.45E-13
2/6
This
13
1
15
4/6
3
is
G3.08E-13
9
4/6
G1.549E-13
3/6
3/6
the
G3.438E-14
Sil
sentence
G4.288E-12
5
2/6
G2.87E-14
1/6
this
G3.08E-13
is
test
G2.87E-14
2
4/6
a
G4.136E-12
12
4/6
7
10
G8.421E-12
G4.136E-12
G4.136E1-12
By choosing the links with the maximum posterior
probability we can be certain that we have
included most probable words in the final
sequence.
15

Using it on a real application

Using the algorithm on real application Need
to perform word spotting without using a language
model i.e. we can only use a loop grammar.
In order to spot the word of interest we will
construct a loop grammar with just this one
word. Now the final one best hypothesis will
consist of a sequence of the same
word repeated N times. So, the challenge here is
to determine which of these N words actually
corresponds to the word of interest. This is
achieved by computing the link posterior
probability and selecting the one with the
maximum value.
16

1-best output from the word spotter

The recognizer puts out the following output
- 0000 0023 !SENT_START -1433.434204 0023
0081 BIG -4029.476440 0081
0176 BIG -6402.677246 0176
0237 BIG -4080.437500 0237
0266 !SENT_END -1861.777344 We have to
determine which of the three instances of the
word actually exists.
17
Lattice from one of the utterances
For this example we have to spot the word BIG
in an utterance that consists of three words
(BIG TIED GOD). All the links in the output
lattice contains the word BIG. The values on
the links are the acoustic likelihoods in log
domain. Hence a forward backward computation just
involves addition of these numbers in a
systematic manner.
18

Alphas and betas for the lattice

The initial probability at both the nodes is 1.
So, its logarithmic value is 0. The language
model probability of the word is also 1 since
it is the only word in the loop grammar.
19

Link posterior calculation

It is observed that we can obtain a greater
discrimination in confidence levels if we also
multiply the final probability with the
likelihood of the link other than the
corresponding alphas and betas. In this example
we add the likelihood since it is in log domain.
2
G-18061
G-18061
G-67344
G-17942
3
sent_start
G-21382
1
0
G-17859
4
G-25690
G-17781
G-31152
G-31152
G-31152
sent_end
8
5
6
7
20

Inference from the link posteriors

Link 1 to 5 corresponds to the first word time
instance while 5 to 6 and 6 to 7 correspond to
the second and third word instances respectively.
It is very clear from the link posterior values
that the first instance of the word BIG has a
much higher probability than the other two.
Note The part that is missing in this
presentation is the normalization of these
probabilities, this is needed to make comparison
between various link posteriors.
21

References

F. Wessel, R. Schlüter, K. Macherey, H. Ney.
"Confidence Measures for Large Vocabulary
Continuous Speech Recognition". IEEE Trans. on
Speech and Audio Processing. Vol. 9, No. 3, pp.
288-298, March 2001
Wessel, Macherey, and Schauter, "Using Word
Probabilities as Confidence Measures, ICASSP'97
G. Evermann and P.C. Woodland, Large Vocabulary
Decoding and Confidence Estimation using Word
Posterior Probabilities in Proc. ICASSP 2000, pp.
2366-2369, Istanbul.
X. Huang, A. Acero, and H.W. Hon, Spoken Language
Processing - A Guide to Theory, Algorithm, and
System Development, Prentice Hall, ISBN
0-13-022616-5, 2001
J. Deller, et. al., Discrete-Time Processing of
Speech Signals, MacMillan Publishing Co., ISBN
0-7803-5386-2, 2000