Using Neural Network Language Models for LVCSR

About This Presentation

Title:

Using Neural Network Language Models for LVCSR

Description:

Targets set to 1 for wj and to 0 otherwise. These outputs shown to cvg to posterior probs ... Neural net LM provide significant improvements in PPL and WER ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 25

Provided by: erinfit

Category:

more less

Transcript and Presenter's Notes

Title: Using Neural Network Language Models for LVCSR

1
Using Neural Network Language Models for LVCSR

Holger Schwenk and Jean-Luc Gauvain
Presented by Erin Fitzgerald
CLSP Reading Group
December 10, 2004

2
Introduction

Build and use neural networks to estimate LM
posterior probabilities for ASR tasks
Idea
Project word indices onto continuous space
Resulting smooth prob fns of word representations
generalize better to unknown ngrams
Still an n-gram approach, but posteriors
interpolated for any poss. context no backing
off
Result significant WER reduction with small
computational costs

3
ArchitectureStandard fully connected multilayer
perceptron
4
Architecture
oi
pi P(wji hj)
ck
dj
M
V
b
k
H
P
d tanh(Mcb)
pN P(wjN hj)
N
o tanh(Vdk)
5
Training

Train with std back propagation algorithm
Error fn cross entropy
Weight decay regularization used
Targets set to 1 for wj and to 0 otherwise
These outputs shown to cvg to posterior probs
Back-prop through projection layer? NN learns
best projection of words onto continuous space
for prob estimation task

6
Optimizations
7
Fast Recognition

Techniques
Lattice Rescoring
Shortlists
Regrouping
Block mode
CPU optimization

8
Fast Recognition

Techniques
Lattice Rescoring
Decode with std backoff LM to build lattices
Shortlists
Regrouping
Block mode
CPU optimization

9
Fast Recognition

Techniques
Lattice Rescoring
Shortlists
NN only predicts high freq subset of vocab
Regrouping
Block mode
CPU optimization

10
Shortlist optimization
oi
pi P(wji hj)
ck
dj
M
V
pS P(wjS hj)
k
b
H
P
N
11
Fast Recognition

Techniques
Lattice Rescoring
Shortlists
Regrouping Optimization of 1
Collect and sort LM prob requests
All prob requests with same ht only one fwd
pass necessary
Block mode
CPU optimization

12
Fast Recognition

Techniques
Lattice Rescoring
Shortlists
Regrouping
Block mode
Several examples propagated through NN at once
Takes advantage of faster matrix operations
CPU optimization

13
Block mode calculations
oi
ck
dj
M
V
b
k
H
P
d tanh(Mcb)
N
o tanh(Vdk)
14
Block mode calculations
O
C
D
M
V
b
k
D tanh(MCB)
O (VDK)
15
Fast Recognition Test Results

Techniques
Lattice Rescoring ave 511 nodes
Shortlists (2000) 90 prediction coverage
3.8M 4gms reqd, 3.4M processed by NN
Regrouping only 1M fwd passes reqd
Block mode bunch size128
CPU optimization
Total processing lt 9min (0.03xRT)
Without optimizations, 10x slower

16
Fast Training

Techniques
Parallel implementations
Full connections req low latency very costly
Resampling techniques
Optimum floating pt operations best with
continuous memory locations

17
Fast Training

Techniques
Floating point precision 1.5x faster
Suppress internal calcs 1.3x faster
Bunch mode 10x faster
Fwd back propagation for many examples at once
Multiprocessing 1.5x faster
47 hours ? 1h27m with bunch size 128

18
Application toCTS and BNLVCSR
19
Application to ASR

Neural net LM techniques focus on CTS bc
Far less in-domain training data ? data sparsity
NN can only handle sm amount of training data
New Fisher CTS data 20M words (vs 7M)
BN data 500M words

20
Application to CTS

Baseline Train standard backoff LMs for each
domain and then interpolate
Expt 1 Interpolate CTS neural net with
in-domain back-off LM
Expt 2 Interpolate CTS neural net with full
data back-off LM

21
Application to CTS - PPL

Baseline Train standard backoff LMs for each
domain and then interpolate
In-domain PPL 50.1 Full data PPL 47.5
Expt 1 Interpolate CTS neural net with
in-domain back-off LM
In-domain PPL 45.5
Expt 2 Interpolate CTS neural net with full
data back-off LM
Full data
PPL 44.2

22
Application to CTS - WER

Baseline Train standard backoff LMs for each
domain and then interpolate
In-domain WER 19.9 Full data WER 19.3
Expt 1 Interpolate CTS neural net with
in-domain back-off LM
In-domain WER 19.1
Expt 2 Interpolate CTS neural net with full
data back-off LM
Full data
WER 18.8

23
Application to BN

Only subset of 500M available words could be used
for training 27M train set
Still useful
NN LM gave 12 PPL gain over backoff on small 27M
set
NN LM gave 4 PPL gain over backoff on full 500M
word training set
Overall WER reduction of 0.3 absolute

24
Conclusion

Neural net LM provide significant improvements in
PPL and WER
Optimizations can speed NN training by 20x and
lattice rescoring in less than 0.05xRT
While NN LM was developed for and works best with
CTS, gains found in BN task too

Write a Comment

User Comments (0)

About PowerShow.com

Using Neural Network Language Models for LVCSR - PowerPoint PPT Presentation

Using Neural Network Language Models for LVCSR

Targets set to 1 for wj and to 0 otherwise. These outputs shown to cvg to posterior probs ... Neural net LM provide significant improvements in PPL and WER ... – PowerPoint PPT presentation