Speech recognition in MUMIS

About This Presentation

Title:

Speech recognition in MUMIS

Description:

Speech recognition in MUMIS Judith Kessens, Mirjam Wester & Helmer Strik Manual transcriptions Transcriptions made by SPEX: orthographic transcriptions transcriptions ... – PowerPoint PPT presentation

Number of Views:205

Avg rating:3.0/5.0

Slides: 23

Provided by: cog82

Category:

more less

Transcript and Presenter's Notes

Title: Speech recognition in MUMIS

1
Speech recognition in MUMIS

Judith Kessens, Mirjam Wester
Helmer Strik

2
Manual transcriptions

Transcriptions made by SPEX
orthographic transcriptions
transcriptions on chunk level (2-3 sec.)
Formats
.Textgrid ? praat
xml-derivatives
.pri no time information
.skp time information

3
Manual transcriptions

Total amount of transcribed matches on ftp-site
(including the demo matches)
Dutch 6 matches
German 21 matches
English 3 matches
Extensions
Dutch (_N), German (_G), English (_E)

4
Automatic speech recognition

Acoustic preprocessing
Acoustic signal ? features
2. Speech recognition
Acoustic models
Language models
Lexicon

5
Automatic transcriptions

Problem of recorded data
Commentaries and stadium noise are mixed
Very high noise levels
? Recognition of such extreme noisy data is very
difficult

6
Examples of data

Yug-Ned match
Dutch
English
German

op _t ogenblik wordt in dit stadion de
opstelling voorgelezen
and they wanna make the change before the corner
und die beiden Tore die die Hollaender bekommen
hat haben
7
Examples of data

Eng-Dld match
Dutch
English
German

geeft nu een vrije trap in _t voordeel van Ince
and phil neville had to really make about three
yards to stop ltdreislerugt pulling it down and
playing it
wurde von allen englischen Zeitungen aus der
Mannschaft
8
Evaluation of aut. transcriptions
insertionsdeletionssubstitutions number of words
WER()
? WER can be larger than 100 !
9
WERs (all words)
Dutch English German
Yug-Ned 84.5 84.5 77.4
Eng-Dld 83.2 83.3 90.8
10
WERs (player names)
Dutch English German
Yug-Ned names 84.5 53.0 84.5 48.2 77.4 40.9
Eng-Dld names 83.2 55.0 83.3 56.2 90.8 77.4
11
WERs versus SNR
Dutch English German
Yug-Ned SNR 84.5 9 84.5 12 77.4 19
Eng-Dld SNR 83.2 8 83.3 11 90.8 7
12
Automatic transcriptions

The language model (LM) and lexicon (lex) are
adapted to a specific match
Start with a general LM and lex
Add player names of the specific match
Expand the general LM and lex when more data is
available

13
WERs for various amounts of data
14
Oracle experiments - ICLSP02

Due to limited amount of material we started off
with oracle experiments
Language models are trained on target match
Acoustic models are trained on part of target
match or other match
? Much lower WERs

15
Summary of results

Acoustic model training
Leaving out non-speech chunks does not hurt
recognition performance
Using more training data is benificial, but more
important
The SNRs of the training and test data should be
matched

16
Summary of results

WERs are SNR-dependent

(tested on Yug-Ned match)
17
Summary of results
Split words into categories, i.e. function words,
content words and football players names WER
function words gt WER content words gt WER names
(tested on Yug-Ned match)
18
Summary of results

Noise reduction tool (FTNR)? small improvement

19
Ongoing work

Techniques to lower WERs
Tuning of the generic language model
Defining different classes
Reduction of OOV words in lexicon and in the
language model (using more material)
Speaker Adaptation in HTK
(note all other experiments are being carried
out using Phicos)

20
Ongoing work