Application of Gender Identification to Automatic Speech Recognition - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Application of Gender Identification to Automatic Speech Recognition

Description:

Enormous amount of audio data collected from intercepted telephone conversations ... National Monument the site is known as the location of Custer's last stand ... – PowerPoint PPT presentation

Number of Views:1411

Avg rating:3.0/5.0

Slides: 35

Provided by: Vic455

Category:

more less

Transcript and Presenter's Notes

Title: Application of Gender Identification to Automatic Speech Recognition

1
Application of Gender Identification to Automatic
Speech Recognition

Jen Burge
Vonetta Lewis
Intelligent Information Engineering Laboratory
Department of Computer Science and Engineering
Oakland University

2
Voice Scoring System

Enormous amount of audio data collected from
intercepted telephone conversations
Use speech recognition to transcribe text
Use transcription to decide what data should be
examined by human

3
Diagram from Sethi. Scoring Voice-stream for
Homeland Security, Oakland University. 2003.
4
Speech Recognition

Recognize speech data (convert to text or
perform commands, etc)
Applications
Use of speech as a form of input (dictation,
control, etc.)
Content based retrieval for audio files

Feature extraction (LPC or MFCC)
Model words or individual phonemes
Attempt to match new sounds to models to
recognize speech
Can use knowledge of language to improve accuracy

Diagram from http//www.mor.itesm.mx/omayora/T
utorial/tutorial.html
6
Training Recognizer

Speaker-dependent
Specific user trains system
Speaker-independent
Attempt to recognize speech from any speaker
Need to train for all possible speakers
Less accurate

7
Variability Across Speakers

Main obstacle in speaker independent speech
recognition systems
Gender, accent
Approaches to Problem
Build models for variability and choose
appropriate one
Adapt to current speaker during recognition

8
Project Goals

Improve accuracy of speaker-independent
recognition by having gender dependent models
Automatically determine gender of speaker
Use the gender recognition to select appropriate
model for transcription
Investigate Language Identification

9
Gender-related Acoustic Differences

Differences in length of vocal tract and vocal
folds causes different
Fundamental frequency
Formant frequencies
Other differences

Diagram from Kent Read. Acoustic Analysis of
Speech. 2002.
10
Past Work on Gender Problem

Recognition by pitch
Difficulty accurately measuring fundamental
frequency
Male/Female threshold usually 160 Hz
Model-based techniques
Hidden Markov Model (HMM)
Gaussian Mixture Model (GMM)
Other
Formant positions

11
Data Collection

Used CNN and NPR radio broadcasts
16 kHz, 16 bit mono for gender recognition
experiments
22 kHz, 16 bit mono for ViaVoice tests

12
Pitch Threshold Experiments

Measured fundamental frequency using
autocorrelation algorithm
Averaged frequency over entire clip
Compared to 160 Hz threshold
Achieved approx. 89 accuracy

13
Gaussian Mixture Model

Model data as a weighted sum of Gaussian
distributions
Use Expectation-Maximization algorithm to fit GMM
to training data
New data classified by finding the likelihood it
was produced by each model

14
GMM Experiments

Features MFCCs (10,12,14,16)
GMM components (32,64,128)
Trained with 10 speakers per gender, 20-22 second
samples
Tested with 51 utterances 23 male, 28 female
of varying length

15
(No Transcript)
16
Observations

Slight improvement in accuracy with GMM (14 or 16
MFCCs)
Misclassifications usually people with accents
not represented in training speakers

17
ViaVoice Testing

Attempt to improve accuracy using gender
dependent models
Trained one male model and one female model
(trained by specific people)

18
Experiments

Transcribed 24 radio clips with both male and
female models as well as built-in ViaVoice
speaker-independent model
Measure of accuracy
words correctly recognized
total of words spoken

100
19
(No Transcript)
20
(No Transcript)
21
Observations

Accuracy improvement by using same-gender model
as opposed to opposite gender model 353
Average accuracy of generic model 68.5

22
Language Identification

Goal Automatically determine language spoken in
order to select appropriate language model for
transcription
Previous methods
HMM
GMM

23
Experiments

Hypothesized that English ViaVoice would
recognize fewer words for clips of foreign
languages
Compared words recognized per second to voiced
regions per second

24
Data Collection

Used World Radio Network, BBC, and NPR radio
broadcasts
Recorded at 22 kHz, 16 bit mono
13-30 second samples
13 foreign languages, 17 different accents in
English

25
Sample Transcriptions

English sample

Persian sample

26
(No Transcript)
27
English/Non-English Classifier

Use line (Voiced/Sec Words/Sec) 0 as
classifier
Negative English, Positive Non-English
Tested on 82 clips (65 Non-English, 22 English)
Accuracy 80.5

28
Observations

A previous paper achieved 76.8 in an
English/Japanese decision
To distinguish between multiple languages besides
English/non-English, would need to perform
similar procedure for all expected languages

29
Conclusions

Gender classification with GMM slightly more
accurate than with pitch
ViaVoice model trained by person of same gender
as speaker more accurate than one trained by
person of opposite sex
Achieved 80 accuracy in English/non-English
decision based on the number of words produced by
English ViaVoice

30
Future Work

Possibility of accuracy improvement with
speaker-independent male and female models
Need for accent consideration
Use of grammatical and semantic information to
improve English/non-English decision

31
Acknowledgements

IIE Lab Mingkun, Victor, Aiyesha, Dingguo,
Shuo, Rishi
Dr. Sethi
Test Subjects
Sniffy Fluffy

32
Books

Kent, R. and Read, C. Acoustic Analysis of
Speech Second Edition. Thompson Learning,
Albany, NY, 2002.
Deller, J., Proakis, J., and Hansen, J.
Discrete-Time Processing of Speech Signals.
Macmillan, New York, NY, 1993.
Rabiner,L. and Schafer, R. Digital Processing of
Speech Signals. Prentice Hall, New Jersey, 1978.
Proakis, J. and Manolakis, D. Digital Signal
Processing Principles, Algorithms, and
Applications. Prentice Hall, New Jersey, 1996.

33
Papers

1. Parris, E. Carey, M, Language Independent
Gender Identification, Proc. ICASSP, 1996.
2. Sethi, I. Scoring Voice-stream for Homeland
Security, Oakland University, 2003.
3. Abdulla, W. Kasabov, N. Improving speech
recognition performance through gender
separation Proceedings of the Fifth Biannual
Conference on Artificial Neural Networks and
Expert Systems (ANNES2001), pp.218-222, 2001.
4. Chen, T., Huang, C., Chang, E., Wang, J.
Automatic Accent Identification using Gaussian
Mixture Models Proceedings IEEE Automatic Speech
Recognition and Understanding Workshop
(ASRU2001), Italy, 2001
5. Vergin, R., Farhat, A., OShaughnessy,D.
Robust Gender-Dependent Acoustic-Phonetic
Modeling in Continuous Speech Recognition Based
On a New Automatic Male/Female Classification,
Proc. ICASSP, 1996.