Title: Speaker
1Alexandros Xafopoulos
2Presentation Outline
- Framework
- Preprocessing -
- Features (Extraction, Noise Compensation-Channel
Equalization, Selection) - (Pattern) Matching-Modeling
- Decision Making -
- Performance Evaluation
- Experimental Results
- References
3Introduction
Framework
- Motivation
- Speech contains speaker specific characteristics
(physiological-behavioral) - vocal tract
- pitch range - vocal cords
- articulator movement
- Mouth
- Nasal cavity
- Lips
- Voiceprint as a biometric (distinguishing trait)
- Natural economical way
4Introduction(2)
Framework
- Objective
- Discriminate betw. a given speaker all others
- Definitions
- Verification lt Latin verus (true)
- Claim Speaker identity
- Proof Speech utterance
- Binary decision to establish the truth
- Client speaker registered on the system
- Impostor speaker who claims a false identity
- Model set of parameters that represents a
speaker or a group of speakers
5Related Research Areas
Framework
Signal processing
Analog Signal Processing
Digital Signal Processing
Speech Processing
Other Signals
Recognition
Coding / Synthesis
Analysis
Storage / Transmission
Enhancement
Speech Recognition
Speaker Recognition
Language Identification
Speaker Identification
Speaker Detection / Tracking
Speaker Verification
6Related Research Areas(2)
Framework
- (Statistical) Pattern Recognition
Class Label
Data
Feature Extractor
Features
Trained Classifier
7Related Research Areas(3)
Framework
- Biometrics Technology
- def automatic identification of a person based
on his/her physiological or behavioral
characteristics (biometrics) - desirable properties of biometrics Jain_bk
- universality (found in every person)
- uniqueness (different value for each person)
- permanence (invariant with time)
- collectability (quantitatively measurable)
- performance (? accuracy vs. ? resources)
- high acceptability (persons willingness)
- low circumvention (not easy to fool)
8Related Research Areas(4)
Framework
Communication by speech Somervuo
9Related Research Areas(5)
Framework
Speech Production Physiology
Picone
Block Diagram of Human Speech Production
Picone
10Related Research Areas(6)
Framework
Morgan (corrected)
General Discrete-Time Model for Speech Production
Gain for voice source
G(z)
H(z)
R(z)
Gain for noise source
11Generic Speaker Verification Process
Framework
- Enrollment (Training) module
?
Speaker A N utterances
Known Identity Speaker is A
Speech Pressure Wave of A
N Sets of Feature Vectors
Digital Speech
Feature Vectors
Digital Signal Acquisition
Feature Creation
Model Registration
Channel to transfer signal
Speaker Model of A
12Generic Speaker Verification Process(2)
Framework
- Enrollment module(2)
- Digital signal acquisition
- Sampling frequency
Speech Pressure Wave
Analog Voltage Signal
Conditioned Analog Signal
Antialiasing low-pass filter
Sampling Quantization (A/D converter)
Microphone
Digital Speech
13Generic Speaker Verification Process(3)
Framework
- Enrollment module(3) Feature creation
Digital Speech
Preprocessing
Noise Compensation Channel Equalization
Preprocessed Digital Speech
Plain Feature Vectors
Feature Extraction
(Clean) Feature Vectors
Feature Selection
14Generic Speaker Verification Process(4)
Framework
Claimed Identity B
Threshold of B
P Speaker Models
Speaker Model of B
Speech Pressure Wave of A
Model Selection
Digital Speech
Feature Vectors
Matching Results
Digital Signal Aquisition
Feature Creation
Pattern Matching
Decision Making
Acceptance (AB) or Rejection (A?B)
15Generic Speaker Verification Process(5)
Framework
Speaker Models of
Threshold Setting
Speaker Model of A
Threshold of A
- Cohort model competitive clients only
- World model all the clients
16Corpus Parameters
Framework
- Text-dependency Nedic
- Text dependent verification done on a fixed
phrase, predetermined by the recognizer (fixed
phrase) - Text prompted verification done on system
generated sequence of predetermined words (fixed
vocabulary) - User customized verification done on user
requested phrase - Text independent verification done on any phrase
- Language independent verification done on any
language - Vocabulary
- Fixed or not
- Size (V)
17Corpus Parameters(2)
Framework
- Population (Speakers)
- Size (P)
- Similarity
- Speech Flow
- Discrete Utterance (pauses betw. words)
- Continuous
- Spontaneous (natural)
- Quantity (sessions, phrases, phrase duration)
- Quality of speech (Problems)
18Problems under real conditions
Framework
- Microphone / Communication channel / Digitizer
quality - Channel - Environmental mismatch (different
channels - environments for enrollment
verification request) - Mimicry by humans tape recorders
- Bad pronunciation
- Extreme emotional states (e.g. anger)
- Sickness / Allergies / Tiredness / Thirst
- Aging
- Environmental noise / Poor room acoustics
19Errors
Framework
- False Rejection
- A client makes a request to be verified as
himself/herself the request is rejected - High rate client goat Koolwaaij
- Low rate client sheep
- False Acceptance
- An impostor makes a request to be verified as a
client the request is accepted - High rate client lamb
- Low rate client ram
- High rate impostor wolf
- Low rate impostor badger
20Applications
Framework
- Access control to databases / facilities
- Electronic commerce
- Remote access to computer networks
- Forensic
- Telephone banking James
21Preemphasis-Frame Blocking
Preprocessing
- Preemphasis Low order digital system to
- spectrally flatten the signal (in favour of vocal
tract parameters) - make it less susceptible to later finite
precision effects - usually (order1)
- Frame blocking (short-term(st) processing)
- L successive overlapping (by M samples) frames
- window size - length N samples N/ sec
- frame rate-shift-period M samples M/ sec
22Frame Windowing
Preprocessing
- Used to minimize the singal discontinuities at
the beg. end of each frame - Time (long window)lt-gtfreq. (short) resolution
- Window type
- Corrections
Picone
23Speech Activity Detection
Preprocessing
- Silence-speech detection
- Voiced-unvoiced discrimination
- Endpoint detection Deller_bk
- Can be applied afterward
24Signal Measures Graphs
Preprocessing
Zerocrossing rate
Speech waveform
Time-frequency plot (Spectrogram)
Energy plot
Weingessel
25Features - General
Feature Extraction
- Maps each speech interval-frame to a
multidimensional feature space - Order number of coefficients in each
feature vector (dimensionality) - Several kinds of coefficients have been proposed
26Linear Prediction (LP)
Feature Extraction
- Speech sample as a linear combination of
previous samples (autoregressive mdl) - LP coefficients (LPC)
- normalized excitation source
- G scale factor
- stLPC of frame l
27Linear Prediction (LP)(2)
Feature Extraction
- Calculation of stLPC
- Mean squared error minimization
- Autocorrelation method
- Levinson-Durbin (L-D) recursion
- Covariance method
- Cholesky (LU) decomposition
L-D recursion (l is implied, R
autocorrelation matrix)
Picone2
28Linear Prediction (LP)(3)
Feature Extraction
- LPC
- highly correlated
- not orthonormal
- Distance Itakura-Saito
- Computationally expensive
- LPC processor Rabiner_bk
29Cepstrum (Complex-Real)
Feature Extraction
- Special case of homomorphic signal proc.
- Focuses on voiced segments
- Short-term complex cepstrum (stCC)
- Short-term real cepstrum (stRC)
- Distance of cepstrum based coefficients
- Euclidean vectors defined in an orthonormal space
30Mel Cepstrum
Feature Extraction
Mel-cepstral feature generation (frame l)
- Mel
- unit of measure of perceived frequency of a tone
- non-linear correspondance to the physical freq.
(like the human ear) - mel freq. cepstral coefficients (MFCC)
- generalized case Vergin
Young
31LP derived Cepstrum
Feature Extraction
- LP Cepstral Coefficients (LPCC)
32Other cepstral variants
Feature Extraction
- Linear Freq. Cepstral Coefficients (LFCC)
- Like MFCC but
- filters are uniformally spaced on the Hz scale
- Mel-warped LPCC (MLPCC) Kuitert
- CC not directly derived from LPC
- 1st compute the log magn. spectrum of LPC
- then warp the freq. axis to correspond to the mel
axis
33Variants
Feature Extraction
- Discrete Wavelet Transform (DWT) instead of FFT
Krishnan - Application of other type than triangular filters
- Application of the logarithm before the
triangular filters
34Delta Cepstrum
Feature Extraction
- Milner
- Inclusion of temporal information
35PLP - Auditory Features
Feature Extraction
- Perceptual Linear Prediction (PLP) Hermansky
- Spectral scale non-linear Bark scale
- Spectral features smoothed within freq. bands
- Auditory Features Kumar
- Imitates signal proc. performed by the ear
- cochlear modeling
36Intra-frame cepstral proc.
Noise Compensation-Channel Equalization
Mammone
- Liftering-weighting
- low order coeffs sensitive to overall spectral
slope - high order sensitive to noise
- gttapered window (bandpass liftering)
- Adaptive Component Weighting (ACW)
- motivation all frames don't have same distortion
37Inter-frame cepstral proc.
Noise Compensation-Channel Equalization
- Cepstral Mean Subtraction (CMS)
- mean (over a num of frames) subtraction (tackles
training-testing discrepancy) - lowpass filtering
- eliminates communication channel spectral shaping
- Pole Filtered CMS (PFCMS) cepstrum poles
modification
38RASTA proc.
Noise Compensation-Channel Equalization
- Relative Spectral Filtering (RASTA) Hermansky
- bandpass filtering in the log-spectral domain
- suppresses spectral components that change more
slowly or quickly than in typical speech - RASTA-PLP
- Microphone (type, position) robustness
39Feature Selection Introduction
Feature Selection
- Goal
- find a transformation to a relatively
low-dimensional feature space that preserves the
information pertinent to the application while
enabling meaningful comparisons to be performed
using measures of similarity - Processing of features
- Principal Component Analysis (PCA) (or Karhunen
Loève Expansion-KLE) - seeks a lower dimensional representation that
accounts for variance of the features - not necessarily optimum for class discrimination
- Linear Discriminant Analysis (LDA) Jin
- Non LDA (NLDA) (using MLP) Konig
40Matching-Modeling Introduction
Matching-Modeling
- Modeling creation of (speaker) models
- Model Can be considered as the output of a
proper proc. of a speakers set of feature
vectors - Matching computation of a match score betw. the
input feature vectors some speaker model - Methods Wassner
- Template Matching
- deterministic
- score distance betw. a test speaker (feature
vectors of an) utterance a reference speaker
model - better score min distance
41Matching-Modeling Introduction(2)
Matching-Modeling
- Methods(2)
- Stochastic Approach
- probabilistic matching
- score prob. of generation of a speech utterance
by the claimed speaker - better score max probability
- Parametric speaker model specific pdf is assumed
its appropriate parameters (e.g. mean vector,
covariance matrix) can be estimated using the
Maximum Likelihood Estimation (MLE) e.g.
multivariate normal model
42Template Matching Methods
Matching-Modeling
- Dynamic Time Warping (DTW)
- dynamic comparison betw. a test a reference
(model) matrix (set of feature vectors) - computes a distance betw. the test ref.
patterns - allows time alignment at different costs
- uses Dynamic Programming (DP)
43Template Matching Methods(2)
Matching-Modeling
- Dynamic Time Warping (DTW)(2)
The DP grid with test (t) reference (r) feature
vectors at respective frame indices
Picone
44Template Matching Methods(3)
Matching-Modeling
- Dynamic Time Warping (DTW)(3)
- distances-costs on the DP grid (i,j frame
indices, k step index) - Node
- e.g.
- Transition e.g.
- Both
- e.g.
- Global
- K number of transitions
(Type 4)
45Template Matching Methods(4)
Matching-Modeling
- Dynamic Time Warping (DTW)(4)
- DTW search constraints
- Endpoint Constraints (bottom left(S) - top
right(E) corners) - endpoint relaxation max points allowed in each
direction - Monotonicity (going up right)
- Global Path Constraints (global movement area)
- permissible slope or
- permissible window
46Template Matching Methods(5)
Matching-Modeling
- Dynamic Time Warping (DTW)(5)
- DTW search constraints(2)
- Local Path Constraints
- (local movement area)
Sakoe Shiba local constraints on DTW path search
Picone
47Template Matching Methods(6)
Matching-Modeling
- Dynamic Time Warping (DTW)(6)
- The minimum cost final endpoint provides the
distance betw. a test a reference phrase - Training-Modeling Deller_bk
- Casual Unaltered feature strings form models
- Averaging feature strings of utterances
- The stochastic techniques possess superior
training methods
48Template Matching Methods(7)
Matching-Modeling
- Vector Quantization (VQ)
- Uses intra-vector dependencies to break-up a
vector space in cells (unsupervised) - follows Linde-Buzo-Gray (LBG) algorithm
- speaker model codebook
- codebook set of prototype vectors used to
represent vector spaces - goal data structure "discovery" by finding how
the data is clustered
49Template Matching Methods(8)
Matching-Modeling
- Learning Vector Quantization (LVQ)
- Predefined classes, labeled data
- defines the class borders according to the
nearest neighbor rule - supervised version of VQ
- set of variants (e.g. LVQ1,2,3)
- goal to determine a set of prototypes that best
represent each class.
50Statistical Measures
Matching-Modeling
- Second Order Statistical Measures (SOSM) Bimbot
- E.g. Arithmetic-Harmonic-Sphericity (AHS)
- speaker model covariance matrix of feature
vectors - Distancemin(0) iff all eigenvalues of test
ref covar matrices are equal
51Generative Models
Matching-Modeling
- Hidden Markov Models (HMMs)
- Statistical - stochastic
- Flexible
- Types
- Continuous Density (CD)
- Discrete
- SemiContinuous (SC) Falavigna
- Model prob. distributions e.g. mixtures of
Gaussians of the feature vectors of the speaker
52Generative Models(2)
Matching-Modeling
- Hidden Markov Models (HMMs)(2)
- Topologies
- Left-Right (LR) (self right connections)
attempts to catch the temporal structure of the
speech to link consecutive short-time
observations together - states/unit (e.g. phoneme)
- gaussian distributions(mixtures)/state
Kumar
Example of a left-right HMM
feature vectors
53Generative Models(3)
Matching-Modeling
- Hidden Markov Models (HMMs)(3)
- Topologies(2)
- Ergodic (fully
- connected)
- -AR HMMs the prob.
- distrib. associated
- with each state is
- estimated via an AR
- process Bourlard
Picone
Example of an ergodic HMM
54Generative Models(4)
Matching-Modeling
- Gaussian Mixture Models (GMMs)
- Single multi-Gaussian state HMM
- Uses a mixture of Gaussian densities to model the
distribution of the feature vectors of each
speaker - Local covariance info
55Neural Networks (NN)
Matching-Modeling
- Feed-Forward Neural Networks
- supervised learning
- each speaker has his own NN (each checked in turn
to find the best match) - classifier-matcher NN output
- positive/negative training (rivals)
56Neural Networks (NN)(2)
Matching-Modeling
- Feed-Forward NNs(2)
- Types Haykin_bk
- Multilayer Perceptron (MLP) trained usually with
the Back-Propagation (BP) algorithm - Error Correction Learning
- Global optimization
- Time Delay NNs (TDNN)
- Radial Basis Function (RBF) Networks Lo
- Memory-Based Learning
- Local optimization
- Support Vector Machines (SVM)
- Learning by examples
- Vapnik-Chervonenkis (VC) dimension framework for
the development of SVMs
57Neural Networks (NN)(3)
Matching-Modeling
- Self Organizing Maps (SOM)
- unsupervised learning
- method to form a topologically ordered codebook
- speaker model codebook
- density of prototype vectors approaches the pdf
of the input vectors during the training - nonlinear projection
- competitive (winner neuron) learning
58NNs Combined Methods
Matching-Modeling
- DTW-SOM
- associate an entire feature vector sequence,
instead of a single feature vector, as a model
with each SOM node (also DTW-LVQ) Somervuo - Recurrent NNs (RNN) Shrimpton
- (self-or not) feedback
- Combined methods Genoud
59Sub-band Proc. Introduction
Matching-Modeling
- Speech signal split into band-limited channels
(freq. ranges)
Block diagram of an LPCC-based sub-band
processing system
Finan
60Decision Approaches
Decision Making
- Template approach
- threshold setting based on inter-
intra-speaker scores/distances - comparison
- test scoreltthreshold?acceptance Fakotakis
- Statistical approach Bengio Bourlard
- speaker RV for identity c being claimed
- utterance represented by feat. vectors
- other speakers RV
61Decision Approaches(2)
Decision Making
- Statistical approach(2)
- Claim c is true if
- decision threshold usu. found assuming
Gaussian distributions for and - ?normalised likelihood - likelihood ratio
- using logs
- ?Log Likelihood Ratio (LLR)
62Decision Approaches(3)
Decision Making
- Statistical approach(3)
- speaker dependent model
- normalization factor
- cohort model group of selected
speakers who are more competitive with the model
of the claimed id - No well-established selection procedure
- world model all other speakers
- less computation storage needed
63Decision Approaches(4)
Decision Making
- Statistical approach extensions
- If
- sign(y) gives the decision
- Techniques
- Bayes Decision Rule (assumes prob.s perfectly
estimated) - Minimizes Half Total Error Rate(HTER)
- Linear Regression
- SVM Regression
64Threshold Setting
Decision Making
- speaker dependent
- P thresholds
- speaker independent
- 1 threshold
- leave one (client o) out
- PP thresholds
- a priori computed on training set (enrollment
data) Lindberg - a posteriori computed on test set (obtained
during actual use of the system)
65Hypothesis Testing
Decision Making
Valid impostor densities
Campbell
66Hypothesis Testing(2)
Decision Making
Probability terms definitions
Campbell
67Accuracy
Performance Evaluation
- Error s
- FAR (False Acceptance Rate)
- Prob. of false acceptance
- FRR (False Rejection Rate)
- Prob. of false rejection
- Values for FAR FRR are adjusted by changing the
threshold values ? FAR vs. ? FRR
68Accuracy(2)
Performance Evaluation
- Error s(2)
- EER (Equal Error Rate) operating point where FAR
FRR - Choice of 2 subsequent operating points to
approximate the EER value - MDE (Minimum Decision Error) operating point
where FRR 10FAR
69Accuracy(3)
Performance Evaluation
- Graphs
- Quantities
- speakers correctly/wrongly verified
ROC (Receiver Operating Characteristics)
curve Plot of different operating points (FRR
vs. FAR values). Called also DET (Detection
Error Tradeoff) plot
Gauvain
70Computational Complexity
Performance Evaluation
- CPU time
- Training
- Feature creation
- Modeling
- Threshold setting
- Testing (verification throughput)
- Feature creation
- Matching
- Memory-disk storage
- Speech database, Features, Models, Thresholds
71Parameters
Experimental Results
Text dependent Fixed vocab. Digits 0-9 in
French or Spanish ?V10 P37 (M2VTS
database) Discrete utterance speech
flow sessions(shots)/speaker5, the 5th is for
testing?S4 phrases/session1 (0-9
utterance) Phrase duration6sec
Proc. Freq.12KHz
Window type Hamming
Coefficients LPCC
Liftering-weighting
72Parameters(2)-EER
Experimental Results
Matching method DTW
Euclidean
Type 4
Local path constraint Sakoe Shiba (b)
Decision approach Template Threshold setting
leave one out P(client left out).P-1(rest
clients as claimants).S(shot left out for
claiming-testing)5328 client claims P(client
left out as impostor).P-1(claims of the
impostor as one of the rest clients).S(shot
left out for claiming)5328 impostor
claims EER(avg)?0.6569,1.5390 (FAR11.5390
gtFRR10.6569) EER(avg)EER(1234)EER(2134)EER
(3124)EER(4123)/4
73Parameters(3)-EER
Experimental Results
Shot 4 left out, shot 5 used for
testing P.P-11332 client 1332 impostor
claims EER(5123)2.7027
Difference
Coefficients MFCC
EER(avg)4.1817 EER(5123)5.4054
74References
- Bengio S. Bengio and J. Mariéthoz, Learning the
Decision Function for Speaker Verification, IDIAP
Research Report, 2001 - Bimbot F. Bimbot, I. Magrin-Chagnolleau and L.
Mathan, Second-Order Statistical Measures for
Text-Independent Speaker Identification, Speech
Communication, vol. 17, pp. 177-192, 1995 - Bourlard H. Bourlard and N. Morgan, Speaker
Verication A Quick Overview, IDIAP Research
Report, 1998 - Campbell J.P. Campbell Jr., Speaker
Recognition a Tutorial, Proc. of the IEEE, vol.
85, no. 9, pp. 1437-1462, 1997 - Deller_bk J.R. Deller, J.G. Proakis and J.H.
Hansen, Discrete-time Processing of Speech
Signals, Macmillan, New York, 1993 - Fakotakis N. Fakotakis, E. Dermatas, G.
Kokkinakis, Optimal Decision Threshold for
Speaker Verification, in Signal Processing III
Theories and Applications, editor I.T. Young et
al., pp. 585-587, Elsevier Science Publishers
B.V. (North Holland), 1986
75References(2)
- Falavigna D. Falavigna, Comparison Of Different
Hmm Based Methods For Speaker Verification
(citeseer) - Finan R.A. Finan, R.I. Damper and A.T. Sapeluk,
Improved Data Modeling for Text-Dependent Speaker
Recognition Using Sub-Band Processing (citeseer) - Gauvain J. Gauvain, L. Lamel and B. Prouts,
Experiments with Speaker Verification over the
Telephone, Eurospeech95, pp. 651-654, 1995 - Genoud D. Genoud, F. Bimbot, G. Gravier and G.
Chollet, Combining Methods to Improve Speaker
Verification Decision, Proc. of ICSLP'96, vol. 3,
pp. 1756-1759, 1996 - Haykin_bk S. Haykin, Neural Networks A
Comprehensive Foundation, Macmillan, New York,
1995 - Hermansky H. Hermansky and N. Morgan, Rasta
Processing of Speech, IEEE Trans. on Speech and
Audio Processing, vol. 2, no. 4, pp. 578-589, 1994
76References(3)
- Jain_bk A. Jain, R. Bolle and S. Pankanti,
editors, Biometrics Personal Identification in
Networked Society, Kluwer Academic Publishers,
Boston, MA, 1999 - James D. James, H. Hutter and F. Bimbot, CAVE
-- Speaker Verification in Banking and
Telecommunications (citeseer) - Jin Q. Jin and A. Waibel, Application of LDA to
Speaker Recognition (citeseer) - Konig Y. Konig, L. Heck, M. Weintraub and K.
Sonmez, Nonlinear Discriminant Feature Extraction
for Robust Text-Independent Speaker Recognition,
Proc. of RLA2C98 (Speaker Recognition and Its
Commercial and Forensic Applications), 1998 - Koolwaaij J.W. Koolwaaij and L. Boves, A New
Procedure for Classifying Speakers in Speaker
Verification Systems, Proc. of Eurospeech'97, pp.
2355-2358, 1997 - Krishnan M. Krishnan, C. Neophytou and G.
Prescott, Wavelet Transform Speech Recognition
using Vector Quantization, Dynamic Time Warping
and Artificial Neural Networks, 1994
77References(4)
- Kuitert M. Kuitert and L. Boves, Speaker
Verification with GSM Coded Telephone Speech,
Proc. of Eurospeech'97, vol. 2, pp. 975-978, 1997 - Kumar N. Kumar, Investigation of Silicon
Auditory Models and Generalization of Linear
Discriminant Analysis for Improved Speech
Recognition, PhD thesis, Johns Hopkins
University, 1997 - Lindberg J. Lindberg, J.W. Koolwaaij, H.-P.
Hutter, D. Genoud, M. Blomberg, F. Bimbot and
J.-B. Pierrot, Techniques for a priori Decision
Threshold Estimation in Speaker Verification,
Proc. of RLA2C98, 1998 - Lo T.F. Lo and M.W. Mak, A New Intra-Frame and
Inter-Frame Cepstral Processing Method for
Telephone-Based Speaker Verification, Int.
Workshop on Multimedia Data Storage, Retrieval,
Integration and Applications, pp. 116-122, 2000 - Mammone R.J. Mammone, X. Zhang and R.P.
Ramachandran, Robust Speaker Recognition, IEEE
Signal Proc. Magazine, vol. 13, no. 5, pp. 58-71,
Sep. 1996 - Milner B. Milner, Inclusion of Temporal
Information into Features for Speech Recognition,
Proc. of ICSLP96, pp. 256-259, 1996
78References(5)
- Morgan N. Morgan and B. Gold, Speech Analysis
and Synthesis Overview, Lecture, Univ. of
California Berkeley, 1999 - Nedic B. Nedic and H. Bourlard, Recent
Developments in Speaker Verification at IDIAP,
IDIAP Research Report, 2000 - Picone J. Picone, Fundamentals of Speech
Recognition A Short Course, Mississippi State
Univ., 1996 - Picone2 J. Picone, Signal Modeling Techniques
in Speech Recognition, Proc. of the IEEE, vol.
81, no. 9, pp. 1215-1247, 1993 - Rabiner_bk L. Rabiner and B.H. Juang,
Fundamentals of Speech Recognition,
Prentice-Hall, Englewood Cliffs, NJ, 1993 - Shrimpton D. Shrimpton and B.D. Watson,
Comparison of Recurrent Neural Network
Architectures for Speaker Verification. Proc. of
the Fourth Australian International Conference on
Speech Science and Technology, pp. 460-464, 1992
79References(6)
- Somervuo P. Somervuo, Speech Recognition using
Context Vectors and Multiple Feature Streams,
Helsinki University of Technology, Faculty of
Electrical Engineering, 1996 - Vergin R. Vergin, D. O'Shaughnessy and A.
Farhat, Generalized Mel Frequency Cepstral
Coefficients for Large-Vocabulary
Speaker-Independent Continuous-Speech
Recognition, IEEE Trans. on Speech and Audio
Processing, vol. 7, no. 5, pp. 525-532, 1999 - Wassner H. Wassner, G. Maitre and G. Chollet,
Speaker Verification a Review, Technical
Report, IDIAP, 1996 - Weingessel A. Weingessel, Speech Recognition
(citeseer) - Young S. Young, Large vocabulary continuous
speech recognition, IEEE Signal Proc. Magazine,
vol. 13, no. 5, pp. 45-57, 1996