A Baseline System for Speaker Recognition presentation

About This Presentation

Transcript and Presenter's Notes

Title: A Baseline System for Speaker Recognition

1
A Baseline System for Speaker Recognition

2
Outline

3
Introduction

A baseline system has been built and was used in
the NIST 2002 speaker recognition evaluation
GMM based system
Normalization using z-norm
Adaptation technique used to estimate speaker
model starting from world model

4
Baseline Speaker Recognition System

5
Baseline Speaker Recognition System

GMM modeling for both hypotheses speaker and non
speaker (world)
EM algorithm to train the world model
(Baum-Welch)
Initialization using LBG VQ
Speaker model adapted mean vectors from the
world model
Approximation of the unified adaptation
approach (Online Adaptation of HMMs to
Real-Life Conditions A Unified Framework, IEEE
Trans. on SAP Vol. 9, n 4, may 2001) IEEE Trans.
on SAP Vol. 9, n 4, may 2001)

6
Baseline Speaker Recognition System

Speaker Adaptation
World model Gaussian distributions grouped in a
binary tree
Speaker data driven determination of the Gaussian
classes
MLLR applied based on these classes only means
of Gaussian distributions are adapted
MAP applied to the leaves Gaussian distributions

7
Baseline Speaker Recognition System

Building the Gaussian tree bottom up
Grouping two by two the closest Gaussian
distributions
Distance between 2 Gaussian distributions is
equal to the loss in the likelihood of the
associated data if the two Gaussian are merged in
a unique Gaussian

8
Baseline Speaker Recognition System

After the E-step of the EM algorithm the weights
associated to the leaves of the tree are
propagated through the tree up to the root
Going from the root to the leaves, nodes are
selected whenever one of their two children has a
weight less than a threshold
This defines a partition that will be used in an
MLLR algorithm

9
Baseline Speaker Recognition System

MAP algorithm
Estimated Gaussian means parameters at the leaves
are smoothed using a fixed weight with the
parameters of the world Gaussian

10
Baseline Speaker Recognition System

Given a target speaker model ls, the world model
lw and a test utterance X, the score for this
utterance is computed as the log likelihood
ratio
s log p(X/ls) / p(X/lw)
This score should be normalized due to the fact
that the world model is not precise

11
Baseline Speaker Recognition System

Normalization using the z-norm
Few impostors utterances are used
A score is computed for every utterance
The different scores define a distribution per
target speaker
Target speakers distributions should be similar
for a decision using a unique threshold
Reduce and center the distribution
ns a s b

12
Baseline Speaker Recognition System

Based on the data from the 2001 evaluation a DET
curve can be plotted
Find the optimal decision threshold that minimize
the cost defined by NIST2002, i.e.
Cdet CmisPrmiss/targetPrtarget
CFalseAlarmPrFalseAlarm/NonTarget(1-Prtarget)

13
NIST 2002 evaluation

14
NIST 2002 evaluation

Target speaker model adapted from world model
For every iteration and after the E step
Threshold (cumulative probability 3.0) to
select tree nodes
MLLR used to update the Gaussian means
Approximated MAP to smooth the MLLR estimated
parameters linear combination between the MLLR
estimated mean (0.8) and the world (a priori)
mean (0.2)

15
NIST 2002 evaluation

16 male and 21 female speakers (NIST 2001) used
as impostors (8 test files from each)
The pseudo-impostors scores define a distribution
used to z-normalize the score for a given target
speaker
Global threshold estimated on NIST 2001 data in
order to minimize the cost

16
NIST 2002 evaluation

17
NIST 2002 evaluation

18
NIST 2002 evaluation
19
NIST 2002 evaluation
20
NIST 2002 evaluation
21
Conclusions and perspectives

Write a Comment

User Comments (0)

About PowerShow.com

A Baseline System for Speaker Recognition PowerPoint PPT Presentation