Speaker Discrimination: The Challenge of Conversational Data presentation

About This Presentation

Transcript and Presenter's Notes

Title: Speaker Discrimination: The Challenge of Conversational Data

1
Speaker DiscriminationThe Challenge of
Conversational Data
Dissertation Committee Advisor Robert
Yantorno, Ph.D Members Dennis Silage,
Ph.D. Brian Butz, Ph.D. Iyad Obeid, Ph.D. Eugene
Kwatny, Ph.d
Uchechukwu O. Ofoegbu
2
Presentation Outline

Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary

Dissertation Committee Advisor Robert
Yantorno, Ph.D Members Dennis Silage,
Ph.D. Brian Butz, Ph.D. Iyad Obeid, Ph.D. Eugene
Kwatny, Ph.d
3

Problem Statement and Research Goal

4
Conventional Speaker Recognition

Speaker Identification
Who is this speaker?
Speaker Verification
Is he who he claims to be?

System Output
5
Conversation Segmentation

Broadcast News/Conference Data
Conversational Data

6
Problems with Conversational Data

No a priori information available from
participating speakers.
Training is impossible
No a priori knowledge of change points
Speakers alternate very rapidly.
Limited amounts of data for single speaker
representations
Distortion
Channel noise, co-channel data

7
Proposed Solutions

Selective creation of data models
Development of an optimal distance measure
Decision level fusion of distance measures
Development of application-specific system

Scope of Research

9
Criminal Activity Detection

Monitoring inmate conversations
Prevention of 3-way calls
Notification of suspicious contacts
Enhancement of keyword detection
Uncooperative data collection
Forensics
Voiceprints

10
Commercial Services

Automated Customer Services
Personalized contact with customers
Search/Retrieval of Audio Data

11
Homeland Security

Military Activities
Pilot-control tower communications
Detection of unidentified speakers on pilot radio
channels
Terrorist Identification

Distance Analysis

13
Distance Measures

Univariate vs. Multivariate Analysis

14
Distance Measures

Notations
Random variables being compared
X X1, X2, , Xp nx by p matrix
Y Y1, Y2, , Yp ny by p matrix
Properties
Q(X, Y) 0,
Q(X, Y) 0 iff X Y,
Q(X, Y) Q(Y, X),
Q(X, Y) Q(X, Z) Q(Z,Y)

15
Distance Measures

Mahalanobis Distance
QMAHANALOBIS(X,Y) (µx µy)T S-1 (µx µy)
S combined covariance matrix of X and Y
Hotellings T-Square Statistics
Cik ith row and kth column of the inverse of C

16
Distance Measures

Kullback-Leibler (KL) Distance
Bhattacharya Distance

17
Distance Measures

Levenes Test
Derived from T-Square statistics as follows
Each set of points is transformed along each
vector into absolute divergence from the mean
vector
The T-Square Statistic is then applied on the
transformed features.

18
Procedural Set-up

HTIMIT database used
Average Utterance Length 5 seconds
Intra-speaker distance computations

Randomly Select 2 Utterances
19
Procedural Set-up

Inter-speaker, different utterances distance
computations

Randomly Select Utterance
Randomly Select Utterance
20
Analysis of Distance Measures

Mahalanobis Distance Gaussian Estimate

21
Analysis of Distance Measures

Levenes Test Gamma Estimate

Feature Analysis

23
Cepstral Analysis
Frequency Analysis of Speech
Excitation Component
Vocal Tract Component
STFT of Speech
Slowly varying formants
Fast varying harmonics

X
Log of STFT
Log of Excitation
Log of Vocal Tract Component

IDFT of Log of STFT
Excitation
Vocal tract

24
Cepstral Features

Linear Predictive Cepstral Coefficients
Obtained Recursively from LPC Coefficients
Mel-Scale Frequency Cepstral Coefficients
Nonlinear warping of frequency axis to model the
human auditory system

25
Cepstral Features

Delta Cepstral Coefficients
First and Second derivatives of cepstral
coefficients
Reflects dynamic information
Used as supplement to original cepstral features

26
Analysis of Cepstral Features

Mahalanobis Distance

27
Analysis of Cepstral Features

Levenes Test

28
Feature Combination

Proposed Investigation
Whats the best feature combination?
Will the delta and delta-delta coefficients
contribute to the speaker differentiating ability
of the features.

29
Feature Combination Analysis

T-test Based Evaluation
Why?
Robust to the Gaussian distribution especially
for amounts of data sizes and when the two
samples to be compared have approximately equal
values.
Unaffected by differences in the variances of the
compared variables

Data Analysis

31
Traditional Speaker Modeling

Examples
Gaussian Mixture Models
Hidden Markov Models
Neural Networks
Prosody-Based Models
Disadvantages
Require large amounts
Sometimes require training procedure
Relatively complex

32
Conversational Data Modeling

Current Method
Equal Segmentation of Data
Indiscriminate use of data
Poor performance
Problems
Change points unknown
Not all speech is useful

33
Proposed Speaker Modeling
SEGMENT 1
SEGMENT M
FEATURE COMPUTATION
FEATURE COMPUTATION
. . .
MODEL 1
MODEL M
34
Proposed Speaker Modeling

Why voiced only
Same speech class compared
Contains the most information
Whats the appropriate number of phonemes
Large enough to sufficiently represent speakers
Small enough to avoid speaker overlap

35
Modeling Analysis
N 20 4 seconds of voiced
speech
36
Modeling Analysis
37
Modeling Analysis
N 5 1 second of voiced
speech
38

Applications Systems

39
Unsupervised Speaker Indexing

The Restrained-Relative Minimum Distance (RRMD)
Approach

REFERENCE MODELS
0 D1,2 D1,3 D2,1 0 D2,3
D3,1 D3,2 0
0 D1,2 D1,3 D2,1 0 D2,3
D3,1 D3,2 0
40
Unsupervised Speaker Indexing

The Restrained-Relative Minimum Distance (RRMD)
Approach

Observe distance
Reference 2
Reference 1
Unusable Data
Failed
Min. Distance
Relative Distance Condition
Failed
Restraining Condition
Passed
Same Speaker?
Same Speaker
Passed
41
RRMD Approach

Restraining Condition
Distance Likelihood Ratio
DLR gt 1 ? Same Speaker
DLR lt 1 ? Check Relative
Distance Condition

42
RRMD Approach

Relative Distance Condition
Relative Distance
Drel dmax dmin
Drel gt threshold
? Same Speaker

dmin
dmax
43
Preliminary Results

Experiments
245 telephone conversations from the SWITCHBOARD
database, with an average length of 400 seconds.
T-Square statistics used
Ground truth obtained from Mississippi State
Transcriptions

44
Preliminary Results

Best N Estimation

N 5
45
Preliminary Results

RRMD Experiments
Drel Varied from 0-200
Two Errors Defined
Indexing Error
Ierr 100 Accuracy,
Undecided Error
Nu number of detected undecided/unusable
samples,
Nc number labeled as co-channel data
undecided error

46
Preliminary Results
47
Speaker Count System

The Residual Ratio Algorithm (RRA)
Process is repeated K-1 times for counting up to
K speakers

Too little data Removed, select Another model
DLR-based Model Comparison
DLR-based Model Comparison
. . .
48
RRA Examples 2 Speakers

49
RRA Examples 3 Speakers
50
Comparison
TWO-SPEAKER RESIDUAL
THREE-SPEAKER RESIDUAL
Residual Ratio after 2nd round of RRA
Residual Ratio after 2nd round of RRA
Speaker 2
51
Preliminary Results

Experiments
HTIMIT Database
1000 artificially generated K-speaker
conversations (each) for K1-4
Average conversation length 1min
Mahalanobis distance used

52
Preliminary Results

Counting Techniques
Stopped Residual Ratio (SRR)
Added Residual Ratio (ARR)
speaker count determined based on the sum of the
Residual Ratios for all K-1 rounds. The higher
the ARR higher speaker count

53
Preliminary Results
54
Preliminary Results
55

Fusion of Distances

56
Correlation Analysis
57
Correlation Analysis
58
Best Distance

Optimized Fusion of Distances
Minimize inter-speaker variation
Maximize intra-speaker variation
Maximize T-test value between inter-class
distance distributions

Tmax New Distance X vector consisting of the
distance measure values a vector of the
weights assigned to each distance measure
59
Best Distance

Distance Measure 2
Distance Measure 1
60
Preliminary Experiments

LPCCs
61
Preliminary Experiments

LPCCs
62
Preliminary Experiments

MFCCs
63
Preliminary Experiments

MFCCs
64

Proposal Summary

65
Research Goal Revisited

To overcome the following challenges faced in
between differentiating speakers participating in
conversations
No a priori information
Limited data size
No knowledge of change points
Co-channel speech

66
Summary of Work Accomplished

Practically demonstration of the existence of the
problem.
Analysis of distance measures and features
Development of a novel model formation technique
Development, implementation and evaluation of two
conversations-based speaker differentiation
systems
Introduction to and preliminary testing of an
optimal distance formation

67
Proposed Work

Features Combinations
Determination of the best combination of features
using univariate tests of similarity
Enhancement of feature combinations using
Principal Component Analysis.
Fusion of Distance measure
Enhancement of fusion technique using mutual
information suppression techniques
Decision-level distance measure fusion

68
Proposed Work

Further development of introduced systems
Use of all distance measures
Use of best feature combination
The use of the optimal distance
Implementation of decision-level fusion technique

69
Final Goal

A speaker recognition system for conversations
yields results which are comparable to
non-conversational systems.

70
Publications

U. Ofoegbu, A. Iyer, R. Yantorno, Detection of a
Third Speaker in Telephone Conversations, ICSLP,
INTERSPEECH 2006
U. Ofoegbu, A. Iyer, R. Yantorno and S. Wenndt,
Unsupervised Indexing of Noisy conversations
with Short Speaker Utterances, IEEE Aerospace
Conference. March, 2007
U. Ofoegbu, A. Iyer, R. Yantorno, A Simple
Approach to Unsupervised Speaker Indexing, IEEE
ISPACS. 2006.
U. Ofoegbu, A. Iyer, R. Yantorno, A Speaker
Count System for Telephone Conversations, IEEE
ISPACS. 2006.
A. Iyer, U. Ofoegbu, R. Yantorno, Speaker
Discriminative Distances Comprehensive Study,
IEEE Transactions on Speech and Audio Processing.
(Submitted).

71
Dissertation Committee Advisor Robert
Yantorno, Ph.D Members Dennis Silage,
Ph.D. Brian Butz, Ph.D. Iyad Obeid, Ph.D. Eugene
Kwatny, Ph.d
72
Cepstral Features

Linear Predictive Cepstral Coefficients
Obtained Recursively from LPC Coefficients

Let LPC vector a0 a1 a2 ap and LPCC
vector c0 c1 c2 cp c0 c1 c2 cn-1
73
Conversational Data Modeling

Current Method
Equal Segmentation of Data
Indiscriminate use of data
Problems
Change points unknown
Not all speech is useful

74
Best Distance

Intra-speaker and inter-speaker distance lengths
are always equal, therefore
P sum of the covariance matrices of the
two classes.
?1 maximum eigenvalue obtained by solving
the
generalized eigenvalue problem
Q is the square of the distance between the
mean vectors
of the two classes

Speaker Discrimination: The Challenge of Conversational Data PowerPoint PPT Presentation