Model Formation and Classification Techniques For Conversation-based Speaker Discrimination - PowerPoint PPT Presentation

1 / 79
About This Presentation
Title:

Model Formation and Classification Techniques For Conversation-based Speaker Discrimination

Description:

Advisor: Robert Yantorno, Ph.D Committee Members: Brian Butz, Ph.D. Dennis Silage, Ph.D. Iyad Obeid, Ph.D. Model Formation and Classification Techniques For ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 80
Provided by: Anant6
Category:

less

Transcript and Presenter's Notes

Title: Model Formation and Classification Techniques For Conversation-based Speaker Discrimination


1
Model Formation and Classification Techniques For
Conversation-based Speaker Discrimination
Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D.
Uchechukwu O. Ofoegbu
2
Acknowledgement
Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D.
My committee members, for your time and
commitment to my research
The Air Force Research Labs, for financially
supporting most of this research work
My family, for being there
Dr Y, the best advisor one could hope for
Members and Friends of the Speech Lab, for your
valuable contributions
ECE faculty and staff, for your great support
The audience, for being a part of this
3
Presentation Outline
  • Introduction
  • Challenges of Conversational Data
  • General Applications of Research
  • Novelty of Research
  • Introduction
  • Evaluation Databases
  • Modeling Speakers
  • Traditional Speaker Modeling
  • Proposed Method
  • Features Used
  • Distance Used
  • Introduction
  • Evaluation Databases
  • Modeling Speakers
  • Application Systems
  • Unsupervised Speaker Indexing
  • Speaker Count
  • Generalized Speaker Indexing
  • Introduction
  • Evaluation Databases
  • HTIMIT
  • SWITCHBOARD
  • New Conversations Database
  • Introduction
  • Evaluation Databases
  • Modeling Speakers
  • Application Systems
  • Fusion of Distance Measures
  • Optimized T Distance
  • Decision-Based Combination
  • Weighted Decision-Based Combination
  • Introduction
  • Evaluation Databases
  • Modeling Speakers
  • Application Systems
  • Fusion of Distance Measures
  • Summary
  • Introduction
  • Evaluation Databases
  • Modeling Speakers
  • Application Systems
  • Fusion of Distance Measures
  • Summary
  • Further Research

Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D.
4
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
  • Introduction

5
Challenges of Conversational Data
  • No a priori information available from
    participating speakers
  • Training is impossible
  • No a priori knowledge of change points
  • Speakers alternate very rapidly
  • Limited amounts of data for single speaker
    representations
  • Distortion
  • Channel noise, co-channel data

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
6
Proposed Solutions
  • Selective creation of data models
  • Distance-Based Model Comparison
  • Development of application-specific system

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
7
Novelty of this Research
  • Selective creation of data models
  • Distance-Based Model Comparison
  • Development of application-specific system

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
8
Applications
  • Monitoring criminal conversations
  • Forensics
  • Automated Customer Services
  • Storage/Search/Retrieval of Audio Data
  • Military Activities
  • Conference calls

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
9
Databases
  • Standard Speaker Discrimination Databases
  • HTMIT
  • Switchboard
  • Temple Conversations Database (TCD)

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
10
  • Modeling Speakers

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
11
Traditional Speaker Modeling
  • Examples
  • Gaussian Mixture Models
  • Hidden Markov Models
  • Neural Networks
  • Prosody-Based Models
  • Disadvantages
  • Require large amounts
  • Sometimes require training procedure
  • Relatively complex

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
12
Conversational Data Modeling
  • Current Method
  • Equal segmentation of data
  • Indiscriminate use of data
  • Problems
  • Change points unknown
  • Not all speech is useful
  • Poor performance

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
13
Proposed Speaker Modeling
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
SEGMENT 1
SEGMENT M
FEATURE COMPUTATION
FEATURE COMPUTATION
. . .
MODEL 1
MODEL M
14
Proposed Speaker Modeling
  • Why voiced only?
  • Same speech class compared
  • Contains the most information
  • Whats the appropriate number of phonemes?
  • Large enough to sufficiently represent speakers
  • Small enough to avoid speaker overlap

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
15
Features Considered
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
  • Linear Predictive Cepstral Coefficients
  • Model the vocal tract
  • Mel-Scale Frequency Cepstral Coefficients
  • Model the human auditory system

16
Distance Measurements
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
Different speaker distances
Same speaker distances
17
Distances Used
  • Mahalanobis Distance
  • Hotellings T-Square Statistics
  • Kullback-Leibler Distance
  • Bhattacharyya Distance
  • Levenes Test

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
18
Analysis of Cepstral Features
  • Mahalanobis Distance

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
19
Best Number of Phonemes?
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
Number of Phonemes
Features Used - LPCC
20
  • Application Systems

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
21
Unsupervised Speaker Indexing
  • The Restrained-Relative Minimum Distance (RRMD)
    Approach

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
REFERENCE MODELS
0 D1,2 D1,3 D2,1 0 D2,3
D3,1 D3,2 0
0 D1,2 D1,3 D2,1 0 D2,3
D3,1 D3,2 0
22
Unsupervised Speaker Indexing
  • The Restrained-Relative Minimum Distance (RRMD)
    Approach

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
Observe distance
Reference 2
Reference 1
Unusable Data
Failed
Min. Distance
Relative Distance Condition
Failed
Restraining Condition
Passed
Same Speaker?
Same Speaker
Passed
23
RRMD Approach
  • Restraining Condition
  • Distance Likelihood Ratio
  • DLR gt 1 ? Same Speaker
  • DLR lt 1 ? Check Relative
  • Distance Condition

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
24
RRMD Approach
  • Relative Distance Condition
  • Relative Distance
  • Drel dmax dmin
  • Drel gt threshold
  • ? Same Speaker

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
dmin
dmax
25
Experiments and Results
  • Experiments
  • HTIMIT used for obtaining likelihood ratio
    parameters
  • 1000 same speaker and 1000 different speaker
    utterances computed
  • 100 conversations from Switchboard database used
    for evaluation

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
26
Indexing Results - Mahalanobis
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
27
Indexing Results T-Square
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
28
Indexing Results - Bhattacharyya
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
29
Indexing Results - Summary

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
  • Mahalanobis distance yielded best results
  • LPCCs outperformed MFCCs

30
Speaker Count System
  • The Residual Ratio Algorithm (RRA)
  • Process is repeated K-1 times for counting up to
    K speakers

Too little data Removed, select Another model
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
DLR-based Model Comparison
DLR-based Model Comparison
. . .
31
Speaker Count
  • Added Residual Ratio
  • Is the sum of the residual ratios in all
    elimination stages
  • Should be higher for greater number of speakers

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
32
Experiments and Results
  • Experiments
  • 4000 conversations generated from HTIMIT
  • All 40 conversations from new database used

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
33
Speaker Count Results - HTIMIT
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
34
Speaker Count Results - HTIMIT
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
35
Speaker Count Results TCD
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
36
Speaker Count Results TCD
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
37
Cross Evaluation
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
HTIMIT LPCCs with the WDBC TCD MFCCs with the
T-Square
38
Speaker Counting-Indexing
  • The Residual Ratio speaker count algorithm is
    applied
  • Test models are associated with their matching
    reference models
  • Unmatched models are assigned to the references
    from which it has the minimum distance.

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
39
Speaker Counting /Indexing Results
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
Solid - HTMIT Patterned TCD
40
  • Fusion of Distance Measures

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
41
Correlation Analysis
Draftsmans Display - LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
42
Best Distance
  • Optimal Criteria for Fusion of Distances
  • Maximize inter-speaker variation
  • Minimize intra-speaker variation
  • Maximize T-test value between inter-class
    distance distributions

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
43
Decision Level Fusion
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
D1 gt match
D2 gt no match
Match ¾ No Match ¼ Final Decision Match
D3 gt match
D4 gt match
44
Weighted Decision Level Fusion
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
Ti T-value corresponding to each distance
45
  • Summary

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
46
Research Goal
  • To differentiate between speakers in a
    conversation
  • To determine the number of speakers present
  • To determine who is speaking when
  • To overcome the following challenges
  • No a priori information
  • Limited data size
  • No knowledge of change points
  • Co-channel speech

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
47
Summary of Accomplishments
  • Novel model formation technique
  • Three novel approaches for conversations-based
    speaker differentiation
  • Distance combination techniques to enhance
    performance

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
48
Observations

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
  • Mahalanobis Distance, LPCCs optimal for standard
    databases
  • T-Square Distance, MFCCs optimal for new database
  • Best fusion technique Weighted voting
    combination technique most efficient

49
Conclusion
  • Developed system yields about 6 EER whereas
    state of the art speaker indexing systems yield
    about 10 error rate.
  • Methods for discrimination between speakers
    (speaker count or indexing) in CONVERSATIONS with
    more than two speakers have been introduced.

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
50
  • Further Research

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
51
Further Research
  • Investigation of prosodic speaker discrimination
    features
  • Improving model formation technique by
    determining speaker change-points a priori
  • Exploring the use of individual phonemes to form
    models

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
52
Further Research, contd
  • Investigating the use of unvoiced speech,
    cautiously, in the formation of models
  • Speech enhancement techniques to handle distorted
    data
  • Implementation of other fusion techniques such as
    KL measure of divergence

Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
53
Publications
  • U. Ofoegbu, A. Iyer, R. Yantorno and S. Wenndt,
    Unsupervised Indexing of Noisy conversations
    with Short Speaker Utterances, IEEE Aerospace
    Conference. March, 2007
  •  U. Ofoegbu, A. Iyer, R. Yantorno, Detection of
    a Third Speaker in Telephone Conversations,
    ICSLP, INTERSPEECH 2006
  • U. Ofoegbu, A. Iyer, R. Yantorno, A Simple
    Approach to Unsupervised Speaker Indexing, IEEE
    ISPACS. 2006.
  • U. Ofoegbu, A. Iyer, R. Yantorno, A Speaker
    Count System for Telephone Conversations, IEEE
    ISPACS. 2006.

Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D.
54
ACKNOWLEDGEMENT To the greatest teacher in the
world, and the one who has made the most impact
in my life, Dr. Robert E. Yantorno. To my best
friend and the love of my life, Dr. Jude C.
Abanulo To Dr. Brian Butz, Dr. John Helferty, Dr.
Saroj Biswas and Dr. Henry Sendaula To my
dissertation committee members, Dr. Iyad Obeid
and Dr. Dennis Silage, and to Dr. Rena Krakow. To
my friend, Ananth Iyer To Abdoul Fall, Joe
Fitschgrund, Angela Linse and Ralph Oyini and to
the members of the Speech Processing Lab and the
faculty of the electrical engineering department
To engineering administrators, Tamika Butler,
Carol Dahlberg, Yvette Gibson and Cheryl Sharp,
and to Louise, day time janitress for the
engineering building To the Temple students who
volunteered as participants in the New
Conversations Database To Temple To the Air Force
Research Labs at Rome Financial supporters of
most of the research To my parents, Ugo Joseph
Ofoegbu my siblings, Amaka Humphrey Onyendi,
Nene, Obinna and Chibuzor Ofoegbu and my
grandmother, Cordelia Osuji To God Thank you.
Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D.
55
Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D. Brett Smolenski, Ph.D.
Extra Slides
56
Cepstral Analysis
Frequency Analysis of Speech
Excitation Component
Vocal Tract Component
STFT of Speech
Slowly varying formants
Fast varying harmonics

X
Log of STFT
Log of Excitation
Log of Vocal Tract Component


IDFT of Log of STFT
Excitation
Vocal tract


57
Cepstral Features
  • Linear Predictive Cepstral Coefficients
  • Obtained Recursively from LPC Coefficients

Let LPC vector a0 a1 a2 ap   and LPCC
vector c0 c1 c2 cp c0 c1 c2 cn-1     
58
Conversational Data Modeling
  • Current Method
  • Equal Segmentation of Data
  • Indiscriminate use of data
  • Problems
  • Change points unknown
  • Not all speech is useful

59
Best Distance

  • Intra-speaker and inter-speaker distance lengths
    are always equal, therefore
  • P sum of the covariance matrices of the
    two classes.
  • ?1 maximum eigenvalue obtained by solving
    the
  • generalized eigenvalue problem
  • Q is the square of the distance between the
    mean vectors
  • of the two classes

60
Best Distance


Distance Measure 2
Distance Measure 1
61
RRMD Approach
  • Relative Distance Condition

62
Modeling Analysis
N 20 4 seconds of voiced
speech
63
Modeling Analysis
N 5 1 second of voiced
speech
64
Distance Measures
  • Mahalanobis Distance
  • Measures the separation between the means of both
    classes
  • Hotellings T-Square Statistics
  • Measures the separation between the means of both
    classes and takes into consideration the data
    lengths
  • Kullback-Leibler Distance
  • Measures the separation between the distribution
    of both classes
  • Bhattacharyya Distance
  • Derived from measuring the classification error
    between both classes
  • Levenes Test
  • Measures absolute deviation from the center of
    the class distribution

65
Speaker Recognition
  • Speaker Identification
  • Who is this speaker?
  • Speaker Verification
  • Is he who he claims to be?

System Output
66
Speaker Segmentation
  • Broadcast News/Conference Data
  • Conversational Data

67
Procedural Set-up
  • Intra-speaker distance computations
  • 384-Speaker database used
  • Average Utterance Length 5 seconds

Inter-speaker distance computations
68
Best N Estimation
  • 245 conversations from SWITCHBOARD used
  • Results shown for T-Square distance

Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
N 5
69
RRA Examples 2 Speakers
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research

70
RRA Examples 3 Speakers
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
71
Comparison
TWO-SPEAKER RESIDUAL
THREE-SPEAKER RESIDUAL
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
Residual Ratio after 2nd round of RRA
Residual Ratio after 2nd round of RRA
Speaker 2
72
Effects of Fusion

LPCCs
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
73
Effects of Fusion

LPCCs
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
74
Effects of Fusion

MFCCs
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
75
Effects of Fusion

MFCCs
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
76
Best Feature Size
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
77
Best Feature Size
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
78
Correlation Analysis
Draftsmans Display - MFCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
79
Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D. Brett Smolenski, Ph.D.
Write a Comment
User Comments (0)
About PowerShow.com