SPEAKER RECOGNITION presentation

About This Presentation

Transcript and Presenter's Notes

Title: SPEAKER RECOGNITION

1
SPEAKER RECOGNITION

A PRESENTATION
BY
SHAMALEE DESHPANDE

2
INTRODUCTION

Speaker Recognition
Automatically recognizing
speaker
Uses individual information from the
speakers speech waves

3
INTRODUCTION

Two Approaches
Text-Dependant Recognition
Text-Independent Recognition

4
INTRODUCTION

Two Approaches
Text-Dependant Recognition
Use of keywords or sentences having the same
text for the templates and the recognition
Text-Independent Recognition

5
INTRODUCTION

Two Approaches
Text-Dependant Recognition
Text-Independent Recognition
Does not rely on a specific text being spoken.

6
INTRODUCTION

Classes of Sound
Voiced, unvoiced, Plosive
Production of Pitch Frequency
and Formants

Glottal Waveform
7
BLOCK DIAGRAM OF A SPEAKER RECOGNITION
SYSTEM
8
DESIRABLE ATTRIBUTES OF A SPEAKER RECOGNITION SYS

Feature should occur naturally and frequently in
speech
Easily measurable
Doesnt change over time or be affected by
speakers health
Isnt affected by background noise
Not be subject to mimicry

9
SOURCES OF VARIABILITY IN SPEECH

Phonetic Identity
Two samples may correspond to different phonetic
segments. E.g. Vowel and fricative
Pitch
Pitch, other features like breathiness and
amplitude can be varied
Speaker
Differences due to source physiology, emotions
Microphone
Environment

Possible Acoustic Parameters
Formant Frequencies
LPC
Pitch
Nasal Co articulation
Gain

11
COMMON SPEAKER RECOGNITION TECHNIQUES

DISCRETE FOURIER TRANSFORM
LINEAR PREDICTIVE CODING
CEPSTRAL ANALYSIS
DYNAMIC TIME WARPING
HIDDEN MARKOV MODELS

12
DISCRETE / FAST FOURIER TRANSFORM

Changes time domain signals into freq domain
signal representations
Enables reduced complexity for processor

Read N speech samples from input
Append N-L zeroes to the input data
Calculation of DFT
Windowing
13
LINEAR PREDICTIVE CODING

TUBE Vocal tract
BUZZER Glottal excitation
Characterized by intensity and pitch
Characterized by formants
LPC model of the speech producing organs of the
body
14
CEPSTRAL ANALYSIS

Dis-adv of DFT/FFT is that formant freqs may
shift the pitch or overlap it
In Cepstral analysis, formants are completely
removed from the spectrum
Defined as Fourier Transform of the Log of the
power spectrum
S(n) p(n) v(n)
X(n) w(n) s(n)
S(w) p(w) v(w) Fourier Transform
Log S(w)log p(w) log v(w)
C(q) log S(q) log p(q) log v(q)
Q quefrency , C(q) complex cepstrum

15
CEPSTRAL ANALYSIS
Window
DFT
LOG
IDFT
Speech
Cepstrum
16
DYNAMIC TIME WARPING

Incoming speech is usually compared frame by
frame with stored template
Achieved via a pair wise comparison of feature
vectors from each sequence
Dis Adv variation in length of corresponding
phonemes
DTW takes into account non linear relation
between lengths of the two signals
Used as a matching algorithm

Example DTW grid
17
HIDDEN MARKOV MODELS

Speech signal is identified during search process
rather than explicitly
Comprises of
Hidden Markov Chain representing temporal
variability
Observable process representing spectral
variability
Portrayed as stochastic pair (X,Y)
HMM is a Finite State Machine where a Probability
Density Function p(xs) is associated with each
state s

18
FUTURE RESEARCH

To extract and apply all levels and information
from the speech signal conveying speaker identity
Acoustic use spectral features conveying vocal
tract information
Prosodic - use features derived from pitch,
energy tracks
to classify information
Phonetic use phone sequences to characterize
speaker specific pronunciations
Idiolect use words to characterize user
specific word patterns
Linguistic use linguistic patterns to
characterize speaker specific
conversation style

19
APPLICATIONS

Access Control- physical facilities, computer
networks and websites
PC Login and Password Reset
Secured Transactions remote banking and online
credit card purchase authentication
Time Attendance - workplaces
Law Enforcement forensics, parole

Write a Comment

User Comments (0)

About PowerShow.com

SPEAKER RECOGNITION PowerPoint PPT Presentation