Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics - PowerPoint PPT Presentation


PPT – Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics PowerPoint presentation | free to download - id: 522769-NmViN


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics


Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 17
Provided by: wclEceUp
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile
Telephone Networks using Voice Biometrics
  • Authors
  • Anastasis Kounoudes, Anixi Antonakoudi,
  • Vasilis Kekatos

  • We propose a double-digit voice biometric system
    for secure access in
  • telephone services. The system combines
    text-dependent speaker
  • Authentication and also text validation.
  • Main System Characteristics
  • Feature Extraction based on Perceptual Linear
    Prediction (PLP) coefficients and Mel Frequency
    Cepstral Coefficients (MFCC)
  • Concatenated phoneme HMMs for both speech
    recognition and user authentication
  • Operates in a sound-prompted mode.
  • Speech recognition and speaker verification
    performance was evaluated against
  • The length of the training data,
  • The number of embedded re-estimations and
    Gaussian mixtures in training of the HMMs,
  • The use of world models and bootstrapping,
  • User-depended thresholds

A. Kounoudes
System Overview
  • User is voice-prompted for utterances to create
    speech samples.
  • A front-end feature extractor calculates the
    voice features.
  • Input speech is validated against the prompted
  • Successful validation leads to verification.
  • During the verification phase, the system
    verifies that the captured speech matches the
    models of the enrolled user.
  • - the accumulated log likelihood probability
    of the input speech frames against the registered
    users model is compared with a threshold to
    decide whether to accept or reject the speaker.
  • The system accepts or rejects the speaker.
  • The enrolment procedure, is used from the system
    to create HMM
  • speaker-specific phoneme models for each user.

A. Kounoudes
System Architecture
A. Kounoudes
Data Collection
  • In-house database
  • Comprises of data that were collected over a
    period of four months over the GSM and PSTN
  • Contains speech samples from 23 speakers, which
    are categorized for enrolment and verification
  • Replica of the YOHO corpus recorded over the PSTN
    network (using an analogue modem).
  • YOHO-GSM Database
  • Replica of the YOHO corpus recorded over the GSM
    network (using an analogue modem).
  • The YOHO database were used for initial training
    of the HMMs

A. Kounoudes
Text Validation (Speech Recognition)
  • Aim evaluate the performance of the text
    validation over the two telephone channels.
  • The text validation performance is evaluated
  • Number of embedded re-estimations used in
  • The utilisation of bootstrapping in training,
  • The number of Gaussian mixtures of the HMM
  • The incorporation of PLP and MFCC coefficients.

A. Kounoudes
Embedded Re-estimations
  • Evaluation against Number of embedded
    re-estimations of the Baum-Welch Algorithm on DD
    recognition performance.
  • Models used
  • 12 MFCC Normalized Energy Delta Delta-Delta
  • Continuous density single Gaussian mono-phone
    HMMs (18)
  • 3 left-to-right states
  • Results
  • 4 embedded re-estimations suffice.
  • Asymptotically converges to maximum performance
    for the specific 1 GM system.

A. Antonakoudi
Gaussian Mixtures
  • Evaluation against number of Gaussian Mixtures
    (GM) per HMM state, while keeping the number of
    embedded re-estimations at 4.
  • Results
  • Recognition performance increases with the number
    of GMs.
  • The computational complexity is exponentially
    increasing with the number of GMs.
  • The increase in performance from 4 to 8 GMs does
    not compensate for the computational complexity
    which almost doubles.

A. Kounoudes
Use of YOHO-trained HMMs
  • Evaluation
  • If pre-trained HMM prototype can result in a
    better performance.
  • Whether additional training will adapt the models
    to the Greek accent and pronunciation of the
    speakers in the in-house database.
  • Experiment Setup
  • YOHO-PSTN trained HMM models for bootstrapping
  • Additional training using the enrolment files of
    the In-house database
  • Testing using the verification files of the
  • Results
  • Recognition performance increased by 2-4.

A. Kounoudes
PLP and MFCC coefficients
  • Experiment Setup
  • YOHO-GSM and YOHO-PSTN databases to bootstrap
    additional HMM training on the 80 of the
    In-house database.
  • The remaining 20 of the database was used for
  • Results
  • PLP coefficients outperform MFCC (2-3 increase
    in performance).
  • Cepstral Mean Subtraction (CMS) improves
    performance by approximately 2.
  • 8 GM DD recognizer with PLPCMS (10 embedded
    re-estimations) results in a 98.4 sentence
    recognition performance.

A. Kounoudes
Speaker Verification
  • Evaluation of the speaker verification
    performance of the system against various
  • The use of MFCC and PLP Coefficients.
  • The Number of the Utterances used for training
    speaker-specific HMM models.
  • The selection of the Speaker Authentication
    Decision Threshold.
  • The Normalization of HMM scores through the use
    of a World Model.

A. Kounoudes
MFCC and PLP coefficients
  • Experiment Setup
  • Single GM HMMs were trained for each speaker
    using the five enrolment sessions (each session
    contains 10 DD utterances) of the In-house
  • Each speaker is authenticated against all 23 HMM
    speaker models using his/her 150 DD
    authentication utterances.

X axis Impostor speakers attacking each model Y
axis Speaker dependent HMM models Z axis
Averaged HMM scores. Horizontal plane
threshold for which FARFRR. Main diagonal
represents speaker identification for the 23 sets
of the In-house database speakers.
A. Kounoudes
SA using PSTN and GSM Enrolment Data
Using 30 authentication sessions from each
speaker, tests were performed to evaluate the
speaker authentication performance against False
Acceptance Rate (FAR), False Rejection Rate (FRR)
and Equal Error rate (EER).
  • CMS can improve speaker authentication
    performance when applied either on MFCC or PLP
    feature sets.
  • The use of PLP coefficients was found to improve
    the speaker verification performance by 1-4 when
    compared to the MFCCs.

A. Kounoudes
HMM SA Decision Threshold
  • Applying the threshold which corresponds to
    FARFRR, the individual FAR and FRR for each
    speaker can be estimated.
  • Observation FAR is not equal to FRR for each
    speaker and at some cases the deviation is
    considerably high.
  • Repetition of tests using PLPs and CMS but
    calculating the EER as the mean of the individual
    EER of each speaker showed that
  • - The EER is significantly dropped from 3.52
    to 1.14.
  • - The decision threshold (estimated as the
    average individual threshold) was found to
    produce a much better EER when compared to the
    one estimated by averaging the utterance scores.

A. Kounoudes
Normalization using World Model
  • We investigated whether the use of a World model
    for the normalization of the HMM scores of each
    individual improves the overall SA performance.
  • The world model relies on the development of a
    universal speaker model from a pool of speech
    utterances produced by various speakers.
  • Present evaluations were based on pre-training a
    world model using all the speakers in the
    enrollment data of the In-house and the two YOHO
    databases evaluating speaker authentication using
    the verification part of the In-house database.
  • Test showed that
  • The EER calculated over all individual EER for
    each speaker using a world model was 0.094,
    while the EER calculated performing identical
    tests without using a world model was 1.14.
  • The use of the world model to normalize
    verification scores can significantly improve
    speaker authentication performance.

A. Kounoudes
  • Voice Biometric System
  • Text-dependent concatenate phoneme HMM-based
    speaker verifier
  • Concatenate phoneme HMM-based speech recognizer
  • Sound-prompted operation over the PSTN and GSM
  • Evaluation using a custom In-house database 2
    versions of YOHO.
  • Text Validation Evaluation
  • 4 GMs is a good tradeoff for accuracy and
  • DD speech recognition performance converges
    asymptotically after 4 embedded re-estimations of
    the Baum-Welch algorithm.
  • Bootstrapping initial HMM training results in an
    improvement of performance.
  • CMS improves double-digit recognition performance
    by approximately 2.
  • PLP coefficients outperform MFCCs when speech is
    recorded via different channels.
  • Speaker Verification Evaluation
  • CMS increases HMM speaker authentication
    performance (MFCC and PLP).
  • PLP produce approximately 2 better performance
    compared to MFCCs.
  • Speaker dependent thresholds and the use of a
    world model further improve speaker verification
    performance resulting in EER0.094.

A. Kounoudes