Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Description:

Title: MIRACLE A Multimodal Internet Music Search Engine Author: shilo Last modified by: RogerJang Created Date – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 15
Provided by: shilo
Category:

less

Transcript and Presenter's Notes

Title: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments


1
Robust Entropy-based Endpoint Detection for
Speech Recognition in Noisy Environments
  • ???
  • Jang_at_cs.nthu.edu.tw
  • http//www.cs.nthu.edu.tw/jang

2
Reference
  • Jialin Shen, Jeihweih Hung, Linshan Lee, Robust
    entropy-based endpoint detection for speech
    recognition in noisy environments, International
    Conference on Spoken Language Processing, Sydney,
    1998

3
Summary
  • Entropy-based algorithm for accurate and robust
    endpoint detection for speech recognition under
    noisy environments
  • Better than energy-based algorithms in both
    detection accuracy and recognition performance
  • Error reduction 16

4
Motivation
  • Energy-based endpoint detection becomes less
    reliable when dealing with non-stationary noise
    and sound artifacts such as lip smacks, heavy
    breathing and mouth clicks, etc.
  • Spectral entropy is effective in distinguishing
    the speech segments from the non-speech parts.

5
Spectral Entropy
  • PDF
  • Normalization
  • Spectral entropy

6
Properties of Entropy
  • N2
  • entropyPlot.m
  • N3

7
Entropy Weighting
  • A set of weighting factors can be applied
  • These weighting factors are statistically
    estimated from a large collection of speech
    signals.

8
Endpoint Detection
  • The sum of the spectral entropy values over a
    duration of frames (20 frames) is first evaluated
    and smoothed by a median filter
  • Some thresholds are used to detect the beginning
    and ending boundaries of the embedded speech
    segments
  • A short period of background noise is first taken
    as the reference for some initial boundary
    detection process.
  • Short speech segments (lt100ms) are rejected.

9
Experiment Settings
  • Speech database
  • Isolated digits in Mandarin Chinese produced by
    100 speakers (10 speakers for test, others for
    training)
  • Speech features 12-order MFCC and 12-order delta
    MFCC
  • Models
  • Continuous-density HMM
  • 6 states/digits, 3 mixture/state

10
Experiment Settings
  • Noise
  • NOISEX-92 noise-in-speech database
  • White noise, pink noise, volvo noise (car noise),
    F16 noise, machinegun noise
  • Sound artifacts
  • Breath noise, cough noise and mouse click noise.

11
Example
12
Experimental Results
13
Experimental Results
14
Something Not Clear
  • What is the sample rate? Bit resolution?
  • What is the frame size and overlap?
  • What is the order of the median filter?
  • How to use the short period of background
    noise?
  • What is the value for the thresholds of spectral
    entropy for determining boundaries?
  • What are the values for d1 and d2?
Write a Comment
User Comments (0)
About PowerShow.com