Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

About This Presentation

Title:

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Description:

Title: MIRACLE A Multimodal Internet Music Search Engine Author: shilo Last modified by: RogerJang Created Date – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 15

Provided by: shilo

Category:

more less

Transcript and Presenter's Notes

Title: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

1
Robust Entropy-based Endpoint Detection for
Speech Recognition in Noisy Environments

???
Jang_at_cs.nthu.edu.tw
http//www.cs.nthu.edu.tw/jang

2
Reference

Jialin Shen, Jeihweih Hung, Linshan Lee, Robust
entropy-based endpoint detection for speech
recognition in noisy environments, International
Conference on Spoken Language Processing, Sydney,
1998

3
Summary

Entropy-based algorithm for accurate and robust
endpoint detection for speech recognition under
noisy environments
Better than energy-based algorithms in both
detection accuracy and recognition performance
Error reduction 16

4
Motivation

Energy-based endpoint detection becomes less
reliable when dealing with non-stationary noise
and sound artifacts such as lip smacks, heavy
breathing and mouth clicks, etc.
Spectral entropy is effective in distinguishing
the speech segments from the non-speech parts.

5
Spectral Entropy

PDF
Normalization
Spectral entropy

6
Properties of Entropy

N2
entropyPlot.m

7
Entropy Weighting

A set of weighting factors can be applied
These weighting factors are statistically
estimated from a large collection of speech
signals.

8
Endpoint Detection

The sum of the spectral entropy values over a
duration of frames (20 frames) is first evaluated
and smoothed by a median filter
Some thresholds are used to detect the beginning
and ending boundaries of the embedded speech
segments
A short period of background noise is first taken
as the reference for some initial boundary
detection process.
Short speech segments (lt100ms) are rejected.

9
Experiment Settings

Speech database
Isolated digits in Mandarin Chinese produced by
100 speakers (10 speakers for test, others for
training)
Speech features 12-order MFCC and 12-order delta
MFCC
Models
Continuous-density HMM
6 states/digits, 3 mixture/state

10
Experiment Settings

Noise
NOISEX-92 noise-in-speech database
White noise, pink noise, volvo noise (car noise),
F16 noise, machinegun noise
Sound artifacts
Breath noise, cough noise and mouse click noise.

11
Example
12
Experimental Results
13
Experimental Results
14
Something Not Clear

What is the sample rate? Bit resolution?
What is the frame size and overlap?
What is the order of the median filter?
How to use the short period of background
noise?
What is the value for the thresholds of spectral
entropy for determining boundaries?
What are the values for d1 and d2?

Write a Comment

User Comments (0)