A noiseestimation algorithm for highly nonstationary environments - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

A noiseestimation algorithm for highly nonstationary environments

Description:

Martin, R., 2001. ... Minimum statistics (MS) (Martin, 2001) ... Panel A Clean Speech Panel C Martins (2001) Panel E - Proposed method ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 30
Provided by: ShihH
Category:

less

Transcript and Presenter's Notes

Title: A noiseestimation algorithm for highly nonstationary environments


1
A noise-estimation algorithm for highly
non-stationary environments
  • Sundarrajan Rangachari, Philipos C. Loizou
  • Department of Electrical Engineering, University
    of Texas at Dallas, P.O. Box 830688, EC 33
    Richardson, TX 75083-0688, USA
  • Presenter Shih-Hsiang(??)

SPEECH COMMUNICATION Vol. 48(2), 2006
2
Reference
  • Doblinger, G., 1995. Computationally efficient
    speech enhancement by spectral minima tracking in
    subbands. Proc. Eurospeech 2, 15131516.
  • Hirsch, H., Ehrlicher, C., 1995. Noise estimation
    techniques for robust speech recognition. Proc.
    IEEE Internat. Conf. on Acoust. Speech Signal
    Process., 153156.
  • Martin, R., 2001. Noise power spectral density
    estimation based on optimal smoothing and minimum
    statistics. IEEE Trans. Speech Audio Process. 9
    (5), 504512.
  • Cohen, I., 2002. Noise estimation by minima
    controlled recursive averaging for robust speech
    enhancement. IEEE Signal Process. Lett. 9 (1),
    1215.
  • Hu, Y., Loizou, P., 2004. Speech enhancement
    based on wavelet thresholding the multitaper
    spectrum. IEEE Trans. Speech Audio Process. 12
    (1), 5967.

3
Introduction
  • In most speech-enhancement algorithms, it is make
    assumed that an estimate of the noise spectrum is
    available
  • It is critical for the performance of
    speech-enhancement algorithms
  • The noise estimate can have a major impact on the
    quality of the enhanced signal
  • If the noise estimate is too low, annoying
    residual noise will be audible
  • If the noise estimate is too high, speech will be
    distorted
  • The simplest approach is to estimate and update
    the noise spectrum during the silent segments of
    the signal
  • Using a voice activity detection (VAD) algorithm
  • It only work satisfactorily in stationary noise,
    not work well in more realistic environments
    (non-stationary noise)
  • Hence there is a need to update the noise
    spectrum continuously over time

4
Proposed noise-estimation algorithmsCompute
smooth speech power spectrum
Let the noisy speech signal in the time domain be
denoted as
Noisy speech
Clean speech
Additive noise
The smoothed power spectrum of noisy speech is
computed using the following first-order
recursive equation
Smoothing constant
Frame index
Frequency index
Smooth power spectrum
5
Proposed noise-estimation algorithmsTracking the
minimum of noisy speech
Local minimum of the noisy speech power spectrum
ß and ? are constants which are determined
experimentally The look ahead factor ß controls
the adaptation time of the local minimum
6
Proposed noise-estimation algorithmsTracking the
minimum of noisy speech
Plot of noisy speech power spectrum and local
minimum using (3) for a speech degraded by
babble noise at 5 dB SNR at frequency bin k5
7
Proposed noise-estimation algorithmsSpeech-presen
ce probability
Let the ratio of noisy speech power spectrum and
its local minimum be defined as
The power spectrum of noisy speech will be nearly
equal to its local minimum when speech is absent
The speech-presence probability, P(?,k), is
updated using the following first-order recursion
Smoothing constant
The above recursion implicitly exploits the
correlation for speech presence in adjacent
frames
8
Proposed noise-estimation algorithmsSpeech-presen
ce probability
Top panel Plot of estimated speech-presence
probability based on the ratio Sr(?,k) Bottom
panel spectrogram of the clean signal.
9
Proposed noise-estimation algorithmsComputing
frequency-dependent smoothing constants
Using the speech-presence probability estimate,
we compute the time-frequency dependent smoothing
factor as follows
constant
Note that as(? ,k) take values in the range of
ad as(? ,k) 1
Finally, the noise spectrum estimate is updated as
10
Proposed noise-estimation algorithms
Plot of true noise spectrum and the estimated
noise spectrum using our proposed method for a
speech degraded by babble noise at 5 dB SNR and
single frequency f 250 Hz.
11
Comparison with existing algorithmsMinimum
statistics (MS) (Martin, 2001)
  • Minimum statistics (MS) (Martin, 2001)

the power spectral densities of the noise signal
Equivalent degrees of freedom
12
Comparison with existing algorithmsMinimum
statistics (MS) (Martin, 2001)
Comparison between the noise spectrum estimated
using the proposed algorithm (thick line) and
Martins (Martin, 2001) (dashed line) algorithm
for a sentence corrupted by car noise (t lt 1.8
s) followed by a sentence corrupted by
multi-talker babble (t gt 1.8 s).
13
Comparison with existing algorithmsContinuous
minima tracking (Doblinger, 1995)
  • Continuous minima tracking (Doblinger, 1995)

Drawback the noise estimate increases whenever
the noisy speech power increases
14
Comparison with existing algorithmsContinuous
minima tracking (Doblinger, 1995)
Top panel Plot of true noise spectrum and
estimated noise spectrum using the proposed
method Bottom panel Plot of true noise spectrum
and estimated noise spectrum using (Doblinger,
1995) Arrows indicate regions where noise is
overestimated.
15
Comparison with existing algorithmsWeighted
average technique (Hirsch et al., 1995)
  • Weighted average technique (Hirsch and Ehrliche ,
    1995)

It fails when there is a sudden increase in
noise level. This will result in a situation
where the noisy speech spectrum will never be
smaller than the threshold, since the threshold
is based on the past noise estimates already very
low. Thus, the noise estimate will not be updated
if the noise power remains at that high level
spectral magnitude
l-th frame
i-th subband
estimate noise magnitude
16
Comparison with existing algorithmsWeighted
average technique (Hirsch et al., 1995)
Comparison of estimated noise spectrum (f 500
Hz) of proposed method (dashed line) with that
of Hirsch and Ehrlicher (1995) (solid line) for a
noisy speech of SNR 20 dB (t lt 1.8 s) followed
by a noisy speech of SNR 5 dB (t gt 1.8 s).
17
Comparison with existing algorithmsMinima
controlled Recursive Averaging (Cohen,2002)
  • Minima controlled Recursive Averaging (MRCA)
    (Cohen,2002)

Given two hypotheses
l-th frame
speech absence
speech presence
k-th subband
Let ?d(k,l)ED(k,l)2 denote the variance of
the noise in the k-th band
speech absence
speech presence
Smoothing constant
Let p(k,l)p(H1(k,l)Y(k,l)) denote the
conditional signal presence probability
where
18
Comparison with existing algorithmsMinima
controlled Recursive Averaging (Cohen,2002)
Let the local energy of the noisy speech be
obtained by smoothing the magnitude squared of
its STFT in time and frequency. In frequency,
use a window function b whose length is 2w1
In time, the smoothing is performed by a first
order recursive averaging, given by
Track the minimum of the local energy
Speech presence is determined by the ratio
between the local energy of the noisy speech and
its minimum within a specified time window
The conditional signal presence probability
calculated as follow
19
Comparison with existing algorithmsMinima
controlled Recursive Averaging (Cohen,2002)
  • The local minimum in (Cohen, 2002) was found by
    tracking the minimum of noisy speech over a
    search window spanning L frames, this has some
    drawbacks
  • The minimum is sensitive to outliers
  • The minima tracking may lag by as many as 2L
    frames
  • In this paper
  • The estimate of the noise spectrum in the
    proposed method is not influenced by the
    minimum-search window
  • the threshold used in our method for identifying
    speech presence /absence regions is frequency
    dependent while that of Cohen (2002) is fixed for
    all frequencies

20
Experimental
  • Combined with a Wiener-type speech-enhancement
    algorithm (Hu and Loizou, 2004)
  • Estimate the spectral gain function

where C(?,k) is the estimated clean speech
spectrum compute as follow
where v0.001 is a small positive number
µmax is the maximum allowable value of µ ,which
was set to 10 µ0(14 µmax)/5 s25/(µmax-1)
21
Experimental (cont.)
  • Obtain the enhanced spectrum
  • Other parameters
  • ad0.85, ap0.2, ß0.8, ?0.998, ?0.7

where X(?,k) is the enhanced spectrum
where LF and MF are the bins corresponding to 1
and 3 kHz, and Fs is the sampling frequency
22
Experimental ResultSubjective evaluation
  • Using formal listening tests
  • Single noise
  • Sentences were degraded by either multi-talker
    babble noise or factory noise
  • Triplet noise
  • Three different noise types (multi-talker
    babble, factory noise, and white noise) appear in
    proper order without any pauses in the middle
  • The listeners were asked to select from the pair
    of stimuli presented the sentence which was more
    natural, easier to listen and free of artifacts
  • A preference score of 100 would indicate that
    listeners preferred the proposed method over the
    other methods all the time

23
Experimental ResultSubjective evaluation
due to the fact that proposed noise-estimation
algorithm adapts quickly to the highly
non-stationary environments
24
Experimental ResultObjective evaluation
  • Mean squared error between the true noise
    spectrum and the estimated noise spectrum
  • Log-likelihood ratio (LLR) measure

estimated noise power spectrum
total frame number
true noise power spectrum
linear prediction coefficient vector of the
enhanced speech frame
The LLR is a spectral distance measure which
mainly models the mismatch between the formants
of the original and enhanced signals
autocorrelation matrix of the original (clean)
speech frame
linear prediction coefficient vector of the
original (clean) speech frame
25
Experimental ResultObjective evaluation
  • Segmental SNR

the set of frames that contain speech
26
Experimental ResultObjective evaluation (MSE)
The MSE results are not consistent with the
preference outcomes, in that lower MSE values did
not suggest better preference. This indicates
that the MSE measure might not be a reliable
measure for assessing performance of
noise-estimation algorithms. 1. this
measure is sensitive to outlier values 2.
it treats noise overestimation and noise
underestimation errors the same
27
Experimental ResultObjective evaluation (LLR and
SNR)
The segmental SNR values and the LLR values shown
in Table 3 were found to be more consistent with
the subjective evaluation results
28
Experimental Result
Panel A Clean Speech Panel C Martins
(2001) Panel E - Proposed method Panel B
Noisy Speech Panel D Cohen (2003)
29
Conclusions
  • The noise estimate was updated continuously in
    every frame using timefrequency smoothing
    factors calculated based on speech-presence
    probability in each frequency bin of the noisy
    speech spectrum
  • The speech-presence probability was estimated
    using the ratio of noisy speech power spectrum to
    its local minimum
  • The update of noise estimate was faster for very
    rapidly varying non-stationary noise environments
Write a Comment
User Comments (0)
About PowerShow.com