Probabilistic Inference of Speech Signals from Phaseless Spectrograms

About This Presentation

Title:

Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Description:

Techniques that modify the spectrogram need a reliable procedure to reconstruct ... Probabilistic model of time domain signal and spectrogram ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 22

Provided by: Kan6150

Category:

more less

Transcript and Presenter's Notes

Title: Probabilistic Inference of Speech Signals from Phaseless Spectrograms

1
Probabilistic Inference of Speech Signals from
Phaseless Spectrograms

Kannan Achan, Sam Roweis, Brendan Frey
University of Toronto

2
Time-Frequency Representation

Spectrogram most common representation of
speech
summary of short time spectral features
contains useful features of a speech signal.
energy distribution across various frequencies
over time

3
Short-time Spectral Analysis

mk magnitude of the Fourier transform of
samples in the kth window
Windows overlap - we assume an overlap on n/2
Use Hamming/Hanning windows to avoid boundary
effects.

F Fourier matrix and sk samples in the kth
frame
Spectrogram M
4
Applications and the Bottleneck

Time Scale Modification
Lengthen/shorten speech without altering
frequency content
Idea Sub sample columns of the spectrogram

?
2x faster
5
Applications and the Bottleneck

Speech de-noising
Spectral subtraction, Algonquin,
Idea De-noise the spectrogram

Noisy waveform
Noisy spectrogram
?
De-noised waveform
De-noised spectrogram
6
What is the Bottleneck?
Input speech waveform S
Techniques that modify the spectrogram need a
reliable procedure to reconstruct the underlying
time domain signal
Windowed FFT
Windowed IFFT
F
abs(.)
angle(.)
Spectrogram
Phase
Time scale modification, denoising,..
Corresponding Phase?
Modified Spectrogram
Output speech waveform S
7
Is the Problem Solvable?

Goal perform frequency to time transformation
Instead of using magnitude and phase ? signal
Use multiple magnitude constraints

Recall overlapping windows
Every si contributes to 2 columns in the
spectrogram
2 constraints for every si , both due to
magnitude - there is hope

8
Standard Method Griffin and Lim

Alternate until convergence
estimate the phase given the hypothesized time
domain signal and the observed spectrum
estimate the time domain signal given observed
spectrum and a hypothesized phase

Issues
Inconsistent estimates of phase and signal at any
iteration
convergence
poor perceptual quality

9
Probabilistic Inference of Time-domain Speech
Signals

Given a spectrogram M, goal is to infer the
underlying time domain signal S
Say, we have prior knowledge about the speaker
Idea Prior distribution P(S)
Rth order auto-regressive model

10
Probabilistic Inference of Time-domain Speech
Signals

Probability model for spectrogram M and speech s

Prior (AR model)
Time domain signal (to be inferred)
Likelihood function
m
m
m
Observed spectrogram
11
Probability Model
Prior P(s) Rth order auto-regressive model

each sample is predicted as a linear combination
of previous R samples (all poles filter).
AR parameters (ar) are known in advance.

Likelihood P(Ms) The probability of observing a
spectrogram M given a time domain signal s
12
Making the Signal Explicit in the Model
Simplify mk(s) by introducing the Fourier
transform matrix, F

Log probability is a quartic in the unknowns,
si
Derivative is a cubic in the unknowns

13
Inference - ICM

Iterative Conditional Modes
Iteratively select a variable and assign the MAP
estimate given observations and other variables
Guaranteed to increase P(S,M)
Issues
Updates single sample at any time
prone to poor local optima

14
Inference (joint optimization)

Directly search for max(log(P(S,M))
Jointly optimize s1, s2, . sN using conjugate
gradients involves computing,

Algorithm makes global changes to the waveform
Avoids inconsistent phase estimates (phase is
implicit in the formulation)
Guaranteed to reduce discrepancy between the
spectrogram of estimated waveform and the
observed spectrogram

15
Experiments

Setup
Hamming window of length 256
Overlap 128 samples
12th order auto regressive model as prior
Data Randomly chosen utterances from NIST/ WSJ
database.
Evaluation
Perceptual quality of sound in estimated signal.
Audio demonstrations http//www.psi.utoronto.ca/
kannan/spectrogram/
SNR analysis
Application Time scale modification

16
Results
Original signal
input
Griffin Lim
Our algorithm (CG)
17
Results in dB gain

Phase not implicit in the model
Use an approximation to SNR

Application Time scale modification (audio
demonstration)
18
Variational Inference

Current work Find the posterior distribution
P(SM)
Exact inference intractable!
Mean field inference approximate using a fully
factored distribution

Goal infer mean and variance of every time sample

Minimize KL divergence between Q(s) and P(S,M)

19
Mean Field Inference

G(?µ,?) accounts for uncertainty in S. Estimates
with high uncertainty dont influence other
estimates.
If we set ?0, the first (entropy) and third term
vanish this is equivalent to our earlier
formulation

20
Conclusion

Probabilistic model of time domain signal and
spectrogram
Takes advantage of using prior information, if
available
Joint optimization avoids poor local optimum

can easily be extended to
other types of prior information
deal with missing or noisy spectrogram frames

21
References

Griffin, D. W and Lim, J. S Signal estimation
from modified short time Fourier transform, IEEE
Trans. on Acoustics, Speech and Signal
Processing, 1984 32/2
Roucos, S. and A. M. Wilgus. High Quality
Time-Scale Modification for Speech. Proceedings
of the International Conference on Acoustics,
Speech, and Signal Processing, IEEE, 1985,
493-496.
Green, P, Barker J, Cooke M, Josifovski L.
Handling Missing and Unreliable Information in
Speech Recognition, AISTATS 8, 2001
Frey B.J, Kristjansson T, Deng L, Acero A,
Learning dynamic noise models from noisy speech
for robust speech recognition, Advances in Neural
Information Processing (NIPS) 2001

Write a Comment

User Comments (0)

About PowerShow.com

Probabilistic Inference of Speech Signals from Phaseless Spectrograms - PowerPoint PPT Presentation

Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Techniques that modify the spectrogram need a reliable procedure to reconstruct ... Probabilistic model of time domain signal and spectrogram ... – PowerPoint PPT presentation