Probabilistic Inference of Speech Signals from Phaseless Spectrograms - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Description:

Techniques that modify the spectrogram need a reliable procedure to reconstruct ... Probabilistic model of time domain signal and spectrogram ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 22
Provided by: Kan6150
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Inference of Speech Signals from Phaseless Spectrograms


1
Probabilistic Inference of Speech Signals from
Phaseless Spectrograms
  • Kannan Achan, Sam Roweis, Brendan Frey
  • University of Toronto

2
Time-Frequency Representation
  • Spectrogram most common representation of
    speech
  • summary of short time spectral features
  • contains useful features of a speech signal.
  • energy distribution across various frequencies
    over time

3
Short-time Spectral Analysis
  • mk magnitude of the Fourier transform of
    samples in the kth window
  • Windows overlap - we assume an overlap on n/2
  • Use Hamming/Hanning windows to avoid boundary
    effects.

F Fourier matrix and sk samples in the kth
frame
Spectrogram M
4
Applications and the Bottleneck
  • Time Scale Modification
  • Lengthen/shorten speech without altering
    frequency content
  • Idea Sub sample columns of the spectrogram

?
2x faster
5
Applications and the Bottleneck
  • Speech de-noising
  • Spectral subtraction, Algonquin,
  • Idea De-noise the spectrogram

Noisy waveform
Noisy spectrogram
?
De-noised waveform
De-noised spectrogram
6
What is the Bottleneck?
Input speech waveform S
Techniques that modify the spectrogram need a
reliable procedure to reconstruct the underlying
time domain signal
Windowed FFT
Windowed IFFT
F
abs(.)
angle(.)
Spectrogram
Phase
Time scale modification, denoising,..
Corresponding Phase?
Modified Spectrogram
Output speech waveform S
7
Is the Problem Solvable?
  • Goal perform frequency to time transformation
  • Instead of using magnitude and phase ? signal
  • Use multiple magnitude constraints
  • Recall overlapping windows
  • Every si contributes to 2 columns in the
    spectrogram
  • 2 constraints for every si , both due to
    magnitude - there is hope

8
Standard Method Griffin and Lim
  • Alternate until convergence
  • estimate the phase given the hypothesized time
    domain signal and the observed spectrum
  • estimate the time domain signal given observed
    spectrum and a hypothesized phase
  • Issues
  • Inconsistent estimates of phase and signal at any
    iteration
  • convergence
  • poor perceptual quality

9
Probabilistic Inference of Time-domain Speech
Signals
  • Given a spectrogram M, goal is to infer the
    underlying time domain signal S
  • Say, we have prior knowledge about the speaker
  • Idea Prior distribution P(S)
  • Rth order auto-regressive model

10
Probabilistic Inference of Time-domain Speech
Signals
  • Probability model for spectrogram M and speech s

Prior (AR model)
Time domain signal (to be inferred)
Likelihood function
m
m
m
Observed spectrogram
11
Probability Model
Prior P(s) Rth order auto-regressive model
  • each sample is predicted as a linear combination
    of previous R samples (all poles filter).
  • AR parameters (ar) are known in advance.

Likelihood P(Ms) The probability of observing a
spectrogram M given a time domain signal s
12
Making the Signal Explicit in the Model
Simplify mk(s) by introducing the Fourier
transform matrix, F
  • Log probability is a quartic in the unknowns,
    si
  • Derivative is a cubic in the unknowns

13
Inference - ICM
  • Iterative Conditional Modes
  • Iteratively select a variable and assign the MAP
    estimate given observations and other variables
  • Guaranteed to increase P(S,M)
  • Issues
  • Updates single sample at any time
  • prone to poor local optima

14
Inference (joint optimization)
  • Directly search for max(log(P(S,M))
  • Jointly optimize s1, s2, . sN using conjugate
    gradients involves computing,
  • Algorithm makes global changes to the waveform
  • Avoids inconsistent phase estimates (phase is
    implicit in the formulation)
  • Guaranteed to reduce discrepancy between the
    spectrogram of estimated waveform and the
    observed spectrogram

15
Experiments
  • Setup
  • Hamming window of length 256
  • Overlap 128 samples
  • 12th order auto regressive model as prior
  • Data Randomly chosen utterances from NIST/ WSJ
    database.
  • Evaluation
  • Perceptual quality of sound in estimated signal.
    Audio demonstrations http//www.psi.utoronto.ca/
    kannan/spectrogram/
  • SNR analysis
  • Application Time scale modification

16
Results
Original signal
input
Griffin Lim
Our algorithm (CG)
17
Results in dB gain
  • Phase not implicit in the model
  • Use an approximation to SNR

Application Time scale modification (audio
demonstration)
18
Variational Inference
  • Current work Find the posterior distribution
    P(SM)
  • Exact inference intractable!
  • Mean field inference approximate using a fully
    factored distribution
  • Goal infer mean and variance of every time sample
  • Minimize KL divergence between Q(s) and P(S,M)

19
Mean Field Inference
  • G(?ยต,?) accounts for uncertainty in S. Estimates
    with high uncertainty dont influence other
    estimates.
  • If we set ?0, the first (entropy) and third term
    vanish this is equivalent to our earlier
    formulation

20
Conclusion
  • Probabilistic model of time domain signal and
    spectrogram
  • Takes advantage of using prior information, if
    available
  • Joint optimization avoids poor local optimum
  • can easily be extended to
  • other types of prior information
  • deal with missing or noisy spectrogram frames

21
References
  • Griffin, D. W and Lim, J. S Signal estimation
    from modified short time Fourier transform, IEEE
    Trans. on Acoustics, Speech and Signal
    Processing, 1984 32/2
  • Roucos, S. and A. M. Wilgus. High Quality
    Time-Scale Modification for Speech. Proceedings
    of the International Conference on Acoustics,
    Speech, and Signal Processing, IEEE, 1985,
    493-496.
  • Green, P, Barker J, Cooke M, Josifovski L.
    Handling Missing and Unreliable Information in
    Speech Recognition, AISTATS 8, 2001
  • Frey B.J, Kristjansson T, Deng L, Acero A,
    Learning dynamic noise models from noisy speech
    for robust speech recognition, Advances in Neural
    Information Processing (NIPS) 2001
Write a Comment
User Comments (0)
About PowerShow.com