Reconstruction of Speech from Whispers - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Reconstruction of Speech from Whispers

Description:

Different formant locations and bandwidths ... Formant Shift. Alter Bandwidth. fM ... State dependent formant shifting. Develop training algorithms ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 20
Provided by: csip9
Category:

less

Transcript and Presenter's Notes

Title: Reconstruction of Speech from Whispers


1
Reconstruction of Speech from Whispers
  • Robert Morris

2
Introduction
  • Synthesize normal sounding speech given whispered
    speech
  • Different applications
  • Want to avoid being overheard
  • Normal speech often inappropriate
  • Voice prosthesis
  • Approach taken
  • Create model of normal speech
  • Estimate parameters given observed whispers

3
Whisper Differences
From Flanagan (1972)
4
MELP Model
Inverse DFT
Shaping Filter
Excitation Generation
Shaping Filter
Noise Generator
LPC Synthesis Filter
Scale
Speech
5
MELP Model-Whispered
Excitation Generation
Noise Generator
LPC Synthesis Filter
Scale
Speech
Spectrum
Gain
6
Linear Prediction Spectral Model
  • Short-time stationary autoregressive signal
  • Transform into Line Spectrum Pairs (LSP)

7
Spectral Differences
  • Excited by turbulent airflow
  • Different spectrum than glottal pulse
  • Modeled by random noise ? Parameter estimates
    also random
  • Altered vocal tract resonances
  • Vocal folds held open
  • Coupling with trachea and lungs
  • Different formant locations and bandwidths

8
Formant Bias
  • Resonances of vocal tract changed by tracheal
    coupling

9
Formant Shifting
  • Approximate linear relationship between LSPs and
    formant frequencies

Compute Jacobians
f
fM
Formant Shift
Alter Bandwidth
10
Estimation Variance
  • Signal is noise excited
  • Spectral estimates are noisy
  • Estimation variance computed
  • Standard deviation of gain ? 1.8 dB

11
LSP Smoothing
  • Dynamic linear model of LSPs
  • Linear solution Kalman filter
  • Very little smoothing
  • Nonlinear filter Median filter
  • Multiple frame delay introduced
  • Non-Gaussian model of dynamics

12
Gaussian Mixture Model
  • First LSP residual statistics

13
Gaussian Sum Filtering
  • Given
  • Also possible to perform smoothing

14
Gaussian Sum Filtering
  • Mixture history Mixn, Mixn-1

15
Prosodic Generation
  • Pitch and voicing not in whispers
  • Voicing
  • Use constant voicing decision
  • 0-3 kHz voiced, 3-4 kHz unvoiced
  • Pitch
  • Pitch and intensity are correlated
  • Set pitch proportional to the gain parameter
  • Smooth pitch parameter to produce more realistic
    contours

16
Spectral Differences
Normal Speech
Whispered Speech
17
Enhanced Spectrum
Normal Speech
Whispered Speech
18
Algorithm Summary
  • Use MELP model of speech for synthesis
  • Estimate parameters given whispers
  • Pitch and voicing not present
  • Can generate reasonable parameters
  • LPC spectrum distorted
  • Can be reconstructed

Spectral Smoothing
Shift Formant
Linear Prediction
Pre-filter
Pitch Estimation
MELP Synthesis
19
Future Directions
  • Generalize speech model
  • Jump Markov Linear Systems
  • Include pitch and voicing
  • State dependent formant shifting
  • Develop training algorithms
  • Maximum Likelihood estimate of JMLS
  • Alternative estimation algorithms
  • Particle filters
Write a Comment
User Comments (0)
About PowerShow.com