High Quality Voice Morphing - PowerPoint PPT Presentation

Loading...

PPT – High Quality Voice Morphing PowerPoint presentation | free to download - id: 695730-MTNjY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

High Quality Voice Morphing

Description:

High Quality Voice Morphing Hui Ye & Steve Young Cambridge University Engineering Department August 2004 Baseline System Pitch Synchronous Harmonic Model for speech ... – PowerPoint PPT presentation

Number of Views:5
Avg rating:3.0/5.0
Slides: 18
Provided by: KK
Learn more at: http://mi.eng.cam.ac.uk
Category:
Tags: high | morphing | quality | voice

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: High Quality Voice Morphing


1
High Quality Voice Morphing
  • Hui Ye Steve Young
  • Cambridge University Engineering Department
  • August 2004

2
Baseline System
  • Pitch Synchronous Harmonic Model
  • for speech representation and modification.

Pitch scale 1.7
Time scale 1.7
3
  • Transform-based Conversion

4
  • Analysis of the distortion in the baseline
    suggested 3 problems areas
  • Spectral Distortion
  • Unnatural Phase Dispersion
  • Transformation of Unvoiced sounds
  • Solutions have therefore been developed in each
    of these areas

5
  1. Spectral Distortion
  • Formant structure has been transformed
  • Spectral details lost due to reduced LSF
    dimensionality
  • Spectral peaks broadened by the averaging effect
    of least square error estimation

6
  • Spectral Residual Selection
  • Idea reintroduce the lost spectral details to
    the
  • converted envelopes
  • Use a codebook
  • selection method to
  • construct a residual
  • Post-filtering
  • applies a perceptual filter to the converted
    spectral envelope

7
  • 2. Unnatural Phase Dispersion
  • In the baseline system, the converted spectral
    envelope was combined with the original phases.
    This results in converted speech with a harsh
    quality.
  • Spectral magnitudes and phases of human speech
    are highly correlated.
  • To simultaneously model the magnitudes and phases
    and then convert them both via a single unified
    transform is extremely difficult.

8
  • Phase Prediction
  • If we can predict the waveform shape, then we can
    predict the phases.

9
  • Phase Prediction Implementation
  • The set of template signal (codebook entries)
    TT1,,TM can be estimated by minimizing the
    waveform shape prediction error

10
  • Phase Prediction Result
  • Phase prediction vs copying src phase

Phase prediction
SNR
7.2
3.2
Original signal
Copy src phases
Amplitude
Time
11
  • Phase Prediction Result
  • Phase prediction vs codebook

Phase prediction
SNR
7.2
6.1
Original signal
Phase codebook
Amplitude
Time
12
  • 3. Transforming Unvoiced Sounds
  • In our baseline system, the unvoiced sounds are
    not transformed.
  • In reality, many unvoiced sounds have some vocal
    tract colouring which affects the speech
    characteristics.
  • A unit selection approach was therefore developed
    to transform the unvoiced sounds.

13
Experiments
  • Training Data OGI Voice Corpus 12 speakers,
    each speaker has about 5 minutes parallel speech
    data.
  • Four Conversion Tasks male to male, male to
    female, female to male, female to female

14
Subjective Evaluation
  • ABX test (identify target speaker)
  • Preference test (which is more natural)

Baseline system Enhanced system
ABX 86.4 91.8
Baseline system Enhanced system
Preference 38.9 61.1
15
Examples
  • Voice Transformation with parallel training data

Source Baselineshifted pitch Enhanced tgt prosody target
M to F
F to M
F to F
M to M
16
Unknown Speaker Voice Transformation
  • No pre-existed training data is available from
    the source speaker, although there is still a
    reasonable amount of speech data from designated
    target speaker.
  • Use speech recognition to create a mapping
    between the unknown input source speech and the
    target vectors.

Source Converted Target
Female
Male
17
Summary
  • A complete solution to the voice morphing problem
    has been developed which can deliver reasonable
    quality.
  • However, there still some way to go before these
    techniques can support high fidelity studio
    applications.
  • Future Work
  • Improve the quality of the converted speech
  • Unknown speaker voice conversion
  • Cross language voice conversion
About PowerShow.com