A Bayesian Approach to HMM-Based Speech Synthesis - PowerPoint PPT Presentation

About This Presentation
Title:

A Bayesian Approach to HMM-Based Speech Synthesis

Description:

A Bayesian Approach to. HMM-Based Speech Synthesis. Kei Hashimoto , Heiga Zen , ... Maximum likelihood (ML) criterion. Train HMMs and generate speech parameters ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 21
Provided by: spNit
Category:

less

Transcript and Presenter's Notes

Title: A Bayesian Approach to HMM-Based Speech Synthesis


1
A Bayesian Approach to HMM-Based Speech Synthesis
1
1
  • Kei Hashimoto , Heiga Zen ,
  • Yoshihiko Nankaku , Takashi Masuko ,
  • and Keiichi Tokuda
  • Nagoya Institute of Technology
  • Tokyo Institute of Technology

1
2
1
1
2
2
Background
  • HMM-based speech synthesis system
  • Spectrum, excitation and duration are modeled
  • Speech parameter seqs. are generated
  • Maximum likelihood (ML) criterion
  • Train HMMs and generate speech parameters
  • Point estimate ? The over-fitting problem
  • Bayesian approach
  • Estimate posterior dist. of model parameters
  • Prior information can be use
  • ? Alleviate the over-fitting problem

3
Outline
  • Bayesian speech synthesis
  • Variational Bayesian method
  • Speech parameter generation
  • Bayesian context clustering
  • Prior distribution using cross validation
  • Experiments
  • Conclusion Future work

4
Bayesian speech synthesis (1/2)
  • Model training and speech synthesis

5
Bayesian speech synthesis (2/2)
  • Predictive distribution (marginal likelihood)

Variational Bayesian method Attias 99
6
Variational Bayesian method (1/2)
  • Estimate approximate posterior dist.
  • ? Maximize a lower bound


7
Variational Bayesian method (2/2)
  • Random variables are statistically independent
  • Optimal posterior distributions


Iterative updates as the EM algorithm
8
Approximation for speech synthesis
  • is dependent on synthesis data
  • ? Huge computational cost in the synthesis part
  • Ignore the dependency of synthesis data
  • ? Estimation from only training data

9
Prior distribution
  • Conjugate prior distribution
  • ? Posterior dist. becomes a same family of dist.
    with prior dist.
  • Determination using statistics of prior data


10
Speech parameter generation
  • Speech parameter
  • Consist of static and dynamic features
  • ? Only static feature seq. is generated
  • Speech parameter generation based on Bayesian
    approach
  • ? Maximize the lower bound

11
Relation between Bayes and ML
  • Compare with the ML criterion
  • Use of expectations of model parameters
  • Can be solved by the same fashion of ML


Output dist.
12
Outline
  • Bayesian speech synthesis
  • Variational Bayesian method
  • Speech parameter generation
  • Bayesian context clustering
  • Prior distribution using cross validation
  • Experiments
  • Conclusion Future work

13
Bayesian context clustering
  • Context clustering based on maximizing

yes
no
? Split node based on gain
14
Impact of prior distribution
  • Affect model selection as tuning parameters
  • ? Require determination technique of prior dist.
  • Conventional maximize the marginal likelihood
  • Lead to the over-fitting problem as the ML
  • Tuning parameters are still required
  • Determination technique of prior distribution
    using cross validation Hashimoto 08

15
Bayesian approach using CV
  • Prior distribution based on Cross Validation

16
Outline
  • Bayesian speech synthesis
  • Variational Bayesian method
  • Speech parameter generation
  • Bayesian context clustering
  • Prior distribution using cross validation
  • Experiments
  • Conclusion Future work

17
Experimental conditions (1/2)
Database ATR Japanese speech database B-set
Speaker MHT
Training data 450 utterances
Test data 53 utterances
Sampling rate 16 kHz
Window Blackman window
Frame size / shift 25 ms / 5 ms
Feature vector 24 mel-cepstrum ? ?? and log F0 ? ?? (78 dimension)
HMM 5-state left-to-right HMM without skip transition
18
Experimental conditions (2/2)
  • Compared approach
  • Mean Opinion Score (MOS) test
  • Subjects were 10 Japanese students
  • 20 sentences were chosen at random

Training Context clustering of states
ML-MDL ML MDL 2,491
Bayes-Bayes Bayes Bayes using CV 25,911
Bayes-MDL Bayes Bayes using CV Adjust threshold 2,553
ML-Bayes ML MDL Adjust threshold 27,106
19
Subjective listening test
  • Mean opinion score

2,491
25,911
27,106
2,553
20
Conclusions and future work
  • A new framework based on Bayesian approach
  • All processes are derived from a single
    predictive distribution
  • Improve the naturalness of synthesized speech
  • Future work
  • Introduce HSMM instead of HMM
  • Investigate the relation between the speech
    quality and model structures

21
(No Transcript)
22
Cross valid prior distribution
  • Marginal likelihood using cross validation
  • Alleviate over-fitting problem
  • Cross valid prior distribution

23
Experimental conditions(2/2)
  • Compared approach
  • Number of states

Training Context clustering
ML-MDL ML MDL
Bayes-Bayes Bayes Bayes using cross validation
Bayes-MDL Bayes Bayes using threshold
ML-Bayes ML MDL using threshold
Spectrum F0 Duration Sum
ML-MDL 956 1,151 280 2,491
Bayes-Bayes 9,070 12,836 4,005 25,911
Bayes-MDL 1,941 565 47 2,553
ML-Bayes 15,077 8,844 3,185 27,106
24
Bayesian context clustering using CV
  • ??????????????

????????
yes
no
???????????????
25
?????
  • ???????????????????

????
?????
????
?????
????
?????????????? ? ??????
Write a Comment
User Comments (0)
About PowerShow.com