Title: A Bayesian Approach to HMM-Based Speech Synthesis
1A Bayesian Approach to HMM-Based Speech Synthesis
1
1
- Kei Hashimoto , Heiga Zen ,
- Yoshihiko Nankaku , Takashi Masuko ,
- and Keiichi Tokuda
- Nagoya Institute of Technology
- Tokyo Institute of Technology
1
2
1
1
2
2Background
- HMM-based speech synthesis system
- Spectrum, excitation and duration are modeled
- Speech parameter seqs. are generated
- Maximum likelihood (ML) criterion
- Train HMMs and generate speech parameters
- Point estimate ? The over-fitting problem
- Bayesian approach
- Estimate posterior dist. of model parameters
- Prior information can be use
- ? Alleviate the over-fitting problem
3Outline
- Bayesian speech synthesis
- Variational Bayesian method
- Speech parameter generation
- Bayesian context clustering
- Prior distribution using cross validation
- Experiments
- Conclusion Future work
4Bayesian speech synthesis (1/2)
- Model training and speech synthesis
5Bayesian speech synthesis (2/2)
- Predictive distribution (marginal likelihood)
Variational Bayesian method Attias 99
6Variational Bayesian method (1/2)
- Estimate approximate posterior dist.
- ? Maximize a lower bound
7Variational Bayesian method (2/2)
- Random variables are statistically independent
- Optimal posterior distributions
Iterative updates as the EM algorithm
8Approximation for speech synthesis
- is dependent on synthesis data
- ? Huge computational cost in the synthesis part
- Ignore the dependency of synthesis data
- ? Estimation from only training data
9Prior distribution
- Conjugate prior distribution
- ? Posterior dist. becomes a same family of dist.
with prior dist. -
- Determination using statistics of prior data
10Speech parameter generation
- Speech parameter
- Consist of static and dynamic features
- ? Only static feature seq. is generated
- Speech parameter generation based on Bayesian
approach - ? Maximize the lower bound
11Relation between Bayes and ML
- Compare with the ML criterion
- Use of expectations of model parameters
- Can be solved by the same fashion of ML
Output dist.
12Outline
- Bayesian speech synthesis
- Variational Bayesian method
- Speech parameter generation
- Bayesian context clustering
- Prior distribution using cross validation
- Experiments
- Conclusion Future work
13Bayesian context clustering
- Context clustering based on maximizing
yes
no
? Split node based on gain
14 Impact of prior distribution
- Affect model selection as tuning parameters
- ? Require determination technique of prior dist.
- Conventional maximize the marginal likelihood
- Lead to the over-fitting problem as the ML
- Tuning parameters are still required
- Determination technique of prior distribution
using cross validation Hashimoto 08
15Bayesian approach using CV
- Prior distribution based on Cross Validation
16Outline
- Bayesian speech synthesis
- Variational Bayesian method
- Speech parameter generation
- Bayesian context clustering
- Prior distribution using cross validation
- Experiments
- Conclusion Future work
17Experimental conditions (1/2)
Database ATR Japanese speech database B-set
Speaker MHT
Training data 450 utterances
Test data 53 utterances
Sampling rate 16 kHz
Window Blackman window
Frame size / shift 25 ms / 5 ms
Feature vector 24 mel-cepstrum ? ?? and log F0 ? ?? (78 dimension)
HMM 5-state left-to-right HMM without skip transition
18Experimental conditions (2/2)
- Compared approach
- Mean Opinion Score (MOS) test
- Subjects were 10 Japanese students
- 20 sentences were chosen at random
Training Context clustering of states
ML-MDL ML MDL 2,491
Bayes-Bayes Bayes Bayes using CV 25,911
Bayes-MDL Bayes Bayes using CV Adjust threshold 2,553
ML-Bayes ML MDL Adjust threshold 27,106
19Subjective listening test
2,491
25,911
27,106
2,553
20Conclusions and future work
- A new framework based on Bayesian approach
- All processes are derived from a single
predictive distribution - Improve the naturalness of synthesized speech
- Future work
- Introduce HSMM instead of HMM
- Investigate the relation between the speech
quality and model structures
21(No Transcript)
22Cross valid prior distribution
- Marginal likelihood using cross validation
- Alleviate over-fitting problem
- Cross valid prior distribution
23Experimental conditions(2/2)
- Compared approach
- Number of states
Training Context clustering
ML-MDL ML MDL
Bayes-Bayes Bayes Bayes using cross validation
Bayes-MDL Bayes Bayes using threshold
ML-Bayes ML MDL using threshold
Spectrum F0 Duration Sum
ML-MDL 956 1,151 280 2,491
Bayes-Bayes 9,070 12,836 4,005 25,911
Bayes-MDL 1,941 565 47 2,553
ML-Bayes 15,077 8,844 3,185 27,106
24Bayesian context clustering using CV
????????
yes
no
???????????????
25?????
????
?????
????
?????
????
?????????????? ? ??????