Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach - PowerPoint PPT Presentation

About This Presentation
Title:

Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach

Description:

Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach. Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi Tokuda ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 2
Provided by: Demo216
Category:

less

Transcript and Presenter's Notes

Title: Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach


1
Hyperparameter Estimation for Speech Recognition
Based on Variational Bayesian Approach Kei
Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu
Lee and Keiichi Tokuda (Nagoya Institute of
Technology)
1. Introduction
6. Experimental Results
4. Hyperparameter Estimation
3. Variational Bayesian Approach
Relationships between F and recognition Acc.
  • Variational Bayes Attias1999
  • Approximate posterior distributions
  • by variational method
  • Define a lower bound on the log-likelihood

Estimating appropriate hyperparameters Maximize F
w.r.t. hyperparameters
  • Recent speech recognition systems
  • ML(Maximum Likelihood) criterion
  • ? Reduce the estimation accuracy
  • MDL(Minimum Description Length) criterion
  • ? Based on an asymptotic approximation

Conventional
Using monophone HMM state statistics ?
Maximizing F at the root node
  • Variational Bayesian(VB) approach
  • Higher generalization ability
  • Appropriate model structures can be selected
  • Performance depends on hyperparameters

?
Conventional Proposed
all
phone
state
leaf
  • F and recognition accuracy
  • behaved similarly

Objective
Proposed
  • Proposed technique gives
  • consistent improvement at the value of F

Estimate hyperparameters maximizing marginal
likelihood
? Maximize F w.r.t. variational posteriors
Using the statistics of all leaf nodes ?
Maximizing F of the tree structure
  • If prior distributions have tying structure
  • ? F is good for model selection

Context Clustering based on Variational
Bayes Watanabe et al. 2002
  • Otherwise
  • ? F increases monotonically as T increases

?
2. Bayesian Framework
Maximize F w.r.t. variational posteriors
Relationships between tying structure and the
amount of training data
Q Phonetic question
Yes
No
Tying structure of prior distributions
Consider four kinds of tying structure
  • Use a conjugate prior distribution
  • Output probability distribution
  • ? Gaussian distribution

Proposed
all
phone
state
leaf
  • Conjugate prior distribution
  • ? Gauss-Wishart distribution

ML
Based on the posterior distributions Model
parameters are regarded as probabilistic
variables
5. Experimental Conditions
  • The VB clustering with appropriate prior
  • distribution improves the recognition
    performance
  • Advantages
  • Prior knowledge can be integrated
  • Model structure can be selected
  • Robust classification
  • Likelihood function
  • ? Proportional to a Gauss-Wishart distribution

Database JNAS (Japanese Newspaper Article Sentences)
Training data JNAS 20,000 / 2,500 / 200 sentences
Test data JNAS 100 sentences
Sampling rate 16 KHz
Window Hamming window
Frame size/shift 25 ms / 10 ms
Feature vector 12 order MFCC ?MFCC ?Energy (25 dimensions)
  • Appropriate tying structure of prior
    distributions
  • ? Depend on the amount of training data
  • Define new hyperparameter T representing
  • the amount of prior data

Large training data set ? Tying few prior
distributions
  • Disadvantage
  • Include integral and expectation calculations
  • ? Effective approximation technique is required

Small training data set ? Tying many prior
distributions
Write a Comment
User Comments (0)
About PowerShow.com