Learning mixture models for speech recognition - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Learning mixture models for speech recognition

Description:

Usually modeled by a mixture of gaussians for each phone ... Uses eigenvectors of the Kernel Matrix to estimate parameters of a Gaussian Mixture Model ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 12
Provided by: jeremyj
Category:

less

Transcript and Presenter's Notes

Title: Learning mixture models for speech recognition


1
Learning mixture models for speech recognition
  • Rohit Prabhavalkar,Prateeti Mohapatra

2
Outline
  • Introduction
  • Learning mixture models using EM with k-means
  • The experiment
  • Results
  • Future Work

3
Speech recognition
  • Given acoustic inputs (converted into feature
    vectors eg. PLP or MFCCs)?
  • Find argmax P(QX)?
  • Q
  • Where,
  • Q is most likely phone sequence
  • X is the input sequence of feature vectors

4
Conventional techniques for acoustic modeling
  • Generative models, such as HMMs, estimate p(xq)?
  • Usually modeled by a mixture of gaussians for
    each phone
  • Conventionally this is done by EM, with k-means
    initialization

5
Problems with conventional techniques
  • EM converges to local maxima
  • 'Good' initialization for k-means is critical
  • It is usually hard to estimate the number of
    gaussians to use

Image taken from http//home.dei.polimi.it/matte
ucc/Clustering/tutorial_html/AppletKM.html
6
Problems with conventional techniques
  • EM converges to local maxima
  • 'Good' initializations for k-means is critical
  • It is usually hard to estimate the number of
    gaussians to use

Image taken from http//home.dei.polimi.it/matte
ucc/Clustering/tutorial_html/AppletKM.html
7
The algorithm
  • Uses eigenvectors of the Kernel Matrix to
    estimate parameters of a Gaussian Mixture Model
  • Estimates number of mixtures to use, along
  • with means and covariances of each mixture
  • Choosing the bandwidth 'w' is a challenge

8
CRFs
  • Are discriminative models
  • Do not make independence assumptions about the
    input unlike HMMs
  • Have been shown to achieve better performance
    than HMMs for speech recogntion

9
The experiments
  • Performed phone recognition on the TIMIT dataset
  • Used a Tandem System as the baseline
  • Test system was based on a CRF
  • Needed to randomly sample from the training set
    due to time constraints

10
Results
11
Future work
  • Lack of time prevented us from experimenting with
    different bandwidth values
  • Could increase number of samples used in the
    algorithm
  • Could use the values predicted by algorithm as
    initialization for EM
Write a Comment
User Comments (0)
About PowerShow.com