Learning mixture models for speech recognition - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

Learning mixture models for speech recognition

Description:

Usually modeled by a mixture of gaussians for each phone ... Uses eigenvectors of the Kernel Matrix to estimate parameters of a Gaussian Mixture Model ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 12

Provided by: jeremyj

Category:

Tags: learning | mixture | models | recognition | speech

Transcript and Presenter's Notes

Title: Learning mixture models for speech recognition

1
Learning mixture models for speech recognition

Rohit Prabhavalkar,Prateeti Mohapatra

2
Outline

Introduction
Learning mixture models using EM with k-means
The experiment
Results
Future Work

3
Speech recognition

Given acoustic inputs (converted into feature
vectors eg. PLP or MFCCs)?
Find argmax P(QX)?
Q
Where,
Q is most likely phone sequence
X is the input sequence of feature vectors

4
Conventional techniques for acoustic modeling

Generative models, such as HMMs, estimate p(xq)?
Usually modeled by a mixture of gaussians for
each phone
Conventionally this is done by EM, with k-means
initialization

5
Problems with conventional techniques

EM converges to local maxima
'Good' initialization for k-means is critical
It is usually hard to estimate the number of
gaussians to use

Image taken from http//home.dei.polimi.it/matte
ucc/Clustering/tutorial_html/AppletKM.html
6
Problems with conventional techniques

EM converges to local maxima
'Good' initializations for k-means is critical
It is usually hard to estimate the number of
gaussians to use

Image taken from http//home.dei.polimi.it/matte
ucc/Clustering/tutorial_html/AppletKM.html
7
The algorithm

Uses eigenvectors of the Kernel Matrix to
estimate parameters of a Gaussian Mixture Model

Estimates number of mixtures to use, along
with means and covariances of each mixture
Choosing the bandwidth 'w' is a challenge

8
CRFs

Are discriminative models
Do not make independence assumptions about the
input unlike HMMs
Have been shown to achieve better performance
than HMMs for speech recogntion

9
The experiments

Performed phone recognition on the TIMIT dataset
Used a Tandem System as the baseline
Test system was based on a CRF
Needed to randomly sample from the training set
due to time constraints

10
Results
11
Future work

Lack of time prevented us from experimenting with
different bandwidth values
Could increase number of samples used in the
algorithm
Could use the values predicted by algorithm as
initialization for EM

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

Featured Presentations

Related Books