PPT – MMM PowerPoint presentation | free to download

About This Presentation

Title:

MMM

Description:

Swiss Federal Institute of Technology, Lausanne. 1. Feature selection for ... Swiss Federal Institute of Technology, Lausanne. 19. Noisy AV and visual-only comparison ... – PowerPoint PPT presentation

Number of Views:227

Avg rating:3.0/5.0

Slides: 21

Provided by: del60

Category:

Tags: mmm | swiss

more less

Transcript and Presenter's Notes

Title: MMM

1
Feature selection for audio-visual speech
recognition
Mihai Gurban
2
Outline

Feature selection and extraction
Why select features?
Information theoretic criteria
Our approach
The audio-visual recognizer
Audio-visual integration
Features and selection methods
Experimental results
Conclusion

3
Feature selection

Features and classification
Features (or attributes, properties,
characteristics) - different types of measures
that can be taken on the same physical phenomenon
An instance (or pattern, sample, example) -
collection of feature values representing
simultaneous measurements
For classification, each sample has an associated
class label
Feature selection
Finding from the original feature set, a subset
which retains most of the information that is
relevant for a classification task
This is needed because of the curse of
dimensionality
Why dimensionality reduction?
The number of samples required to obtain accurate
models of the data grows exponentially with the
dimensionality
The computing resources required also grow with
the dimensionality of the data
Irrelevant information can decrease performance

4
Feature selection

Entropy and mutual information
H(X), the entropy of X the amount of
uncertainty about the value of X
I(XY), the mutual information between X and Y
the reduction in the uncertainty of X due to the
knowledge of Y (or vice-versa)
Maximum dependency
One of the frequently used criteria is mutual
information
Pick YS1YSm from the set Y1Yn of features, such
thatI(YS1,YS2,, YSm C) is maximum
How many subsets?
Impossible to check all subsets, high number of
combinations
As an approximate solution, greedy algorithms are
used
The number of possibilities is reduced to

5
A simple example

Entropies and mutual information can be
represented by Venn diagrams
We are searching for the features YSi with
maximum mutual information with the class label
Assume the complete set of features is

6
A simple example
7
A simple example
8
A simple example
9
A simple example
10
Which criterion to penalize redundancy?

Many different criteria proposed in the
literature
Our criterion penalizes only relevant redundancy

11
Solutions from the literature

Natural DCT ordering
Zigzag scanning, used in compression (JPEG/MPEG)
Maximum mutual information
Typically the redundancy is not taken into
account
Linear Discriminant Analysis
A transform is applied on the features

12
Our application AVSR

Experiments on the CUAVE database
36 speakers, 10 words, 5 repetitions per speaker
Leave-one-out crossvalidation
Audio features MFCC coefficients
Visual features DCT with first and second
temporal derivatives
Different levels of noise added to the audio

13
The multi-stream HMM

Audio-visual integration with multi-stream HMMs
States are modeled with gaussian mixtures
Each modality is modeled separately
The emission likelihood is a weighted product
The optimal weights are chosen for each SNR

14
Information content of different types of features
15
Visual-only recognition rate
16
Audio-visual performance
17
AV performance with clean audio
18
AV performance at 10db SNR
19
Noisy AV and visual-only comparison
20
Conclusion and future work

Feature selection for audio-visual speech
recognition
Visual-only recognition rate not a good predictor
for audio-visual performance because of
dimensionality
Maximum audio-visual performance is obtained for
small video dimensionalities
Algorithms that improve performance at small
dimensionalities are needed
Future work
Better methods to compute the amount of
redundancy between features