MMM - PowerPoint PPT Presentation

About This Presentation
Title:

MMM

Description:

Swiss Federal Institute of Technology, Lausanne. 1. Feature selection for ... Swiss Federal Institute of Technology, Lausanne. 19. Noisy AV and visual-only comparison ... – PowerPoint PPT presentation

Number of Views:227
Avg rating:3.0/5.0
Slides: 21
Provided by: del60
Category:
Tags: mmm | swiss

less

Transcript and Presenter's Notes

Title: MMM


1
Feature selection for audio-visual speech
recognition
Mihai Gurban
2
Outline
  • Feature selection and extraction
  • Why select features?
  • Information theoretic criteria
  • Our approach
  • The audio-visual recognizer
  • Audio-visual integration
  • Features and selection methods
  • Experimental results
  • Conclusion

3
Feature selection
  • Features and classification
  • Features (or attributes, properties,
    characteristics) - different types of measures
    that can be taken on the same physical phenomenon
  • An instance (or pattern, sample, example) -
    collection of feature values representing
    simultaneous measurements
  • For classification, each sample has an associated
    class label
  • Feature selection
  • Finding from the original feature set, a subset
    which retains most of the information that is
    relevant for a classification task
  • This is needed because of the curse of
    dimensionality
  • Why dimensionality reduction?
  • The number of samples required to obtain accurate
    models of the data grows exponentially with the
    dimensionality
  • The computing resources required also grow with
    the dimensionality of the data
  • Irrelevant information can decrease performance

4
Feature selection
  • Entropy and mutual information
  • H(X), the entropy of X the amount of
    uncertainty about the value of X
  • I(XY), the mutual information between X and Y
    the reduction in the uncertainty of X due to the
    knowledge of Y (or vice-versa)
  • Maximum dependency
  • One of the frequently used criteria is mutual
    information
  • Pick YS1YSm from the set Y1Yn of features, such
    thatI(YS1,YS2,, YSm C) is maximum
  • How many subsets?
  • Impossible to check all subsets, high number of
    combinations
  • As an approximate solution, greedy algorithms are
    used
  • The number of possibilities is reduced to

5
A simple example
  • Entropies and mutual information can be
    represented by Venn diagrams
  • We are searching for the features YSi with
    maximum mutual information with the class label
  • Assume the complete set of features is

6
A simple example
7
A simple example
8
A simple example
9
A simple example
10
Which criterion to penalize redundancy?
  • Many different criteria proposed in the
    literature
  • Our criterion penalizes only relevant redundancy

11
Solutions from the literature
  • Natural DCT ordering
  • Zigzag scanning, used in compression (JPEG/MPEG)
  • Maximum mutual information
  • Typically the redundancy is not taken into
    account
  • Linear Discriminant Analysis
  • A transform is applied on the features

12
Our application AVSR
  • Experiments on the CUAVE database
  • 36 speakers, 10 words, 5 repetitions per speaker
  • Leave-one-out crossvalidation
  • Audio features MFCC coefficients
  • Visual features DCT with first and second
    temporal derivatives
  • Different levels of noise added to the audio

13
The multi-stream HMM
  • Audio-visual integration with multi-stream HMMs
  • States are modeled with gaussian mixtures
  • Each modality is modeled separately
  • The emission likelihood is a weighted product
  • The optimal weights are chosen for each SNR

14
Information content of different types of features
15
Visual-only recognition rate
16
Audio-visual performance
17
AV performance with clean audio
18
AV performance at 10db SNR
19
Noisy AV and visual-only comparison
20
Conclusion and future work
  • Feature selection for audio-visual speech
    recognition
  • Visual-only recognition rate not a good predictor
    for audio-visual performance because of
    dimensionality
  • Maximum audio-visual performance is obtained for
    small video dimensionalities
  • Algorithms that improve performance at small
    dimensionalities are needed
  • Future work
  • Better methods to compute the amount of
    redundancy between features
Write a Comment
User Comments (0)
About PowerShow.com