HIWIRE Progress Report July 2006 - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

HIWIRE Progress Report July 2006

Description:

HIWIRE Progress Report July 2006. Technical University of Crete. Speech Processing and ... Proposed a novel element-wise parameter estimation process ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 33
Provided by: vdi6
Category:

less

Transcript and Presenter's Notes

Title: HIWIRE Progress Report July 2006


1
HIWIRE Progress Report July 2006
  • Technical University of Crete
  • Speech Processing and
  • Dialog Systems Group
  • Presenter Alex Potamianos

2
Outline
  • Work package 1
  • Task 1Blind Source Separation for ASR
  • Task 2,5 Feature extraction and fusion
  • Task 4 Segment models for ASR
  • Work package 2
  • Task 1,2 VTLN
  • Task 2 Bayes optimal adaptation
  • Work package 3
  • Task 1 Fixed platform integration

3
Blind Speech Separation (BSS) problem
4
Data Model Problem Statement
mixing impulse response matrix
spatial signature of the i-th speaker for lag t
additive noise vector
L Channel order
Objective Estimate the inverse-channel impulse
response matrix W(t) from the observed signal
5
BSS permutation problem
  • Permutation problem Order of mics may be
    different in the solution for each frequency bin
  • To solve permutation combine
  • Spatial constraints
  • Continuity constraints in frequency domain
  • Solution to the permutation problem can be
    formulated using
  • ILS minimization criterion

6
Recent progress
  • Improved solution to permutation problem
  • Combining spatial and continuity constraints
  • Trying out different continuity criteria
  • Created a synthetic database using typical room
    impulse responses
  • First ASR experiments using the synthetic
    database

7
Outline
  • Work package 1
  • Task 1Blind Source Separation for ASR
  • Task 2,5 Feature extraction and fusion
  • Task 4 Segment models for ASR
  • Work package 2
  • Task 1,2 VTLN
  • Task 2 Bayes optimal adaptation
  • Work package 3
  • Task 1 Fixed platform integration

8
Motivation
  • Combining classifiers/information sources is an
    important problem in machine learning apps.
  • Simple, yet powerful, way to combine classifiers
    is multi-stream approach assumes independent
    information sources
  • Unsupervised stream weight computation for
    multi-stream classifiers is an open problem

9
Problem Definition
10
Optimal Stream Weights Result I
  • Equal error rate in single-stream classifiers
  • optimal stream weights are inversely
    proportional to the total stream estimation error
    variance

11
Optimal Stream Weights Result II
  • Equal estimation error variance in each stream
  • optimal weights are approximately inversely
    proportional to the single stream classification
    error

12
Recent Progress
  • Experiments with synthetic data
  • Gaussian distribution classification problem)
  • Results show good match with theoretical results
  • Experimental verification for Naïve Bayes
    classifiers
  • utterance classification - NLP application
  • First experiments with unsupervised estimates
    of stream weights
  • Intra-class based metrics on observations
  • AV-ASR application

13
Outline
  • Work package 1
  • Task 1Blind Source Separation for ASR
  • Task 2,5 Feature extraction and fusion
  • Task 4 Segment models for ASR
  • Work package 2
  • Task 1,2 VTLN
  • Task 2 Bayes optimal adaptation
  • Work package 3
  • Task 1 Fixed platform integration

14
Dynamical System Segment Model
  • Based on linear dynamical system
  • Where x is state, y is observation, u control,
    w,v noise
  • The system parameters should guarantee
  • Identifiability, Controllability, Observability,
    Stability
  • We investigated more generalized parameter
    structures

15
Generalized forms of parameter structures
  • The systems parameters have an identifiable
    canonical form
  • F ones in the superdiagonal remaining with
    zeros. Row ri with free parameters (i1,,n)
  • H column dim. equal to F. Filled with zeros.
    Take r00 and then row i have a one in column
    ri-1 1.
  • P, R filled with free parameters.
  • Propose a novel element-wise estimation based on
    EM algorithm for systems identification.

16
Application on speech
  • Experiments on clean data from AURORA 2
  • 11 word-models (onenine, zero, oh)
  • No. of segments of each model depends on the No.
    of phones of the word-model
  • HTK for feature extraction (14 MFCCs)
  • Alignments taken by HTK using HMMs
  • 4000 training sentences 600 isolated words for
    testing

17
Results
  • Fig. (a) classification performance (using 3
    different initializations)
  • Fig. (b) the log-likelihood is increasing for the
    same runs

18
Conclusions Future Work
  • Developed new forms of Linear State-space models
  • Proposed a novel element-wise parameter
    estimation process
  • Performed training classification on AURORA 2
    based on speech segments and LDS
  • Results shown correlation between performance and
    initialization
  • In the future
  • investigation of optimal initialization
  • Feature-segments alignment (through dynamic
    programming)
  • Investigation of state space dimension

19
Outline
  • Work package 1
  • Task 1Blind Source Separation for ASR
  • Task 2,5 Feature extraction and fusion
  • Task 4 Segment models for ASR
  • Work package 2
  • Task 1,2 VTLN
  • Task 2 Bayes optimal adaptation
  • Work package 3
  • Task 1 Fixed platform integration

20
Vocal Tract Length Normalization.
  • Linear and Non-Linear Frequency Warping.
  • Multi-Parameter Frequency Warping.
  • Warping and Spectral Bias Addition by ML
    Estimation.

21
Linear and Non-Linear Warping Analysis
  • An optimal warping factor a is computed (for
    each phoneme), so that the Euclidean spectral
    distance (MSE) is minimized,
  • between the warped g(X) and the corresponding
    unwraped spectrum X. Optimization is achieved by
    full search
  • The mapped spectrum is warped according to this
    optimal warping factor.

22
Linear and Non-Linear Warping
  • Frequency Warping is implemented by re-sampling
    the spectral envelope at linearly and nonlinearly
    frequency indices, i.e.
  • 1. Linear
  • 2. Piece-Wise Non-Linear
  • 3. Power

23
Multi-Parameter Frequency Warping.
  • After the computation of the optimal warping
    factor, we explore alternative piecewise linear
    frequency warping strategies
  • Bi-Parametric Warping Function (2pts)
  • Different warping factors are evaluated, for the
    low (F lt 3 KHz) and high (F 3 KHz) frequencies.
  • Four-Parametric Warping Function (4pts)
  • Different warping factors are evaluated for the
    frequency ranges, 0-1.5, 1.5-3, 3-4.5 and 4.5-8
    KHz.

24
Reduction in MSE Non-linear warping
25
Reduction in MSE Multi-parametric warping
26
Reduction in MSE Bias Removal and
Multi-parametric warping
27
Ongoing work
  • Implementation of phone-dependent warping in
    HTK
  • Implementation of multi-parametric warping and
    bias removal in HTK

28
Outline
  • Work package 1
  • Task 1Blind Source Separation for ASR
  • Task 2,5 Feature extraction and fusion
  • Task 4 Segment models for ASR
  • Work package 2
  • Task 1,2 VTLN
  • Task 2 Bayes optimal adaptation
  • Work package 3
  • Task 1 Fixed platform integration

29
Optimal Bayes Adaptation
  • Central problem is to determine
  • Using Bayes rule we have
  • 2 step process
  • Obtain the priors from the SI
    models
  • Compute the Likelihoods

30
Phone-Based Clustering
  • Cluster the output distributions based on
    common central phone

Number of Mixture
Components
1
2
M
1
2
M
Number of
Dimensions (Cepstrum Coef)
genone 1
genone 2
? is every component of the above representation
and stands for the prior
31
Our Implementation
  • Computation of priors using
  • Computation of likelihoods by using Baum Welch
    algorithm and ML
  • After computation of posterior probabilities we
    use smoothing
  • Such techniques are
  • Flooring
  • Uniform
  • Delta

32
Outline
  • Work package 1
  • Task 1Blind Source Separation for ASR
  • Task 2,5 Feature extraction and fusion
  • Task 4 Segment models for ASR
  • Work package 2
  • Task 1,2 VTLN
  • Task 2 Bayes optimal adaptation
  • Work package 3
  • Task 1 Fixed platform integration
Write a Comment
User Comments (0)
About PowerShow.com