A Microphone Array Beamforming Approach to Blind Speech Separation - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

A Microphone Array Beamforming Approach to Blind Speech Separation

Description:

Good initial estimate of geometry. Multiple explicit ... 1. Array Shape Calibration. A diffuse noise field is a good model for a number of practical ... – PowerPoint PPT presentation

Number of Views:420
Avg rating:3.0/5.0
Slides: 22
Provided by: homepage7
Category:

less

Transcript and Presenter's Notes

Title: A Microphone Array Beamforming Approach to Blind Speech Separation


1
A Microphone Array Beamforming Approach to Blind
Speech Separation
PASCAL Speech Separation Challenge II
  • Iain McCowan, Ivan Himawan, Mike Lincoln.
  • CSIRO, U Edin, QUT, IDIAP.

2
The Challenge
  • To separate the speech of two talkers
    simultaneously reading sentences from the Wall
    Street Journal (WSJ) speech corpus.
  • Recordings are made using 16 microphones in 2
    eight element circular arrays on a table in the
    centre of a reverberant meeting room.
  • Knowledge of the room layout and speaker and
    microphone placements may NOT be used UNLESS
    derived automatically.

3
Possible Solutions
  • Traditionally, two different approaches to
    multi-channel speech enhancement
  • Microphone Array Beamforming,
  • Blind Source Separation.
  • Beamforming more robust for ASR, however requires
    knowledge of microphone and speaker locations.

4
Our Approach
  • Automatically derive microphone and speaker
    positions.
  • Apply microphone array beamforming to separate
    the speech.

5
Our System
  • Array Shape Calibration.
  • Speaker Localisation.
  • Beamforming.
  • Post-filtering.
  • Speech Recognition.

6
1. Array Shape Calibration
  • Consists of determining the relative positions of
    array elements.
  • Existing techniques rely on
  • Good initial estimate of geometry.
  • Multiple explicit calibrating signals of known
    location.
  • Purpose built devices, e.g. with speakers
    co-located with microphones.
  • Tested on simulated data.

7
1. Array Shape Calibration
  • A diffuse noise field is a good model for a
    number of practical environments (e.g. office,
    car).

8
1. Array Shape Calibration
9
1. Array Shape Calibration
Multi-channel Microphone Signals
a. Detect Noise Frames
Measured Noise Coherence Matrix
b. Fit Model to Measured Coherence
Inter-Microphone Distance Matrix
c. Multidimensional Scaling
Microphone Position Vector
10
1. Array Shape Calibration
1 I. McCowan, M. Lincoln, and I. Himawan.
Microphone Array Calibration in Diffuse Noise
Fields. Submitted to IEEE TASLP., 2006.
11
1. Array Shape Calibration
  • Global position calibration for all 16
    microphones.
  • K-means to cluster into localised sub-arrays.
  • Increase K until cluster dimensions reach minimum
    threshold (all elements lt 5cm apart).
  • Re-calibrate positions for each sub-array.
  • Ensure common coordinate system by aligning to
    initial global estimates.

12
2. Speaker Localisation
  • Note in this step we assume two stationary
    speakers.
  • For each sub-array
  • Grid search over SRP-PHAT values.
  • Take 2 prominent sources for each file.
  • Merge estimates across sub-arrays
  • Take globally most confident source as first
    estimate.
  • Select second estimate as one with greatest
    azimuth angular separation from first (low
    likelihood of being the same speaker).

13
3. Beamforming
  • Filter-sum beamforming for spatial filtering of
    signals.
  • Superdirective filter weights.
  • Maximise gain in desired direction while
    minimising average gain over all other
    directions.
  • Shown to be robust beamformer in ASR
    applications.
  • Better gain than simple delay-sum, but less
    signal distortion than many adaptive techniques.

14
4. Post-filtering
  • Simple masking post-filter to separate speech
  • Motivation 2
  • For 2 speech signals combined additively, the log
    spectrum is well modelled as the maximum of the 2
    individual log spectra. This is due to sparsity
    of speech signal over frequency and time.

2 S. T. Roweis, Factorial models and
refiltering for speech separation and denoising,
in Eurospeech, 2003, pp. 10091012.
15
5. Speech Recognition
  • Provided evaluation recognition system.
  • HTK, HMM/GMM, Tri-gram LM.
  • Adaptation on Dev set to account for distant
    microphones and processing.

16
Development Results
  • Results on SSC2 Dev set.
  • Evaluation measures
  • Array calibration and speaker localisation
  • Accuracy compared to known microphone and desired
    speaker locations.
  • Beamformer, Post-filter
  • Speech Recognition (WER).
  • To avoid speaker-dependent adaptation, these dev
    results generated using K-folds.

17
1. Array Calibration Accuracy
18
2. Speaker Localisation Accuracy
19
3. Speech Recognition
WER
Both
Best
20
Evaluation Results
WER
Both
Best
21
Conclusions
  • This was a difficult challenge.
  • Array processing yields major improvement over
    single distant microphone.
  • Results show that lapel-like performance is a
    realistic target for ongoing research.
  • Ways to improve on our baseline system
  • Better speaker localisation.
  • Investigate other post-filter strategies.
  • More sophisticated ASR.
Write a Comment
User Comments (0)
About PowerShow.com