FYP0202 Advanced Audio Information Retrieval System - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

FYP0202 Advanced Audio Information Retrieval System

Description:

FYP0202 Advanded Audio Information Retrievel System – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 14
Provided by: eduh87
Category:

less

Transcript and Presenter's Notes

Title: FYP0202 Advanced Audio Information Retrieval System


1
FYP0202Advanced Audio InformationRetrieval
System
  • By Alex Fok, Shirley Ng

2
Outline
  • Overview
  • Read in the raw speech
  • MFCC processing
  • Detect the audio scene change
  • Audio Clustering
  • Interleave Audio Clustering
  • Conclusion

3
Overview
  • Automatic segmentation of an audio stream and
    automatic clustering of audio segments have quite
    a bit of attention nowadays.
  • Example, in the task of automatic transcription
    of broadcast news, the data contains clean
    speech, telephone speech, music segments, speech
    corrupted by music or noise.

4
Overview (cont)
  • We would like to SEGMENT the audio stream into
    homogenous regions according to speaker identity.
  • We would like to cluster speech segments into
    homogeneous clusters according to speaker
    identity.

5
Step1Read in the raw speech
  • Read in a mpeg file as input
  • Convert the file from .mpeg format to .wav format
  • Because the MFCC library only process on .wav
    file

6
Step2MFCC processing
  • A wav is viewed as frames, each contains
    different features
  • We make use of the MFCC library to convert the
    wav to MFCC features for processing
  • We extract 24 features for each frames
  • The result are stored in feature vectors

Frame1
Frame 2
Frame 3
7
Step3 Detect the audio scene change
  • Make use of the feature vector to detect the
    audio scene change
  • The input audio stream will be modeled as
    Gaussian process
  • Model selection criterion called BIC (Bayesian
    Information Criterion) is used to detect the
    change point

8
Step3 Detect the audio scene change
  • Denote Xi (i 1,,N) as the feature vector of
    frame i
  • N is the total number of frame
  • mi mean of mean vector of frame i
  • ?i full covariance matrix of frame i
  • R(i) N log ? - N1 log ?1 - N2 log ?2
  • ?, ?1, ?2 are the sample covariance matrices from
    all the data, from x1,,xi, from xi1,,Xn
    respectively

9
Step3 Detect the audio scene change
  • BIC(i) R(i) constant
  • If there is only one change point, then the frame
    with highest BIC score is the change point
  • If there are more than one change point, just
    simple extend the algorithm

10
Step 4Audio Clustering
  • As we want to speed up the audio detecting, so we
    just roughly find the change point.
  • As a result, there maybe some wrongly calculated
    change point.
  • In this part, we try to combine the wrongly
    segmented neighbor segments
  • Compare with neighbor segments, if they are
    speech of the same person, then combine it.

11
Step5Interleave Audio Clustering
  • Group all the segments of the same speaker into
    one node.
  • Before
  • After

Speaker 1
Speaker 1
Speaker 2
Combined Speaker1
Speaker 1
Speaker 1
Speaker 2
12
Conclusion
  • We would like to make a precise and speedy engine
    that recognize the identity of speaker in a wave
    file.
  • We would like to group the same speaker in the
    wave.

13
Conclusion (cont)
  • Instead of making local decision based on
    distance between fixed size sample, we expand the
    decision as wide as possible
  • Avoid the respectively calculation by using
    dynamic programming.
  • Detection algorithm can detects acoustic changing
    points with reasonable detestability.
Write a Comment
User Comments (0)
About PowerShow.com