FYP0202 Advanced Audio Information Retrieval System

About This Presentation

Title:

Description:

Number of Views:79

Avg rating:3.0/5.0

Slides: 14

Provided by: eduh87

Category:

more less

Transcript and Presenter's Notes

Title: FYP0202 Advanced Audio Information Retrieval System

1
FYP0202Advanced Audio InformationRetrieval
System

2
Outline

3
Overview

Automatic segmentation of an audio stream and
automatic clustering of audio segments have quite
a bit of attention nowadays.
Example, in the task of automatic transcription
of broadcast news, the data contains clean
speech, telephone speech, music segments, speech
corrupted by music or noise.

4
Overview (cont)

We would like to SEGMENT the audio stream into
homogenous regions according to speaker identity.
We would like to cluster speech segments into
homogeneous clusters according to speaker
identity.

5
Step1Read in the raw speech

6
Step2MFCC processing

A wav is viewed as frames, each contains
different features
We make use of the MFCC library to convert the
wav to MFCC features for processing
We extract 24 features for each frames
The result are stored in feature vectors

Frame1
Frame 2
Frame 3
7
Step3 Detect the audio scene change

Make use of the feature vector to detect the
audio scene change
The input audio stream will be modeled as
Gaussian process
Model selection criterion called BIC (Bayesian
Information Criterion) is used to detect the
change point

8
Step3 Detect the audio scene change

Denote Xi (i 1,,N) as the feature vector of
frame i
N is the total number of frame
mi mean of mean vector of frame i
?i full covariance matrix of frame i
R(i) N log ? - N1 log ?1 - N2 log ?2
?, ?1, ?2 are the sample covariance matrices from
all the data, from x1,,xi, from xi1,,Xn
respectively

9
Step3 Detect the audio scene change

BIC(i) R(i) constant
If there is only one change point, then the frame
with highest BIC score is the change point
If there are more than one change point, just
simple extend the algorithm

10
Step 4Audio Clustering

As we want to speed up the audio detecting, so we
just roughly find the change point.
As a result, there maybe some wrongly calculated
change point.
In this part, we try to combine the wrongly
segmented neighbor segments
Compare with neighbor segments, if they are
speech of the same person, then combine it.

11
Step5Interleave Audio Clustering

Speaker 1
Speaker 1
Speaker 2
Combined Speaker1
Speaker 1
Speaker 1
Speaker 2
12
Conclusion

We would like to make a precise and speedy engine
that recognize the identity of speaker in a wave
file.
We would like to group the same speaker in the
wave.

13
Conclusion (cont)

Instead of making local decision based on
distance between fixed size sample, we expand the
decision as wide as possible
Avoid the respectively calculation by using
dynamic programming.
Detection algorithm can detects acoustic changing
points with reasonable detestability.

Write a Comment

User Comments (0)