Speech%20and%20Music%20Discrimination%20using%20%20%20%20Gaussian%20Mixture%20Model - PowerPoint PPT Presentation

About This Presentation

Title:

Speech%20and%20Music%20Discrimination%20using%20%20%20%20Gaussian%20Mixture%20Model

Description:

Richard A. Johnson, Dean W. Wichern, Applied Multivariate Statistical Analysis ... Martin, KD, Scheirer, ED, Vercoe, BL, Music Content Analysis through Models of ... – PowerPoint PPT presentation

Number of Views:475

Avg rating:3.0/5.0

Slides: 55

Provided by: Arth187

Category:

more less

Transcript and Presenter's Notes

Title: Speech%20and%20Music%20Discrimination%20using%20%20%20%20Gaussian%20Mixture%20Model

1
_________________________
Speech and Music Discrimination using
Gaussian Mixture Model
Seminar Program
Project Team Dr. Deep Sen (Supervisor) CHOI
Arthur, Tsz Kin (3015809) CHENG Derek, Ka
Chun (3015631)
2
_________________________
Speech and Music Discrimination using GMM
3
_________________________
Speech and Music Discrimination using GMM
Motivations

Many researches on HMM, not too many using GMM
GMM reduce complexity compared to HMM
Our feature extraction methods will reduce
complexity
Multimedia files search/storage still under
develop
Fit University requirement

4
_________________________
Speech and Music Discrimination using GMM
5
_________________________
Speech and Music Discrimination using GMM
Applications

Audio Database Indexing
Automatic Bandwidth Allocation
Broadcast Browsing
Intelligent Signal Processing
Intelligent Audio Coding
Audio file Compression
Audio Clip Editing

6
_________________________
Speech and Music Discrimination using GMM
Approaches
Deterministic Signals can be analysis as
completely specified functions of
time Un-deterministic Signals must
analysis probilistically Tele3013 notes
7
_________________________
Speech and Music Discrimination using GMM
Procedures

Read a signal
Segmented it into small frames
Extract features of each frames
Classify each frames

8
Speech and Music Discrimination using GMM
_________________________
Feature Extractions
9
Speech and Music Discrimination using GMM
_________________________
Classification
10
_________________________
Speech and Music Discrimination using GMM
11
Speech and Music Discrimination by using GMM
_________________________
Segmentation

Reasons
Get a better estimation result
Achieve a Real-Time behavior
Problems and solutions
Frames too big -- Classification accuracy
decrease
Frames too small -- Feature extraction accuracy
decrease
Chose frame size 20ms

Music Signal
12
_________________________
Speech and Music Discrimination using GMM
13
Speech and Music Discrimination using GMM
_________________________
4 Hz modulation energy

Speech energy has a characteristic energy
modulation peak around the 4Hz syllabic rate.
Houtgast Steeneken 1985
Reasons
Accurately separate speech signals and music
signals (94)
Easy to implement in Matlab
Novel and Robust

14
Speech and Music Discrimination using GMM
_________________________
Music Signal
Speech Signal
15
_________________________
Speech and Music Discrimination using GMM
16
Speech and Music Discrimination using GMM
_________________________
Music Signal
Speech Signal
Energy vs. Time
17
Speech and Music Discrimination using GMM
_________________________
Zero-Crossing Count (ZCC)

The zero-crossing count is the total number of
times that a signal goes through the x-axis over
a certain time.
Speech signals High ZCC
Music signals Low ZCC
Reasons
ZCC of a speech signal is significantly high
Very easy to implement in Matlab
Mature and Robust

18
Speech and Music Discrimination using GMM
_________________________
19
Speech and Music Discrimination using GMM
_________________________
20
Speech and Music Discrimination using GMM
_________________________
Spectral Roll-off Point

The spectral roll-off point measures the
skewness of the spectrum.
Reasons
Music usually has more energy in the high
frequency range
Useful for separate different kind of speech
later

21
_________________________
Speech and Music Discrimination using GMM
Spectral Roll-off Point
Spectral Roll-off Point SR where,
22
_________________________
Speech and Music Discrimination using GMM
power
Music Signal
frequency
power
Speech Signal
frequency
23
_________________________
Speech and Music Discrimination using GMM
Entropy Modulation

Music appears to be ordered compared with a
speech signal J.Pinquier, J.L. Rouas, R.
Andre-Obercht 2002
Higher Entropy means higher ordered
Higher Dynamism means higher rate of changes
Reasons
Accurately separate speech signals and music
signals(90)
Novel and Robust

24
_________________________
Speech and Music Discrimination using GMM
Music Signal
Speech Signal
25
_________________________
Speech and Music Discrimination using GMM
J. Ajmera, I.A. McCowan, H.Bourlard 2002
26
_________________________
Speech and Music Discrimination using GMM
Instantaneous entropy
Average entropy
Average Instantaneous entropy
27
Speech and Music Discrimination using GMM
_________________________
Pulse Metric
The beat of a piece of music is one of the
clearest features of the music. K.D. Martin,
E.D.Scheirer, B.L. Vercoe 1988
28
Speech and Music Discrimination using GMM
_________________________
Other Features

Spectral Centroid
Spectral Flux
Silence Ratio
Short-Time Energy Ratio
Volume Dynamic Change
Number of Segments
Segment Duration
etc

29
_________________________
Introduction to Gaussian Mixture Model (GMM)

Differentiation of speech and music from a
sound source
Use for speech processing, mostly for speech
recognition, speaker identification and voice
conversion
Model densities and to represent general
spectral features

30
Why we choose GMM?

Low complexity
Rate independence
Bit scalability
Short computation time

31
What is Gaussian Mixture Model?

Gaussian Mixture Model consist of a set of local
Gaussian modes, and an integrating network.
Different Gaussian distributions represent
different domain of feature space, and have
different output characteristics
GMM try to describe a complex system using
combination of all the Gaussian clusters, instead
of using a single model

32
Gaussian mixtures or clusters

Use to describe a complex system instead of using
a single model
Represents a dataset by a set of mean and
covariance

33
Gaussian Mixture Model

A Gaussian Mixture Model is represented by
is the P-dimensional input vector
is the mixture weights
is the component densities

34
Clustering

clustering is a technique from pattern
classification
A technique to group samples
P-dimensional feature vector is considered as a
point in space and all points near if are
clustered together

35
clustering

Grey circle represents the variance of
distribution

36
Gaussian component density

P-variate Gaussian function of the form
is the mean vector
is the covariance matrix

37
Covariance matrix

Indicates the dispersion of distribution
In mathematics, it is defined as the matrix whose
ij th element is the covariance of
and
i,j1d

38
Covariance matrix

The diagonal components of the covariance matrix
are the variances of individual random variables
Off-diagonal components are the covariance of two
random variables, and
Symmetric matrix

39
Full covariance matrix

The most powerful Gaussian model as it fits the
data best
drawback!
Needs a lot of data to estimate parameters
Costly in high-dimensional feature spaces

40
Diagonal covariance matrix

Good compromise between quality and model size
Gaussian components can act together to model the
overall probability density function
Capable of modelling the correlations between the
feature vector

41
Review the Gaussian mixture density

The matrix weight must satisfy the condition
and
Three components compose the Gaussian mixture
density mean vectors, covariance matrices and
mixture weights

42
Expectation-maximization (EM)

Estimate the mean vector, covariance matrix and
mixture weight
Recursively updates distribution of each Gaussian
model and conditional probability

43
Idea of Expectation-maximization

Instead of starting with a random
configuration of all components and improve upon
this configuration with expectation-maximization.
We start with the optimal one-component mixture.
Then start repeating two steps until convergence
Inset a new components and
Apply EM until convergence

44
Convergence Theorem

The sequence of likelihood is
monotonically-increasing and bounded, the
likelihood will converge to a local maximum

45
EM algorithm

Assume denote the log-likelihood of the
dataset under k-component matrix
Compute the optimal one-component mixture . Set
k1
Find the optimal new component and
corresponding matrix weight
while keeping fixed

46
EM algorithm

3. Set
and kk1
4. Update until convergence

47
Speech/music discrimination by using GMM

An interesting feature of GMM, component
densities of mixture may represent
Different phonetic events for modelling speech
Different portion of the sound when used to model
spectra of sound from musical instrument

48
Achievement

Identified optimized frame size
Obtained robust features
Performed a few tests
Implemented some Matlab codes
Studied the Gaussian Mixture Models (GMMs) and
some of their mathematical expressions

49
Next year planning

Comprehensive and more in-depth research on GMMs
Model the sound source base on GMMs
Evaluate noise effect
Matlab implementation for speech/music separation

50
Next year planning

Investigate a novel classification method
Support Vector Machine (SVM)
Differentiate Male and female speech
Differentiate Classical and Non-Classical Music
Generate a final thesis report

51
Speech and Music Discrimination using GMM
_________________________
52
Speech and Music Discrimination using GMM
_________________________
Resources

Internet, Microsoft Sound Recorder, Matlab
Neural Networks for Pattern Recognition (Bishop
1996)
Processing and Perception of Speech and Music
(Morgan 2000)
Research Papers

53
_________________________
Speech and Music Discrimination using GMM
Management Plan

Dec Feb 04 Matlab Implementations
Investigate noise effect
Research on Support Vector Machine
Experiments
Jan 05 Separating class., non-class. music
Feb 05 Separating male, female speech
Mar Jun 05 Separate Chamber music and Orchestra
Music. Separate Baby speech. (if have time)

54
Ben Gold, Nelson Morgan, Speech and Audio Signal
Processing Processing and Perception of Speech
and Music (2000), John Wiley Sons, Inc.,
USA. Joseph F. Hair, JR., Rolph E. Anderson,
Ronald L. Tatham, William C. Black, Multivariate
Data Analysis 4th Edition (1995), Prentice-Hall
International, Inc. USA. Keinosuke Fukunaga,
Computer Science and Scientific Computing
Introduction to Statistical Pattern Recognition
2nd Edition (1990), Academic Press, Inc.,
California, USA., ISBN 0-12-269851-7 Marty
J.Schmidts, Understanding and Using Statistic
(1975), D.C Health and Company, Canada. ISBN
0-669-94490-4 Norman L.Johnson, Samuel Kotz,
Distributions in statistics Continuous
univariate distributions vol.1 (1970), Houghton
Mifflin Company, Boston, USA Richard A. Johnson,
Dean W. Wichern, Applied Multivariate Statistical
Analysis (1992), Prentice-Hall, Inc., New Jersey,
USA. ISBN 0-13-041400-X Richard J.Harris, A
Primer of Multivariate Statistics (1975),
Academic Press Inc., New York, USA. ISBN
0-12-327250-5 Thomas D. Rossing, The Science of
Sound (1982), Addison-Wesley Publishing Company
Inc., USA., ISBN 0-201-06505-3 Thomas D.
Rossing, Neville H. Fletcher, Principles of
Vibration and Sound (1995), Springer-Verlag New
York Inc. ISBN 0-387-94336-6 El-Maleh K., Klein
M., Petrucci G., and Kabal P., Speech/music
discrimination for multimedia applications
(2000), in ICASSP00 Houtgast, T. and Steeneken,
H.J.M. (1985). A review of the MTF-concept in
room acoustics, J. Acoust. Soc. Am. 77, 1069
1077. J. Ajmera, I. McCowan, and H. Bourlard.
Robust HMM- based speech/music segmentation
(2002). In Proceedings of ICASSP-02 J.J. Burred,
A. Lerch , Hierarchical Automatic Audio Signal
Classification (2004), Journal of the Audio
Engineering Society J. Pinquier, J. Rouas, R.
Andre-Obrecht, Robust speech / music
classification in audio documents (2002), 7th
International Conference On Spoken Language
Processing (ICSLP), pp. 20052008 Martin, KD,
Scheirer, ED, Vercoe, BL, Music Content Analysis
through Models of Audition (1998), ACM
Multimedia98 Workshop on Content Processing of
Music for Multimedia Applications, Bristol, UK
Thank you

Write a Comment

User Comments (0)