Title: Speech%20and%20Music%20Discrimination%20using%20%20%20%20Gaussian%20Mixture%20Model
1_________________________
Speech and Music Discrimination using
Gaussian Mixture Model
Seminar Program
Project Team Dr. Deep Sen (Supervisor) CHOI
Arthur, Tsz Kin (3015809) CHENG Derek, Ka
Chun (3015631)
2_________________________
Speech and Music Discrimination using GMM
3_________________________
Speech and Music Discrimination using GMM
Motivations
- Many researches on HMM, not too many using GMM
- GMM reduce complexity compared to HMM
- Our feature extraction methods will reduce
complexity - Multimedia files search/storage still under
develop - Fit University requirement
4_________________________
Speech and Music Discrimination using GMM
5_________________________
Speech and Music Discrimination using GMM
Applications
- Audio Database Indexing
- Automatic Bandwidth Allocation
- Broadcast Browsing
- Intelligent Signal Processing
- Intelligent Audio Coding
- Audio file Compression
- Audio Clip Editing
6_________________________
Speech and Music Discrimination using GMM
Approaches
Deterministic Signals can be analysis as
completely specified functions of
time Un-deterministic Signals must
analysis probilistically Tele3013 notes
7_________________________
Speech and Music Discrimination using GMM
Procedures
- Read a signal
- Segmented it into small frames
- Extract features of each frames
- Classify each frames
8Speech and Music Discrimination using GMM
_________________________
Feature Extractions
9Speech and Music Discrimination using GMM
_________________________
Classification
10_________________________
Speech and Music Discrimination using GMM
11Speech and Music Discrimination by using GMM
_________________________
Segmentation
- Reasons
- Get a better estimation result
- Achieve a Real-Time behavior
- Problems and solutions
- Frames too big -- Classification accuracy
decrease - Frames too small -- Feature extraction accuracy
decrease - Chose frame size 20ms
Music Signal
12_________________________
Speech and Music Discrimination using GMM
13Speech and Music Discrimination using GMM
_________________________
4 Hz modulation energy
- Speech energy has a characteristic energy
modulation peak around the 4Hz syllabic rate.
Houtgast Steeneken 1985 - Reasons
- Accurately separate speech signals and music
signals (94) - Easy to implement in Matlab
- Novel and Robust
14Speech and Music Discrimination using GMM
_________________________
Music Signal
Speech Signal
15_________________________
Speech and Music Discrimination using GMM
16Speech and Music Discrimination using GMM
_________________________
Music Signal
Speech Signal
Energy vs. Time
17Speech and Music Discrimination using GMM
_________________________
Zero-Crossing Count (ZCC)
- The zero-crossing count is the total number of
times that a signal goes through the x-axis over
a certain time. - Speech signals High ZCC
- Music signals Low ZCC
- Reasons
- ZCC of a speech signal is significantly high
- Very easy to implement in Matlab
- Mature and Robust
18Speech and Music Discrimination using GMM
_________________________
19Speech and Music Discrimination using GMM
_________________________
20Speech and Music Discrimination using GMM
_________________________
Spectral Roll-off Point
- The spectral roll-off point measures the
skewness of the spectrum. - Reasons
- Music usually has more energy in the high
frequency range - Useful for separate different kind of speech
later
21_________________________
Speech and Music Discrimination using GMM
Spectral Roll-off Point
Spectral Roll-off Point SR where,
22_________________________
Speech and Music Discrimination using GMM
power
Music Signal
frequency
power
Speech Signal
frequency
23_________________________
Speech and Music Discrimination using GMM
Entropy Modulation
- Music appears to be ordered compared with a
speech signal J.Pinquier, J.L. Rouas, R.
Andre-Obercht 2002 - Higher Entropy means higher ordered
- Higher Dynamism means higher rate of changes
- Reasons
- Accurately separate speech signals and music
signals(90) - Novel and Robust
24_________________________
Speech and Music Discrimination using GMM
Music Signal
Speech Signal
25_________________________
Speech and Music Discrimination using GMM
J. Ajmera, I.A. McCowan, H.Bourlard 2002
26_________________________
Speech and Music Discrimination using GMM
Instantaneous entropy
Average entropy
Average Instantaneous entropy
27Speech and Music Discrimination using GMM
_________________________
Pulse Metric
The beat of a piece of music is one of the
clearest features of the music. K.D. Martin,
E.D.Scheirer, B.L. Vercoe 1988
28Speech and Music Discrimination using GMM
_________________________
Other Features
- Spectral Centroid
- Spectral Flux
- Silence Ratio
- Short-Time Energy Ratio
- Volume Dynamic Change
- Number of Segments
- Segment Duration
- etc
29_________________________
Introduction to Gaussian Mixture Model (GMM)
- Differentiation of speech and music from a
sound source - Use for speech processing, mostly for speech
recognition, speaker identification and voice
conversion - Model densities and to represent general
spectral features
30Why we choose GMM?
- Low complexity
- Rate independence
- Bit scalability
- Short computation time
31What is Gaussian Mixture Model?
- Gaussian Mixture Model consist of a set of local
Gaussian modes, and an integrating network.
Different Gaussian distributions represent
different domain of feature space, and have
different output characteristics - GMM try to describe a complex system using
combination of all the Gaussian clusters, instead
of using a single model
32Gaussian mixtures or clusters
- Use to describe a complex system instead of using
a single model - Represents a dataset by a set of mean and
covariance
33Gaussian Mixture Model
- A Gaussian Mixture Model is represented by
- is the P-dimensional input vector
- is the mixture weights
- is the component densities
34Clustering
- clustering is a technique from pattern
classification - A technique to group samples
- P-dimensional feature vector is considered as a
point in space and all points near if are
clustered together
35clustering
- Grey circle represents the variance of
distribution
36Gaussian component density
- P-variate Gaussian function of the form
- is the mean vector
- is the covariance matrix
37Covariance matrix
- Indicates the dispersion of distribution
- In mathematics, it is defined as the matrix whose
ij th element is the covariance of - and
- i,j1d
38Covariance matrix
- The diagonal components of the covariance matrix
are the variances of individual random variables - Off-diagonal components are the covariance of two
random variables, and - Symmetric matrix
39Full covariance matrix
- The most powerful Gaussian model as it fits the
data best - drawback!
- Needs a lot of data to estimate parameters
- Costly in high-dimensional feature spaces
40Diagonal covariance matrix
- Good compromise between quality and model size
- Gaussian components can act together to model the
overall probability density function - Capable of modelling the correlations between the
feature vector
41Review the Gaussian mixture density
- The matrix weight must satisfy the condition
- and
- Three components compose the Gaussian mixture
density mean vectors, covariance matrices and
mixture weights
42Expectation-maximization (EM)
- Estimate the mean vector, covariance matrix and
mixture weight - Recursively updates distribution of each Gaussian
model and conditional probability
43Idea of Expectation-maximization
- Instead of starting with a random
configuration of all components and improve upon
this configuration with expectation-maximization.
We start with the optimal one-component mixture.
Then start repeating two steps until convergence - Inset a new components and
- Apply EM until convergence
44Convergence Theorem
-
- The sequence of likelihood is
monotonically-increasing and bounded, the
likelihood will converge to a local maximum
45EM algorithm
- Assume denote the log-likelihood of the
dataset under k-component matrix - Compute the optimal one-component mixture . Set
k1 - Find the optimal new component and
corresponding matrix weight -
- while keeping fixed
46EM algorithm
- 3. Set
-
-
- and kk1
- 4. Update until convergence
47Speech/music discrimination by using GMM
- An interesting feature of GMM, component
densities of mixture may represent -
- Different phonetic events for modelling speech
- Different portion of the sound when used to model
spectra of sound from musical instrument
48Achievement
- Identified optimized frame size
- Obtained robust features
- Performed a few tests
- Implemented some Matlab codes
- Studied the Gaussian Mixture Models (GMMs) and
some of their mathematical expressions
49Next year planning
- Comprehensive and more in-depth research on GMMs
- Model the sound source base on GMMs
- Evaluate noise effect
- Matlab implementation for speech/music separation
50Next year planning
- Investigate a novel classification method
Support Vector Machine (SVM) - Differentiate Male and female speech
- Differentiate Classical and Non-Classical Music
- Generate a final thesis report
51Speech and Music Discrimination using GMM
_________________________
52Speech and Music Discrimination using GMM
_________________________
Resources
- Internet, Microsoft Sound Recorder, Matlab
- Neural Networks for Pattern Recognition (Bishop
1996) - Processing and Perception of Speech and Music
(Morgan 2000) - Research Papers
53_________________________
Speech and Music Discrimination using GMM
Management Plan
- Dec Feb 04 Matlab Implementations
- Investigate noise effect
- Research on Support Vector Machine
- Experiments
- Jan 05 Separating class., non-class. music
- Feb 05 Separating male, female speech
- Mar Jun 05 Separate Chamber music and Orchestra
Music. Separate Baby speech. (if have time)
54Ben Gold, Nelson Morgan, Speech and Audio Signal
Processing Processing and Perception of Speech
and Music (2000), John Wiley Sons, Inc.,
USA. Joseph F. Hair, JR., Rolph E. Anderson,
Ronald L. Tatham, William C. Black, Multivariate
Data Analysis 4th Edition (1995), Prentice-Hall
International, Inc. USA. Keinosuke Fukunaga,
Computer Science and Scientific Computing
Introduction to Statistical Pattern Recognition
2nd Edition (1990), Academic Press, Inc.,
California, USA., ISBN 0-12-269851-7 Marty
J.Schmidts, Understanding and Using Statistic
(1975), D.C Health and Company, Canada. ISBN
0-669-94490-4 Norman L.Johnson, Samuel Kotz,
Distributions in statistics Continuous
univariate distributions vol.1 (1970), Houghton
Mifflin Company, Boston, USA Richard A. Johnson,
Dean W. Wichern, Applied Multivariate Statistical
Analysis (1992), Prentice-Hall, Inc., New Jersey,
USA. ISBN 0-13-041400-X Richard J.Harris, A
Primer of Multivariate Statistics (1975),
Academic Press Inc., New York, USA. ISBN
0-12-327250-5 Thomas D. Rossing, The Science of
Sound (1982), Addison-Wesley Publishing Company
Inc., USA., ISBN 0-201-06505-3 Thomas D.
Rossing, Neville H. Fletcher, Principles of
Vibration and Sound (1995), Springer-Verlag New
York Inc. ISBN 0-387-94336-6 El-Maleh K., Klein
M., Petrucci G., and Kabal P., Speech/music
discrimination for multimedia applications
(2000), in ICASSP00 Houtgast, T. and Steeneken,
H.J.M. (1985). A review of the MTF-concept in
room acoustics, J. Acoust. Soc. Am. 77, 1069
1077. J. Ajmera, I. McCowan, and H. Bourlard.
Robust HMM- based speech/music segmentation
(2002). In Proceedings of ICASSP-02 J.J. Burred,
A. Lerch , Hierarchical Automatic Audio Signal
Classification (2004), Journal of the Audio
Engineering Society J. Pinquier, J. Rouas, R.
Andre-Obrecht, Robust speech / music
classification in audio documents (2002), 7th
International Conference On Spoken Language
Processing (ICSLP), pp. 20052008 Martin, KD,
Scheirer, ED, Vercoe, BL, Music Content Analysis
through Models of Audition (1998), ACM
Multimedia98 Workshop on Content Processing of
Music for Multimedia Applications, Bristol, UK
Thank you