Speech%20and%20Music%20Discrimination%20using%20%20%20%20Gaussian%20Mixture%20Model - PowerPoint PPT Presentation

About This Presentation
Title:

Speech%20and%20Music%20Discrimination%20using%20%20%20%20Gaussian%20Mixture%20Model

Description:

Richard A. Johnson, Dean W. Wichern, Applied Multivariate Statistical Analysis ... Martin, KD, Scheirer, ED, Vercoe, BL, Music Content Analysis through Models of ... – PowerPoint PPT presentation

Number of Views:475
Avg rating:3.0/5.0
Slides: 55
Provided by: Arth187
Category:

less

Transcript and Presenter's Notes

Title: Speech%20and%20Music%20Discrimination%20using%20%20%20%20Gaussian%20Mixture%20Model


1
_________________________
Speech and Music Discrimination using
Gaussian Mixture Model
Seminar Program
Project Team Dr. Deep Sen (Supervisor) CHOI
Arthur, Tsz Kin (3015809) CHENG Derek, Ka
Chun (3015631)
2
_________________________
Speech and Music Discrimination using GMM
3
_________________________
Speech and Music Discrimination using GMM
Motivations
  • Many researches on HMM, not too many using GMM
  • GMM reduce complexity compared to HMM
  • Our feature extraction methods will reduce
    complexity
  • Multimedia files search/storage still under
    develop
  • Fit University requirement

4
_________________________
Speech and Music Discrimination using GMM
5
_________________________
Speech and Music Discrimination using GMM
Applications
  • Audio Database Indexing
  • Automatic Bandwidth Allocation
  • Broadcast Browsing
  • Intelligent Signal Processing
  • Intelligent Audio Coding
  • Audio file Compression
  • Audio Clip Editing

6
_________________________
Speech and Music Discrimination using GMM
Approaches
Deterministic Signals can be analysis as
completely specified functions of
time Un-deterministic Signals must
analysis probilistically Tele3013 notes
7
_________________________
Speech and Music Discrimination using GMM
Procedures
  • Read a signal
  • Segmented it into small frames
  • Extract features of each frames
  • Classify each frames

8
Speech and Music Discrimination using GMM
_________________________
Feature Extractions
9
Speech and Music Discrimination using GMM
_________________________
Classification
10
_________________________
Speech and Music Discrimination using GMM
11
Speech and Music Discrimination by using GMM
_________________________
Segmentation
  • Reasons
  • Get a better estimation result
  • Achieve a Real-Time behavior
  • Problems and solutions
  • Frames too big -- Classification accuracy
    decrease
  • Frames too small -- Feature extraction accuracy
    decrease
  • Chose frame size 20ms

Music Signal
12
_________________________
Speech and Music Discrimination using GMM
13
Speech and Music Discrimination using GMM
_________________________
4 Hz modulation energy
  • Speech energy has a characteristic energy
    modulation peak around the 4Hz syllabic rate.
    Houtgast Steeneken 1985
  • Reasons
  • Accurately separate speech signals and music
    signals (94)
  • Easy to implement in Matlab
  • Novel and Robust

14
Speech and Music Discrimination using GMM
_________________________
Music Signal
Speech Signal
15
_________________________
Speech and Music Discrimination using GMM
16
Speech and Music Discrimination using GMM
_________________________
Music Signal
Speech Signal
Energy vs. Time
17
Speech and Music Discrimination using GMM
_________________________
Zero-Crossing Count (ZCC)
  • The zero-crossing count is the total number of
    times that a signal goes through the x-axis over
    a certain time.
  • Speech signals High ZCC
  • Music signals Low ZCC
  • Reasons
  • ZCC of a speech signal is significantly high
  • Very easy to implement in Matlab
  • Mature and Robust

18
Speech and Music Discrimination using GMM
_________________________
19
Speech and Music Discrimination using GMM
_________________________
20
Speech and Music Discrimination using GMM
_________________________
Spectral Roll-off Point
  • The spectral roll-off point measures the
    skewness of the spectrum.
  • Reasons
  • Music usually has more energy in the high
    frequency range
  • Useful for separate different kind of speech
    later

21
_________________________
Speech and Music Discrimination using GMM
Spectral Roll-off Point
Spectral Roll-off Point SR where,
22
_________________________
Speech and Music Discrimination using GMM
power
Music Signal
frequency
power
Speech Signal
frequency
23
_________________________
Speech and Music Discrimination using GMM
Entropy Modulation
  • Music appears to be ordered compared with a
    speech signal J.Pinquier, J.L. Rouas, R.
    Andre-Obercht 2002
  • Higher Entropy means higher ordered
  • Higher Dynamism means higher rate of changes
  • Reasons
  • Accurately separate speech signals and music
    signals(90)
  • Novel and Robust

24
_________________________
Speech and Music Discrimination using GMM
Music Signal
Speech Signal
25
_________________________
Speech and Music Discrimination using GMM
J. Ajmera, I.A. McCowan, H.Bourlard 2002
26
_________________________
Speech and Music Discrimination using GMM
Instantaneous entropy
Average entropy
Average Instantaneous entropy
27
Speech and Music Discrimination using GMM
_________________________
Pulse Metric
The beat of a piece of music is one of the
clearest features of the music. K.D. Martin,
E.D.Scheirer, B.L. Vercoe 1988
28
Speech and Music Discrimination using GMM
_________________________
Other Features
  • Spectral Centroid
  • Spectral Flux
  • Silence Ratio
  • Short-Time Energy Ratio
  • Volume Dynamic Change
  • Number of Segments
  • Segment Duration
  • etc

29
_________________________
Introduction to Gaussian Mixture Model (GMM)
  • Differentiation of speech and music from a
    sound source
  • Use for speech processing, mostly for speech
    recognition, speaker identification and voice
    conversion
  • Model densities and to represent general
    spectral features

30
Why we choose GMM?
  • Low complexity
  • Rate independence
  • Bit scalability
  • Short computation time

31
What is Gaussian Mixture Model?
  • Gaussian Mixture Model consist of a set of local
    Gaussian modes, and an integrating network.
    Different Gaussian distributions represent
    different domain of feature space, and have
    different output characteristics
  • GMM try to describe a complex system using
    combination of all the Gaussian clusters, instead
    of using a single model

32
Gaussian mixtures or clusters
  • Use to describe a complex system instead of using
    a single model
  • Represents a dataset by a set of mean and
    covariance

33
Gaussian Mixture Model
  • A Gaussian Mixture Model is represented by
  • is the P-dimensional input vector
  • is the mixture weights
  • is the component densities

34
Clustering
  • clustering is a technique from pattern
    classification
  • A technique to group samples
  • P-dimensional feature vector is considered as a
    point in space and all points near if are
    clustered together

35
clustering
  • Grey circle represents the variance of
    distribution

36
Gaussian component density
  • P-variate Gaussian function of the form
  • is the mean vector
  • is the covariance matrix

37
Covariance matrix
  • Indicates the dispersion of distribution
  • In mathematics, it is defined as the matrix whose
    ij th element is the covariance of
  • and
  • i,j1d

38
Covariance matrix
  • The diagonal components of the covariance matrix
    are the variances of individual random variables
  • Off-diagonal components are the covariance of two
    random variables, and
  • Symmetric matrix

39
Full covariance matrix
  • The most powerful Gaussian model as it fits the
    data best
  • drawback!
  • Needs a lot of data to estimate parameters
  • Costly in high-dimensional feature spaces

40
Diagonal covariance matrix
  • Good compromise between quality and model size
  • Gaussian components can act together to model the
    overall probability density function
  • Capable of modelling the correlations between the
    feature vector

41
Review the Gaussian mixture density
  • The matrix weight must satisfy the condition
  • and
  • Three components compose the Gaussian mixture
    density mean vectors, covariance matrices and
    mixture weights

42
Expectation-maximization (EM)
  • Estimate the mean vector, covariance matrix and
    mixture weight
  • Recursively updates distribution of each Gaussian
    model and conditional probability

43
Idea of Expectation-maximization
  • Instead of starting with a random
    configuration of all components and improve upon
    this configuration with expectation-maximization.
    We start with the optimal one-component mixture.
    Then start repeating two steps until convergence
  • Inset a new components and
  • Apply EM until convergence

44
Convergence Theorem
  • The sequence of likelihood is
    monotonically-increasing and bounded, the
    likelihood will converge to a local maximum

45
EM algorithm
  • Assume denote the log-likelihood of the
    dataset under k-component matrix
  • Compute the optimal one-component mixture . Set
    k1
  • Find the optimal new component and
    corresponding matrix weight
  • while keeping fixed

46
EM algorithm
  • 3. Set
  • and kk1
  • 4. Update until convergence

47
Speech/music discrimination by using GMM
  • An interesting feature of GMM, component
    densities of mixture may represent
  • Different phonetic events for modelling speech
  • Different portion of the sound when used to model
    spectra of sound from musical instrument

48
Achievement
  • Identified optimized frame size
  • Obtained robust features
  • Performed a few tests
  • Implemented some Matlab codes
  • Studied the Gaussian Mixture Models (GMMs) and
    some of their mathematical expressions

49
Next year planning
  • Comprehensive and more in-depth research on GMMs
  • Model the sound source base on GMMs
  • Evaluate noise effect
  • Matlab implementation for speech/music separation

50
Next year planning
  • Investigate a novel classification method
    Support Vector Machine (SVM)
  • Differentiate Male and female speech
  • Differentiate Classical and Non-Classical Music
  • Generate a final thesis report

51
Speech and Music Discrimination using GMM
_________________________
52
Speech and Music Discrimination using GMM
_________________________
Resources
  • Internet, Microsoft Sound Recorder, Matlab
  • Neural Networks for Pattern Recognition (Bishop
    1996)
  • Processing and Perception of Speech and Music
    (Morgan 2000)
  • Research Papers

53
_________________________
Speech and Music Discrimination using GMM
Management Plan
  • Dec Feb 04 Matlab Implementations
  • Investigate noise effect
  • Research on Support Vector Machine
  • Experiments
  • Jan 05 Separating class., non-class. music
  • Feb 05 Separating male, female speech
  • Mar Jun 05 Separate Chamber music and Orchestra
    Music. Separate Baby speech. (if have time)

54
Ben Gold, Nelson Morgan, Speech and Audio Signal
Processing Processing and Perception of Speech
and Music (2000), John Wiley Sons, Inc.,
USA. Joseph F. Hair, JR., Rolph E. Anderson,
Ronald L. Tatham, William C. Black, Multivariate
Data Analysis 4th Edition (1995), Prentice-Hall
International, Inc. USA. Keinosuke Fukunaga,
Computer Science and Scientific Computing
Introduction to Statistical Pattern Recognition
2nd Edition (1990), Academic Press, Inc.,
California, USA., ISBN 0-12-269851-7 Marty
J.Schmidts, Understanding and Using Statistic
(1975), D.C Health and Company, Canada. ISBN
0-669-94490-4 Norman L.Johnson, Samuel Kotz,
Distributions in statistics Continuous
univariate distributions vol.1 (1970), Houghton
Mifflin Company, Boston, USA Richard A. Johnson,
Dean W. Wichern, Applied Multivariate Statistical
Analysis (1992), Prentice-Hall, Inc., New Jersey,
USA. ISBN 0-13-041400-X Richard J.Harris, A
Primer of Multivariate Statistics (1975),
Academic Press Inc., New York, USA. ISBN
0-12-327250-5 Thomas D. Rossing, The Science of
Sound (1982), Addison-Wesley Publishing Company
Inc., USA., ISBN 0-201-06505-3 Thomas D.
Rossing, Neville H. Fletcher, Principles of
Vibration and Sound (1995), Springer-Verlag New
York Inc. ISBN 0-387-94336-6 El-Maleh K., Klein
M., Petrucci G., and Kabal P., Speech/music
discrimination for multimedia applications
(2000), in ICASSP00 Houtgast, T. and Steeneken,
H.J.M. (1985). A review of the MTF-concept in
room acoustics, J. Acoust. Soc. Am. 77, 1069
1077. J. Ajmera, I. McCowan, and H. Bourlard.
Robust HMM- based speech/music segmentation
(2002). In Proceedings of ICASSP-02 J.J. Burred,
A. Lerch , Hierarchical Automatic Audio Signal
Classification (2004), Journal of the Audio
Engineering Society J. Pinquier, J. Rouas, R.
Andre-Obrecht, Robust speech / music
classification in audio documents (2002), 7th
International Conference On Spoken Language
Processing (ICSLP), pp. 20052008 Martin, KD,
Scheirer, ED, Vercoe, BL, Music Content Analysis
through Models of Audition (1998), ACM
Multimedia98 Workshop on Content Processing of
Music for Multimedia Applications, Bristol, UK
Thank you
Write a Comment
User Comments (0)
About PowerShow.com