Unpacking Songlevel features and support vector machines for music classification by Mandel and Elli - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Unpacking Songlevel features and support vector machines for music classification by Mandel and Elli

Description:

Unpacking 'Song-level features and support vector machines for music ... Duda, R., P. Hart, and D. Stork. 2001. Pattern classification. New York: John Wiley & Sons. ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 20

Provided by: rebeccaf7

Category:

more less

Transcript and Presenter's Notes

Title: Unpacking Songlevel features and support vector machines for music classification by Mandel and Elli

1
Unpacking Song-level features and support vector
machines for music classification by Mandel and
Ellis

Presented by Rebecca Fiebrink
IFT6080
13 March 2006

2
Basic idea

Classify artists using MFCCs
Compare song-level and artist-level features
Compare kNN and SVM
Compare Mahalanobis and KL-divergence as distance
metrics
Compare single Gaussians and GMMs, using
appropriate computations for KL-divergence
Look for album effect

3
Mahalanobis distance

Distance based on correlations between variables
Useful for determining similarity of an unknown
sample set to a known one
Also useful for defining the dissimilarity
between two random vectors of the same
distribution
Differs from Euclidean takes into account the
correlations of the dataset, and its
scale-invariant

4
Mahalanobis distance defined

Equation for two vectors
Or, for a sample set to a known set
Assumes u and v are random vectors from the same
distribution, with covariance matrix ?
Can approximate ? as a diagonal matrix of
individual variances, and use equation

5
Kullback Leibler divergence

Also called Kullback Leibler distance,
information divergence, information gain, and
relative entropy
Represents the distance between a true
probability distribution P to an arbitrary
probability distribution Q
Always nonnegative only zero if P and Q are
equal
Asymmetric, so not a true distance metric
Several alternative symmetric versions exist
Choosing the artist model with the smallest KL
divergence w.r.t. a songs representation is
equivalent to choosing the artist model under
which the songs frames have the maximum
likelihood.

6
KL divergence defined

Equation
Single Gaussian case
Mixture of Gaussians Must approximate using
Monte Carlo

7
SVMs in Mandel Ellis

Uses DAG-SVM for multi-class classification
DAG contains one node for each pair of classes
N(N-1)/2 nodes for N classes
Each node may have one associated classifier
Algorithm efficiently places 1-vs-1 SVMs into DAG
Can do VC-style bound of generalization error
(unlike max-wins)
Generalization performance is about as accurate
as max-wins and 1-vs-rest, but training and
evaluation are faster

8
Kernels in Mandel Ellis

SVMs all use standard radial basis function (RBF)
kernel, with tunable ? gt 0
Distance D defined using DM(u,v) or DKL
(Xi,Xj)DKL(XiXj) DKL(XjXi)
Implementation pre-compute the Gram matrix
(matrix of all possible inner products between
vectors in a set), using K

9
Song-level vs. artist-level

Song-level features collect bag of all MFCCs
for a song (ignore temporal organization)
Represent each song as a point in feature space
Artist-level features MFCCs are not grouped by
song

10
Song-level representations

Single Gaussian, described by mean and covariance
of each of the MFCCs over the duration of a song
use closed-form KL-divergence for distance
Vector representation of the means and
covariances for MFCCs (same information as above)
use Mahalanobis distance
Mixture of Gaussians, fit using EM algorithm
use Monte Carlo to estimate KL-divergence for
distance
kNN and SVM can use all three of these

11
Example of song-level classification
12
Artist-level representations

Single GMM representing all of an artists songs
50 Gaussians used
Classify a new song as the artist model with the
maximum likelihood of generating its frames
Or, use each MFCC vector individually
Each MFCC vector is a training instance for SVM
Classify all MFCCs from a song individually, and
assign most frequently predicted class to be song
label
Computation-intensive!!

13
Example of artist-level classification
14
Experiments
Classification
Max likelihood
SVM
kNN
1
2
Artist-level
Representation
Song-level
Album-conscious Training, Testing, Validation
sets
Album-blind 3-fold CV
15
Results Feature set

Conclusion Song-level features are better
Extraction time is higher, but training time is
much lower for song-level features

16
Results Representation/Distance measure

Mahalanobis and KL for single Gaussian are
comparable
However, KL is better when not controlling for
album effect
KL for GMMs (500-sample Monte Carlo) is worse
than single Gaussian KL (computed directly)

17
Uninvited commentary, part 1

Mandel and Ellis squeeze a lot of information
into a short paper.
My experiments comparing Mahalanobis and simple
min-max normalized Euclidean distance for kNN
show that these are pretty comparable for MFCCs
in terms of accuracy
However, Mahalanobis distance is a bad choice for
other features

18
Uninvited commentary, Part 2

My biggest criticism is this papers use of
statistical significance
Significant at what level? Not defined.
Cant simply compare differences between best of
all configurations from each category,
especially when you use different numbers of
configurations from each category
See Jensen and Cohen 2000.
My next-biggest criticism is the smallness of the
dataset
Only 18 artists
Cant really generalize about practical
applications

19
References

Duda, R., P. Hart, and D. Stork. 2001. Pattern
classification. New York John Wiley Sons. (2nd
ed.)
Hsu, C., and C. Lin. 2002. A comparison of
methods for multiclass support vector machines.
IEEE transactions on neural networks 13(2)
415-25.
Jensen, D., and P. Cohen. 2000. Multiple
comparisons in induction algorithms. Machine
learning 38 309-338.
Mandel, M., and D. Ellis. 2005. Song-level
features and support vector machines for music
classification. Proceedings of the International
Conference on Music Information Retrieval
(ISMIR). 594-599.
Wikipedia. www.wikipedia.org. See esp. pages on
Kullback Leibler divergence, Mahalanobis
distance.