Title: Unpacking Songlevel features and support vector machines for music classification by Mandel and Elli
1Unpacking Song-level features and support vector
machines for music classification by Mandel and
Ellis
- Presented by Rebecca Fiebrink
- IFT6080
- 13 March 2006
2Basic idea
- Classify artists using MFCCs
- Compare song-level and artist-level features
- Compare kNN and SVM
- Compare Mahalanobis and KL-divergence as distance
metrics - Compare single Gaussians and GMMs, using
appropriate computations for KL-divergence - Look for album effect
3Mahalanobis distance
- Distance based on correlations between variables
- Useful for determining similarity of an unknown
sample set to a known one - Also useful for defining the dissimilarity
between two random vectors of the same
distribution - Differs from Euclidean takes into account the
correlations of the dataset, and its
scale-invariant
4Mahalanobis distance defined
- Equation for two vectors
- Or, for a sample set to a known set
- Assumes u and v are random vectors from the same
distribution, with covariance matrix ? - Can approximate ? as a diagonal matrix of
individual variances, and use equation
5Kullback Leibler divergence
- Also called Kullback Leibler distance,
information divergence, information gain, and
relative entropy - Represents the distance between a true
probability distribution P to an arbitrary
probability distribution Q - Always nonnegative only zero if P and Q are
equal - Asymmetric, so not a true distance metric
- Several alternative symmetric versions exist
- Choosing the artist model with the smallest KL
divergence w.r.t. a songs representation is
equivalent to choosing the artist model under
which the songs frames have the maximum
likelihood.
6KL divergence defined
- Equation
- Single Gaussian case
- Mixture of Gaussians Must approximate using
Monte Carlo
7SVMs in Mandel Ellis
- Uses DAG-SVM for multi-class classification
- DAG contains one node for each pair of classes
- N(N-1)/2 nodes for N classes
- Each node may have one associated classifier
- Algorithm efficiently places 1-vs-1 SVMs into DAG
- Can do VC-style bound of generalization error
(unlike max-wins) - Generalization performance is about as accurate
as max-wins and 1-vs-rest, but training and
evaluation are faster
8Kernels in Mandel Ellis
- SVMs all use standard radial basis function (RBF)
kernel, with tunable ? gt 0 - Distance D defined using DM(u,v) or DKL
(Xi,Xj)DKL(XiXj) DKL(XjXi) - Implementation pre-compute the Gram matrix
(matrix of all possible inner products between
vectors in a set), using K
9Song-level vs. artist-level
- Song-level features collect bag of all MFCCs
for a song (ignore temporal organization) - Represent each song as a point in feature space
- Artist-level features MFCCs are not grouped by
song
10Song-level representations
- Single Gaussian, described by mean and covariance
of each of the MFCCs over the duration of a song - use closed-form KL-divergence for distance
- Vector representation of the means and
covariances for MFCCs (same information as above) - use Mahalanobis distance
- Mixture of Gaussians, fit using EM algorithm
- use Monte Carlo to estimate KL-divergence for
distance - kNN and SVM can use all three of these
11Example of song-level classification
12Artist-level representations
- Single GMM representing all of an artists songs
- 50 Gaussians used
- Classify a new song as the artist model with the
maximum likelihood of generating its frames - Or, use each MFCC vector individually
- Each MFCC vector is a training instance for SVM
- Classify all MFCCs from a song individually, and
assign most frequently predicted class to be song
label - Computation-intensive!!
13Example of artist-level classification
14Experiments
Classification
Max likelihood
SVM
kNN
1
2
Artist-level
Representation
Song-level
Album-conscious Training, Testing, Validation
sets
Album-blind 3-fold CV
15Results Feature set
- Conclusion Song-level features are better
- Extraction time is higher, but training time is
much lower for song-level features
16Results Representation/Distance measure
- Mahalanobis and KL for single Gaussian are
comparable - However, KL is better when not controlling for
album effect - KL for GMMs (500-sample Monte Carlo) is worse
than single Gaussian KL (computed directly)
17Uninvited commentary, part 1
- Mandel and Ellis squeeze a lot of information
into a short paper. - My experiments comparing Mahalanobis and simple
min-max normalized Euclidean distance for kNN
show that these are pretty comparable for MFCCs
in terms of accuracy - However, Mahalanobis distance is a bad choice for
other features
18Uninvited commentary, Part 2
- My biggest criticism is this papers use of
statistical significance - Significant at what level? Not defined.
- Cant simply compare differences between best of
all configurations from each category,
especially when you use different numbers of
configurations from each category - See Jensen and Cohen 2000.
- My next-biggest criticism is the smallness of the
dataset - Only 18 artists
- Cant really generalize about practical
applications
19References
- Duda, R., P. Hart, and D. Stork. 2001. Pattern
classification. New York John Wiley Sons. (2nd
ed.) - Hsu, C., and C. Lin. 2002. A comparison of
methods for multiclass support vector machines.
IEEE transactions on neural networks 13(2)
415-25. - Jensen, D., and P. Cohen. 2000. Multiple
comparisons in induction algorithms. Machine
learning 38 309-338. - Mandel, M., and D. Ellis. 2005. Song-level
features and support vector machines for music
classification. Proceedings of the International
Conference on Music Information Retrieval
(ISMIR). 594-599. - Wikipedia. www.wikipedia.org. See esp. pages on
Kullback Leibler divergence, Mahalanobis
distance.