Unpacking Songlevel features and support vector machines for music classification by Mandel and Elli - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Unpacking Songlevel features and support vector machines for music classification by Mandel and Elli

Description:

Unpacking 'Song-level features and support vector machines for music ... Duda, R., P. Hart, and D. Stork. 2001. Pattern classification. New York: John Wiley & Sons. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 20
Provided by: rebeccaf7
Category:

less

Transcript and Presenter's Notes

Title: Unpacking Songlevel features and support vector machines for music classification by Mandel and Elli


1
Unpacking Song-level features and support vector
machines for music classification by Mandel and
Ellis
  • Presented by Rebecca Fiebrink
  • IFT6080
  • 13 March 2006

2
Basic idea
  • Classify artists using MFCCs
  • Compare song-level and artist-level features
  • Compare kNN and SVM
  • Compare Mahalanobis and KL-divergence as distance
    metrics
  • Compare single Gaussians and GMMs, using
    appropriate computations for KL-divergence
  • Look for album effect

3
Mahalanobis distance
  • Distance based on correlations between variables
  • Useful for determining similarity of an unknown
    sample set to a known one
  • Also useful for defining the dissimilarity
    between two random vectors of the same
    distribution
  • Differs from Euclidean takes into account the
    correlations of the dataset, and its
    scale-invariant

4
Mahalanobis distance defined
  • Equation for two vectors
  • Or, for a sample set to a known set
  • Assumes u and v are random vectors from the same
    distribution, with covariance matrix ?
  • Can approximate ? as a diagonal matrix of
    individual variances, and use equation

5
Kullback Leibler divergence
  • Also called Kullback Leibler distance,
    information divergence, information gain, and
    relative entropy
  • Represents the distance between a true
    probability distribution P to an arbitrary
    probability distribution Q
  • Always nonnegative only zero if P and Q are
    equal
  • Asymmetric, so not a true distance metric
  • Several alternative symmetric versions exist
  • Choosing the artist model with the smallest KL
    divergence w.r.t. a songs representation is
    equivalent to choosing the artist model under
    which the songs frames have the maximum
    likelihood.

6
KL divergence defined
  • Equation
  • Single Gaussian case
  • Mixture of Gaussians Must approximate using
    Monte Carlo

7
SVMs in Mandel Ellis
  • Uses DAG-SVM for multi-class classification
  • DAG contains one node for each pair of classes
  • N(N-1)/2 nodes for N classes
  • Each node may have one associated classifier
  • Algorithm efficiently places 1-vs-1 SVMs into DAG
  • Can do VC-style bound of generalization error
    (unlike max-wins)
  • Generalization performance is about as accurate
    as max-wins and 1-vs-rest, but training and
    evaluation are faster

8
Kernels in Mandel Ellis
  • SVMs all use standard radial basis function (RBF)
    kernel, with tunable ? gt 0
  • Distance D defined using DM(u,v) or DKL
    (Xi,Xj)DKL(XiXj) DKL(XjXi)
  • Implementation pre-compute the Gram matrix
    (matrix of all possible inner products between
    vectors in a set), using K

9
Song-level vs. artist-level
  • Song-level features collect bag of all MFCCs
    for a song (ignore temporal organization)
  • Represent each song as a point in feature space
  • Artist-level features MFCCs are not grouped by
    song

10
Song-level representations
  • Single Gaussian, described by mean and covariance
    of each of the MFCCs over the duration of a song
  • use closed-form KL-divergence for distance
  • Vector representation of the means and
    covariances for MFCCs (same information as above)
  • use Mahalanobis distance
  • Mixture of Gaussians, fit using EM algorithm
  • use Monte Carlo to estimate KL-divergence for
    distance
  • kNN and SVM can use all three of these

11
Example of song-level classification
12
Artist-level representations
  • Single GMM representing all of an artists songs
  • 50 Gaussians used
  • Classify a new song as the artist model with the
    maximum likelihood of generating its frames
  • Or, use each MFCC vector individually
  • Each MFCC vector is a training instance for SVM
  • Classify all MFCCs from a song individually, and
    assign most frequently predicted class to be song
    label
  • Computation-intensive!!

13
Example of artist-level classification
14
Experiments
Classification
Max likelihood
SVM
kNN
1
2
Artist-level
Representation
Song-level
Album-conscious Training, Testing, Validation
sets
Album-blind 3-fold CV
15
Results Feature set
  • Conclusion Song-level features are better
  • Extraction time is higher, but training time is
    much lower for song-level features

16
Results Representation/Distance measure
  • Mahalanobis and KL for single Gaussian are
    comparable
  • However, KL is better when not controlling for
    album effect
  • KL for GMMs (500-sample Monte Carlo) is worse
    than single Gaussian KL (computed directly)

17
Uninvited commentary, part 1
  • Mandel and Ellis squeeze a lot of information
    into a short paper.
  • My experiments comparing Mahalanobis and simple
    min-max normalized Euclidean distance for kNN
    show that these are pretty comparable for MFCCs
    in terms of accuracy
  • However, Mahalanobis distance is a bad choice for
    other features

18
Uninvited commentary, Part 2
  • My biggest criticism is this papers use of
    statistical significance
  • Significant at what level? Not defined.
  • Cant simply compare differences between best of
    all configurations from each category,
    especially when you use different numbers of
    configurations from each category
  • See Jensen and Cohen 2000.
  • My next-biggest criticism is the smallness of the
    dataset
  • Only 18 artists
  • Cant really generalize about practical
    applications

19
References
  • Duda, R., P. Hart, and D. Stork. 2001. Pattern
    classification. New York John Wiley Sons. (2nd
    ed.)
  • Hsu, C., and C. Lin. 2002. A comparison of
    methods for multiclass support vector machines.
    IEEE transactions on neural networks 13(2)
    415-25.
  • Jensen, D., and P. Cohen. 2000. Multiple
    comparisons in induction algorithms. Machine
    learning 38 309-338.
  • Mandel, M., and D. Ellis. 2005. Song-level
    features and support vector machines for music
    classification. Proceedings of the International
    Conference on Music Information Retrieval
    (ISMIR). 594-599.
  • Wikipedia. www.wikipedia.org. See esp. pages on
    Kullback Leibler divergence, Mahalanobis
    distance.
Write a Comment
User Comments (0)
About PowerShow.com