Automatic Genre Classification Using Large HighLevel Musical Feature Sets - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Genre Classification Using Large HighLevel Musical Feature Sets

Description:

Jazz. Bebop. Jazz Soul. Swing. Popular. Rap. Punk. Country. Western Classical. Baroque. Modern Classical ... e.g. whether modern instruments are present ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 28
Provided by: corym
Category:

less

Transcript and Presenter's Notes

Title: Automatic Genre Classification Using Large HighLevel Musical Feature Sets


1
Automatic Genre Classification Using Large
High-Level Musical Feature Sets
  • Cory McKay and Ichiro Fujinaga
  • Dept. of Music Theory
  • Music Technology Area
  • McGill University
  • Montreal, Canada

2
Topics
  • Introduction
  • Existing research
  • Taxonomy
  • Features
  • Classification methodology
  • Results
  • Conclusions

3
Introduction
  • GOAL Automatically classify symbolic recordings
    into pre-defined genre taxonomies
  • This is first stage of a larger project
  • General music classification system
  • Classifies audio
  • Simple interface

4
Why symbolic recordings?
  • Valuable high-level features can be used which
    cannot currently be extracted from audio
    recordings
  • Research provides groundwork that can immediately
    be taken advantage of as transcription techniques
    improve
  • Can classify music for which only scores exist
    (using OMR)
  • Can aid musicological and psychological research
    into how humans deal with the notion of musical
    genre
  • Chose MIDI because of diverse recordings
    available
  • Can convert to MusicXML, Humdrum, GUIDO, etc.
    relatively easily

5
Existing research
  • Automatic audio genre classification becoming a
    well researched field
  • Pioneering work Tzanetakis, Essl Cook
  • Audio results
  • Less than 10 categories
  • Success rates generally below 80 for more than 5
    categories
  • Less research done with symbolic recordings
  • 84 for 2-way classifications (Shan Kuo)
  • 63 for 3-way classifications (Chai Vercoe)
  • Relatively little applied musicological work on
    general feature extraction. Two standouts
  • Lomax 1968 (ethnomusicology)
  • Tagg 1982 (popular musicology)

6
Taxonomies used
  • Used hierarchical taxonomy
  • A recording can belong to more than one category
  • A category can be a child of multiple parents in
    the taxonomical hierarchy
  • Chose two taxonomies
  • Small (9 leaf categories)
  • Used to loosely compare system to existing
    research
  • Large (38 leaf categories)
  • Used to test system under realistic conditions

7
Small taxonomy
  • Jazz
  • Bebop
  • Jazz Soul
  • Swing
  • Popular
  • Rap
  • Punk
  • Country
  • Western Classical
  • Baroque
  • Modern Classical
  • Romantic

8
Large taxonomy
9
Training and test data
  • 950 MIDI files
  • 5 fold cross-validation
  • 80 training, 20 testing

10
Features
  • 111 high-level features implemented
  • Instrumentation
  • e.g. whether modern instruments are present
  • Musical Texture
  • e.g. standard deviation of the average melodic
    leap of different lines
  • Rhythm
  • e.g. standard deviation of note durations
  • Dynamics
  • e.g. average note to note change in loudness
  • Pitch Statistics
  • e.g. fraction of notes in the bass register
  • Melody
  • e.g. fraction of melodic intervals comprising a
    tritone
  • Chords
  • e.g. prevalence of most common vertical interval
  • More information available in Cory McKays
    masters thesis (2004)

11
Overview of the classifier
12
A classifier ensemble
13
Feature types
  • One-dimensional features
  • Consist of a single number that represents an
    aspect of a recording in isolation
  • e.g. an average or a standard deviation
  • Multi-dimensional features
  • Consist of vectors of closely coupled statistics
  • Individual values may have limited significance
    taken alone, but together may reveal meaningful
    patterns
  • e.g. bins of a histogram, instruments present

14
Classifiers used
  • K-nearest neighbour (KNN)
  • Fast
  • One for all one-dimensional features
  • Feedforward neural networks
  • Can learn complex interrelationships between
    features
  • One for each multi-dimensional feature

15
Simplified classifier ensemble
16
A classifier ensemble
  • Consists of one KNN classifier and multiple
    neural nets
  • An ensemble with n candidate categories
    classifies a recording into 0 to n categories
  • Input
  • All available feature values
  • Output
  • A score for each candidate category based on a
    weighted average of KNN and neural net output
    scores

17
Simplified classifier ensemble
18
Complete classifier ensemble
19
Feature and classifier selection/weighting
  • Some features more useful than others
  • Context dependant
  • e.g. best features for distinguishing between
    Baroque and Romantic different than when
    comparing Punk and Heavy Metal
  • Hierarchical and round-robin classifiers only
    trained on recordings belonging to candidate
    categories
  • Feature selection allows specialization to
    improve performance
  • Used genetic algorithms to perform
  • Feature selection (fast) followed by
  • Feature weighting of survivors

20
Complete classifier ensemble
21
Complete classifier
22
Exploration of taxonomy space
  • Three kinds of classification performed
  • Parent (hierarchical)
  • 1 ensemble for each category with children
  • Only promising branch(es) of taxonomy explored
  • Field initially narrowed using relatively easy
    broad classifications before proceeding to more
    difficult specialized classifications
  • Flat
  • 1 ensemble classifying amongst all leaf
    categories
  • Round-robin
  • 1 ensemble for each pair of leaf categories
  • Final results arrived at through averaging

23
Complete classifier
24
Overall average success rates across all folds
  • 9 Category Taxonomy
  • Leaf 86
  • Root 96
  • 38 Category Taxonomy
  • Leaf 57
  • Root 75

25
Importance of number of candidate features
  • Examined effect on success rate of only providing
    subsets of available features to feature
    selection system

26
Conclusions
  • Success rates better than previous research with
    symbolic recordings and on the upper end of
    research involving audio recordings
  • True comparisons impossible to make without
    standardized testing
  • Effectiveness of high-level features clearly
    demonstrated
  • Large feature library combined with feature
    selection improves results
  • Not yet at a point where can effectively deal
    with large realistic taxonomies, but are
    approaching that point

27
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com