Knowledge Discovery Based Music Information Retrieval for Instrument Recognition

About This Presentation

Title:

Knowledge Discovery Based Music Information Retrieval for Instrument Recognition

Description:

Title: Knowledge Discovery Based Music Information Retrieval for Instrument Recognition Author: yiwei cai Last modified by: uncc Created Date: 1/19/2006 1:19:56 AM – PowerPoint PPT presentation

Number of Views:134

Avg rating:3.0/5.0

Slides: 52

Provided by: yiwe2

Category:

more less

Transcript and Presenter's Notes

Title: Knowledge Discovery Based Music Information Retrieval for Instrument Recognition

1
www.kdd.uncc.edu
Music Information Retrieval based on multi-label
cascade classification system
CCI, UNC-Charlotte
http//www.mir.uncc.edu
Research sponsored by NSF IIS-0414815, IIS-0968647
presented by Zbigniew W. Ras
2
Collaborators Alicja Wieczorkowska
(Polish-Japanese Institute of IT, Warsaw,
Poland) Krzysztof Marasek (Polish-Japanese
Institute of IT, Warsaw, Poland) My former PhD
students Elzbieta Kubera (Maria Curie-Sklodowska
University, Lublin, Poland ) Rory Lewis
(University of Colorado at Colorado Springs,
USA) Wenxin Jiang (Fred Hutchinson Cancer
Research Center in Seattle, USA) Xin Zhang
(University of North Carolina, Pembroke, USA) My
current PhD student Amanda Cohen-Mostafavi
(University of North Carolina, Charlotte, USA)
3
MIRAI - Musical Database (mostly MUMS) music
pieces played by 57 different music instruments
Goal Design and Implement a System for
Automatic Indexing of Music by Instruments
(objective task) and Emotions (subjective task)

Outcome Musical Database represented as FS-tree
guarantying efficient storage and retrieval
music pieces indexed by instruments and
emotions.
4
MIRAI - Musical Database music pieces played by
57 different music instruments (see below) and
described by over 910 attributes
Alto Flute, Bach-trumpet, bass-clarinet, bassoon,
bass-trombone, Bb trumpet, b-flat clarinet,
cello, cello-bowed, cello-martele, cello-muted,
cello-pizzicato, contrabassclarinet,
contrabassoon, crotales, c-trumpet,
ctrumpet-harmonStemOut, doublebass-bowed,
doublebass-martele, doublebass-muted,
doublebass-pizzicato, eflatclarinet,
electric-bass, electric-guitar, englishhorn,
flute, frenchhorn, frenchHorn-muted,
glockenspiel, marimba-crescendo,
marimba-singlestroke, oboe, piano-9ft,
piano-hamburg, piccolo, piccolo-flutter,
saxophone-soprano, saxophone-tenor, steeldrums,
symphonic, tenor-trombone, tenor-trombone-muted,
tuba, tubular-bells, vibraphone-bowed,
vibraphone-hardmallet, viola-bowed,
viola-martele, viola-muted, viola-natural,
viola-pizzicato, violin-artificial,
violin-bowed, violin-ensemble, violin-muted,
violin-natural-harmonics, xylophone.
5
Automatic Indexing of Music
What is needed? Database of monophonic and
polyphonic music signals and their descriptions
in terms of new features (including temporal) in
addition to the standard MPEG7 features. These
signals are labeled by instruments and emotions
forming additional features called decision
features.
Why is needed? To build classifiers for automatic
indexing of musical sound by instruments and
emotions.
6
MIRAI - Cooperative Music Information Retrieval
System based on Automatic Indexing
Indexed Audio Database
Query

Instruments

Durations

Query Adapter

Music Objects
Empty Answer?
User

7
Raw data--signal representation

Binary File
PCM
Sampling Rate
44.1K Hz
16 bits
2,646,000 values/min.

PCM (Pulse Code Modulation) - the most
straightforward mechanism to store audio. Analog
audio is sampled individual samples are stored
sequentially in binary format.
8
Challenges to applying KDD in MIR
The nature and types of raw data
Data source organization volume Type Quality
Traditional data Structured Modest Discrete, Categorical Clean
Audio data Unstructured Very large Continuous, Numeric Noise
9
Feature extractions
Amplitude values at each sample point
lower level raw data form
Feature Extraction
Higher level representations
Feature Database
manageable
traditional pattern recognition
classification
clustering
regression
10
MPEG7 features
Hamming
NFFT
Window
FFT points
Power
STFT
Spectral Centroid

Spectrum
Log Attack Time

Signal
envelope

Temporal Centroid

Signal

Instantaneous
Harmonic Spectral Spread

Harmonic

STFT

Peaks
Instantaneous
Detection

Harmonic Spectral Centroid

Hamming

Window

Fundamental Frequency
Instantaneous
Harmonic Spectral Deviation

Instantaneous
Harmonic Spectral Variation

11
Derived Database
MPEG7 features Non-MPEG7
features new temporal features
Roll-Off
Flux
Mel frequency cepstral coefficients (MFCC)
Tristimulus and similar parameters (contents of odd and even partials- Od, Ev)
Mean frequency deviation for low partials
Changing ratios of spectral spread
Changing ratios of spectral centroid
Spectrum Centroid
Spectrum Spread
Spectrum Flatness
Spectrum Basic Functions
Spectrum Projection Functions
Log Attack Time
Harmonic Peaks
..
12
New Temporal Features S(i), C(i), S(i),
C(i)
S(i) S(i1) S(i)/S(i) C(i) C(i1)
C(i)/C(i) where S(i1), S(i) and C(i1),
C(i) are the spectral spread and spectral
centroid of two consecutive frames frame i1
and frame i. The changing ratios of spectral
spread and spectral centroid for two consecutive
frames are considered as the first derivatives of
the spread and spectral centroid. Following the
same method we calculate the second
derivatives S(i) S(i1) S(i)/S(i)
C(i) C(i1) C(i)/C(i)
Remark Sequence S(i), S(i1), S(i2),..,
S(ik) can be approximated by
polynomial p(x)a0a1xa2x2 a3x3 new
features a0, a1, a2, a3,
13
Experiment with WEKA 19 instruments flute,
piano, violin, saxophone, vibraphone, trumpet,
marimba, french-horn, viola, basson, clarinet,
cello, trombone, accordian, guitar, tuba,
english-horn, oboe, double-bass, J48 with
0.25 confidence factor for pruning tree, minimum
number of instances per leaf 10 KNN number
of neighbors 3 Euclidean distance is used as
similarity function.
Classification confidence with temporal features
Experiment Features Classifier Confidence
1 S, C Decision Tree 80.47
2 S, C, S , C Decision Tree 83.68
3 S, C, S , C , S , C Decision Tree 84.76
4 S ,C KNN 80.31
5 S, C, S , C KNN 84.07
6 S, C, S , C , S , C KNN 85.51
14
Confusion matrices left is from Experiment 1,
right is from Experiment 3. The correctly
classified instances are highlighted in green and
the incorrectly classified instances are
highlighted in yellow
15
Precision of the decision tree for each instrument
Recall of the decision tree for each instrument
F-score of the decision tree for each instrument
16

Polyphonic sounds how to handle?
Single-label classification Based on Sound
Separation
Multi-labeled classifiers

Problems?
Polyphonic Sound
Get frame
Classifier

.
segmentation
Feature extraction
Sound separation
Get Instrument
Information loss during the signal subtraction
Sound Separation Flowchart
17
Timbre estimation in polyphonic sounds and
designing multi-labeled classifiers

timbre relevant descriptors
Spectrum Centroid, Spread
Spectrum Flatness Band Coefficients
Harmonic Peaks
Mel frequency cepstral coefficients (MFCC)
Tristimulus

18
Sub-pattern of single instrument in mixture
Feature extraction
Mel-Frequency Cepstral Coefficients
19
Timbre estimation based on multi-label classifier
Get frame
timbre descriptors
Features Extraction
instrument confidence
Candidate 1 70
Candidate 2 50
. .
. .
. .
Candidate N 10
instrument confidence
Candidate 1 70
Candidate 2 50
. .
. .
. .
Candidate N 10
instrument confidence
Candidate 1 70
Candidate 2 50
. .
. .
. .
Candidate N 10
Classifier
20
Timbre Estimation Results based on different
methods Instruments - 45, Training Data (TD) -
2917 single instr. sounds from MUMS, Testing on
308 mixed sounds randomly chosen from TD, window
size 1s, frame size 120ms, hop size 40ms
(25 frames), Mel-frequency cepstral coefficients
(MFCC) extracted from each frame
experiment pitch based Sound Separation N(Labels) max Recall Precision F-score
1 Yes Yes/No 1 54.55 39.2 45.60
2 Yes Yes 2 61.20 38.1 46.96
3 Yes No 2 64.28 44.8 52.81
4 Yes No 4 67.69 37.9 48.60
5 Yes No 8 68.3 36.9 47.91
Threshold 0.4 controls the total number of
estimations for each index window.
21
Polyphonic Sound (window)
Polyphonic Sounds
Classifiers
Feature extraction
Get frame
Multiple labels
Compressed representations of the signal
Harmonic Peaks, Mel Frequency Ceptral
Coefficients (MFCC), Spectral Flatness,
. Irrelevant information (inharmonic
frequencies or partials) is removed. Violin and
viola have similar MFCC patterns. The same is
with double-bass and guitar. It is difficult to
distinguish them in polyphonic sounds. More
information from the raw signal is needed.
22
Short Term Power Spectrum low level
representation of signal (calculated by STFT)
Spectrum slice 0.12 seconds long
Power Spectrum patterns of flute trombone can
be seen in the mixture
23
Experiment Middle C instrument sounds (pitch
equal to C4 in MIDI notation, frequency -261.6
Hz Training set Power Spectrum from 3323
frames - extracted by STFT from 26 single
instrument sounds electric guitar, bassoon,
oboe, B-flat, clarinet, marimba, C
trumpet, E-flat clarinet, tenor trombone, French
horn, flute, viola, violin, English horn,
vibraphone, Accordion, electric bass, cello,
tenor saxophone, B-flat trumpet, bass flute,
double bass, Alto flute, piano, Bach trumpet,
tuba, and bass clarinet. Testing Set Fifty two
audio files are mixed (using Sound Forge ) by two
of these 26 single instrument sounds. Classifier
(1) KNN with Euclidean distance (spectrum
match based classification) (2) Decision Tree
(multi label classification based on previously
extracted features)
24
Timbre Pattern Match Based on Power Spectrum
experiment description Recall Precision F-score
1 Feature-based Decision Tree (n2) 64.28 44.8 52.81
2 Spectrum Match KNN (k1n2) 79.41 50.8 61.96
3 Spectrum Match KNN (k5n2) 82.43 45.8 58.88
4 Spectrum Match KNN (k5n2) without percussion instrument 87.1
n number of labels assigned to each frame k
parameter for KNN
25
Schema I - Hornbostel Sachs
Aerophone
Chordophone
Membranophone
Idiophone
Free
Single Reed
Side
Lip Vibration
Whip
Flute
C Trumpet
Tuba
Bassoon
Alto Flute
French Horn
Oboe
26
Schema II - Play Methods

Muted
Pizzicato
Bowed
Picked
Shaken
Blow
Piccolo
Flute
Bassoon
Alto Flute

27
Decision Table
Obj Classification Attributes Classification Attributes Classification Attributes Decision Attributes Decision Attributes
CA1 CAn Hornbostel Sachs Play Method
1 0.22 0.28 Aerophone, Side, Alto Flute Blown, Alto Flute
2 0.31 0.77 Idiophone, Concussion, Bell Concussive, Bell
3 0.05 0.21 Chordophone, Composite, Cello Bowed, Cello
4 0.12 0.11 Chordophone, Composite, Violin Martele, Violin
27
Xin Cynthia Zhang
27
28
Example
1
2
1
2
3
C1
C2
d1
d2
d3
Level I
Level II
1
2
1
2
C2,1
C2,2
d3,1
d3,2
X a b c d
x1 a1 b2 c1 d3
x2 a1 b1 c1 d3,1
x3 a1 b2 c2,2 d1
x4 a2 b2 c2 d1
Classification Attributes
Decision Attributes
29
Instrument granularity classifiers which are
trained at each level of the hierarchical tree
Hornbostel/Sachs
We do not include membranophones because
instruments in this family usually do not produce
harmonic sound so that they need special
techniques to be identified
30
Modules of cascade classifier for single
instrument estimation --- Hornboch /Sachs Pitch 3B
96.02
91.80
98.94

95.00
gt
31

New Experiment
Middle C instrument sounds (pitch equal to C4 in
MIDI notation, frequency - 261.6 Hz
Training set
2762 frames extracted from the following
instrument sounds
electric guitar, bassoon, oboe, B-flat,
clarinet, marimba, C trumpet,
E-flat clarinet, tenor trombone, French horn,
flute, viola, violin, English horn, vibraphone,
Accordion, electric bass, cello, tenor saxophone,
B-flat trumpet, bass flute, double bass,
Alto flute, piano, Bach trumpet, tuba, and bass
clarinet.
Classifiers WEKA
(1) KNN with Euclidean distance (spectrum match
based classification)
Decision Tree (classification based on
previously extracted features)
Confidence
ratio of the correct classified
instances over the total number of instances

32
Classification on different Feature Groups
Group Feature description Classifier Confidence
A 33 Spectrum Flatness Band Coefficients KNN Decision Tree 99.23 94.69
B 13 MFCC coefficients KNN Decision Tree 98.19 93.57
C 28 Harmonic Peaks KNN Decision Tree 86.60 91.29
D 38 Spectrum projection coefficients KNN Decision Tree 47.45 31.81
E Log spectral centroid, spread, flux, rolloff, zerocrossing KNN Decision Tree 99.34 99.77
33
Feature and classifier selection at each level of
cascade system
KNN Band Coefficients
Node feature Classifier
chordophone Band Coefficients KNN
aerophone MFCC coefficients KNN
idiophone Band Coefficients KNN
Node feature Classifier
chrd_composite Band Coefficients KNN
aero_double-reed MFCC coefficients KNN
aero_lip-vibrated MFCC coefficients KNN
aero_side MFCC coefficients KNN
aero_single-reed Band Coefficients Decision Tree
idio_struck Band Coefficients KNN
34
Classification on the combination of different
feature groups
Classification based on KNN
Classification based on Decision Tree
35

From those two experiments, we see that
KNN classifier works better with feature vectors
such as spectral flatness coefficients,
projection coefficients and MFCC.
Decision tree works better with harmonic peaks
and statistical features.
Simply adding more features together does not
improve
the classifiers and sometime even worsens
classification
results (such as adding harmonic to other feature
groups).

36
HIERARCHICAL STRUCTURE BUILT BY CLUSTERING
ANALYSIS
Seven common method to calculate the distance or
similarity between clusters single linkage
(nearest neighbor), complete linkage (furthest
neighbor), unweighted pair-group method using
arithmetic averages (UPGMA), weighted pair-group
method using arithmetic averages (WPGMA),
unweighted pair-group method using the centroid
average (UPGMC), weighted pair-group method using
the centroid average (WPGMC), Ward's
method. Six most common distance functions
Euclidean, Manhattan, Canberra (examines the sum
of series of a fraction differences between
coordinates of a pair of objects), Pearson
correlation coefficient (PCC) measures the
degree of association between objects, Spearman's
rank correlation coefficient, Kendal (counts the
number of pairwise disagreements between two
lists) Clustering algorithm HCLUST
(Agglomerative hierarchical clustering) R
Package
37
Testing Datasets (MFCC, flatness coefficients,
harmonic peaks) The middle C pitch group which
contains 46 different musical sound objects.
Each sound object is segmented into multiple
0.12s frames and each frame is stored as an
instance in the testing dataset. There are
totally 2884 frames This dataset is represented
by 3 different sets of features (MFCC, flatness
coefficients, and harmonic peaks) Total number
of experiments 3 ? 7 ? 6 126 Clustering When
the algorithm finishes the clustering process,
a particular cluster ID is assigned to each
single frame.
38
Contingency Table derived from clustering result
Cluster 1 Cluster j Cluster n

Instrument 1 X11 X1 j X1n
X11 X1 j X1n

Instrument i Xi1 Xij Xin
Xi1 Xij Xin

Instrument n X n1 X nj X nn
X n1 X nj X nn

39

Evaluation result of Hclust algorithm (14 results
which yield the highest score among 126
experiments
Feature method metric a w score
Flatness Coefficients ward pearson 87.3 37 32.30
Flatness Coefficients ward euclidean 85.8 37 31.74
Flatness Coefficients ward manhattan 85.6 36 30.83
mfcc ward kendall 81.0 36 29.18
mfcc ward pearson 83.0 35 29.05
Flatness Coefficients ward kendall 82.9 35 29.03
mfcc ward euclidean 80.5 35 28.17
mfcc ward manhattan 80.1 35 28.04
mfcc ward spearman 81.3 34 27.63
Flatness Coefficients ward spearman 83.7 33 27.62
Flatness Coefficients ward maximum 86.1 32 27.56
mfcc ward maximum 79.8 34 27.12
Flatness Coefficients mcquitty euclidean 88.9 30 26.67
mfcc average manhattan 87.3 30 26.20
w number of clusters, a - average clustering
accuracy of all the instruments, score aw
40
Clustering result from Hclust algorithm with Ward
linkage method and Pearson distance measure
Flatness coefficients are used as the selected
feature
ctrumpet and batchtrumpet are clustered in
the same group. ctrumpet_harmonStemOut is
clustered in one single group instead of merging
with ctrumpet. Bassoon is considered as the
sibling of the regular French horn. French horn
muted is clustered in another different group
together with English Horn and Oboe .
41
Looking for optimal classification method ?
data representation in monophonic music Middle
C pitch group - 46 different musical sound
objects
Experiment Classification method Description Recall Precision F-Score
1 non-cascade Feature-based 64.3 44.8 52.81
2 non-cascade Spectrum-Match 79.4 50.8 61.96
3 Cascade Hornbostel/Sachs 75.0 43.5 55.06
4 Cascade play method 77.8 53.6 63.47
5 Cascade machine learned 87.5 62.3 72.78
42
Looking for optimal classification method ?
data representation in polyphonic music Middle
C pitch group - 46 different musical sound
objects Testing Data 49 polyphonic sounds
are created by selecting three different single
instrument sounds from the training database and
mixing them together. This set of sounds is
used to test again our five different
arrangement for classification method ? data
representation KNN (k3) is used as the
classifier for each experiment.
43
Looking for optimal classification method ? data
representation in polyphonic music Testing
Data 49 polyphonic sounds are created by
selecting three different single instrument
sounds from the training database and mixing them
together. This set of sounds is used to test
again our five different arrangement for
classification method ? data representation KNN
(k3) is used as the classifier for each
experiment.
Exp Classifier Method Recall Precision F-Score
1 Non-Cascade Single-label based on sound separation 31.48 43.06 36.37
2 Non_Cascade Feature-based multi-label classification Spectrum-Match 69.44 58.64 63.59
3 Non_Cascade multi-label classification 85.51 55.04 66.97
4 Cascade(hornbostel) multi-label classification 64.49 63.10 63.79
5 Cascade(playmethod) multi-label classification 66.67 55.25 60.43
6 Cascade(machine Learned) multi-label classification 63.77 69.67 66.59
44
WWW.MIR.UNCC.EDU

Auto indexing system for musical instruments
intelligent query answering system for music
instruments

45
Questions?
46
User entering query
User is not satisfied and he is entering a new
query
- Action Rules System
47
Action Rule
Action rule is defined as a term
(?) ? (a ? ß) ?(???)
conjunction of fixed condition features shared
by both groups
proposed changes in values of flexible features
Information System
desired effect of the action
48
Action Rules Discovery
Meta-actions based decision system
S(d)(X,A?d, V ),
with A
A1,A2,,Am
A1 A2 A3 A4 .. Am
M1 E11 E12 E13 E14 E1m
M2 E21 E22 E23 E24 E2m
M3 E31 E32 E33 E34 E3m
M4 E41 E42 E43 E44 E4m
..
Mn Em1 Em2 Em3 Em4 Emn
Influence Matrix
if E32 a2 ? a2, then E31 a1 ? a1,
E34 a4 ? a4
Candidate action rule -
r (A1 , a1? a1) ? (A2 , a2 ? a2) ? (A4 , a4
? a4))? (d , d1? d1)
Rule r is supported covered by M3
49

"Action Rules Discovery without pre-existing
classification rules", Z.W. Ras, A. Dardzinska,
Proceedings of RSCTC 2008 Conference, in Akron,
Ohio, LNAI 5306, Springer, 2008, 181-190
http//www.cs.uncc.edu/ras/Papers/Ras-Aga-AKRON.
pdf
ROOT
50
Since the window diminishes the signal on both
edges, it leads to information loss due to the
narrowing of frequency spectrum. In order to
preserve this information, those consecutive
analysis frames have overlap in time. The
empirical experiments show the best overlap is
two third of window size
A
B
A
A
A
A
Time
51
Windowing
Hamming window
spectral leakage

Write a Comment

User Comments (0)