Title: Locating Cover Songs and Alternate Performances in Databases of Raw Audio
1Locating Cover Songs and Alternate Performances
in Databases of Raw Audio
- Robert Turetsky
- rjt72_at_columbia.edu
- Advent Workshop
- May 17, 2002
2Technology enables liquid music
Production
Consumption
Distribution
3Content-Based Analysis Motivation
- Search on file-sharing systems (e.g. KaZaA)
involves meta-data - Meta-data prone to errors, omission, distortion
- Only works if user already knows what to look for
- Musical Content Analysis means
- Query by humming
- Query by segment/prototype
- Recommendation engines and artist discovery
- Machine feedback/collaboration in composition
- Locating cover songs is a first step
4Locating Cover Songs Prior Work
- Query By Humming
- Mature field (kiosks, applets) but limited to
monophonic music or manually transcribed
polyphonic music - Jonathan Foote (FX Palo Alto)
- ARTHUR (2000) align RMS energy. Works only on
orchestral music, pop music has less dynamic
range. - Content-Based Retrieval of Music and Audio
(1997). Measures acoustic similarity, not
equivalence. - Cheng Yang (Stanford)
- Music Database Retrieval Based on Spectral
Similarity (2001). Aligns MFCC at points of high
energy using DTW. - MACS (2001). Aligns estimates of pitch
likelihood. Indexing. Bad alignments discarded
after linearity filter.
5Why is locating cover songs so difficult?
- Alternate performances can vary
- Studio vs. Live
- Tempo (non-linear time shifting)
- Pitch transposition
- Production technique, acoustic character
- Additions (i.e. audience interaction)
- Alternate lyrics (i.e. Dont Cry versions I and
II) - Cover versions, artist re-interpretations
- Vocalist, instrumentation, ornamentation
- Entire character changes (i.e. Layla, dance
remixes) - Yet we still know these songs are the same!
6System Overview
Locate Section Breaks
Identify Summary Sections
Preprocessing
Pitch Extraction
Tonic Estimation
Query
Alignment
7Phase 1 Locate Section Breaks
- Employ Footes Similarity Matrix
- Theory Windows of same section will have similar
features. Windows of different sections will
have features. - Similarity Matrix Cosine distance between every
fixed width window of the song - Novelty Score - measure of newness
correlation with checkerboard matrix. - Section breaks are peaks in the Novelty Score.
8Phase 2 Summary Segments
- Motivation Only transcribe and align salient
segments - Measure of salience Repetition
- Method Search for largest off-diagonal line in
Similarity Matrix for each segment to measure
extent of repetition (score) - Summary segment is most repeated section. Prune
rows/columns of similar sections in score matrix.
Repeat until 45-75 sec of audio is kept
Section 1 -
Section 4 -
Sec 1
Sec 2
Sec 3
Sec 4
Sec 1
Sec 2
Sec 3
Sec 4
9Phase 3 Pitch Extraction
- Multi-pitch extraction algorithm based on Klapuri
et al, 2001. - Works well, except in presence of drums.
Noise Suppression
Predominant Pitch Estimation
Time -
Estimate Pitched Sound Characteristics
Estimate Voices and Iterate
Remove Found Sound from Mixture
10Phase 3 MPE Details
Noise Reduction RASTA style filter
Predominant pitch estimation Fuzzy search for
harmonic peaks
Spectral Smoothing to estimate sound parameters
Resynthesis
Repeat on mixture after removal
Resynthesis
11Phase 4-5 Query-time alignment
- Exhaustively align summary segments
- Two alignments needed Pitch and Time
- Pitch Alignment Tonic Estimation
- Align two piano rolls at point of maximum
cross-correlation between note histograms - Temporal Alignment Dynamic Programming (Dynamic
Time Warp) - Currently investigating different weights for
rewarding note matches, penalizing mismatches
12Locating Cover Songs Future Work
- Indexing scheme, other alignment techniques to
improve speed of query - Thematic extraction to find only melody or
harmony lines - Include Beat Tracking as part of score
- Investigate harmonic analysis (identifying chord
structure) for better feature - Speech recognition on lyrics???