Content-based retrieval of audio - PowerPoint PPT Presentation

About This Presentation
Title:

Content-based retrieval of audio

Description:

Need effective ways to browse by content through audio databases of growing sizes ... Spectrogram computed using STFT of 2048 samples with Hamming window of 1024 ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 12
Provided by: thomas319
Category:

less

Transcript and Presenter's Notes

Title: Content-based retrieval of audio


1
Content-based retrieval of audio
  • Francois Thibault
  • MUMT 614B
  • McGill University

2
Overview
  • Need effective ways to browse by content through
    audio databases of growing sizes
  • Using descriptive sound parameters or query by
    example systems
  • Determine similarity to query in order to rank
    search results by relevance (AudioGoogle)
  • Feature selection is the sinews of war

3
Cheng Yang Approach (1)
  • Audio files preprocessed to identify local peaks
    in signal power (n 100-200/min)
  • Spectrogram computed using STFT of 2048 samples
    with Hamming window of 1024 samples and overlap
    factor of 2
  • Spectral vector extracted around each peak makes
    up (n, 180, kltlt2048) feature space (200-2000Hz
    range only)

4
Yang Approach (2)
  • Given an example query, compute the feature
    vector for the query and look for similar audio
    in database
  • Compute minimum distance between query and
    database feature sets saving time using dynamic
    programming techniques (use results from previous
    pairs)
  • Linearity filtering to favor time-scaled version
    compared to error orientation disagreement

5
Yangs Results
  • Use database of 120 song excerpts (1min)
  • Good performance with varying tempos, audio
    quality, performance variations
  • Poor performance with transposed versions
  • Slow response, improved with indexing schemes

6
Jonathan Foote Approach
  • Calculate feature vectors of audio examples of
    desired classes (12 MFCCs plus energy)
  • Supervise training of quantized tree (partition
    feature space in maximally different class
    populations)
  • Parameterized data is quantized using the tree
    for subsequent retrieval (creates template)
  • To retrieve similar audio content, template is
    constructed for query audio, compared with corpus
    templates using cosine distance measure

7
Footes Results
  • Good way of measuring subjective qualities of
    sound, without using targeted features
  • Not as accurate to other techniques using
    psycho-acoustic knowledge in finding similar
    timbres (e.g. instruments)
  • Sensitive to pitch (will often return different
    timbres of same pitch)

8
Erling Wold et al. Approach (1)
  • Implemented several approaches in Muscle Fish
    software
  • More particularly, specify explicit perceptual
    features (loudness, pitch, brightness, bandwidth,
    harmonicity)
  • Statistics of corresponding acoustic correlates
    calculated for entire sample (mean, variance,
    autocorrelation) form a-vector
  • For training set, mean vector calculated and
    covariance matrix built from the examples and
    becomes systems model

9
Wold Approach (2)
  • Use a weighted Euclidean distance for
    classification and similarity measurements
  • Distance compared to threshold to decide if
    objects belong to the same class (optional)

10
Wold Approach (3)
  • Segmentation is required beforehand, achieved
    using same features, detecting strong
    discrepancies

11
Wold and Foote comparison
What I retain Wold has proven that it is
possible to use statistical methods for flexible
classification
Write a Comment
User Comments (0)
About PowerShow.com