Sound Detection - PowerPoint PPT Presentation

About This Presentation
Title:

Sound Detection

Description:

Learn model of sound object from few (10-20) examples and distinguish from all other sounds ... Results: Anecdotal. Gunshots. Female Laugh. Male Laugh. Swords. Scream ... – PowerPoint PPT presentation

Number of Views:1143
Avg rating:3.0/5.0
Slides: 17
Provided by: scie5
Learn more at: http://www.cs.cmu.edu
Category:
Tags: detection | sound

less

Transcript and Presenter's Notes

Title: Sound Detection


1
Sound Detection
  • Derek Hoiem
  • Rahul Sukthankar (mentor)
  • August 24, 2004

2
Objective
  • Learn model of sound object from few (10-20)
    examples and distinguish from all other sounds
  • Examples of sound classes
  • Gunshots, screams, laughter, car horns, meow, dog
    bark, etc

3
Applications
  • Tell me if you hear a gunshot. (monitoring)
  • Get me video clips containing dogs barking.
    (search and retrieval)
  • Whats going on? (scene understanding)

4
Why its difficult
  • Sound classes have large variations
  • Sounds are often ambiguous without context
  • Overlaid noise obscures sound

5
Sound or not?
Which of these sounds are not from their named
classes?
Car horn
Dog bark
Laser gun
6
Previous work
  • Sound Classification (Wold 1996, Casey 2001, etc)
  • Categorize short sound clips
  • Reasonable accuracy (5-20 error)
  • Sound Detection (Defaux 2000, Piamsa-nga 1999)
  • Localize and recognize sound objects in long
    clips
  • Poor performance or assumption of unrealistic
    conditions (e.g., very quiet background)

7
Detection via Windowed Search
Long Track
Clip Classifier
Return locations of detected sound object
Break audio track into short overlapping short
clips
Independently classify short clips as object or
non-object
8
Representation
meows
phone rings
Raw Representation
9
Classification Features
  • Diverse feature set
  • Different sound classes are distinctive in
    different ways
  • means and standard deviations of power at
    different frequencies
  • Band-width, peaks, loudness, etc.
  • 138 features in all

10
Classification by Decision Trees
  • Try to find simple rules that discriminate object
    from non-object
  • Each decision is based on a threshold of a
    feature value
  • Assign confidence based on likelihood of data for
    object and non-object classes at each leaf node

Decision nodes
Leaf Nodes
11
Boosted Trees
  • Problem One decision tree by itself may not be a
    great classifier
  • Solution Use several trees, with each one
    focusing on the mistakes of previously learned
    trees
  • Adaboost
  • Weight training data uniformly
  • Learn a decision tree classifier on weighted data
  • Re-weight data giving more weight to incorrectly
    classified examples
  • Final classification based on linear combination
    of confidences from all learned decision trees

12
Examples of Decision Trees
Meow
Gunshot
Low percentage of power in low frequencies in
mid-time of sound
High power amplitude range
Very high power amplitude range
Gunshot
More complex tree that focuses on examples
misclassified by tree above
13
Cascade of Classifiers
  • Goal eliminate false positives with few false
    negatives in early stages
  • Advantages
  • Allows use of large set of negative training
    examples
  • Improves classification speed
  • Dangers cannot recover from false negatives

Pass (5)
Pass (2)
Pass (0.005)
Stage 1
Sound Clip
Stage 2
Stage 3
Pass
Fail
Fail
Fail
Fail
14
Results Classification Error
  stage 1 stage 1 stage 2 stage 2 stages 3 stages 3
  pos neg pos neg pos neg
meow 0.0 1.4 0.0 1.2 2.2 0.8
phone 0.0 0.4 4.3 0.1 5.9 0.0
car horn 0.0 3.9 0.6 2.2 3.6 1.3
door bell 1.4 2.1 2.1 0.4 6.3 0.1
swords 6.1 1.3 6.7 0.1 6.7 0.0
scream 0.3 5.5 2.7 1.4 5.3 1.1
dog bark 0.7 1.0 6.0 0.3 7.7 0.2
laser gun 0.0 6.8 4.4 5.1 6.7 0.9
explosion 4.1 5.2 7.5 1.5 12.0 0.5
light saber 4.8 6.8 9.7 1.0 13.9 0.2
gunshot 8.1 6.1 12.5 2.3 14.5 1.1
close door 7.9 7.8 14.5 4.8 17.6 2.3
male laugh 4.3 14.7 9.5 9.7 13.3 7.0
average 2.9 4.4 6.0 2.2 8.5 1.1
15
Results ROC curves
Note to approximate negative error rate divide
FP by 25,000
16
Results Anecdotal
Gunshots
Female Laugh
Male Laugh
Swords
Scream
Write a Comment
User Comments (0)
About PowerShow.com