On Predicting Rare Classes with SVM Ensembles in Scene Classification PowerPoint PPT Presentation

presentation player overlay
1 / 23
About This Presentation
Transcript and Presenter's Notes

Title: On Predicting Rare Classes with SVM Ensembles in Scene Classification


1
On Predicting Rare Classes with SVM Ensembles in
Scene Classification
  • Rong Yan, Yan Liu, Rong Jin, Alex Hauptmann

2
Outline
  • Motivation
  • SVM ensembles to solve rare class problem
  • SVM ensembles
  • Combination
  • Held-out training set
  • Experimental Results
  • Conclusion

3
Scene Classification
  • Task classify images into meaningful semantic
    scenes
  • Indoor, Outdoor, Cityscape, Landscape, Sunrise
  • Application
  • Descriptor in Multimedia Content Description
    Interface (MPEG-7)
  • Provide high-level features for retrieval system

4
Scene Classification
  • Problem Formulation Binary classification based
    on low-level visual features
  • Image color, texture, and shape features
  • Support Vector Machines
  • Maximize margin between classes

5
Rare Class Problem
  • Small number of positive examples in real-world
    data
  • Positive examples coherent subset (e.g.
    Cityscape, Landscape, Sunrise, Sunset)
  • Negative examples less well-defined as
    "everything else
  • Rare class problem is very common
  • Scene classification, fraud detection, network
    intrusion, text categorization and web mining
  • However, most studies are based on balanced data
    set
  • the number of positive examples is comparable to
    negative examples

6
Why we study rare class problem?
  • Dramatically degrade classification performance
  • All the images are predicted as negative
  • Reason
  • Most classifiers minimize classification error

7
Solutions
  • Better performance measure
  • Precision and recall
  • F1 measure 2PR/(PR)
  • Better classification schemes
  • Under Sampling throw away the negative data
  • Over Sampling replicate the positive data
  • Boosting increase the weight of positive data
    iteratively

8
Under Sampling
  • Throw away the negative data to balance the
    training data distribution

9
Over Sampling
  • Replicate the positive data to balance the
    training data distribution
  • The same as changing the cost function of
    positive data

10
Motivation for our approach
  • Drawbacks for existing approaches
  • Under sampling lost most of potentially useful
    data
  • Over sampling much more training time
  • Critical to SVMs Time complexity O(N2SV)
  • Varying cost function more training time
  • Boosting much more training time and the
    performance is related to weak classifiers
  • Our approach SVM ensembles

11
Outline
  • Motivation
  • SVM ensembles to solve rare class problem
  • SVM ensembles
  • Combination
  • Held-out training set
  • Experimental Results
  • Conclusion

12
SVM ensembles
  • Training Process
  • Decompose negative examples into groups
  • Combine the positive examples with each group of
    negative examples
  • Train base classifiers(SVMs) individually
  • Combine all base classifiers

13
Why SVM ensembles?
  • Advantages compared to over sampling and under
    sampling
  • Keep all of the useful information
  • Less training time than over sampling, 2-3 times
    speedup
  • Reduce the variance of classifier prediction
  • Overcome the limitation of SVM software
  • Most software produce sub-optimal results
  • Combination might provide better performance

14
Classifier Combination
  • How to combine base classifiers
  • Determine the aggregation functions F(x)

Classifier Combination
15
Fixed combining rules
  • Fixed combining rules
  • Majority Voting vote for each class
  • Sum Rule sum of posterior probabilities
  • Averaging out the mistake of classifiers
  • They are sub-optimal
  • Base classifiers are never equally imperfectly
    trained

16
Meta-classifiers
  • Treat classifier outputs as features, y ( y1,
    .. , yK)
  • Stacking another classifier on top
  • Multi-layer perceptrons (MLPs)
  • SVMs
  • Advantages
  • Learn the combination weights automatically
  • Less sensitive to poor performed classifiers
  • SVMs require less tuning effort than MLPs
  • Require a held-out set to train

17
Held-out training set
  • Purpose train top-level classifiers
  • Select the training examples within the local
    region of test sets
  • Training set distribution is likely to be
    different from test set distribution
  • Better estimation of test set distribution

18
Outline
  • Motivation
  • SVM ensembles to solve rare class problem
  • SVM ensembles
  • Combination
  • Held-out training set
  • Experimental Results
  • Conclusion

19
Experimental Setting
  • TREC02 Video Track Feature Extraction Task
  • Text REtrieval Conference(TREC) Video Track
  • http//www-nlpir.nist.gov/projects/t01v/
  • Feature Extraction Task
  • 23 hours of digital video
  • manually labeled and sample to more than 2,000
    images
  • Our task detect cityscape, landscape for images

20
Experimental Setting
  • Classification setting
  • SVMLight RBF kernel ?0.05
  • 10-folder cross validation
  • Performance measure F1
  • Image features
  • Color feature mean and variance for HSV space
  • Texture feature Gabor filter with 6 angles
  • Total length 144

21
Results
  • Results for cityscape

22
Results
  • Results for landscape

23
Conclusion
  • SVM ensembles are effective to address rare class
    problem
  • Hierarchical SVMs are preferred
  • High effectiveness
  • High efficiency
  • Future work
  • Extend our discussion to other types of ensembles
    and base classifiers
Write a Comment
User Comments (0)
About PowerShow.com