1. Stat 231. A.L. Yuille. Fall 2004 - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

1. Stat 231. A.L. Yuille. Fall 2004

Description:

Practical Issues with SVM. Handwritten Digits: US Post Office, MNIST Datasets. No Handout. ... US Post-Office database. MNIST database. Issues with real problems: ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 15
Provided by: yui8
Category:
Tags: fall | office | post | stat | yuille

less

Transcript and Presenter's Notes

Title: 1. Stat 231. A.L. Yuille. Fall 2004


1
1. Stat 231. A.L. Yuille. Fall 2004
  • Practical Issues with SVM.
  • Handwritten Digits US Post Office, MNIST
    Datasets.
  • No Handout. For people seriously interested in
    this material see Learning with Kernels by B.
    Schoelkopf and A.J. Smola. MIT Press. 2002.

2
2. Practical SVM
  • Support Vector Machines first showed major
    success for
  • the task of handwritten digit/character
    recognition.
  • US Post-Office database. MNIST database.
  • Issues with real problems
  • (1) Multiclassification not just yes/no.
  • (2) Large datasets quadratic programming
    impractical.
  • (3) Invariance in data. Prior Knowledge.
  • (4) Which kernels? When are kernels
    generalizing?

3
3. Multiclassification.
  • Two solutions for M classes.
  • (A) One versus Rest. For each class, i
    1,...,M construct a binary classifier
  • where
  • (n no. of data samples).
  • Classify
  • Comment simple, but heuristic.

4
4. Multiclassification
  • (B) Hyperplanes for each class label
  • Data and Slack Variables
  • Quadratic Programming

5
5. Multiclass and Data Size
  • Empirically, methods A and B give similar quality
    results. Method (B) is more attractive. But the
    solution is more computationally intensive. This
    leads to issue
  • (2) Large Datasets.
  • The Quadratic Programming problem is most
    easily formulated in terms of the dual
  • For large datasets n is enormous. Quadratic
    Programming is computationally expensive.

6
6. Large Datasets
  • Chunking is the favored solution. Observe that
    the
  • will be non-zero only for the support
    vectors.
  • Chunk the training data into k sets of size
    n/k.
  • Train on these k sets and keep the support
    vectors for these sets.
  • Then train on the combined support vectors of the
    k sets.
  • Note need to check the original data to make
    sure that it is correctly classified. If not, add
    more support vectors.

7
7. Large Datasets.
  • Chunking is successful and computationally
    efficient provided the number of support vectors
    is small.
  • This happens if there is a hyperplane/hypersurface
    with large margin separating the classes.
  • It is harder if data from the classes overlap
    e.g. when there are a large number of data points
    which need non-zero slack variables (i.e. support
    vectors).
  • In either case, more support vectors are needed
    for the combined multiclass, case (B), than for
    the heuristic (A).
  • Note other approximate methods for when chunking
    fails.

8
8. Invariances and Priors
  • (3) Invariances in the Data.
  • Recognizing handwritten digits. The
    classifier should be insensitive to small changes
    to the data.
  • For example, small rotations, and small
    translations.

9
9. Invariances and Priors
  • Virtual Support Vectors (VSV).
  • Strategy
  • (i) Train on the original dataset to get
    support vectors.
  • (ii) Generate artificial examples by
    applying the transformations to the support
    vectors.
  • (iii) Train again on the virtual examples
    generated by (ii).

10
10. Virtual Support Vectors
11
11. Invariances and Priors
  • Other methods include
  • (i) Hand-designing features which are
    invariant to the problem.
  • (ii) Training on virtual examples, before
    constructing support vectors. (Computationally
    expensive).
  • (iii) Designing criteria allowing for data
    transformations.
  • (iv) Learning features which are invariants
    (TPA.)
  • In general, it is best to select your input
    features using as much prior knowledge as
    you have about the problem.

12
12. MNIST Results
  • MNIST dataset of handwritten digits.
  • Summary of Results page 341 S.S.
  • Best classifier uses a polynomial
  • 8 VSV means 8 invariance samples per data. (1
    pixel translation, plus rotation).
  • MNIST Dataset has 600,000 handwritten digits.
  • LeNet is a multilayer network with special
    training plus boosting,

13
13. MNIST Results
14
14. Summary.
  • Applying SVMs to real problems requires
  • Multiclass Method (A) One-versus-Rest, (B) Full
    solutions.
  • Computational practically Chunking by dividing
    dataset into
  • subsets, and using the support vectors from
    each set.
  • Invariance generate new samples by apply
    translations to
  • support vectors to generate virtual support
    vectors.
  • Very successful on the DNIST and US Postal Office
    datasets. Simpler than the LeNet approach
    (closest rival.)
Write a Comment
User Comments (0)
About PowerShow.com