Concave Minimization for Support Vector Machine Classifiers - PowerPoint PPT Presentation

About This Presentation
Title:

Concave Minimization for Support Vector Machine Classifiers

Description:

Representative sample is labeled by expert or oracle. ... Labeling a small percentage of clusters by an expert. Classification by a semi-supervised SVM ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 30
Provided by: gfu8
Category:

less

Transcript and Presenter's Notes

Title: Concave Minimization for Support Vector Machine Classifiers


1
Concave Minimization for Support Vector Machine
Classifiers
Unlabeled Data ClassificationData Selection
Glenn Fung O. L. Mangasarian
2
Part 1 Unlabeled Data Classification
  • Given a large unlabeled dataset
  • Use a k-Median clustering algorithm to select a
    small (5 to 10) representative sample.
  • Representative sample is labeled by expert or
    oracle.
  • Combined labeled-unlabeled dataset is classified
    by a Semi-supervised Support Vector Machine.
  • Test set correctness within 5.2 of a linear
    support vector machine trained on the entire
    dataset labeled by an expert.

3
Part 2 Data Selection for Support
Vector Machines Classifiers
  • Extract a minimal set of data points from a given
    dataset.
  • Minimal set used to generate a Minimal Support
    Vector Machine (MSVM) classifier.
  • MSVM classifier as good or better than that
    obtained by training on entire dataset.
  • Feature selection is incorporated into procedure
    to obtain a minimal set of input features.
  • Data reduction as high as 81 and averaged 66
    over seven public datasets.

4
SVM Linear Support Vector Machine
5
1-norm Linear SVM
6
Unlabeled Data Classification
  • Given a completely unlabeled large data set.
  • Costly to label points by an expert or an oracle.
  • Two Question arise
  • How to choose a small subset for labeling?
  • How to combine labeled and unlabeled data?
  • Answers
  • Use k-median clustering for selecting
    representative points to be labeled.
  • Use semi-supervised SVM to obtain a classifier
    based on labeled and unlabeled data.

7
Unlabeled Data Classification
Unlabeled Data Set
k-Median clustering
Chosen Data
Remaining Data
Expert
Labeled Data
Semi-supervised SVM
Separating Plane
8
K-Median Clustering Algorithm
  • Given m data points. Find k clusters of these
    points such that the sum of the 1-norm distances
    from each point to the closest cluster center is
    minimized.

9
K-Median Clustering Algorithm
10
K-Median Clustering Algorithm
11
Unlabeled Data Classification
Unlabeled Data Set
k-Median clustering
Chosen Data
Remaining Data
Expert
Labeled Data
Semi-supervised SVM
Separating Plane
12
Semi-supervised SVM (S3VM)
  • Given a dataset consisting of
  • labeled (1,-1) points represented by
  • unlabeled points represented by
  • Classify the data into two classes as follows
  • Assign each unlabeled point in to a class
    (1,-1) so as to maximize the distance between
    the bounding planes obtained by a linear SVM1
    applied to entire dataset.

13
Formulation
14
A concave approach
  • The term in the objective function
    is concave because it is the minimum of two
    linear functions.
  • A local solution to this problem is obtained
    solving a succession of linear programs (4 to 7)
    .

15
S3VM Graphical ExampleSeparate Triangles
Circles
Hollow shapes represent labeled data
Solid shapes represent unlabeled data
SVM
S3VM
16

17
(No Transcript)
18
Numerical Tests
19
Part 2 Data Selection for Support
Vector Machines Classifiers
Labeled dataset
1-norm SVM feature selection
Smaller dimension dataset
Support vector suppression MSVM
Separating surface
20
Support Vectors
21
Feature Selection using 1-norm Linear SVM (
small.)
22
Motivation for the Minimal Support Vector
Machine (MSVM)
23
Motivation for the Minimal Support Vector
Machine (MSVM)
  • Suppression of error term y
  • Minimizes the number of misclassified points.
  • Works remarkably well computationally.
  • Reduces positive components of multiplier u and
    hence number of support vectors.

24
MSVM Formulation
25
MSVM Formulation
26
(No Transcript)
27
Numerical Tests
28
Conclusions
  • Unlabeled data classification
  • A fast finite linear programming based approach
    for Semi-supervised Support Vector Machines was
    proposed for classifying large datasets that are
    mostly unlabeled.
  • Totally unlabeled datasets were classified by
  • Labeling a small percentage of clusters by an
    expert
  • Classification by a semi-supervised SVM
  • Test set correctness within 5.2 of a linear SVM
    trained on the entire dataset labeled by an
    expert.

29
Conclusions
  • Data selection for SVM classifiers
  • Minimal SVM (MSVM) extracts a minimal subset
    used to classify the entire dataset.
  • MSVM maintains or improves generalization over
    other classifiers that use the entire dataset.
  • Data reduction as high as 81, and averaged 66
    over seven public datasets.
  • Future work
  • MSVM Promising tool for incremental algorithms.
  • Improve chunking algorithms with MSVM.
  • Nonlinear MSVM strong potential for time
    storage reduction.
Write a Comment
User Comments (0)
About PowerShow.com