Title: Concave Minimization for Support Vector Machine Classifiers
1Concave Minimization for Support Vector Machine
Classifiers
Unlabeled Data ClassificationData Selection
Glenn Fung O. L. Mangasarian
2Part 1 Unlabeled Data Classification
- Given a large unlabeled dataset
- Use a k-Median clustering algorithm to select a
small (5 to 10) representative sample. - Representative sample is labeled by expert or
oracle. - Combined labeled-unlabeled dataset is classified
by a Semi-supervised Support Vector Machine. - Test set correctness within 5.2 of a linear
support vector machine trained on the entire
dataset labeled by an expert.
3Part 2 Data Selection for Support
Vector Machines Classifiers
- Extract a minimal set of data points from a given
dataset. - Minimal set used to generate a Minimal Support
Vector Machine (MSVM) classifier. - MSVM classifier as good or better than that
obtained by training on entire dataset. - Feature selection is incorporated into procedure
to obtain a minimal set of input features. - Data reduction as high as 81 and averaged 66
over seven public datasets.
4SVM Linear Support Vector Machine
51-norm Linear SVM
6Unlabeled Data Classification
- Given a completely unlabeled large data set.
- Costly to label points by an expert or an oracle.
- Two Question arise
- How to choose a small subset for labeling?
- How to combine labeled and unlabeled data?
- Answers
- Use k-median clustering for selecting
representative points to be labeled. - Use semi-supervised SVM to obtain a classifier
based on labeled and unlabeled data.
7Unlabeled Data Classification
Unlabeled Data Set
k-Median clustering
Chosen Data
Remaining Data
Expert
Labeled Data
Semi-supervised SVM
Separating Plane
8K-Median Clustering Algorithm
- Given m data points. Find k clusters of these
points such that the sum of the 1-norm distances
from each point to the closest cluster center is
minimized.
9K-Median Clustering Algorithm
10K-Median Clustering Algorithm
11Unlabeled Data Classification
Unlabeled Data Set
k-Median clustering
Chosen Data
Remaining Data
Expert
Labeled Data
Semi-supervised SVM
Separating Plane
12Semi-supervised SVM (S3VM)
- Given a dataset consisting of
- labeled (1,-1) points represented by
- unlabeled points represented by
- Classify the data into two classes as follows
- Assign each unlabeled point in to a class
(1,-1) so as to maximize the distance between
the bounding planes obtained by a linear SVM1
applied to entire dataset.
13Formulation
14 A concave approach
- The term in the objective function
is concave because it is the minimum of two
linear functions. - A local solution to this problem is obtained
solving a succession of linear programs (4 to 7)
.
15S3VM Graphical ExampleSeparate Triangles
Circles
Hollow shapes represent labeled data
Solid shapes represent unlabeled data
SVM
S3VM
16 17(No Transcript)
18Numerical Tests
19Part 2 Data Selection for Support
Vector Machines Classifiers
Labeled dataset
1-norm SVM feature selection
Smaller dimension dataset
Support vector suppression MSVM
Separating surface
20Support Vectors
21Feature Selection using 1-norm Linear SVM (
small.)
22Motivation for the Minimal Support Vector
Machine (MSVM)
23Motivation for the Minimal Support Vector
Machine (MSVM)
- Suppression of error term y
- Minimizes the number of misclassified points.
- Works remarkably well computationally.
- Reduces positive components of multiplier u and
hence number of support vectors.
24MSVM Formulation
25MSVM Formulation
26(No Transcript)
27Numerical Tests
28Conclusions
- Unlabeled data classification
- A fast finite linear programming based approach
for Semi-supervised Support Vector Machines was
proposed for classifying large datasets that are
mostly unlabeled. - Totally unlabeled datasets were classified by
- Labeling a small percentage of clusters by an
expert - Classification by a semi-supervised SVM
- Test set correctness within 5.2 of a linear SVM
trained on the entire dataset labeled by an
expert.
29Conclusions
- Data selection for SVM classifiers
- Minimal SVM (MSVM) extracts a minimal subset
used to classify the entire dataset. - MSVM maintains or improves generalization over
other classifiers that use the entire dataset. - Data reduction as high as 81, and averaged 66
over seven public datasets. - Future work
- MSVM Promising tool for incremental algorithms.
- Improve chunking algorithms with MSVM.
- Nonlinear MSVM strong potential for time
storage reduction.