Title: Incremental Support Vector Machine Classification Second SIAM International Conference on Data Minin
1 Incremental Support Vector Machine
ClassificationSecond SIAM International
Conference on Data Mining Arlington, Virginia,
April 11-13, 2002
- Glenn Fung Olvi Mangasarian
Data Mining Institute University of Wisconsin -
Madison
2Key Contributions
- Fast incremental classifier based on PSVM
- Proximal Support Vector Machine
- Capable of modifying an existing linear
classifier by both adding and retiring data - Extremely simple to implement
- Small memory requirement
- Even for huge problems (1 billion)
- NO optimization packages (LP,QP) needed
3Outline of Talk
- (Standard) Support vector machines (SVM)
- Classification by halfspaces
- Proximal linear support vector machines (PSVM)
- Classification by proximity to planes
-
- The incremental and decremental algorithm
- Option of keeping or retiring old data
- Numerical results
- 1 Billion points in 10 dimensional space
classified in less than 3 hours! - Numerical results confirm that algorithm time is
linear in the number of data points
4Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
5Proximal Support Vector MachinesFitting the Data
using two parallel Bounding Planes
A
A-
6Standard Support Vector MachineAlgebra of
2-Category Linearly Separable Case
7Standard Support Vector Machine Formulation
8PSVM Formulation
We have from the standard QP SVM formulation
This simple, but critical modification, changes
the nature of the optimization problem
tremendously!!
9Advantages of New Formulation
- Objective function remains strongly convex.
- An explicit exact solution can be written in
terms of the problem data. - PSVM classifier is obtained by solving a single
system of linear equations in the usually small
dimensional input space. - Exact leave-one-out-correctness can be obtained
in terms of problem data.
10Linear PSVM
- Setting the gradient equal to zero, gives a
nonsingular system of linear equations. - Solution of the system gives the desired PSVM
classifier.
11Linear PSVM Solution
12Linear Proximal SVM Algorithm
13Linear Nonlinear PSVM MATLAB Code
function w, gamma psvm(A,d,nu) PSVM linear
and nonlinear classification INPUT A,
ddiag(D), nu. OUTPUT w, gamma w, gamma
psvm(A,d,nu) m,nsize(A)eones(m,1)HA
-e v(dH) vHDe
r(speye(n1)/nuHH)\v solve (I/nuHH)rv
wr(1n)gammar(n1) getting w,gamma from
r
14Incremental PSVM Classification
15Linear Incremental Proximal SVM Algorithm
16Linear Incremental Proximal SVM Adding Retiring
Data
- Capable of modifying an existing linear
classifier by both adding and retiring data - Option of retiring old data is similar to adding
new data - Financial Data old data is obsolete
- Option of keeping old data and merging it with
the new data - Medical Data old data does not obsolesce.
17Numerical experimentsOne-Billion Two-Class
Dataset
- Synthetic dataset consisting of 1 billion points
in 10- dimensional input space - Generated by NDC (Normally Distributed
Clustered) dataset generator - Dataset divided into 500 blocks of 2 million
points each. - Solution obtained in less than 2 hours and 26
minutes - About 30 of the time was spent reading data
from disk. - Testing set Correctness 90.79
18Numerical Experiments Simulation of Two-month
60-Million Dataset
- Synthetic dataset consisting of 60 million
points (1 million per day) in 10- dimensional
input space - Generated using NDC
- At the beginning, we only have data
corresponding to the first month - Every day
- The oldest block of data is retired (1 Million)
- A new block is added (1 Million)
- A new linear classifier is calculated daily
- Only an 11 by 11 matrix is kept in memory at the
end of each day. All other data is purged.
19Numerical experimentsSeparator changing through
time
20Numerical experiments Normals to the separating
hyperplanes Corresponding to 5 day intervals
21Conclusion
- Proposed algorithm is an extremely simple
procedure for generating linear classifiers in an
incremental fashion for huge datasets. - The linear classifier is obtained by solving a
single system of linear equations in the
small dimensional input space. - The proposed algorithm has the ability to retire
old data and add new data in a very simple
manner. - Only a matrix of the size of the input space is
kept in memory at any time
22Future Work
- Extension to nonlinear classification
- Parallel formulation and implementation on
remotely located servers for massive datasets - Real time on-line application, e.g. fraud
detection