Knowledge-Based Support Vector Machine Classifiers

About This Presentation

Title:

Knowledge-Based Support Vector Machine Classifiers

Description:

Wisconsin breast cancer prognosis dataset. Incorporating knowledge sets into a classifier ... Wisconsin Breast Cancer Prognosis Dataset Description of the data ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 26

Provided by: Mangas

Learn more at: http://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Knowledge-Based Support Vector Machine Classifiers

1
Knowledge-Based Support Vector Machine
Classifiers
NIPS2002, Vancouver, December 9-14, 2002

Glenn Fung
Olvi Mangasarian
Jude Shavlik

University of Wisconsin-Madison
2
Outline of Talk

Support Vector Machine (SVM) Classifiers

Standard Quadratic Programming formulation

Linear Programming formulation1-norm linear
SVM

Polyhedral Knowledge Sets

Knowledge-Based SVMs

Incorporating knowledge sets into a classifier

Empirical Evaluation

The DNA promoter dataset

Wisconsin breast cancer prognosis dataset

Conclusion

3
Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
4
Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
5
Algebra of the Classification Problem 2-Category
Linearly Separable Case

Given m points in n dimensional space

Represented by an m-by-n matrix A

More succinctly

6
Support Vector Machines Quadratic Programming
Formulation

Solve the following quadratic program

7
Support Vector MachinesLinear Programming
Formulation

Use the 1-norm instead of the 2-norm

This is equivalent to the following linear
program

8
Knowledge-Based SVM via Polyhedral Knowledge
Sets
9
Incorporating Knowledge Sets Into an SVM
Classifier

Will show that this implication is equivalent to
a set of constraints that can be imposed on the
classification problem.

10
Knowledge Set Equivalence Theorem
11
Proof of Equivalence Theorem( Via Nonhomogeneous
Farkas or LP Duality)
Proof By LP Duality
12
Knowledge-Based SVM Classification
13
Knowledge-Based SVM Classification
14
Knowledge-Based LP with Slack VariablesMinimize
Error in Knowledge Set Constraints Satisfaction
15
Knowledge-Based SVM via Polyhedral Knowledge
Sets
16
Empirical EvaluationThe Promoter Recognition
Dataset

Promoter Short DNA sequence that precedes a
gene sequence.
A promoter consists of 57 consecutive DNA
nucleotides belonging to A,G,C,T .
Important to distinguish between promoters and
nonpromoters
This distinction identifies starting locations
of genes in long uncharacterized DNA sequences.

17
The Promoter Recognition DatasetNumerical
Representation

Using 1-of-4 representation

57 nominal values
57 x 4 228 binary values
18
Promoter Recognition Dataset Prior Knowledge
Rules

Prior knowledge consist of the following 64
rules Ri

19
Promoter Recognition Dataset Sample Rules
20
The Promoter Recognition DatasetComparative Test
Results
21
Wisconsin Breast Cancer Prognosis Dataset
Description of the data

110 instances corresponding to 41 patients
whose cancer had recurred and 69 patients whose
cancer had not recurred
32 numerical features
The domain theory two simple rules used by
doctors

22
Wisconsin Breast Cancer Prognosis Dataset
Numerical Testing Results

Doctors rules applicable to only 32 out of 110
patients.
Only 22 of 32 patients are classified correctly
by this rule.
KSVM linear classifier applicable to all patients
with correctness of 66.4.
Correctness comparable to best available
results using conventional SVMs.
KSVM can get classifiers based on knowledge
without using any data.

23
Conclusion

Prior knowledge easily incorporated into
classifiers through polyhedral knowledge sets.
Resulting problem is a simple linear program.
Knowledge sets can be used with or without
conventional labeled data.
In either case, KSVM is better than most
classifiers tested.

24
Future Research