Knowledge-Based Support Vector Machine Classifiers - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Knowledge-Based Support Vector Machine Classifiers

Description:

Wisconsin breast cancer prognosis dataset. Incorporating knowledge sets into a classifier ... Wisconsin Breast Cancer Prognosis Dataset Description of the data ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 26
Provided by: Mangas
Learn more at: http://pages.cs.wisc.edu
Category:

less

Transcript and Presenter's Notes

Title: Knowledge-Based Support Vector Machine Classifiers


1
Knowledge-Based Support Vector Machine
Classifiers
NIPS2002, Vancouver, December 9-14, 2002
  • Glenn Fung
  • Olvi Mangasarian
  • Jude Shavlik

University of Wisconsin-Madison
2
Outline of Talk
  • Support Vector Machine (SVM) Classifiers
  • Standard Quadratic Programming formulation
  • Linear Programming formulation1-norm linear
    SVM
  • Polyhedral Knowledge Sets
  • Knowledge-Based SVMs
  • Incorporating knowledge sets into a classifier
  • Empirical Evaluation
  • The DNA promoter dataset
  • Wisconsin breast cancer prognosis dataset
  • Conclusion

3
Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
4
Support Vector MachinesMaximizing the Margin
between Bounding Planes
A
A-
5
Algebra of the Classification Problem 2-Category
Linearly Separable Case
  • Given m points in n dimensional space
  • Represented by an m-by-n matrix A
  • More succinctly

6
Support Vector Machines Quadratic Programming
Formulation
  • Solve the following quadratic program

7
Support Vector MachinesLinear Programming
Formulation
  • Use the 1-norm instead of the 2-norm
  • This is equivalent to the following linear
    program

8
Knowledge-Based SVM via Polyhedral Knowledge
Sets
9
Incorporating Knowledge Sets Into an SVM
Classifier
  • Will show that this implication is equivalent to
    a set of constraints that can be imposed on the
    classification problem.

10
Knowledge Set Equivalence Theorem
11
Proof of Equivalence Theorem( Via Nonhomogeneous
Farkas or LP Duality)
Proof By LP Duality
12
Knowledge-Based SVM Classification
13
Knowledge-Based SVM Classification
14
Knowledge-Based LP with Slack VariablesMinimize
Error in Knowledge Set Constraints Satisfaction
15
Knowledge-Based SVM via Polyhedral Knowledge
Sets
16
Empirical EvaluationThe Promoter Recognition
Dataset
  • Promoter Short DNA sequence that precedes a
    gene sequence.
  • A promoter consists of 57 consecutive DNA
    nucleotides belonging to A,G,C,T .
  • Important to distinguish between promoters and
    nonpromoters
  • This distinction identifies starting locations
    of genes in long uncharacterized DNA sequences.

17
The Promoter Recognition DatasetNumerical
Representation
  • Using 1-of-4 representation

57 nominal values
57 x 4 228 binary values
18
Promoter Recognition Dataset Prior Knowledge
Rules
  • Prior knowledge consist of the following 64
    rules Ri

19
Promoter Recognition Dataset Sample Rules
20
The Promoter Recognition DatasetComparative Test
Results
21
Wisconsin Breast Cancer Prognosis Dataset
Description of the data
  • 110 instances corresponding to 41 patients
    whose cancer had recurred and 69 patients whose
    cancer had not recurred
  • 32 numerical features
  • The domain theory two simple rules used by
    doctors

22
Wisconsin Breast Cancer Prognosis Dataset
Numerical Testing Results
  • Doctors rules applicable to only 32 out of 110
    patients.
  • Only 22 of 32 patients are classified correctly
    by this rule.
  • KSVM linear classifier applicable to all patients
    with correctness of 66.4.
  • Correctness comparable to best available
    results using conventional SVMs.
  • KSVM can get classifiers based on knowledge
    without using any data.

23
Conclusion
  • Prior knowledge easily incorporated into
    classifiers through polyhedral knowledge sets.
  • Resulting problem is a simple linear program.
  • Knowledge sets can be used with or without
    conventional labeled data.
  • In either case, KSVM is better than most
    classifiers tested.

24
Future Research
  • Generate classifiers based on prior expert
    knowledge in various fields
  • Diagnostic rules for various diseases
  • Financial investment rules
  • Intrusion detection rules
  • Extend knowledge sets to general convex sets
  • Nonlinear kernel classifiers. Challenges
  • Express prior knowledge nonlinearly
  • Extend equivalence theorem to general convex
    sets

25
Web Pages
Write a Comment
User Comments (0)
About PowerShow.com