PEGASOS Primal Efficient subGrAdient SOlver for SVM presentation

About This Presentation

Transcript and Presenter's Notes

Title: PEGASOS Primal Efficient subGrAdient SOlver for SVM

1
PEGASOS Primal Efficient sub-GrAdient SOlver for
SVM
YASSO Yet Another Svm SOlver

Shai Shalev-Shwartz
Yoram Singer
Nati Srebro

The Hebrew University Jerusalem, Israel
2
Support Vector Machines
QP form
More natural form
Regularization term
Empirical loss
3
Outline

Previous Work
The Pegasos algorithm
Analysis faster convergence rates
Experiments outperforms state-of-the-art
Extensions
kernels
complex prediction problems
bias term

4
Previous Work

Dual-based methods
Interior Point methods
Memory m2, time m3 log(log(1/?))
Decomposition methods
Memory m, Time super-linear in m
Online learning Stochastic Gradient
Memory O(1), Time 1/?2 (linear kernel)
Memory 1/?2, Time 1/?4 (non-linear kernel)
Typically, online learning algorithms do not
converge to the optimal solution of SVM

Better rates for finite dimensional instances
(Murata, Bottou)
5
PEGASOS
A_t S Subgradient method
A_t 1 Stochastic gradient
Subgradient
Projection
6
Run-Time of Pegasos

Choosing At1 and a linear kernel over Rn
? Run-time required for Pegasos to find ?
accurate solution w.p. 1-?
Run-time does not depend on examples
Depends on difficulty of problem (? and ?)

7
Formal Properties

Definition w is ? accurate if
Theorem 1 Pegasos finds ? accurate solution w.p.
1-? after at most iterations.
Theorem 2 Pegasos finds log(1/?) solutions s.t.
w.p. 1-?, at least one of them is ? accurate
after iterations

8
Proof Sketch
A second look on the update step
9
Proof Sketch

Lemma (free projection)
Logarithmic Regret for OCP (Hazan et al06)
Take expectation
f(wr)-f(w) 0 ? Markov gives that w.p. 1-?
Amplify the confidence

10
Experiments

3 datasets (provided by Joachims)
Reuters CCAT (800K examples, 47k features)
Physics ArXiv (62k examples, 100k features)
Covertype (581k examples, 54 features)
4 competing algorithms
SVM-light (Joachims)
SVM-Perf (Joachims06)
Norma (Kivinen, Smola, Williamson 02)
Zhang04 (stochastic gradient descent)
Source-Code available online

11
Training Time (in seconds)
12
Compare to Norma (on Physics)
obj. value test error
13
Compare to Zhang (on Physics)
Objective
But, tuning the parameter is more expensive than
learning
14
Effect of kAt when T is fixed
Objective
15
Effect of kAt when kT is fixed
Objective
16
I want my kernels !

Pegasos can seamlessly be adapted to employ
non-linear kernels while working solely on the
primal objective function
No need to switch to the dual problem
Number of support vectors is bounded by

17
Complex Decision Problems

Pegasos works whenever we know how to calculate
subgradients of loss func. l(w(x,y))
Example Structured output prediction
Subgradient is ?(x,y)-?(x,y) where y is the
maximizer in the definition of l

18
bias term

Popular approach increase dimension of xCons
pay for b in the regularization term
Calculate subgradients w.r.t. w and w.r.t
bCons convergence rate is 1/?2
DefineCons At need to be large
Search b in an outer loopCons evaluating
objective is 1/?2

19
Discussion

Pegasos Simple Efficient solver for SVM
Sample vs. computational complexity
Sample complexity How many examples do we need
as a function of VC-dim (?), accuracy (?), and
confidence (?)
In Pegasos, we aim at analyzing computational
complexity based on ?, ?, and ? (also in Bottou
Bousquet)
Finding argmin vs. calculating min It seems that
Pegasos finds the argmin more easily than it
requires to calculate the min value

Write a Comment

User Comments (0)

About PowerShow.com

PEGASOS Primal Efficient subGrAdient SOlver for SVM PowerPoint PPT Presentation