Online Learning with a Memory Harness using the Forgetron

About This Presentation

Title:

Online Learning with a Memory Harness using the Forgetron

Description:

No budget algorithm can compete with arbitrary hypotheses. The Forgetron can compete with norm-bounded hypotheses. Works well in practice ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 18

Provided by: sha113

Category:

more less

Transcript and Presenter's Notes

Title: Online Learning with a Memory Harness using the Forgetron

1
Online Learning with a Memory Harness using the
Forgetron
The Hebrew University Jerusalem, Israel

Shai Shalev-Shwartz joint work with
Ofer Dekel and Yoram Singer

Large Scale Kernel Machine NIPS05, Whistler
2
Overview

Online learning with kernels
Goal strict limit on the number of support
vectors
The Forgetron algorithm
Analysis
Experiments

3
Kernel-based Perceptron for Online Learning
Online Learner
yt
sign(ft(xt))
?
sign(ft(xt))
xt
current classifier ft(x) ?i 2 I yi K(xi,x)
Current Active-Set
Mistakes M
I 1,3
I 1,3,4
1 2 3 4 5 6 7 . . .
4
Kernel-based Perceptron for Online Learning
Online Learner
yt
sign(ft(xt))

sign(ft(xt))
xt
current classifier ft(x) ?i 2 I yi K(xi,x)
Current Active-Set
Mistakes M
I 1,3,4
1 2 3 4 5 6 7 . . .
5
Learning on a Budget

I number of mistakes until round t
Memory time inefficient
I might grow unboundedly
Goal Construct a kernel-based online algorithm
for which
I B for each t
Still performs well ? comes with performance
guarantee

6
Mistake Bound for Perceptron

(x1,y1),,(xT,yT) a sequence of examples
A kernel K s.t. K(xt,xt) 1
g a fixed competitor classifier in RKHS
Define t(g) max(0,1 yt g(xt))
Then,

7
Previous Work

Crammer, Kandola, Singer (2003)
Kivinen, Smola, Williamson (2004)
Weston, Bordes, Bottu (2005)

Previous online budget algorithms do not provide
a mistake bound Is our goal attainable ?
8
Mission Impossible

Input space e1,,eB1
Linear kernel K(ei,ej) ei ej ?i,j
Budget constraint I B . Therefore, there
exists j s.t. ?i2 I ?i K(ei,ej) 0
We might always err
But, the competitor g?i ei never errs !
Perceptron makes B1 mistakes

9
Redefine the Goal

We must restrict the competitor g somehow. One
way restrict g
The counter example implies that we cannot
compete with g (B1)1/2
Main result The Forgetron algorithm can compete
with any classifier g s.t.

10
The Forgetron
ft(x) ?i 2 I ?i yi K(xi,x)
1 2 3 ... ... t-1 t
Step (1) - Perceptron
I I t
1 2 3 ... ... t-1 t
Step (2) Shrinking
?i ? ?t ?i
1 2 3 ... ... t-1 t
Step (3) Remove Oldest
r min I I ? I t
1 2 3 ... ... t-1 t
11
Shrinking a two-edged sword

?t is small ? ?r is small ? deviation due
to removal is negligible
?t is small ? deviation due to shrinking is
large
The Forgetron formalizes deviation and
automatically balances the tradeoff

12
Quantifying Deviation

Progress measure ?t ft g2 -
ft1-g2
Progress for each update step
Deviation is measured by negative progress

?t ?t ?t ?t
ft-g2-f-g2
f-g2-f-g2
f-g2-ft1-g2
after Perceptron
after removal
after shrinking
13
Quantifying Deviation
Gain from Perceptron step
Damage from shrinking
Damage from removal
The Forgetron sets
14
Resulting Mistake Bound

For any g s.t.
the number of prediction mistakes the Forgetron
makes is at most

15
Small deviation ? Mistake Bound

Assume low deviation
Perceptrons progress

f
f
f-g2
g
f-g2
16
Small deviation ? Mistake Bound

On one hand positive progress towards good
competitors
On other hand total possible progress is

Corollary Small deviation ? mistake bound
17
Deviation due to Removal

Assume that on round t we remove example r with
weight ?. Then,

Perceptrons progress
Deviation from removaldenote ?t

Remarks
? small ? ? small ? ?t small
?t decreases with yr ft(xr)

18
Deviation due to Shrinking

Case I After shrinking, ft g

?t 0
f-g2
f-g2
19
Deviation due to Shrinking

Case II After shrinking, ft gU

?t U2 ( 1 - ?t )
f-g2
f-g2
20
Self-tuning Shrinking Mechanism

The Forgetron sets ?t to the maximal value in
(0,1 for which the deviation from removal is
small
The above has an analytic solution
By construction, total deviation caused by
removal is at most (15/32) M
It can be shown (strong induction) that the total
deviation caused by shrinking is at most (1/32) M

21
Experiments

Gaussian kernel
Compare performance to Crammer, Kandola Singer
(CKS), NIPS03
Measure the number of prediction mistakes as a
function of the budget
The base line is the performance of the Perceptron

22
Experiment I MNIST dataset
23
Experiment II Census-income (adult)
(Perceptron makes 16,000 mistakes)
24
Experiment III Synthetic Data with Label Noise
25
Summary

No budget algorithm can compete with arbitrary
hypotheses
The Forgetron can compete with norm-bounded
hypotheses
Works well in practice
Does not require parameters
Future work the Forgetron for batch learning

Write a Comment

User Comments (0)

About PowerShow.com

Online Learning with a Memory Harness using the Forgetron - PowerPoint PPT Presentation

Online Learning with a Memory Harness using the Forgetron

No budget algorithm can compete with arbitrary hypotheses. The Forgetron can compete with norm-bounded hypotheses. Works well in practice ... – PowerPoint PPT presentation