PACBayesian Theorems for Gaussian Process Classifications

About This Presentation

Title:

PACBayesian Theorems for Gaussian Process Classifications

Description:

(xi,ti) | i=1,...,n} i.i.d.. Nonuniform PAC Bounds. A PAC bound has to ... Uses recent Rademacher complexity bounds together with convex duality argument ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 27

Provided by: Matthia93

Category:

more less

Transcript and Presenter's Notes

Title: PACBayesian Theorems for Gaussian Process Classifications

1
PAC-Bayesian Theorems forGaussian Process
Classifications

Matthias Seeger
University of Edinburgh

2
Overview

PAC-Bayesian theorem for Gibbs classifiers
Application to Gaussian process classification
Experiments
Conclusions

3
What Is a PAC Bound?
Sample S (xi,ti) i1,,n
Unknown P
i.i.d.

Algorithm S a Predictor t from
xGeneralisation error gen(S)
PAC/distribution free bound

4
Nonuniform PAC Bounds

A PAC bound has tohold independent of
correctnessof prior knowledge
It does not have tobe independentof prior
knowledge
Unfortunately, most standard VC bounds are only
vaguely dependent on prior/model they are applied
to lack tightness

5
Gibbs Classifiers

Bayes classifier
Gibbs classifierNew independent w for each
prediction

R3
2-1,1
6
PAC-Bayesian Theorem

Result for Gibbs classifiers
Prior P(w), independent of S
Posterior Q(w), may depend on S
Expected generalisation error
Expected empirical error

7
PAC-Bayesian Theorem (II)

McAllester (1999)
DQ P Relative entropyIf Q(w) feasible
approximation to Bayesian posterior, we can
compute DQ P

8
The Proof Idea

Step 1 Inequality for a dumb classifier
Let
.Large deviation bound holds for fixed w
(use Asymptotic Equipartition Property).
Since P(w) independent of S, bound holds also on
average

9
The Proof Idea (II)

Could use Jensens inequality
But so what?? P is fixed a-priori, giving a
pretty dumb classifier!
Can we exchange P for Q? Yes!
What do we have to pay? n-1 DQ P

10
Convex Duality

Could finish proof using tricks and Jensen.Lets
see whats behind instead!
Convex (Legendre) DualityA very simple, but
powerful conceptParameterise linear lower
bounds to a convex function
Behind the scenes (almost) everywhereEM,
variational bounds, primal-dual optimisation, ,
PAC-Bayesian theorem

11
Convex Duality (II)
12
Convex Duality (III)
13
The Proof Idea (III)

Works just as well for spaces of functions and
distributions.
For our purposeis convex and has the dual

14
The Proof Idea (IV)

This gives the boundfor all Q, l
Set l(w) n D(w). ThenHave already bounded 2nd
term right.And on the left (Jensen again)

15
Comments

PAC-Bayesian technique genericUse specific
large deviation bounds for the Q-independent term
Choice of QTrade-off between emp(S,Q) and
divergence DQ P.Bayesian posterior a good
candidate

16
Gaussian Process Classification

Recall yesterdayWe approximate true posterior
process by a Gaussian one

17
The Relative Entropy

But, then the relative entropy is just
Straightforward to compute for all GPC
approximations in this class

18
Concrete GPC Methods

We considered so far
Laplace GPC Barber/Williams
Sparse greedy GPC (IVM) Csato/Opper,
Lawrence/Seeger/Herbrich
SetupDownsampled MNIST (2s vs. 3s). RBF
kernels. Model selection using independent
holdout sets (no ML-II allowed here!)

19
Results for Laplace GPC
20
Results Sparse Greedy GPC

Extremely tight for a kernel classifier bound
Note These results are for Gibbs
classifiers.Bayes classifiers do better, but the
(original)PAC-Bayesian theorem does not hold

21
Comparison Compression Bound

Compression bound for sparse greedy GPC (Bayes
version, not Gibbs)
Problem Bound not configurable by prior
knowledge, not specific to the algorithm

22
Comparison With SVM

Compression bound (best we could find!)
Note Bound values lower than for sparse GPC only
because of sparser solutionBound does not
depend on algorithm!

23
Model Selection
24
The Bayes Classifier

Very recently, Meir and Zhang obtained
PAC-Bayesian bound for Bayes-type classifiers
Uses recent Rademacher complexity bounds together
with convex duality argument
Can be applied to GP classification as well (not
yet done)

25
Conclusions

PAC-Bayesian technique (convex duality) leads to
tighter bounds than previously available for
Bayes-type classifiers (to our knowledge)
Easy extension to multi-class scenarios
Application to GP classificationTighter bounds
than previously available for kernel machines (to
our knowledge)

26
Conclusions (II)

Value in practice Bound holds for any posterior
approximation, not just the true posterior itself
Some open problems
Unbounded loss functions
Characterize the slack in the bound
Incorporating ML-II model selection over
continuous hyperparameter space

Write a Comment

User Comments (0)