PACBayesian Theorems for Gaussian Process Classifications - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

PACBayesian Theorems for Gaussian Process Classifications

Description:

(xi,ti) | i=1,...,n} i.i.d.. Nonuniform PAC Bounds. A PAC bound has to ... Uses recent Rademacher complexity bounds together with convex duality argument ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 27
Provided by: Matthia93
Category:

less

Transcript and Presenter's Notes

Title: PACBayesian Theorems for Gaussian Process Classifications


1
PAC-Bayesian Theorems forGaussian Process
Classifications
  • Matthias Seeger
  • University of Edinburgh

2
Overview
  • PAC-Bayesian theorem for Gibbs classifiers
  • Application to Gaussian process classification
  • Experiments
  • Conclusions

3
What Is a PAC Bound?
Sample S (xi,ti) i1,,n
Unknown P
i.i.d.
  • Algorithm S a Predictor t from
    xGeneralisation error gen(S)
  • PAC/distribution free bound

4
Nonuniform PAC Bounds
  • A PAC bound has tohold independent of
    correctnessof prior knowledge
  • It does not have tobe independentof prior
    knowledge
  • Unfortunately, most standard VC bounds are only
    vaguely dependent on prior/model they are applied
    to lack tightness

5
Gibbs Classifiers
  • Bayes classifier
  • Gibbs classifierNew independent w for each
    prediction

R3
2-1,1
6
PAC-Bayesian Theorem
  • Result for Gibbs classifiers
  • Prior P(w), independent of S
  • Posterior Q(w), may depend on S
  • Expected generalisation error
  • Expected empirical error

7
PAC-Bayesian Theorem (II)
  • McAllester (1999)
  • DQ P Relative entropyIf Q(w) feasible
    approximation to Bayesian posterior, we can
    compute DQ P

8
The Proof Idea
  • Step 1 Inequality for a dumb classifier
  • Let
    .Large deviation bound holds for fixed w
    (use Asymptotic Equipartition Property).
  • Since P(w) independent of S, bound holds also on
    average

9
The Proof Idea (II)
  • Could use Jensens inequality
  • But so what?? P is fixed a-priori, giving a
    pretty dumb classifier!
  • Can we exchange P for Q? Yes!
  • What do we have to pay? n-1 DQ P

10
Convex Duality
  • Could finish proof using tricks and Jensen.Lets
    see whats behind instead!
  • Convex (Legendre) DualityA very simple, but
    powerful conceptParameterise linear lower
    bounds to a convex function
  • Behind the scenes (almost) everywhereEM,
    variational bounds, primal-dual optimisation, ,
    PAC-Bayesian theorem

11
Convex Duality (II)
12
Convex Duality (III)
13
The Proof Idea (III)
  • Works just as well for spaces of functions and
    distributions.
  • For our purposeis convex and has the dual

14
The Proof Idea (IV)
  • This gives the boundfor all Q, l
  • Set l(w) n D(w). ThenHave already bounded 2nd
    term right.And on the left (Jensen again)

15
Comments
  • PAC-Bayesian technique genericUse specific
    large deviation bounds for the Q-independent term
  • Choice of QTrade-off between emp(S,Q) and
    divergence DQ P.Bayesian posterior a good
    candidate

16
Gaussian Process Classification
  • Recall yesterdayWe approximate true posterior
    process by a Gaussian one

17
The Relative Entropy
  • But, then the relative entropy is just
  • Straightforward to compute for all GPC
    approximations in this class

18
Concrete GPC Methods
  • We considered so far
  • Laplace GPC Barber/Williams
  • Sparse greedy GPC (IVM) Csato/Opper,
    Lawrence/Seeger/Herbrich
  • SetupDownsampled MNIST (2s vs. 3s). RBF
    kernels. Model selection using independent
    holdout sets (no ML-II allowed here!)

19
Results for Laplace GPC
20
Results Sparse Greedy GPC
  • Extremely tight for a kernel classifier bound
  • Note These results are for Gibbs
    classifiers.Bayes classifiers do better, but the
    (original)PAC-Bayesian theorem does not hold

21
Comparison Compression Bound
  • Compression bound for sparse greedy GPC (Bayes
    version, not Gibbs)
  • Problem Bound not configurable by prior
    knowledge, not specific to the algorithm

22
Comparison With SVM
  • Compression bound (best we could find!)
  • Note Bound values lower than for sparse GPC only
    because of sparser solutionBound does not
    depend on algorithm!

23
Model Selection
24
The Bayes Classifier
  • Very recently, Meir and Zhang obtained
    PAC-Bayesian bound for Bayes-type classifiers
  • Uses recent Rademacher complexity bounds together
    with convex duality argument
  • Can be applied to GP classification as well (not
    yet done)

25
Conclusions
  • PAC-Bayesian technique (convex duality) leads to
    tighter bounds than previously available for
    Bayes-type classifiers (to our knowledge)
  • Easy extension to multi-class scenarios
  • Application to GP classificationTighter bounds
    than previously available for kernel machines (to
    our knowledge)

26
Conclusions (II)
  • Value in practice Bound holds for any posterior
    approximation, not just the true posterior itself
  • Some open problems
  • Unbounded loss functions
  • Characterize the slack in the bound
  • Incorporating ML-II model selection over
    continuous hyperparameter space
Write a Comment
User Comments (0)
About PowerShow.com