Sparse Approximations to Bayesian Gaussian Processes - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Sparse Approximations to Bayesian Gaussian Processes

Description:

Builds on prior work by: Lehel Csato, Manfred Opper (Birmingham) Overview of the Talk ... Good news: EP[u | uI] = PIT uI requires small inversion KI-1 only! ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 18
Provided by: Matthia56
Category:

less

Transcript and Presenter's Notes

Title: Sparse Approximations to Bayesian Gaussian Processes


1
Sparse Approximations toBayesian Gaussian
Processes
  • Matthias Seeger
  • University of Edinburgh

2
Joint Work With
  • Christopher Williams (Edinburgh)
  • Neil Lawrence (Sheffield)
  • Builds on prior work by
  • Lehel Csato, Manfred Opper (Birmingham)

3
Overview of the Talk
  • Gaussian processes and approximations
  • Understanding sparse schemes aslikelihood
    approximations
  • Fast greedy selection
  • Model selection

4
The Recipe
  • Goal Probabilistic approximation to GP
    inference. Scaling (in principle) O(n)
  • Ingredients
  • Gaussian approximations
  • m-projections (moment matching)
  • e-projections (mean field)

5
Gaussian Process Models
  • Given ui yi independent of rest!

6
Roadmap
Non-Gaussian Posterior Process
m-project., EP
GP Approximation
Finite Gaussian Approximation,Feasible Fitting
Scheme
Likelihood approx. by e-project.
Sparse Scheme
Sparse Gaussian Approximation,leading to sparse
predictor
7
Step 1 Infinite ? Finite
  • Gaussian process approximation Q(u() y) of
    posterior P(u() y) by m-projection
  • Data constrains u(u1,,un) only? Q determined
    by finite GaussianQ(u y) and prior GP
  • Optimal Gaussian Q(u y) hard to find, not sparse

8
Step 2 Expectation Propagation
  • Behind EP Approx. variational principle(e-projec
    tions) with weak marginalisation (moment)
    constraints (m-projections)
  • Replace likelihood terms P(yi ui)
    byGaussian-like sites ti(ui) / N(ui mi,pi-1)
  • Update Change ti(ui) ? P(yi ui), m-project to
    Gaussian, extract new ti(ui)
  • ti(ui) role of Shafer/Shenoy update factors

9
Likelihood Approximations
P(y u) P(y uI) ? Sparse approximation!
10
Step 3 KL-optimal Projections
  • If P(u y) / N(m u,P-1) P(u) ,e-projection to
    I-LH-approx. family givesQ(u y) / N(m EPu
    uI,P-1) P(u)Csato/Opper
  • Here
  • Good news EPu uI PIT uI requires small
    inversion KI-1 only!? O(n3) scaling can be
    circumvented

11
Sparse Approximation Scheme
  • Iterate between
  • Select new i and include into I
  • EP updates (m-projection), followed by
    e-projection to I-LH-approx. familyskip EP if
    likelihood is Gaussian
  • Exchange moves possible (unstable?)
  • But how to select inclusion candidates i using a
    fast score?

12
Fast Selection Scores
  • Criteria like Information Gain DQnew Q (Qnew
    after i-inclusion) too expensiveui immediately
    coupled with all n sites!
  • Approximate criteria by removing most couplings
    in Qnew ? O(H d2 1)

1,,n \ (H i)
i
H
Sites
Latents
uI
ui
13
Model Selection
  • Gaussian likelihood (regression)Sparse
    approximation Q(y) to marginal likelihood P(y) by
    plugging in LH approx.? Iterate between descent
    in log Q(y) and re-selection of I
  • General case minimise variational criterion
    behind EP (ADATAP)? Similar to EM, using Q(u
    y) instead of posterior

14
Related Work
  • Csato/Opper Same approximation, but uses
    online-like scheme of including/removing points
    instead of greedy forward selection
  • Smola/Bartlett Restricted to regression with
    Gaussian noise. Expensive selection heuristic
    O(n d) ? high training cost

15
Experiments
  • Regression with Gaussian noise, simplest
    selection score approximation (H)? See paper
    for details
  • Promising hard, low-noise task with many
    irrelevant attributes. Sparse scheme matches
    performance of full GPR in lt1/10 time. Methods
    with isotropic kernel fail poorly? Model
    selection essential here

16
Conclusions
  • Sparse approximations overcome the severe scaling
    problem of GP methods
  • Greedy selection based on active learning
    criteria can yield very sparse solutions with
    errors close to or better than for full GPs
  • Sparse inference is inner loop for model
    selection ? fast selection scores are essential
    for greedy schemes

17
Conclusions (II)
  • Controllable sparsity and training time
  • Staying as close as possible to the gold
    standard (EP), given resource constraints ?
    transfer of properties (errors bars,model
    selection, embedding in other models, )
  • Fast, flexible C implementation will be made
    available
Write a Comment
User Comments (0)
About PowerShow.com