Bagging and Bayesian Model Averaging - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Bagging and Bayesian Model Averaging

Description:

Importance of Bagging in CART. CART = Classification and Regression Tree ... the variance and deal with the unstable estimator or predictor such as CART. ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 17

Provided by: umiac7

Category:

more less

Transcript and Presenter's Notes

Title: Bagging and Bayesian Model Averaging

1
Bagging and Bayesian Model Averaging
Jian LI
ENEE698A seminar on statistical machine learning
10/29/03
2
Outline

What is bagging?
Example
Theoretical aspect of Bagging
Bayesian model averaging
Conclusion

3
What is Bagging?

Acronym of bootstrap aggregating
Recall about bootstrap
Suppose we have a model fit to a set of training
data. The training set is where
.
The basic idea is to randomly draw datasets with
replacement from the training data. Each sample
set is the same size as the original training
set.
This is done B times and we will have B bootstrap
datasets.
We refit the model to each of the bootstrap
datasets, and examines the behavior over the B
replications.

4
Review on Bootstrap

Bootstrapone way to produce replicate data
Can be used to assess the accuracy of a parameter
estimate or prediction
In Bagging, we use it to improve the estimation
or prediction itself.
For each bootstrap sample set , b1,2,,B,
we fit our model, giving prediction ,
the bagging estimate is
The average over the prediction reduces variance.

5
Examples in Tree-based methods

Tree-based methods
Partition the feature space into a set of
rectangles, and usually fit a constant in each
one.
Some partition are hard to describe
We restrict attention to recursive binary
partitions
First split the space into two regions, and model
the response in each region by the mean of that
region. We will choose the variable and split
point to achieve best fit.
Do the same thing on the partitioned regions
until meeting stopping rules.
Regression model function is

6
Importance of Bagging in CART

CART Classification and Regression Tree
To construct the tree, usually MSE or
Misclassification error is minimized over the
training sample.
Tree based methods have very high variance. It is
unstable because of the hierarchical structure.
Bagging can average many trees to reduce the
variance.

7
Example I Tree-based Regression
From Ref. 1
8
Example II Classification Tree
Training Sample
Results from one CART
Bagged Tree Decision Boundary
Fig. From 2
9
Results from Breiman 96
Bagging not suitable for stable Estimators.From
1
10
Theoretical Analysis Bootstrap

Bootstrap vs. ML and Bayesian Approach
In essence, bootstrap is a computer
implementation of non-parametric or parametric
maximum likelihood. The advantage over ML is that
it allows us to compute ML estimates of standard
errors and other quantities when no analytical
solutions are available.
The bootstrap mean is approximately an posterior
average when we assume an non-informative prior
etc. Compared to Bayesian approach, if we want to
estimate posterior mean, we avoid specify a prior
and draw samples from the posterior
.

11
Analysis on Bagging

Denote the empirical distribution putting
equal probability 1/N on each of the data
points(Xi,Yi). The true bagging estimate is
defined by
The formula is actually a
Monte Carlo estimate of the true bagging
estimate, approaching it as B-gtinfinity.
Note the training sample estimate correspond to
the mode of the posterior while the bagged
estimate is an approximate posterior Bayesian
mean.Thats why bagging can often reduce MSE.

12
Bayesian Model Averaging

A more general frame work. Suppose we have a set
of candidate models ,m1,,M for our training
set Z.
These model may be of same type but different
parameters or different models for same task.
Suppose is some quantity of interest, for
example, a prediction f(x) at some fixed point x.
Posterior distribution of is

13
More on Model Averaging

The posterior mean is
So the Bayesian prediction is a weighted average
of the individual predictions, with weights
proportional to the posterior probability of each
model.
Different strategies from here
Committee methods giving equal probability to
each model. Q Is it the same as bagging?
Use BIC criterion to calculate the weights
Minimization over the weights.

14
Formulation

Given predictions , under
squared-error loss, we can seek the weights
such that
The solution is the population linear regression
of Y on
, which is
So the full regression has smaller error than any
single model.
Since the population linear regression in the
equation is not available, we can replace it with
the linear regression over the training set.
Drawback Complicated model will get higher
weights in this methods. Stacking can handle that
better.

15
Conclusion

Bagging can be used to reduce the variance and
deal with the unstable estimator or predictor
such as CART.
Bootstrap distribution essentially approximates
the posterior distribution under certain
conditions.
Bayesian model averaging provide a general frame
work for model selection or combination.
Connection with future talks
Boosting can usually outperform bagging as will
be discussed later.
Stacking can take into model complexity into
account, which will be shown by Arunm in a moment.

16
References

Lecture slides by Ridgeway, G. et. al.
http//www.datamininglab.com/pubs/kdd99_elder_ridg
eway.pdf
Lecture slides By Higgs, R. et. al.
http//miner.chem.purdue.edu/Lectures/
Lecture1620-20Higgs_Ensembles.pdf
Breiman 96, Bagging Predictors, Machine
Learning, 26, 123-140
Leblanc M. and Tibshirani, R., 96, Combining
estimates in regression and classfication, J. of
Amer. Stat. Assoc. 911641-1650
http//www-stat.stanford.edu/jhf/
Textbook The elements of Statistical Learning