Bagging and Bayesian Model Averaging - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Bagging and Bayesian Model Averaging

Description:

Importance of Bagging in CART. CART = Classification and Regression Tree ... the variance and deal with the unstable estimator or predictor such as CART. ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 17
Provided by: umiac7
Category:

less

Transcript and Presenter's Notes

Title: Bagging and Bayesian Model Averaging


1
Bagging and Bayesian Model Averaging
Jian LI
ENEE698A seminar on statistical machine learning
10/29/03
2
Outline
  • What is bagging?
  • Example
  • Theoretical aspect of Bagging
  • Bayesian model averaging
  • Conclusion

3
What is Bagging?
  • Acronym of bootstrap aggregating
  • Recall about bootstrap
  • Suppose we have a model fit to a set of training
    data. The training set is where
    .
  • The basic idea is to randomly draw datasets with
    replacement from the training data. Each sample
    set is the same size as the original training
    set.
  • This is done B times and we will have B bootstrap
    datasets.
  • We refit the model to each of the bootstrap
    datasets, and examines the behavior over the B
    replications.

4
Review on Bootstrap
  • Bootstrapone way to produce replicate data
  • Can be used to assess the accuracy of a parameter
    estimate or prediction
  • In Bagging, we use it to improve the estimation
    or prediction itself.
  • For each bootstrap sample set , b1,2,,B,
    we fit our model, giving prediction ,
    the bagging estimate is
  • The average over the prediction reduces variance.

5
Examples in Tree-based methods
  • Tree-based methods
  • Partition the feature space into a set of
    rectangles, and usually fit a constant in each
    one.
  • Some partition are hard to describe
  • We restrict attention to recursive binary
    partitions
  • First split the space into two regions, and model
    the response in each region by the mean of that
    region. We will choose the variable and split
    point to achieve best fit.
  • Do the same thing on the partitioned regions
    until meeting stopping rules.
  • Regression model function is

6
Importance of Bagging in CART
  • CART Classification and Regression Tree
  • To construct the tree, usually MSE or
    Misclassification error is minimized over the
    training sample.
  • Tree based methods have very high variance. It is
    unstable because of the hierarchical structure.
  • Bagging can average many trees to reduce the
    variance.

7
Example I Tree-based Regression
From Ref. 1
8
Example II Classification Tree
Training Sample
Results from one CART
Bagged Tree Decision Boundary
Fig. From 2
9
Results from Breiman 96
Bagging not suitable for stable Estimators.From
1
10
Theoretical Analysis Bootstrap
  • Bootstrap vs. ML and Bayesian Approach
  • In essence, bootstrap is a computer
    implementation of non-parametric or parametric
    maximum likelihood. The advantage over ML is that
    it allows us to compute ML estimates of standard
    errors and other quantities when no analytical
    solutions are available.
  • The bootstrap mean is approximately an posterior
    average when we assume an non-informative prior
    etc. Compared to Bayesian approach, if we want to
    estimate posterior mean, we avoid specify a prior
    and draw samples from the posterior
    .

11
Analysis on Bagging
  • Denote the empirical distribution putting
    equal probability 1/N on each of the data
    points(Xi,Yi). The true bagging estimate is
    defined by
  • The formula is actually a
    Monte Carlo estimate of the true bagging
    estimate, approaching it as B-gtinfinity.
  • Note the training sample estimate correspond to
    the mode of the posterior while the bagged
    estimate is an approximate posterior Bayesian
    mean.Thats why bagging can often reduce MSE.

12
Bayesian Model Averaging
  • A more general frame work. Suppose we have a set
    of candidate models ,m1,,M for our training
    set Z.
  • These model may be of same type but different
    parameters or different models for same task.
  • Suppose is some quantity of interest, for
    example, a prediction f(x) at some fixed point x.
  • Posterior distribution of is

13
More on Model Averaging
  • The posterior mean is
  • So the Bayesian prediction is a weighted average
    of the individual predictions, with weights
    proportional to the posterior probability of each
    model.
  • Different strategies from here
  • Committee methods giving equal probability to
    each model. Q Is it the same as bagging?
  • Use BIC criterion to calculate the weights
  • Minimization over the weights.

14
Formulation
  • Given predictions , under
    squared-error loss, we can seek the weights
  • such that
  • The solution is the population linear regression
    of Y on
  • , which is
  • So the full regression has smaller error than any
    single model.
  • Since the population linear regression in the
    equation is not available, we can replace it with
    the linear regression over the training set.
  • Drawback Complicated model will get higher
    weights in this methods. Stacking can handle that
    better.

15
Conclusion
  • Bagging can be used to reduce the variance and
    deal with the unstable estimator or predictor
    such as CART.
  • Bootstrap distribution essentially approximates
    the posterior distribution under certain
    conditions.
  • Bayesian model averaging provide a general frame
    work for model selection or combination.
  • Connection with future talks
  • Boosting can usually outperform bagging as will
    be discussed later.
  • Stacking can take into model complexity into
    account, which will be shown by Arunm in a moment.

16
References
  • Lecture slides by Ridgeway, G. et. al.
    http//www.datamininglab.com/pubs/kdd99_elder_ridg
    eway.pdf
  • Lecture slides By Higgs, R. et. al.
    http//miner.chem.purdue.edu/Lectures/
    Lecture1620-20Higgs_Ensembles.pdf
  • Breiman 96, Bagging Predictors, Machine
    Learning, 26, 123-140
  • Leblanc M. and Tibshirani, R., 96, Combining
    estimates in regression and classfication, J. of
    Amer. Stat. Assoc. 911641-1650
  • http//www-stat.stanford.edu/jhf/
  • Textbook The elements of Statistical Learning
Write a Comment
User Comments (0)
About PowerShow.com