GREEDY FUNCTION APPROXIMATION: BOOSTING and ADDITIVE TREE - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

GREEDY FUNCTION APPROXIMATION: BOOSTING and ADDITIVE TREE

Description:

Committee of trees (boosted) model. Optimization Criterion. 11/19/2003 Wednesday ... Boosting Tree. Greedy Strategy: Steepest-descent ... – PowerPoint PPT presentation

Number of Views:524
Avg rating:3.0/5.0
Slides: 20
Provided by: jie90
Category:

less

Transcript and Presenter's Notes

Title: GREEDY FUNCTION APPROXIMATION: BOOSTING and ADDITIVE TREE


1
GREEDY FUNCTION APPROXIMATION BOOSTING and
ADDITIVE TREE
  • Presented By Jie Shao

2
Roadmap
  • Detour introduction of boosting
  • Boosting Trees
  • Numerical Optimization
  • Right-sized Trees for Boosting

3
Background Ensemble learning
  • Some useful results
  • Combines multiple learned models to construct
    better generalizations
  • Classifiers that always agree wont give new
    information
  • Combining predictions of an ensemble will often
    be more accurate than any of single prediction.
  • Ideal learning ensemble individually accurate
    classifiers with high level of disagree
  • Applications bagging, boosting
  • Reason?
  • Small training data, large hypothesis space.
  • Many possible classifiers remain with equal
    accurate
  • Compensate for non-optimal search (several local
    solutions)

4
Boosting
  • Why called boosting it always attempts to boost
    the accuracy of any given learning algorithm,
    whatever weak it is. (A weak learner is a learner
    which performs just slightly better than
    coin-flip does.)
  • Motivation combines the outputs of many weak
    classifiers to produce a powerful committee .
  • Strategy fits model with a set of elementary
    basis functions in an additive way.
  • General form

5
Performance Improvements of Boosting
  • Generates a hypothesis whose error on the
    training set is small by combining many
    hypotheses with large errors.
  • Reality training data exhibit different degrees
    of hardness.
  • Each learning algorithm has unstable behavior,
    i.e, appears sensitive to changes in training
    data.
  • Reduces both variance and bias, while bagging can
    only significantly reduces variance.

6
Boosting Trees
  • General tree definition
  • Committee of trees (boosted) model
  • Optimization Criterion

7
Boosting Trees (contd)
  • If we already know splits , finding
    will be quite easy for any loss function.
  • Finding is difficult, typical strategy
    is to use a greedy, top-down recursive
    partitioning algorithm.
  • It is straightforward to the equation might have
    some simple forms when the loss function is
    squared error or exponential.
  • Otherwise, we need to use numerical method for
    optimization.

8
Numerical Optimization
  • Fast approximate algorithms for solving
    optimization equation with any differentiable
    loss criterion can be derived by analogy to
    numerical optimization
  • Loss function
  • Goal minimize with respect to , where
    here
  • is constrained to be a sum of trees.
  • Recap numerical optimization

9
Greedy Strategy Steepest-descent
  • Steepest-descent is one of the simplest of the
    frequently used numerical minimization methods.
  • Gradient descent search (line search along the
    steepest direction) to minimize the loss on
    training data

10
Gradient Boosting
  • The approach is analogous to line search in
    steepest descent, it performs a separate line
    search for tree components corresponding to each
    separate terminal region in each iteration.
  • Q Why not just use steepest descent?
  • A We need generalize to whole data space, not
    only training data, to which steepest descent is
    feasible to find the optimal.

11
Gradient Tree Boosting Feasible Approach
  • Induce a tree at the mth iteration,
    make the predictions of which as close as
    possible to the negative gradient.
  • Using squared error
  • More robust, less likely to be over fitting.

12
Algorithm MART (Multiple Additive Regression
Trees)
  • Intialize
  • For m1M
  • For I1,..,N, compute
  • Fit regression tree to target giving region
    , j1,2,..,Jm
  • Compute
  • Update
  • Output

13
Remarks for MART
  • Initialization step model is just a single
    terminal node tree.
  • Parameters associated with MART procedure are
  • Number of iteration M
  • Sizes of each of the constitute trees Jm
  • Gradient search operated in a constrained
    function space, each component is a tree.

14
Discussion on Tree Size for Boosting
  • Tree building algorithm was regarded as a
    primitive to produce models to be combined by
    boosting.
  • During each iteration
  • Oversized tree pruned by bottom-up
    procedure
  • The best for one step is not the best in the long
    run.
  • Disadvantages
  • Degrade performance
  • Increase computation

15
Right-Sized Trees for Boosting
  • Strategy restrict all trees to be the same size
    .
  • Adjust to maximize estimated performance
    for the data at hand.
  • Useful property of tree size
  • Limits the input feature interaction level of the
    tree-based approximation.
  • i.e. No interaction effects of level greater than
  • are possible.

16
Estimate Tree Size via Interactive Level
  • Target function
  • The degree of coordinate variable interact with
    one another can be captured by ANOVA expansion
  • In practice, low-order interaction effects tend
    to dominate, empirical results indicate
  • works well.

17
(No Transcript)
18
Conclusion
  • We introduce a machine learning method boosting
  • Boosting tree is a tree-modeled boosting method.
  • Several numerical optimization methods are
    discussed for tree prediction.
  • MART is a state-of-the-art algorithm.
  • When tuning parameter of tree size, the simplest
    way is right-sized tree method.

19
References
  • Jerome H. Friedman(IMS 1999 Reitz Lecture)
  • Greedy Function Approximation A Gradient
    Boosting Machine
  • Jerome H. Friedman(1999) Stochastic Gradient
    Boosting
  • www.boosting.org
Write a Comment
User Comments (0)
About PowerShow.com