9'2'TreeBased Methods - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

9'2'TreeBased Methods

Description:

CART: Classification and Regression Trees. Y continuous regression model : Regression Trees. Y categorical classification model : Classification Trees ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 15
Provided by: Joy259
Category:

less

Transcript and Presenter's Notes

Title: 9'2'TreeBased Methods


1
9.2.Tree-Based Methods
  • Principle
  • Partition the feature space, X, into a set of
    rectangles,
  • homogenous regions Rm
  • Recursive binary partition
  • 2. Fit a simple model for Y (e.g. constant) in
    each rectangle
  • CART Classification and Regression Trees
  • Y continuous ? regression model
    Regression Trees
  • Y categorical ? classification model
    Classification Trees

2
Ex in regression case Y continuous and X1 X2
1 Binary partition X1 t1 ? 2 Regions Ra
t1, Rbgtt12. Model Y in Ra and in Rb3.
Recursive partition split again Ra with X2
t2 , Rb with X1 t3 ..
3
  • Advantage easy to interpret
  • Disadvantage instability
  • The error made at the upper level will be
    propagated to the lower level
  • How to grow a tree?
  • Choice of a splitting variable and
    a split point
  • ? min impurity criteria
  • How large should we grow the tree?
  • Need of a stopping rule based on
    impurity criteria

4
9.2.2. Regression Trees
  • (xi, yi) for i 1,2,...N with xi
  • Partition the space into M regions R1, R2, ,
    RM.

5
How to grow the regression tree
  • The best Xj and value s for partition
  • to minimize the sum of squared
    error
  • Finding the global minimum is computationally
    infeasible
  • Greedy algorithm
  • at each level choose variable j and value s
    as

6
How large should we grow the tree ?
  • Trade-off between accuracy and generalization
  • Very large tree overfit
  • Small tree might not capture the structure
  • Strategies
  • 1 split only when the decrease of error gt
    threshold
  • (short-sighted)
  • 2 Cost-complexity pruning (preferred)
  • - Grow large tree T0 (Nm 5)
  • - Pruning collapsing some internal nodes
  • ? find Ta T0 minimize

7
  • Minimize Cost complexity T number of
    terminal


  • node in T

  • a 0 tuning parameter
  • For a a , a unique Ta
  • To find Ta weakest link pruning
  • Each time collapse an internal node which add
    smallest error
  • ? a sequence of subtrees containing Ta
  • Estimation of a minimize the cross-validated
    sum of squared (p214)
  • Choose from this tree sequence T

8
9.2.3. Classification Trees
  • Y 1, 2 , k, ,K
  • Classify the observations in node m to the major
    class in the node
  • With Pmk proportion of observation of class k
    in node m
  • Splitting min Qm(T)
  • Pruning
  • In regression, Qm(T) squared error node
  • ? not suitable in classification
  • ? need others Qm(T)

9
  • Define Qm(T) for a node m
  • If we class in
  • Misclassification error
  • Gini index
  • Cross-entropy

10
  • Ex 2 classes of Y, p the proportion of second
    class
  • Cross-entropy and Gini are more sensitive
  • ? ex lower for pure node
  • ? To grow the tree use CE or Gini
  • To prune the tree use Misclassification rate
  • (or any
    other method)

11
9.2.4.Discussions on Tree-based Methods
  • Categorical Predictors, X
  • Problem
  • Consider splits of sub tree t into tL and
    tR based on a unordered categorical predictor x
    which has q possible values 2(q-1)
    possibilities !
  • Solution
  • Order the predictor classes by increasing mean of
    the outcome Y.
  • Treat the categorical predictor as if it were
    ordered
  • ? optimal split, in terms squared
    error
  • or Gini index, among all 2(q-1)
    possible splits

12
  • Classification The Loss Matrix
  • Consequences of misclassification depends on
    class
  • Define loss function L ? K x K Loss Matrix with
    Lkk
  • Modify the Gini index as
  • ? In 2-class case no effect
  • ? weigth observations in class k by
    Lkk
  • But alter prior probability on the classes
  • ? In a terminal node m , classify it to class k
    as

13
  • Missing Predictor Values
  • If we have enough training data discard
    observations with mission value
  • Fill in (impute) the missing value.
  • E.g. the mean of known
    values
  • categorical predictor Create a category called
  • missing
  • Surrogate variables
  • Choose primary predictor and split point
  • A list of surrogates and split points
  • The first surrogate predictor best mimics the
    split by the primary predictor, the second does
    second best,
  • When sending observations down the tree, use
    primary first.
  • If the value of primary is missing, use the
    first surrogate. If the first surrogate is
    missing, use the second.

14
  • Why binary Splits?
  • problem with multi-way split it fragments the
    data too quickly, leaving insufficient data at
    the next level down.
  • Linear Combination Splits
  • Split the node based not on but on
  • with optimization aj and s
  • Improve the predictive power
  • Hurt interpretability
  • Instability of Trees
  • Other trees
  • c5.0 after growing tree, dropping
    condition
  • without changing the subset
Write a Comment
User Comments (0)
About PowerShow.com