9'2'TreeBased Methods

About This Presentation

Title:

9'2'TreeBased Methods

Description:

CART: Classification and Regression Trees. Y continuous regression model : Regression Trees. Y categorical classification model : Classification Trees ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 15

Provided by: Joy259

Category:

more less

Transcript and Presenter's Notes

Title: 9'2'TreeBased Methods

1
9.2.Tree-Based Methods

Principle
Partition the feature space, X, into a set of
rectangles,
homogenous regions Rm
Recursive binary partition
2. Fit a simple model for Y (e.g. constant) in
each rectangle
CART Classification and Regression Trees
Y continuous ? regression model
Regression Trees
Y categorical ? classification model
Classification Trees

2
Ex in regression case Y continuous and X1 X2
1 Binary partition X1 t1 ? 2 Regions Ra
t1, Rbgtt12. Model Y in Ra and in Rb3.
Recursive partition split again Ra with X2
t2 , Rb with X1 t3 ..
3

Advantage easy to interpret
Disadvantage instability
The error made at the upper level will be
propagated to the lower level
How to grow a tree?
Choice of a splitting variable and
a split point
? min impurity criteria
How large should we grow the tree?
Need of a stopping rule based on
impurity criteria

4
9.2.2. Regression Trees

(xi, yi) for i 1,2,...N with xi
Partition the space into M regions R1, R2, ,
RM.

5
How to grow the regression tree

The best Xj and value s for partition
to minimize the sum of squared
error
Finding the global minimum is computationally
infeasible
Greedy algorithm
at each level choose variable j and value s
as

6
How large should we grow the tree ?

Trade-off between accuracy and generalization
Very large tree overfit
Small tree might not capture the structure
Strategies
1 split only when the decrease of error gt
threshold
(short-sighted)
2 Cost-complexity pruning (preferred)
- Grow large tree T0 (Nm 5)
- Pruning collapsing some internal nodes
? find Ta T0 minimize

Minimize Cost complexity T number of
terminal
node in T
a 0 tuning parameter
For a a , a unique Ta
To find Ta weakest link pruning
Each time collapse an internal node which add
smallest error
? a sequence of subtrees containing Ta
Estimation of a minimize the cross-validated
sum of squared (p214)
Choose from this tree sequence T

8
9.2.3. Classification Trees

Y 1, 2 , k, ,K
Classify the observations in node m to the major
class in the node
With Pmk proportion of observation of class k
in node m
Splitting min Qm(T)
Pruning
In regression, Qm(T) squared error node
? not suitable in classification
? need others Qm(T)

Define Qm(T) for a node m
If we class in
Misclassification error
Gini index
Cross-entropy

Ex 2 classes of Y, p the proportion of second
class
Cross-entropy and Gini are more sensitive
? ex lower for pure node
? To grow the tree use CE or Gini
To prune the tree use Misclassification rate
(or any
other method)

11
9.2.4.Discussions on Tree-based Methods

Categorical Predictors, X
Problem
Consider splits of sub tree t into tL and
tR based on a unordered categorical predictor x
which has q possible values 2(q-1)
possibilities !
Solution
Order the predictor classes by increasing mean of
the outcome Y.
Treat the categorical predictor as if it were
ordered
? optimal split, in terms squared
error
or Gini index, among all 2(q-1)
possible splits

Classification The Loss Matrix
Consequences of misclassification depends on
class
Define loss function L ? K x K Loss Matrix with
Lkk
Modify the Gini index as
? In 2-class case no effect
? weigth observations in class k by
Lkk
But alter prior probability on the classes
? In a terminal node m , classify it to class k
as

Missing Predictor Values
If we have enough training data discard
observations with mission value
Fill in (impute) the missing value.
E.g. the mean of known
values
categorical predictor Create a category called
missing
Surrogate variables
Choose primary predictor and split point
A list of surrogates and split points
The first surrogate predictor best mimics the
split by the primary predictor, the second does
second best,
When sending observations down the tree, use
primary first.
If the value of primary is missing, use the
first surrogate. If the first surrogate is
missing, use the second.

Why binary Splits?
problem with multi-way split it fragments the
data too quickly, leaving insufficient data at
the next level down.
Linear Combination Splits
Split the node based not on but on
with optimization aj and s
Improve the predictive power
Hurt interpretability
Instability of Trees
Other trees
c5.0 after growing tree, dropping
condition
without changing the subset

Write a Comment

User Comments (0)

About PowerShow.com

9'2'TreeBased Methods - PowerPoint PPT Presentation

9'2'TreeBased Methods

CART: Classification and Regression Trees. Y continuous regression model : Regression Trees. Y categorical classification model : Classification Trees ... – PowerPoint PPT presentation