Regression trees and regression graphs: Efficient estimators for Generalized Additive Models - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Regression trees and regression graphs: Efficient estimators for Generalized Additive Models

Description:

Regression trees and regression graphs: Efficient estimators for. Generalized ... Greedily choose leaf and split x(j) to minimize (RT,train) = (RT(xi)-yi)2/n ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 22

Provided by: adamt154

Category:

more less

Transcript and Presenter's Notes

Title: Regression trees and regression graphs: Efficient estimators for Generalized Additive Models

1
Regression trees and regression graphsEfficient
estimators for Generalized Additive Models

Adam Tauman Kalai
TTI-Chicago

2
Outline

Generalized Additive Models (GAM)
Computationally efficient regression
Model
Thm Regression graph algorithm efficiently
learns GAMs
Regression tree algorithm
Regression graph algorithm
Correlation boosting

Valiant KearnsSchapire
New
MansourMcAllester
New
3
Generalized Additive Models Hastie Tibshirani
Dist. ? over X Y Rd R f(x) E?yx
u(f1(x(1))f2(x(2))fd(x(d))) monotonic u
R!R, arbitrary fi R!R

e.g., Generalized linear models
u( wx ), monotonic u
linear/logistic models
e.g., f(x) ex2 ex(1)2x(2)2x(d)2

4
Non-Hodgkins Lymphoma International Prognostics
Index
NEJM 93
Risk Factors agegt60, sitesgt1, perf. statusgt1,
LDHgtnormal, stagegt2
5
Setup
true error ?(h) E?(h(x)-y)2
X Rd Y 0,1 training sample (x1,y1),,(xn
,yn)
6
Computationally-efficient regression
KearnsSchapire
Family of target functions
Definition A efficiently learns F
f(x) E?yx 2 F,
8
with probability 1-?,
gt0
E?(h(x)-y)2 E?(f(x)-y)2(term)/nc
n examples
poly(f,1/?)
true error ?(h)
Learning Algorithm A
As runtime must be poly(n,f)
7
Properties of M.S.E.

E(h(x)-y)2 E( h(x)-f(x) f(x)-y )2
E(h(x)-f(x))2 E(f(x)-y)2
E(h(x)-f(x))(f(x)-y)
) hf minimizes E(h(x)-y)2

8
Outline

Generalized Additive Models (GAM)
Computationally efficient regression
Model
Thm Regression graph algorithm efficiently
learns GAMs
Regression tree algorithm
Regression graph algorithm
Correlation boosting

Valiant KearnsSchapire
New
MansourMcAllester
New
9
Results for GAMs
New
1
.1
0
1
0
.6
0
0
0
.7
0
1
1
Regression Graph Learner
0
0
.8
.4
1
0
1
.2
1
1
1
1
0
1
0
1
1
h Rd ! 0,1
n samples 2 X 0,1 X µ Rd

Thm reg. graph learner efficiently learns GAMs
8dist ? over XY with E?yx f(x) 2 GAM
E?(h(x)-y)2 E?(f(x)-y)2 O(LV log(dn/?))
runtime poly(n,d)

8 ? with probability 1-?,
n1/7
10
Results for GAMs
New

f(x) u(?i fi(x(i)))
u R!R, monotonic, L-Lipschitz (Lmax u(z))
fi R!R, bounded total variationV ?i s
fi(z)dz

Thm reg. graph learner efficiently learns GAMs
8dist ? over XY with E?yxf(x) 2 GAM
E?(h(x)-y)2 E?(f(x)-y)2 O(LV log(dn/?))
runtime poly(n,d)

n1/7
11
Results for GAMs
New
1
.1
0
0
.6
0
0
1
0
.7
0
1
1
Regression Tree Learner
0
0
.8
.4
1
0
1
.2
1
1
1
1
0
1
0
1
1
h Rd ! 0,1
n samples 2 X 0,1 X µ Rd

Thm reg. tree learner inefficiently learns GAMs
8dist ? over XY with E?yxf(x) 2 GAM
E?(h(x)-y)2 E?(f(x)-y)2 O(LV)
runtime poly(n,d)

(
)
1/4
log(d)
log(n)
12
Regression Tree Algorithm

Regression tree RT Rd ! 0,1
Training sample (x1,y1),(x2,y2),,(xn,yn) 2 Rd
0,1

(x1,y1), (x2,y2),
avg(y1,y2,,yn)
13
Regression Tree Algorithm

Regression tree RT Rd ! 0,1
Training sample (x1,y1),(x2,y2),,(xn,yn) 2 Rd
0,1

x(j) ? ?
(xi,yi) x(j) lt ?
(xi,yi) x(j) ?
avg(yi xi(j)lt?)
avg(yi xi(j)?)
14
Regression Tree Algorithm

Regression tree RT Rd ! 0,1
Training sample (x1,y1),(x2,y2),,(xn,yn) 2 Rd
0,1

x(j) ? ?
(xi,yi) x(j) lt ?
x(j) ? ?
avg(yi xi(j)lt?)
(xi,yi) x(j) ? and x(j) lt ?
(xi,yi) x(j) ? and x(j) ?
avg(yi x(j)?Æx(j)?)
avg(yi x(j)?Æx(j)lt?)
15
Regression Tree Algorithm

n amount of training data
Put all data into one leaf
Repeat until size(RT)n/log2(n)
Greedily choose leaf and split x(j) ? to
minimize ?(RT,train) ? (RT(xi)-yi)2/n
Divide data in split node into two new leaves

Equivalent to Gini
16
Regression Graph Algorithm MansourMcAllester

Regression graph RG Rd ! 0,1
Training sample (x1,y1),(x2,y2),,(xn,yn) 2 Rd
0,1

x(j) ? ?
x(j) ? ?
x(j) ? ?
(xi,yi) x(j) ? and x(j) ?
(xi,yi) x(j) lt ? and x(j) lt ?
(xi,yi) x(j) ? and x(j) lt ?
(xi,yi) x(j) lt ? and x(j) ?
avg(yi x(j)?Æx(j)?)
avg(yi x(j)lt?Æx(j)lt?)
avg(yi x(j)?Æx(j)lt?)
avg(yi x(j)lt?Æx(j)?)
17
Regression Graph Algorithm MansourMcAllester

Regression graph RG Rd ! 0,1
Training sample (x1,y1),(x2,y2),,(xn,yn) 2 Rd
0,1

x(j) ? ?
x(j) ? ?
x(j) ? ?
(xi,yi) x(j) ? and x(j) ?
(xi,yi) x(j) lt ? and x(j) lt ?
(xi,yi) x(j) lt ? and x(j) ? or x(j) ?
and x(j) lt ?
avg(yi x(j)?Æx(j)?)
avg(yi x(j)lt?Æx(j)lt?)
avg(yi (x(j)lt?Æx(j)?)Ç(x(j)?Æx(j)lt?))
18
Regression Graph Algorithm MansourMcAllester

Put all n training data into one leaf
Repeat until size(RG)n3/7
Split greedily choose leaf and split x(j) ? to
minimize ?(RG,train) ? (RG(xi)-yi)2/n
Divide data in split node into two new leaves
Let ? be the decrease in ?(RG,train) from this
split
Merge(s)
Greedily choose two leaves whose merger increases
?(RG,train) as little as possible
Repeat merging while total increase in
?(RG,train) from merges is ?/2

19
Two useful lemmas

Uniform generalization bound for any n
Existence of a correlated splitThere always
exists a split I(x(i) ?) s.t.,

regression graph R
probability over training sets (x1,y1),,(xn,yn)
20
Motivating natural example

X 0,1d, f(x) (x(1)x(2)x(d))/d, uniform
?
Size(RT) ¼ exp(Size(RG)c), e.g. d4

x(1)gt½
x(1)gt½
x(2)gt½
x(2)gt½
x(2)gt½
x(2)gt½
x(3)gt½
x(3)gt½
x(3)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(3)gt½
x(3)gt½
x(3)gt½
x(3)gt½
0
.75
1
.5
.25
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
.75
1
.75
.75
.5
.5
.5
.25
.5
.75
.5
.25
.5
.25
.25
0
21
Regression boosting

Incremental learning
Suppose you find something of positive
correlation with y, then reg. graphs make
progress
Weak regression implies strong regression, i.e.
small correlations can efficiently be combined to
get correlation near 1 (error near 0)
Generalizes binary classification
boostingKearnsValiant, Schapire,
MansourMcAllester,

22
Conclusions

Generalized additive models are very general
Regression graphs, i.e., regression trees with
merging, provably estimate GAMs using polynomial
data and runtime
Regression boosting generalizes binary
classification boosting
Future work
Improve algorithm/analysis
Room for interesting work in statistics Å
computational learning theory