Regression trees and regression graphs: Efficient estimators for Generalized Additive Models - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Regression trees and regression graphs: Efficient estimators for Generalized Additive Models

Description:

Regression trees and regression graphs: Efficient estimators for. Generalized ... Greedily choose leaf and split x(j) to minimize (RT,train) = (RT(xi)-yi)2/n ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 22
Provided by: adamt154
Category:

less

Transcript and Presenter's Notes

Title: Regression trees and regression graphs: Efficient estimators for Generalized Additive Models


1
Regression trees and regression graphsEfficient
estimators for Generalized Additive Models
  • Adam Tauman Kalai
  • TTI-Chicago

2
Outline
  • Generalized Additive Models (GAM)
  • Computationally efficient regression
  • Model
  • Thm Regression graph algorithm efficiently
    learns GAMs
  • Regression tree algorithm
  • Regression graph algorithm
  • Correlation boosting

Valiant KearnsSchapire
New
MansourMcAllester
New
3
Generalized Additive Models Hastie Tibshirani
Dist. ? over X Y Rd R f(x) E?yx
u(f1(x(1))f2(x(2))fd(x(d))) monotonic u
R!R, arbitrary fi R!R
  • e.g., Generalized linear models
  • u( wx ), monotonic u
  • linear/logistic models
  • e.g., f(x) ex2 ex(1)2x(2)2x(d)2

4
Non-Hodgkins Lymphoma International Prognostics
Index
NEJM 93
Risk Factors agegt60, sitesgt1, perf. statusgt1,
LDHgtnormal, stagegt2
5
Setup
true error ?(h) E?(h(x)-y)2
X Rd Y 0,1 training sample (x1,y1),,(xn
,yn)
6
Computationally-efficient regression
KearnsSchapire
Family of target functions
Definition A efficiently learns F
f(x) E?yx 2 F,
8
with probability 1-?,
gt0
E?(h(x)-y)2 E?(f(x)-y)2(term)/nc
n examples
poly(f,1/?)
true error ?(h)
Learning Algorithm A
As runtime must be poly(n,f)
7
Properties of M.S.E.
  • E(h(x)-y)2 E( h(x)-f(x) f(x)-y )2
    E(h(x)-f(x))2 E(f(x)-y)2
  • E(h(x)-f(x))(f(x)-y)
  • ) hf minimizes E(h(x)-y)2

8
Outline
  • Generalized Additive Models (GAM)
  • Computationally efficient regression
  • Model
  • Thm Regression graph algorithm efficiently
    learns GAMs
  • Regression tree algorithm
  • Regression graph algorithm
  • Correlation boosting

Valiant KearnsSchapire
New
MansourMcAllester
New
9
Results for GAMs
New
1
.1
0
1
0
.6
0
0
0
.7
0
1
1
Regression Graph Learner
0
0
.8
.4
1
0
1
.2
1
1
1
1
0
1
0
1
1
h Rd ! 0,1
n samples 2 X 0,1 X µ Rd
  • Thm reg. graph learner efficiently learns GAMs
  • 8dist ? over XY with E?yx f(x) 2 GAM
  • E?(h(x)-y)2 E?(f(x)-y)2 O(LV log(dn/?))
  • runtime poly(n,d)

8 ? with probability 1-?,
n1/7
10
Results for GAMs
New
  • f(x) u(?i fi(x(i)))
  • u R!R, monotonic, L-Lipschitz (Lmax u(z))
  • fi R!R, bounded total variationV ?i s
    fi(z)dz
  • Thm reg. graph learner efficiently learns GAMs
  • 8dist ? over XY with E?yxf(x) 2 GAM
  • E?(h(x)-y)2 E?(f(x)-y)2 O(LV log(dn/?))
  • runtime poly(n,d)

n1/7
11
Results for GAMs
New
1
.1
0
0
.6
0
0
1
0
.7
0
1
1
Regression Tree Learner
0
0
.8
.4
1
0
1
.2
1
1
1
1
0
1
0
1
1
h Rd ! 0,1
n samples 2 X 0,1 X µ Rd
  • Thm reg. tree learner inefficiently learns GAMs
  • 8dist ? over XY with E?yxf(x) 2 GAM
  • E?(h(x)-y)2 E?(f(x)-y)2 O(LV)
  • runtime poly(n,d)

(
)
1/4
log(d)
log(n)
12
Regression Tree Algorithm
  • Regression tree RT Rd ! 0,1
  • Training sample (x1,y1),(x2,y2),,(xn,yn) 2 Rd
    0,1

(x1,y1), (x2,y2),
avg(y1,y2,,yn)
13
Regression Tree Algorithm
  • Regression tree RT Rd ! 0,1
  • Training sample (x1,y1),(x2,y2),,(xn,yn) 2 Rd
    0,1

x(j) ? ?
(xi,yi) x(j) lt ?
(xi,yi) x(j) ?
avg(yi xi(j)lt?)
avg(yi xi(j)?)
14
Regression Tree Algorithm
  • Regression tree RT Rd ! 0,1
  • Training sample (x1,y1),(x2,y2),,(xn,yn) 2 Rd
    0,1

x(j) ? ?
(xi,yi) x(j) lt ?
x(j) ? ?
avg(yi xi(j)lt?)
(xi,yi) x(j) ? and x(j) lt ?
(xi,yi) x(j) ? and x(j) ?
avg(yi x(j)?Æx(j)?)
avg(yi x(j)?Æx(j)lt?)
15
Regression Tree Algorithm
  • n amount of training data
  • Put all data into one leaf
  • Repeat until size(RT)n/log2(n)
  • Greedily choose leaf and split x(j) ? to
    minimize ?(RT,train) ? (RT(xi)-yi)2/n
  • Divide data in split node into two new leaves

Equivalent to Gini
16
Regression Graph Algorithm MansourMcAllester
  • Regression graph RG Rd ! 0,1
  • Training sample (x1,y1),(x2,y2),,(xn,yn) 2 Rd
    0,1

x(j) ? ?
x(j) ? ?
x(j) ? ?
(xi,yi) x(j) ? and x(j) ?
(xi,yi) x(j) lt ? and x(j) lt ?
(xi,yi) x(j) ? and x(j) lt ?
(xi,yi) x(j) lt ? and x(j) ?
avg(yi x(j)?Æx(j)?)
avg(yi x(j)lt?Æx(j)lt?)
avg(yi x(j)?Æx(j)lt?)
avg(yi x(j)lt?Æx(j)?)
17
Regression Graph Algorithm MansourMcAllester
  • Regression graph RG Rd ! 0,1
  • Training sample (x1,y1),(x2,y2),,(xn,yn) 2 Rd
    0,1

x(j) ? ?
x(j) ? ?
x(j) ? ?
(xi,yi) x(j) ? and x(j) ?
(xi,yi) x(j) lt ? and x(j) lt ?
(xi,yi) x(j) lt ? and x(j) ? or x(j) ?
and x(j) lt ?
avg(yi x(j)?Æx(j)?)
avg(yi x(j)lt?Æx(j)lt?)
avg(yi (x(j)lt?Æx(j)?)Ç(x(j)?Æx(j)lt?))
18
Regression Graph Algorithm MansourMcAllester
  • Put all n training data into one leaf
  • Repeat until size(RG)n3/7
  • Split greedily choose leaf and split x(j) ? to
    minimize ?(RG,train) ? (RG(xi)-yi)2/n
  • Divide data in split node into two new leaves
  • Let ? be the decrease in ?(RG,train) from this
    split
  • Merge(s)
  • Greedily choose two leaves whose merger increases
    ?(RG,train) as little as possible
  • Repeat merging while total increase in
    ?(RG,train) from merges is ?/2

19
Two useful lemmas
  • Uniform generalization bound for any n
  • Existence of a correlated splitThere always
    exists a split I(x(i) ?) s.t.,

regression graph R
probability over training sets (x1,y1),,(xn,yn)
20
Motivating natural example
  • X 0,1d, f(x) (x(1)x(2)x(d))/d, uniform
    ?
  • Size(RT) ¼ exp(Size(RG)c), e.g. d4

x(1)gt½
x(1)gt½
x(2)gt½
x(2)gt½
x(2)gt½
x(2)gt½
x(3)gt½
x(3)gt½
x(3)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(3)gt½
x(3)gt½
x(3)gt½
x(3)gt½
0
.75
1
.5
.25
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
x(4)gt½
.75
1
.75
.75
.5
.5
.5
.25
.5
.75
.5
.25
.5
.25
.25
0
21
Regression boosting
  • Incremental learning
  • Suppose you find something of positive
    correlation with y, then reg. graphs make
    progress
  • Weak regression implies strong regression, i.e.
    small correlations can efficiently be combined to
    get correlation near 1 (error near 0)
  • Generalizes binary classification
    boostingKearnsValiant, Schapire,
    MansourMcAllester,

22
Conclusions
  • Generalized additive models are very general
  • Regression graphs, i.e., regression trees with
    merging, provably estimate GAMs using polynomial
    data and runtime
  • Regression boosting generalizes binary
    classification boosting
  • Future work
  • Improve algorithm/analysis
  • Room for interesting work in statistics Å
    computational learning theory
Write a Comment
User Comments (0)
About PowerShow.com