Learning with Trees - PowerPoint PPT Presentation

About This Presentation
Title:

Learning with Trees

Description:

University of Wisconsin-Madison. Collaborators: Rui Castro, Clay Scott, Rebecca Willett ... Joint work with Clay Scott, 2004. Dyadic Decision Trees. The ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 88
Provided by: valued141
Category:

less

Transcript and Presenter's Notes

Title: Learning with Trees


1
Learning with Trees
Rob Nowak
University of
Wisconsin-Madison Collaborators Rui Castro,
Clay Scott, Rebecca Willett
www.ece.wisc.edu/nowak
Artwork Piet Mondrian
2
Basic Problem Partitioning

Many problems in statistical learning theory boil
down to finding a good partition
function
partition
3
Classification
Learning and Classification build a decision
rule based on labeled training data
Labeled training features
Classification rule partition of feature space
4
Signal and Image Processing
MRI data brain aneurysm
Extracted vascular network
Recover complex geometrical structure from noisy
data
5
Partitioning Schemes
Support Vector Machine
image partitions
6
Why Trees ?
  • Simplicity of design
  • Interpretability
  • Ease of implementation
  • Good performance in practice

Trees are one of the most popular and widely used
machine learning / data analysis tools
CART Breiman, Friedman, Olshen, and Stone,
1984 Classification and Regression Trees
C4.5 Quinlan 1993, C4.5 Programs for Machine
Learning
JPEG 2000 Image compression standard, 2000
http//www.jpeg.org/jpeg2000/
7
Example Gamma-Ray Burst Analysis
photon counts
burst
Compton Gamma-Ray Observatory Burst and
Transient Source Experiment (BATSE)
time
One burst (10s of seconds) emits as much energy
as our entire Milky Way does in one hundred years
!
x-ray after glow
8
Trees and Partitions
coarse partition
9
Estimation using Pruned Tree
piecewise constant fits to data on each piece of
the partition provides a good estimate
Each leaf corresponds to a sample f(ti ),
i0,,N-1
10
Gamma-Ray Burst 845
piecewise linear fit on each cell
piecewise polynomial fit on each cell
11
Recursive Partitions
12
Adapted Partition
13
Image Denoising
14
Decision (Classification) Trees
Bayes decision boundary
labeled training data
complete partition
pruned partition
decision tree - majority vote at each leaf
15
Classification
Ideal classfier
Adapted partition
histogram
256 cells in each partition
16
Image Partitions
1024 cells in each partition
17
(No Transcript)
18
Image Coding
JPEG 0.125 bpp
JPEG 2000 0.125 bpp
non-adaptive partitioning
adaptive partitioning
19
Probabilistic Framework
20
Prediction Problem
21
Challenge
22
Empirical Risk
23
Empirical Risk Minimization
24
Classification and Regression Trees
25
Classification and Regression Trees
1
1
1
1
1
1
0
1
0
1
0
0
0
0
1
0
0
0
0
26
Empirical Risk Minimization on Trees
27
Overfitting Problem
crude
stable
accurate
variable
28
Bias/Variance Trade-off
large bias
small variance
coarse partition
small bias
large variance
fine partition
29
Estimation and Approximation Error
30
Estimation Error in Regression
31
Estimation Error in Classification
32
Partition Complexity and Overfitting
empirical risk
leaves
33
Controlling Overfitting
34
Complexity Regularization
35
Per-Cell Variance Bounds Regression
36
Per-Cell Variance Bounds Classification
37
Variance Bounds
38
A Slightly Weaker Variance Bound
39
Complexity Regularization
40
Example Image Denoising
This is special case of wavelet denoising using
Haar wavelet basis
41
Theory of Complexity Regularization
42
Coffee Break !
43
Classification
44
Probabilistic Framework
45
Learning from Data
0
1
0
1
46
Approximation and Estimation
0
Approximation
1
BIAS
Model selection
VARIANCE
47
Classifier Approximations
0
1
48
Approximation Error
Symmetric difference set
Error
49
Approximation Error
boundary smoothness
risk functional (transition) smoothness
50
Boundary Smoothness
51
Transition Smoothness
52
Transition Smoothness
53
Fundamental Limit to Learning
Mammen Tsybakov (1999)
54
Related Work
55
Box-Counting Class
56
Box-Counting Sub-Classes
57
Dyadic Decision Trees
Bayes decision boundary
labeled training data
pruned RDP
complete RDP
Dyadic decision tree - majority vote at each
leaf
Joint work with Clay Scott, 2004
58
Dyadic Decision Trees
59
The Classifier Learning Problem
Training Data
Model Class
Problem
60
Empirical Risk
61
Chernoffs Bound
62
Chernoffs Bound
actual risk is probably not much larger than
empirical risk
63
Error Deviation Bounds
64
Uniform Deviation Bound
65
Setting Penalties
66
Setting Penalties
prefix codes for trees
code 0001001111 6 bits for leaf labels
67
Uniform Deviation Bound
68
Decision Tree Selection
Compare with
Oracle Bound
Approximation Error (Bias)
Estimation Error (Variance)
69
Rate of Convergence
BUT
Why too slow ?
70
Balanced vs. Unbalanced Trees
same number of leafs
all T leaf trees are equally favored
71
Spatial Adaptation
local error
local empirical error
72
Relative Chernoff Bound
73
Designing Leaf Penalties
prefix code construction
01 right branch
00
010001110
01
11 terminate
0/1 label
74
Uniform Deviation Bound
Compare with
75
Spatial Adaptivity
Key local complexity is offset by small volumes!
76
Bound Comparison for Unbalanced Tree
J leafs depth J-1
77
Balanced vs. Unbalanced Trees
same number of leafs
78
Decision Tree Selection
Oracle Bound
Approximation Error
Estimation Error
79
Rate of Convergence
80
Computable Penalty
achieves same rate of convergence
81
Adapting to Dimension - Feature Rejection
0
1
82
Adapting to Dimension - Data Manifold
83
Computational Issues
84
DDTs in Action
85
Comparison to State-of-Art
ODCT DDT cross-validation
Best results (1) AdaBoost with RBF-Network, (2)
Kernel Fisher Discriminant, (3) SVM with
RBF-Kernel,
86
Application to Level Set Estimation
Elevation Map St. Louis
Penalty proportional to T
Noisy data
Spatially adapt. penalty
87
Conclusions and Future Work
Open Problem
www.ece.wisc.edu/nowak
More Info
www.ece.wisc.edu/nowak/ece901
Write a Comment
User Comments (0)
About PowerShow.com