# Additive Models,Trees,and Related Models - PowerPoint PPT Presentation

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

Description:

### Additive Models Trees and Related Models Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University Introduction 9.1: Generalized ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 27
Provided by: ZhangL3
Category:
Tags:
Transcript and Presenter's Notes

1
• Prof. Liqing Zhang
• Dept. Computer Science Engineering,
• Shanghai Jiaotong University

2
Introduction
• 9.2 Tree-Based Methods
• 9.3 PRIMBump Hunting
• 9.4 MARS Multivariate Adaptive Regression
• Splines
• 9.5 HMEHieraechical Mixture of Experts

3
• In the regression setting, a generalized additive
models has the form
• Here s are unspecified smooth and
nonparametric functions.
• Instead of using LBE(Linear Basis Expansion)
in chapter 5, we fit each function using a
scatter plot smoother(e.g. a cubic smoothing
spline)

4
GAM(cont.)
• For two-class classification, the additive
logistic regression model is
• Here

5
GAM(cont)
• In general, the conditional mean U(x) of a
response y is related to an additive function of
the predictors via a link function g
• Examples of classical link functions
• Identity g(u)u
• Logit g(u)logu/(1-u)
• Probit g(m) F-1(m)
• Log g(u) log(u)

6
• The additive model has the form
• Here we have
• Given observations , a criterion like
penalized sum squares can be specified for this
problem
• Where are tuning parameters.

7
FAM(cont.)
• Conclusions
• The solution to minimize PRSS is cubic splines,
however without further restrictions the solution
is not unique.
• If 0 holds, it is easy to
see that
• If in addition to this restriction, the matrix of
input values has full column rank, then (9.7) is
a strict convex criterion and has an unique
solution. If the matrix is singular, then the
linear part of fj cannot be uniquely determined.
(Buja 1989)

8
Learning GAM Backfitting
• Backfitting algorithm
• Initialize
• Cycle j 1,2,, p,,1,2,, p,, (m cycles)
• Until the functions change less than a
prespecified threshold

9
Backfitting Points to Ponder
choose fitting functions?
10
FAM(cont.)
11
12
Logistic Regression
• Model the class posterior
in terms of K-1 log-odds
• Decision boundary is set of points
• Linear discriminant function for class k
• Classify to the class with the largest value for
its ?k(x)

13
Logistic Regression cont
• Parameters estimation
• Objective function
• Parameters estimation
• IRLS (iteratively reweighted least squares)
• Particularly, for two-class case, using
Newton-Raphson algorithm to solve the equation,
the objective function

14
Logistic Regression cont
15
Logistic Regression cont
16
Logistic Regression cont
17
Logistic Regression cont
18
Logistic Regression cont
• When it is used
• binary responses (two classes)
• As a data analysis and inference tool to
understand the role of the input variables in
explaining the outcome
• Feature selection
• Find a subset of the variables that are
sufficient for explaining their joint effect on
the response.
• One way is to repeatedly drop the least
significant coefficient, and refit the model
until no further terms can be dropped
• Another strategy is to refit each model with one
variable removed, and perform an analysis of
deviance to decide which one variable to exclude
• Regularization
• Maximum penalized likelihood
• Shrinking the parameters via an L1 constraint,
imposing a margin constraint in the separable case

19
20
21
Fitting logistic regression
1. where
1.
2.
2.
Iterate
Iterate
a.
a.
b.
b.
Using weighted least squares to fit a linear
model to zi with weights wi, give new estimates
c.
c. Using weighted backfitting algorithm to fit
an additive model to zi with weights wi, give new
estimates
3. Continue step 2 until converge
3.Continue step 2 until converge
22
SPAM Detection via Additive Logistic Regression
• Input variables (predictors)
• 48 quantitative variables percentage of words in
the email that match a given word. Examples
• 6 quantitative variables percentage of
characters in the email that match a given
character, such as ch, ch(, etc.
• The average length of uninterrupted sequences of
capital letters
• The length of the longest uninterrupted sequence
of capital letters
• The sum of length of uninterrupted length of
capital letters
• Output variable SPAM (1) or Email (0)
• fjs are taken as cubic smoothing splines

23
(No Transcript)
24
(No Transcript)
25
SPAM Detection Results
True Class Predicted Class Predicted Class
True Class Email (0) SPAM (1)
Email (0) 58.5 2.5
SPAM (1) 2.7 36.2
Sensitivity Probability of predicting spam given
true state is spam Specificity Probability
of predicting email given true state is email
26
GAM Summary
• Useful flexible extensions of linear models
• Backfitting algorithm is simple and modular
• Interpretability of the predictors (input
variables) are not obscured
• Not suitable for very large data mining
applications (why?)