Review of kernel density estimates, qqplots and robust linear regression SPLUS focus

About This Presentation

Title:

Review of kernel density estimates, qqplots and robust linear regression SPLUS focus

Description:

Linear discriminant classifiers versus decision tree classifiers ... The CART 'bible', providing a theoretical and algorithmic base. A Simple Decision Tree Example ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 17

Provided by: dou870

Category:

more less

Transcript and Presenter's Notes

Title: Review of kernel density estimates, qqplots and robust linear regression SPLUS focus

1
LECTURE 31

Review of kernel density estimates, qq-plots and
robust linear regression (S-PLUS focus)
Use of qq-plots to guide choice of accurate
parametric density estimates
Application Aircraft Separation Standards
Linear discriminant classifiers versus decision
tree classifiers
Brief introduction to decision trees and
comparison comments

2
1.2-(a) Aircraft Separation Standards
Atlantic Crossing Corridors
1 2
3
Aircraft 1
RANDOM DRIFT FROM NOMINAL PATH (CENTER OF
CORRIDORS)
Aircraft 2
Aircraft 3
w
CURRENT SEPARATION STANDARD W 75 nautical
miles
3
Is Reduced Separation Safe?
L random lateral deviation
Probability (L gt ) ?
.5 wr
How do you compute this? You need to -
Collect appropriate data - Build an accurate
probability model
wr
4
Lateral Deviations Data
gt lat.dev16 1 48.2672978 46.4796245
-36.1923183 76.5685543 -30.6044119
29.2046373
At first glance, a normal distribution seems
plausible
5
Naive Fitting of Normal Distribution
gt stdev(lat.dev) 1 35.95356
Corridor width W 100 miles
P(lat.dev gt 100) 2P(lat.dev lt -100) 2
qnorm(-100, 0, 35.95) (mean know to be 0)
gt 2pnorm(-100,0,35.6) 1 0.004969738
6
Quick Model Validation Checks
Check 1 Empirical probability that lat.dev gt
100
gt sum(abs(lat.dev)gt100)/length(lat.dev) 1 0.017
Much larger than Normal model probability of
0.005 !! This is cause for concern, so make a
second check!
Check 2 A normal qq-plot (next slide)
7
The normal distribution is clearly inadequate in
the tails!
8
Logistic Distribution QQ-Plot
So lets fit a logistic distribution!
9
gt qqlogis function(x, mu 0, scale 1) n
length(x) probs ((1n) - 0.5)/n quans
qlogis(probs, mu, scale) plot(quans,
sort(x)) abline(ltsreg(quans, sort(x)))
10
Logistic Distribution Fit to lat.dev Data
Recall the relationship between quantiles of a
standard distribution and quantiles of
distribution with scale parameter s
Estimate this from the data
gt quantile(lat.dev,.75) 21.59046
We know this
gt qlogis(.75) 1 1.098612
11
P(lat.dev gt 100) for Logistic Dist. Model
gt 2plogis(-100,0,19.65) 1 0.01225212
More than twice that of the normal model 0.005
This is a potentially serious miscalculation!
Note that things get worse further out in the
tails
gt 2pnorm(-120,0,35.6) 1 0.0007 Serious
under-estimate gt 2plogis(-120,0,19.65) 1
0.0044 Six times larger
FINAL WORD Use the much more accurate logistic
model
12
CLASSIFIERS/PATTERN RECOGNIZERS

Linear Discriminant Classifiers (Lecture 27)
Well-known classical statistics method
Works if there is good linear pattern
separability
Decision Trees
A modern method invented by both statisticians
and computer sciences (machine learning)
Very powerful and flexible, does not require
linear pattern separability
A key tool in Data Mining

13
Linear Discriminant Classifier

Linear discriminant classifier works very well
for this data!
14
Decision Tree Classifiers

One of the truly great inventions for
non-parametric classification/pattern recognition
Nonparametric unknown nonlinear model which
possibly requires many parameters
Classification and Regression Trees (1984) by
Breiman, Friedman, Olshen and Stone. The CART
bible, providing a theoretical and algorithmic
base

15
A Simple Decision Tree Example
Linear discriminant classifier does not work well
for this data!
16
(No Transcript)

Write a Comment

User Comments (0)