Title: Review of kernel density estimates, qqplots and robust linear regression SPLUS focus
1LECTURE 31
- Review of kernel density estimates, qq-plots and
robust linear regression (S-PLUS focus) - Use of qq-plots to guide choice of accurate
parametric density estimates - Application Aircraft Separation Standards
- Linear discriminant classifiers versus decision
tree classifiers - Brief introduction to decision trees and
comparison comments
21.2-(a) Aircraft Separation Standards
Atlantic Crossing Corridors
1 2
3
Aircraft 1
RANDOM DRIFT FROM NOMINAL PATH (CENTER OF
CORRIDORS)
Aircraft 2
Aircraft 3
w
CURRENT SEPARATION STANDARD W 75 nautical
miles
3Is Reduced Separation Safe?
L random lateral deviation
Probability (L gt ) ?
.5 wr
How do you compute this? You need to -
Collect appropriate data - Build an accurate
probability model
wr
4Lateral Deviations Data
gt lat.dev16 1 48.2672978 46.4796245
-36.1923183 76.5685543 -30.6044119
29.2046373
At first glance, a normal distribution seems
plausible
5Naive Fitting of Normal Distribution
gt stdev(lat.dev) 1 35.95356
Corridor width W 100 miles
P(lat.dev gt 100) 2P(lat.dev lt -100) 2
qnorm(-100, 0, 35.95) (mean know to be 0)
gt 2pnorm(-100,0,35.6) 1 0.004969738
6Quick Model Validation Checks
Check 1 Empirical probability that lat.dev gt
100
gt sum(abs(lat.dev)gt100)/length(lat.dev) 1 0.017
Much larger than Normal model probability of
0.005 !! This is cause for concern, so make a
second check!
Check 2 A normal qq-plot (next slide)
7The normal distribution is clearly inadequate in
the tails!
8Logistic Distribution QQ-Plot
So lets fit a logistic distribution!
9gt qqlogis function(x, mu 0, scale 1) n
length(x) probs ((1n) - 0.5)/n quans
qlogis(probs, mu, scale) plot(quans,
sort(x)) abline(ltsreg(quans, sort(x)))
10Logistic Distribution Fit to lat.dev Data
Recall the relationship between quantiles of a
standard distribution and quantiles of
distribution with scale parameter s
Estimate this from the data
gt quantile(lat.dev,.75) 21.59046
We know this
gt qlogis(.75) 1 1.098612
11P(lat.dev gt 100) for Logistic Dist. Model
gt 2plogis(-100,0,19.65) 1 0.01225212
More than twice that of the normal model 0.005
This is a potentially serious miscalculation!
Note that things get worse further out in the
tails
gt 2pnorm(-120,0,35.6) 1 0.0007 Serious
under-estimate gt 2plogis(-120,0,19.65) 1
0.0044 Six times larger
FINAL WORD Use the much more accurate logistic
model
12CLASSIFIERS/PATTERN RECOGNIZERS
- Linear Discriminant Classifiers (Lecture 27)
- Well-known classical statistics method
- Works if there is good linear pattern
separability - Decision Trees
- A modern method invented by both statisticians
and computer sciences (machine learning) - Very powerful and flexible, does not require
linear pattern separability - A key tool in Data Mining
13Linear Discriminant Classifier
Linear discriminant classifier works very well
for this data!
14Decision Tree Classifiers
- One of the truly great inventions for
non-parametric classification/pattern recognition - Nonparametric unknown nonlinear model which
possibly requires many parameters - Classification and Regression Trees (1984) by
Breiman, Friedman, Olshen and Stone. The CART
bible, providing a theoretical and algorithmic
base
15A Simple Decision Tree Example
Linear discriminant classifier does not work well
for this data!
16(No Transcript)