Class 26 - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Class 26

Description:

Title: Oakland As A Author: Darden Graduate Business School Last modified by: Pfeifer, Phil Created Date: 4/24/2006 5:57:02 PM Document presentation format – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 25
Provided by: Darde
Category:
Tags: city | class | toronto

less

Transcript and Presenter's Notes

Title: Class 26


1
Class 26
Pfeifer note Section 6
  • Model Building Philosophy

2
Assignment 26
  • 1. T-test 2-sample regression with dummy
  • T /- 6.2/2.4483 (from data analysis,
    complicated formula, OR regression with dummy)
  • 2. ANOVA single factor regression with p-1
    dummies (see next slide)
  • 3. Better predictor? The one with lower
    regression standard error (or higher adj R2)
  • Not the one with the higher coefficient.
  • 4. Will they charge less than 4,500?
  • Use regressions standard error and t.dist to
    calculate the probability.

3
Ready for ANOVA
SAT Dlawyer DPT Dcabinet
44 1 0 0
42 1 0 0
74 1 0 0
42 1 0 0
53 1 0 0
50 1 0 0
45 1 0 0
48 1 0 0
64 1 0 0
38 1 0 0
55 0 1 0
78 0 1 0
80 0 1 0
86 0 1 0
60 0 1 0
59 0 1 0
62 0 1 0
52 0 1 0
55 0 1 0
50 0 1 0
54 0 0 1
65 0 0 1
79 0 0 1
69 0 0 1
79 0 0 1
64 0 0 1
59 0 0 1
78 0 0 1
84 0 0 1
60 0 0 1
44 0 0 0
73 0 0 0
71 0 0 0
60 0 0 0
64 0 0 0
66 0 0 0
41 0 0 0
55 0 0 0
76 0 0 0
62 0 0 0
Occupation SAT
Lawyer 44
Lawyer 42
Lawyer 74
Lawyer 42
Lawyer 53
Lawyer 50
Lawyer 45
Lawyer 48
Lawyer 64
Lawyer 38
Physical Therapist 55
Physical Therapist 78
Physical Therapist 80
Physical Therapist 86
Physical Therapist 60
Physical Therapist 59
Physical Therapist 62
Physical Therapist 52
Physical Therapist 55
Physical Therapist 50
Cabinetmaker 54
Cabinetmaker 65
Cabinetmaker 79
Cabinetmaker 69
Cabinetmaker 79
Cabinetmaker 64
Cabinetmaker 59
Cabinetmaker 78
Cabinetmaker 84
Cabinetmaker 60
Systems Analyst 44
Systems Analyst 73
Systems Analyst 71
Systems Analyst 60
Systems Analyst 64
Systems Analyst 66
Systems Analyst 41
Systems Analyst 55
Systems Analyst 76
Systems Analyst 62
Ready for Regression
Lawyer Physical Therapist Cabinetmaker Systems Analyst
44 55 54 44
42 78 65 73
74 80 79 71
42 86 69 60
53 60 79 64
50 59 64 66
45 62 59 41
48 52 78 55
64 55 84 76
38 50 60 62
4
Agenda
  • IQ demonstration
  • What you can do with lots of data
  • What you should do with not much data
  • Practice using the Oakland As case

5
Remember the Coal Pile!
  • Model Building involves more than just selecting
    which of the available Xs to include in the
    model.
  • See section 9 of the Pfeifer note to learn about
    transforming Xs.
  • We wont do much in this regard

6
With lots of data (big data?)
Stats like std error and adj R-square only
measure FIT
  X1 X2 . . Xn Y
1 0.96 0.24 0.34 0.57 0.20 0.43
2 0.58 0.16 0.93 0.96 0.75 0.35
3 0.39 0.75 0.07 0.63 0.87 0.49
. . . . . . .
. . . . . . .
N 0.47 0.34 0.69 0.86 0.30 0.22
2. Use the training set to build several models.
  X1 X2 . . Xn Y
1 0.96 0.24 0.34 0.57 0.20 0.43
2 0.58 0.16 0.93 0.96 0.75 0.35
3 0.39 0.75 0.07 0.63 0.87 0.49
. . . . . . .
. . . . . . .
N1 0.21 0.76 0.44 0.07 0.65 0.92
Performance on a hold-out sample measures how
well each model will FORECAST
1. Split the data into two sets
  X1 X2 . . Xn Y
N11 0.47 0.86 0.53 0.02 0.70 0.73
N12 0.03 0.51 0.35 0.09 0.95 0.11
N13 0.16 0.31 0.37 0.38 0.31 0.96
. . . . . . .
. . . . . . .
N 0.47 0.34 0.69 0.86 0.30 0.22
3. Use the hold-out sample to test/compare the
models. Use the best performing model.
7
With lots of data (big data?)
  • Computer Algorithms do a very good job of finding
    a model
  • They guard against over-fitting
  • Once you own the software, they are fast and
    cheap
  • They wont claim, however, to do better than a
    professional model builder
  • Remember the coal pile!

8
Without much Data
  • You will not be able to use a training set/hold
    out sample
  • You get one shot to find a GOOD model
  • Regression and all its statistics can tell you
    which model FIT the data the best.
  • Regression and all its statistics CANNOT tell you
    which model will perform (forecast) the best.
  • Not to mention.regression has no clue about what
    causes what..

9
Remember..
  • The model that does a spectacular job of fitting
    the past.will do worse at predicting the future
    than a simpler model that more accurately
    captures the way the world works.
  • Better fit leads to poorer forecasts!
  • Instead of forecasting 100 for the next IQ, the
    over-fit model will sometimes predict 110 and
    other times predict 90!

10
Requiring low-p-values for all coefficients does
not protect against over-fitting.
  • If there are 100 Xs that are of NO help in
    predicting Y,
  • We expect 5 of them will be statistically
    significant.
  • And well want to use all 5 to predict the
    future.
  • And the model will be over-fit
  • We wont know it, perhaps
  • Our predictions will be WORSE as a result.

11
Modeling Balancing Act
  • Useable (do we know the Xs?)
  • Simple
  • Make Sense
  • Use your judgment, given you cant solely rely on
    the stats/data
  • Signs of coefficients should make sense
  • Significant (low p) coefficients
  • Except for sets of dummies
  • Low standard error
  • Consistent with high adjusted R-square
  • Meets all four assumptions
  • Linearity (most important)
  • Homoskedasticity (equal variance)
  • Independence
  • Normality (least important)

12
Oakland As (A)
13
Case Facts
  • Despite making only 40K, pitcher Mark Nobel had
    a great year for Oakland in 1980.
  • Second in the league for era (2.53), complete
    games (24), innings (284-1/3), and strikeouts
    (180)
  • Gold glove winner (best fielding pitcher)
  • Second in CY YOUNG award voting.

14
Nobel Wants a Raise
  • Im not saying anything against Rick Langford or
    Matt Keough (fellow As pitchers)but I filled the
    stadium last year against Tommy John (star
    pitcher for the Yankees)
  • Nobels Agent argued
  • Avg. home attendance for Nobels 16 starts was
    12,663.6
  • Avg. home attendance for remaining home games was
    only 10,859.4
  • Nobel should get paid for the difference
  • 1,804.2 extra tickets per start.

15
Data from 1980 Home Games
No DATE TIX OPP POS GB DOW TEMP PREC TOG TV PROMO YANKS NOBEL
1 10-Apr 24415 2 5 1 4 57 0 2 1 0 0 0
2 11-Apr 5729 2 3 1 5 66 0 2 1 0 0 0
3 12-Apr 5783 2 7 1 6 64 0 1 0 0 0 0
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
73 26-Sep 5099 6 2 14 5 64 0 2 1 0 0 1
74 27-Sep 4581 6 2 13 6 62 0 1 0 0 0 0
75 28-Sep 10662 6 2 12 7 65 0 1 0 1 0 0
                     
  LEGEND  
   
  Opposing Team Opposing Team Position A's ranking in American League West Position A's ranking in American League West Position A's ranking in American League West Position A's ranking in American League West
   
  1 Seattle 8 White Sox Games Behind Minimum No of games needed to move ahead of current first place team. Games Behind Minimum No of games needed to move ahead of current first place team. Games Behind Minimum No of games needed to move ahead of current first place team. Games Behind Minimum No of games needed to move ahead of current first place team.
  2 Minnesota 9 Boston  
  3 California 10 Baltimore Day of Week Monday1, Tuesday2, etc. Day of Week Monday1, Tuesday2, etc. Day of Week Monday1, Tuesday2, etc. Day of Week Monday1, Tuesday2, etc.
  4 Yankees 11 Cleveland
  5 Detroit 12 Texas Precipitation 1 if precipitation, 0 if not. Precipitation 1 if precipitation, 0 if not. Precipitation 1 if precipitation, 0 if not. Precipitation 1 if precipitation, 0 if not.
  6 Milwaukee Milwaukee 13 Kansas City
  7 Toronto Time of Game 1 if day, 2 if night Time of Game 1 if day, 2 if night Time of Game 1 if day, 2 if night
                     
16
TASK
  • Be ready to report about the model assigned to
    your table (1 to 7)
  • What is the model? (succinct)
  • Critique it (succinctly)
  • Ignore durban watson
  • standard deviation of residuals aka
    regressions standard error.
  • Output gives just t-stat. A t of - 2
    corresponds to p-value of 0.05.

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
What does it mean that the coefficient of NOBEL
in negative in most of the models?
Why was the coefficient of NOBEL positive in
model 1?
Write a Comment
User Comments (0)
About PowerShow.com