Multiple Regression and Model Building - PowerPoint PPT Presentation

1 / 159
About This Presentation
Title:

Multiple Regression and Model Building

Description:

... (X1), holding fixed the number of Sales Reps at any particular level. The effect of more advertising is the same for any fixed number of sales reps. 12 - 16 ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 160
Provided by: johnj178
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression and Model Building


1
Chapter 12
  • Multiple Regression and Model Building

2
Learning Objectives
  • 1. Explain the Linear Multiple Regression Model
  • 2. Test Overall Significance
  • 3. Describe Various Types of Models
  • 4. Evaluate Portions of a Regression Model
  • 5. Interpret Linear Multiple Regression Computer
    Output
  • 7. Explain Residual Analysis
  • 8. Describe Regression Pitfalls

3
Types of Regression Models
4
Regression Modeling Steps
  • 1. Hypothesize Deterministic Component
  • 2. Estimate Unknown Model Parameters
  • 3. Specify Probability Distribution of Random
    Error Term
  • Estimate Standard Deviation of Error
  • 4. Evaluate Model
  • 5. Use Model for Prediction Estimation

5
Regression Modeling Steps
  • 1. Hypothesize Deterministic Component
  • 2. Estimate Unknown Model Parameters
  • 3. Specify Probability Distribution of Random
    Error Term
  • Estimate Standard Deviation of Error
  • 4. Evaluate Model
  • 5. Use Model for Prediction Estimation

Expanded in Multiple Regression
6
Linear Multiple Regression Model
  • Hypothesizing the Deterministic Component

Expanded in Multiple Regression
7
Regression Modeling Steps
  • 1. Hypothesize Deterministic Component
  • 2. Estimate Unknown Model Parameters
  • 3. Specify Probability Distribution of Random
    Error Term
  • Estimate Standard Deviation of Error
  • 4. Evaluate Model
  • 5. Use Model for Prediction Estimation

8
Linear Multiple Regression Model
  • 1. Relationship between 1 dependent 2 or more
    independent variables is a linear function

Population slopes
Population Y-intercept
Random error
Dependent (response) variable
Independent (explanatory) variables
9
Population Multiple Regression Model
Bivariate model
10
Sample Multiple Regression Model
Bivariate model
11
Parameter Estimation
Expanded in Multiple Regression
12
Regression Modeling Steps
  • 1. Hypothesize Deterministic Component
  • 2. Estimate Unknown Model Parameters
  • 3. Specify Probability Distribution of Random
    Error Term
  • Estimate Standard Deviation of Error
  • 4. Evaluate Model
  • 5. Use Model for Prediction Estimation

13
Multiple Linear Regression Equations
Too complicated by hand!
Ouch!
14
Interpretation of Estimated Coefficients
15
Interpretation of Estimated Coefficients
  • 1. Slope (?k)
  • Estimated Y Changes by ?k for Each 1 Unit
    Increase in Xk Holding All Other Variables
    Constant
  • Example If ?1 2, then Sales (Y) Is expected to
    increase by 2 for each 1 unit increase in
    Advertising (X1), holding fixed the number of
    Sales Reps at any particular level
  • The effect of more advertising is the same for
    any fixed number of sales reps



16
Interpretation of Estimated Coefficients
  • 1. Slope (?k)
  • Estimated Y Changes by ?k for Each 1 Unit
    Increase in Xk Holding All Other Variables
    Constant
  • Example If ?1 2, then Sales (Y) Is Expected to
    Increase by 2 for Each 1 Unit Increase in
    Advertising (X1) Given the Number of Sales Reps
    (X2)
  • 2. Y-Intercept (?0)
  • Average Value of Y When Xk 0




17
Parameter Estimation Example
  • You work in advertising for the New York Times.
    You want to find the effect of ad size (sq. in.)
    newspaper circulation (000) on the number of ad
    responses (00).

Youve collected the following data
Resp Size Circ 1 1 2 4 8 8 1 3 1 3 5 7 2 6
4 4 10 6
18
Parameter Estimation Computer Output
. regress resp size circ Source SS
df MS Number of obs
6 -------------------------------------------
F( 2, 3) 55.44 Model
9.24973638 2 4.62486819 Prob gt F
0.0043 Residual .25026362 3
.083421207 R-squared
0.9737 ------------------------------------------
- Adj R-squared 0.9561 Total
9.5 5 1.9 Root
MSE .28883 ------------------------------
------------------------------------------------
resp Coef. Std. Err. t
Pgtt 95 Conf. Interval ------------------
--------------------------------------------------
--------- size .2049209 .0588218
3.48 0.040 .0177238 .392118
circ .2804921 .0686017 4.09 0.026
.062171 .4988132 _cons .0639719
.2598628 0.25 0.821 -.7630274
.8909712 -----------------------------------------
-------------------------------------
19
Interpretation of Coefficients Solution
20
Interpretation of Coefficients Solution
  • 1. Slope (?1)
  • Responses to Ad Is Expected to Increase by
    .2049 (20.49) for Each 1 Sq. In. Increase in Ad
    Size Holding Circulation Constant

21
Interpretation of Coefficients Solution
  • 1. Slope (?1)
  • Responses to Ad Is Expected to Increase by
    .2049 (20.49) for Each 1 Sq. In. Increase in Ad
    Size Holding Circulation Constant
  • 2. Slope (?2)
  • Responses to Ad Is Expected to Increase by
    .2805 (28.05) for Each 1 Unit (1,000) Increase in
    Circulation Holding Ad Size Constant


22
Evaluating the Model
Expanded in Multiple Regression
23
Regression Modeling Steps
  • 1. Hypothesize Deterministic Component
  • 2. Estimate Unknown Model Parameters
  • 3. Specify Probability Distribution of Random
    Error Term
  • Estimate Standard Deviation of Error
  • 4. Evaluate Model
  • 5. Use Model for Prediction Estimation

24
Evaluating Multiple Regression Model Steps
  • 1. Examine Variation Measures
  • 2. Do Residual Analysis
  • 3. Test Parameter Significance
  • Overall Model
  • Individual Coefficients
  • 4. Test for Multicollinearity

25
Evaluating Multiple Regression Model Steps
Expanded!
  • 1. Examine Variation Measures
  • 2. Do Residual Analysis
  • 3. Test Parameter Significance
  • Overall Model
  • Individual Coefficients
  • Test for Multicollinearity

New!
New!
New!
26
Evaluating Multiple Regression Model Steps
Expanded!
  • 1. Examine Variation Measures
  • 2. Do Residual Analysis
  • 3. Test Parameter Significance
  • Overall Model
  • Individual Coefficients
  • 4. Test for Multicollinearity

New!
New!
New!
27
Variation Measures
28
Coefficient of Multiple Determination
  • Proportion of Variation in Y Explained by All X
    Variables Taken Together

29
Check Your Understanding
  • If you add a variable to the model
  • How will that affect the R-squared value for the
    model?

30
Adjusted R2
  • R2 Never Decreases When New X Variable Is Added
    to Model
  • Only Y Values Determine SSyy
  • Disadvantage When Comparing Models
  • Solution Adjusted R2
  • Each additional variable reduces adjusted R2,
    unless SSE goes up enough to compensate

31
Variance of Error
  • Assuming model is correctly specified
  • Best (unbiased) estimator ofis
  • Used in formula for computing
  • Exact formula is too complicated to show
  • But higher value for s leads to higher

32
Check Your Understanding
  • If you add a variable to the model
  • Exercise 12.5 How will that affect the estimate
    of standard deviation (of the error term)?

33
Individual Coefficients
34
T Statistics
. regress resp size circ Source SS
df MS Number of obs
6 -------------------------------------------
F( 2, 3) 55.44 Model
9.24973638 2 4.62486819 Prob gt F
0.0043 Residual .25026362 3
.083421207 R-squared
0.9737 ------------------------------------------
- Adj R-squared 0.9561 Total
9.5 5 1.9 Root
MSE .28883 ------------------------------
------------------------------------------------
resp Coef. Std. Err. t
Pgtt 95 Conf. Interval ------------------
--------------------------------------------------
--------- size .2049209 .0588218
3.48 0.040 .0177238 .392118
circ .2804921 .0686017 4.09 0.026
.062171 .4988132 _cons .0639719
.2598628 0.25 0.821 -.7630274
.8909712 -----------------------------------------
-------------------------------------
35
Exercise 12.7
  • n30
  • H0 beta2 0 NOT rejected
  • H0 beta3 0 rejected
  • Explain result despite beta2gtbeta3

36
Evaluating Multiple Regression Model Steps
Expanded!
  • 1. Examine Variation Measures
  • 2. Do Residual Analysis
  • 3. Test Parameter Significance
  • Overall Model
  • Individual Coefficients
  • 4. Test for Multicollinearity

New!
New!
New!
37
Testing Overall Significance
  • 1. Shows If There Is a Linear Relationship
    Between All X Variables Together Y
  • 2. Uses F Test Statistic
  • 3. Hypotheses
  • H0 ?1 ?2 ... ?k 0
  • No Linear Relationship
  • Ha At Least One Coefficient Is Not 0
  • At Least One X Variable Affects Y

38
T Statistics
k
n - k -1
. regress resp size circ Source SS
df MS Number of obs
6 -------------------------------------------
F( 2, 3) 55.44 Model
9.24973638 2 4.62486819 Prob gt F
0.0043 Residual .25026362 3
.083421207 R-squared
0.9737 ------------------------------------------
- Adj R-squared 0.9561 Total
9.5 5 1.9 Root
MSE .28883 ------------------------------
------------------------------------------------
resp Coef. Std. Err. t
Pgtt 95 Conf. Interval ------------------
--------------------------------------------------
--------- size .2049209 .0588218
3.48 0.040 .0177238 .392118
circ .2804921 .0686017 4.09 0.026
.062171 .4988132 _cons .0639719
.2598628 0.25 0.821 -.7630274
.8909712 -----------------------------------------
-------------------------------------
MS(Model) MS(Error)
P-Value
39
Exercise 12.6
  • See minitab printout p. 678

40
Exercise 12.12
  • F-test for model is significant
  • Is the model the best available predictor for y?
  • Are all the terms in the model important for
    predicting y?
  • Or what does it mean?

41
Exercise 12.26
  • 18 variables
  • N20
  • R-squared.95
  • Compute adjusted-R-squared
  • Compute F-statistic
  • Can you reject null hypothesis of all
    coefficients0?

42
Exercise 12.28 Soln
  • 18 variables
  • N20
  • R-squared.95

43
Exercise 12.28 Soln
  • k18, n20, R-squared.95
  • Would need an F-value gt245.9 to reject the null
    hypothesis!

44
Exercise 12.143
  • Model salary based on gender
  • Other variables included
  • Race
  • Education level
  • Tenure in firm
  • Number of hours/week worked
  • e. Why would one want to adjust/control for these
    other factors when testing for gender
    discrimination?

45
GFCLOCKS Dataset
  • Dependent variable
  • Auction price
  • Independent variables
  • Age
  • Number of bidders

46
Simple Linear Model (compare to Minitab p. 686)
. regress price age numbids Source
SS df MS Number of obs
32 ---------------------------------------
---- F( 2, 29) 120.19
Model 4283062.96 2 2141531.48
Prob gt F 0.0000 Residual
516726.54 29 17818.1565 R-squared
0.8923 -------------------------------------
------ Adj R-squared 0.8849
Total 4799789.5 31 154831.919
Root MSE 133.48 -------------------------
--------------------------------------------------
--- price Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- age 12.74057 .9047403
14.08 0.000 10.89017 14.59098
numbids 85.95298 8.728523 9.85 0.000
68.10115 103.8048 _cons -1338.951
173.8095 -7.70 0.000 -1694.432
-983.4711 ----------------------------------------
--------------------------------------
47
Types of Regression Models
48
Models With a Single Quantitative Variable
49
Types of Regression Models
50
First-Order Model With 1 Independent Variable
51
First-Order Model With 1 Independent Variable
  • 1. Relationship Between 1 Dependent 1
    Independent Variable Is Linear

52
First-Order Model With 1 Independent Variable
  • 1. Relationship Between 1 Dependent 1
    Independent Variable Is Linear
  • 2. Used When Expected Rate of Change in Y Per
    Unit Change in X Is Stable

53
First-Order Model With 1 Independent Variable
  • 1. Relationship Between 1 Dependent 1
    Independent Variable Is Linear
  • 2. Used When Expected Rate of Change in Y Per
    Unit Change in X Is Stable
  • 3. Used With Curvilinear Relationships If
    Relevant Range Is Linear

54
First-Order Model Relationships
?1 lt 0
?1 gt 0
Y
Y
X
X
1
1
55
First-Order Model Worksheet
Run regression with Y, X1
56
(No Transcript)
57
GFClocks Revisited
. regress price numbids Source SS
df MS Number of obs
32 -------------------------------------------
F( 1, 30) 5.55 Model
749662.281 1 749662.281 Prob gt F
0.0252 Residual 4050127.22 30
135004.241 R-squared
0.1562 ------------------------------------------
- Adj R-squared 0.1281 Total
4799789.5 31 154831.919 Root
MSE 367.43 ------------------------------
------------------------------------------------
price Coef. Std. Err. t
Pgtt 95 Conf. Interval ------------------
--------------------------------------------------
--------- numbids 54.76335 23.23972
2.36 0.025 7.30151 102.2252
_cons 804.9119 230.8305 3.49 0.002
333.4931 1276.331 ----------------------------
--------------------------------------------------

58
Types of Regression Models
59
Second-Order Model With 1 Independent Variable
  • 1. Relationship Between 1 Dependent 1
    Independent Variables Is a Quadratic Function
  • 2. Useful 1St Model If Non-Linear Relationship
    Suspected

60
Second-Order Model With 1 Independent Variable
  • 1. Relationship Between 1 Dependent 1
    Independent Variables Is a Quadratic Function
  • 2. Useful 1St Model If Non-Linear Relationship
    Suspected
  • 3. Model

Curvilinear effect
Linear effect
61
Second-Order Model Relationships
?2 gt 0
?2 gt 0
?2 lt 0
?2 lt 0
62
Second-Order Model Worksheet
Create X12 column. Run regression with Y, X1,
X12.
63
GFClocks revisited
. gen bidssq numbids numbids . regress price
numbids bidssq Source SS df
MS Number of obs
32 -------------------------------------------
F( 2, 29) 4.30 Model
1098297.13 2 549148.563 Prob gt F
0.0231 Residual 3701492.37 29
127637.668 R-squared
0.2288 ------------------------------------------
- Adj R-squared 0.1756 Total
4799789.5 31 154831.919 Root
MSE 357.26 ------------------------------
------------------------------------------------
price Coef. Std. Err. t
Pgtt 95 Conf. Interval ------------------
--------------------------------------------------
--------- numbids 326.1043 165.7274
1.97 0.059 -12.84632 665.0548
bidssq -13.44477 8.134995 -1.65 0.109
-30.0827 3.193167 _cons -454.8959
794.6254 -0.57 0.571 -2080.087
1170.295 -----------------------------------------
-------------------------------------
64
Exercise 12.53, p. 705
  • Graph the equations
  • What effect does 2x term have on the graphs?
  • What effect does xx term have on the graphs?

65
Exercise 12.53, p. 705
66
Exercise 12.55, p. 706
  • Plot scattergram
  • If only had data for xlt33, what kind of model
    would you suggest?
  • If only xgt33?
  • If all data?

67
Types of Regression Models
68
Third-Order Model With 1 Independent Variable
  • 1. Relationship Between 1 Dependent 1
    Independent Variable Has a Wave
  • 2. Used If 1 Reversal in Curvature

69
Third-Order Model With 1 Independent Variable
  • 1. Relationship Between 1 Dependent 1
    Independent Variable Has a Wave
  • 2. Used If 1 Reversal in Curvature
  • 3. Model

Curvilinear effects
Linear effect
70
Third-Order Model Relationships
?3 lt 0
?3 gt 0
71
Third-Order Model Worksheet
Multiply X1 by X1 to get X12. Multiply X1 by X1
by X1 to get X13. Run regression with Y, X1,
X12 , X13.
72
Models With Two or More Quantitative Variables
73
Types of Regression Models
74
First-Order Model With 2 Independent Variables
  • 1. Relationship Between 1 Dependent 2
    Independent Variables Is a Linear Function
  • 2. Assumes No Interaction Between X1 X2
  • Effect of X1 on E(Y) Is the Same Regardless of X2
    Values

75
First-Order Model With 2 Independent Variables
  • 1. Relationship Between 1 Dependent 2
    Independent Variables Is a Linear Function
  • 2. Assumes No Interaction Between X1 X2
  • Effect of X1 on E(Y) Is the Same Regardless of X2
    Values
  • 3. Model

76
No Interaction
77
No Interaction
E(Y)
E(Y) 1 2X1 3X2
12
8
4
0
X1
0
1
0.5
1.5
78
No Interaction
E(Y)
E(Y) 1 2X1 3X2
12
8
4
E(Y) 1 2X1 3(0) 1 2X1
0
X1
0
1
0.5
1.5
79
No Interaction
E(Y)
E(Y) 1 2X1 3X2
12
8
E(Y) 1 2X1 3(1) 4 2X1
4
E(Y) 1 2X1 3(0) 1 2X1
0
X1
0
1
0.5
1.5
80
No Interaction
E(Y)
E(Y) 1 2X1 3X2
12
E(Y) 1 2X1 3(2) 7 2X1
8
E(Y) 1 2X1 3(1) 4 2X1
4
E(Y) 1 2X1 3(0) 1 2X1
0
X1
0
1
0.5
1.5
81
No Interaction
E(Y)
E(Y) 1 2X1 3X2
E(Y) 1 2X1 3(3) 10 2X1
12
E(Y) 1 2X1 3(2) 7 2X1
8
E(Y) 1 2X1 3(1) 4 2X1
4
E(Y) 1 2X1 3(0) 1 2X1
0
X1
0
1
0.5
1.5
82
No Interaction
E(Y)
E(Y) 1 2X1 3X2
E(Y) 1 2X1 3(3) 10 2X1
12
E(Y) 1 2X1 3(2) 7 2X1
8
E(Y) 1 2X1 3(1) 4 2X1
4
E(Y) 1 2X1 3(0) 1 2X1
0
X1
0
1
0.5
1.5
Effect (slope) of X1 on E(Y) does not depend on
X2 value
83
First-Order Model Relationships
84
First-Order Model Worksheet
Run regression with Y, X1, X2
85
Types of Regression Models
86
Interaction Model With 2 Independent Variables
  • 1. Hypothesizes Interaction Between Pairs of X
    Variables
  • Response to One X Variable Varies at Different
    Levels of Another X Variable

87
Interaction Model With 2 Independent Variables
  • 1. Hypothesizes Interaction Between Pairs of X
    Variables
  • Response to One X Variable Varies at Different
    Levels of Another X Variable
  • 2. Contains Two-Way Cross Product Terms

88
Interaction Model With 2 Independent Variables
  • 1. Hypothesizes Interaction Between Pairs of X
    Variables
  • Response to One X Variable Varies at Different
    Levels of Another X Variable
  • 2. Contains Two-Way Cross Product Terms
  • 3. Can Be Combined With Other Models
  • Example Dummy-Variable Model

89
Effect of Interaction
90
Effect of Interaction
  • 1. Given

91
Effect of Interaction
  • 1. Given
  • 2. Without Interaction Term, Effect of X1 on Y Is
    Measured by ?1

92
Effect of Interaction
  • 1. Given
  • 2. Without Interaction Term, Effect of X1 on Y Is
    Measured by ?1
  • 3. With Interaction Term, Effect of X1 onY Is
    Measured by ?1 ?3X2
  • Effect Increases As X2i Increases

93
Interaction Model Relationships
94
Interaction Model Relationships
E(Y) 1 2X1 3X2 4X1X2
E(Y)
12
8
4
0
X1
0
1
0.5
1.5
95
Interaction Model Relationships
E(Y) 1 2X1 3X2 4X1X2
E(Y)
12
8
E(Y) 1 2X1 3(0) 4X1(0) 1 2X1
4
0
X1
0
1
0.5
1.5
96
Interaction Model Relationships
E(Y) 1 2X1 3X2 4X1X2
E(Y) 1 2X1 3(1) 4X1(1) 4 6X1
E(Y) 1 2X1 3(0) 4X1(0) 1 2X1
97
Interaction Model Relationships
E(Y) 1 2X1 3X2 4X1X2
E(Y) 1 2X1 3(1) 4X1(1) 4 6X1
E(Y) 1 2X1 3(0) 4X1(0) 1 2X1
Effect (slope) of X1 on E(Y) does depend on X2
value
98
Interaction Model Worksheet
Multiply X1 by X2 to get X1X2. Run regression
with Y, X1, X2 , X1X2
99
GFClocks Revisited (compare Minitab p. 693)
. regress price age numbids agebid Source
SS df MS Number
of obs 32 --------------------------------
----------- F( 3, 28) 193.04
Model 4578427.37 3 1526142.46
Prob gt F 0.0000 Residual
221362.133 28 7905.79047 R-squared
0.9539 ------------------------------------
------- Adj R-squared 0.9489
Total 4799789.5 31 154831.919
Root MSE 88.915 -------------------------
--------------------------------------------------
--- price Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- age .8781425 2.032156
0.43 0.669 -3.28454 5.040825
numbids -93.26482 29.89162 -3.12 0.004
-154.495 -32.03463 agebid 1.297846
.2123326 6.11 0.000 .8629022
1.732789 _cons 320.458 295.1413
1.09 0.287 -284.1115 925.0275 ------------
--------------------------------------------------
----------------
100
Exercise 12.41
  • Minitab printout p.695
  • What is the prediction equation?
  • Describe the geometric form of the response
    surface
  • Plot for x21, 3, 5
  • Explain what it means for x1, x2 to interact
  • Specify null hypothesis for test of interaction
  • Conduct test with alpha .01

101
Exercise 12.43a
  • p. 695
  • Y frequency of alcohol consumption
  • X1 personal attitude toward drinking
  • X2 social support (?for drinking?)
  • Interpret X1X2 interaction

102
Types of Regression Models
103
Second-Order Model With 2 Independent Variables
  • 1. Relationship Between 1 Dependent 2 or More
    Independent Variables Is a Quadratic Function
  • 2. Useful 1St Model If Non-Linear Relationship
    Suspected

104
Second-Order Model With 2 Independent Variables
  • 1. Relationship Between 1 Dependent 2 or More
    Independent Variables Is a Quadratic Function
  • 2. Useful 1St Model If Non-Linear Relationship
    Suspected
  • 3. Model

105
Second-Order Model Relationships
?4 ?5 gt 0
?4 ?5 lt 0
?32 gt 4 ?4 ?5
106
Second-Order Model Worksheet
Multiply X1 by X2 to get X1X2 then X12, X22.
Run regression with Y, X1, X2 , X1X2, X12, X22.
107
Stata Code
gen x1x2 x1x2 gen x1sq x1x1 gen x2sq
x2x2 regress y x1 x2 x1x2 x1sq x2sq
108
Models With One Qualitative Independent Variable
109
Types of Regression Models
110
Dummy-Variable Model
  • Involves Categorical X Variable With 2 (or More)
    Levels
  • e.g., Male-Female College-No College
  • If 2 levels, One Dummy Variable
  • Coded 0 1
  • May Be Combined With Quantitative Variable (1st
    Order or 2nd Order Model)

111
Dummy-Variable Model Worksheet
X2 levels 0 Group 1 1 Group 2. Run
regression with Y, X1, X2
112
Interpreting Dummy-Variable Model Equation
113
Interpreting Dummy-Variable Model Equation
?
?
?
?
Y
X
X
?
?
?
?
?
?
Given

i
i
i
0
1
1
2
2

Y
?
Starting s
alary of c
ollege gra
ds

X
?
GPA
1
0
i
f Male
X
?
2
1
if Female
114
Interpreting Dummy-Variable Model Equation
?
?
?
?
Y
X
X
?
?
?
?
?
?
Given

i
i
i
0
1
1
2
2

Y
?
Starting s
alary of c
ollege gra
d'
s

X
?
GPA
1
0
i
f Male
X
?
2
1
if Female
Males (
)
X
?
0
2
?
?
?
?
?
?
Y
X
X
?
?
?
?
?
?
?
?
?
?
(0)
i
i
i
0
1
1
2
0
1
1
115
Interpreting Dummy-Variable Model Equation
?
?
?
?
Y
X
X
?
?
?
?
?
?
Given

i
i
i
0
1
1
2
2

Y
?
Starting s
alary of c
ollege gra
d'
s

X
?
GPA
1
0
i
f Male
X
?
2
1
if Female
Same slopes
Males (
)
X
?
0
2
?
?
?
?
?
?
Y
X
X
?
?
?
?
?
?
?
?
?
?
(0)
i
i
i
0
1
1
2
0
1
1
Females (
)
X
?
1
2
?
?
?
?
?
?
?
Y
X
X
?
?
?
?
?
?
?
?
(?
?
?
? )
(1)
i
i
i
0
1
1
2
1
1
0
2
116
Dummy-Variable Model Relationships
Y

Same Slopes ?1
Females


?0 ?2

?0
Males
0
X1
0
117
Dummy-Variable Model Example
118
Dummy-Variable Model Example
?

Y
X
X
?
?
?
3
5
7
Computer O
utput
i
i
i
1
2
i
0
f Male
X
?
2
1
if Female
119
Dummy-Variable Model Example
?

Y
X
X
?
?
?
3
5
7
Computer O
utput
i
i
i
1
2
i
0
f Male
X
?
2
1
if Female
Males (
)
X
?
0
2
?
Y
X
X
?
?
?
?
?
3
5
7
3
5
(0)
i
i
i
1
1
120
Dummy-Variable Model Example
?

Y
X
X
?
?
?
3
5
7
Computer O
utput
i
i
i
1
2
i
0
f Male
X
?
2
1
if Female
Same slopes
Males (
)
X
?
0
2
?
Y
X
X
?
?
?
?
?
3
5
7
3
5
(0)
i
i
i
1
1
Females
)
(X
?
1
2
?
Y
X
?
?
?
?
3
5
7
(1)
X
?
(3 7)
5
i
i
1
i
1
121
Dummies for More than Two Levels
  • Categorical variable X with k levels
  • Choose one level as base
  • The left-out value
  • Generate dummy variables for other levels I
  • Xi 1 if Xi, otherwise Xi 0
  • Interpret coefficient on Xi
  • Impact of move from base level to level i

122
GOLFCRD (p.711)
insheet using golfcrd.txt rename v1 brand rename
v2 distance gen b 1 if brand "B" replace b
0 if brand !"B" gen c 1 if brand "C" replace
c0 if brand ! "C" gen d 1 if brand
"D" replace d 0 if brand ! "D"
123
GOLFCRD (SPSS p. 712)
. regress distance b c d Source
SS df MS Number of obs
40 -----------------------------------------
-- F( 3, 36) 43.99 Model
2794.38913 3 931.463043 Prob gt
F 0.0000 Residual 762.300429 36
21.1750119 R-squared
0.7857 ------------------------------------------
- Adj R-squared 0.7678 Total
3556.68956 39 91.1971681 Root
MSE 4.6016 ------------------------------
------------------------------------------------
distance Coef. Std. Err. t
Pgtt 95 Conf. Interval ------------------
--------------------------------------------------
--------- b 10.28 2.057912
5.00 0.000 6.106356 14.45363
c 19.17 2.057912 9.32 0.000
14.99636 23.34364 d -1.460002
2.057912 -0.71 0.483 -5.633641
2.713637 _cons 250.78 1.455164
172.34 0.000 247.8288 253.7312 ----------
--------------------------------------------------
------------------
124
Exercise 12.67
  • p. 713
  • What is least squares equation?
  • Interpret the betas
  • Interpret the null hypothesis beta1beta20 in
    terms of mu values for the different levels
  • Conduct the hypothesis test from c.

125
Exercise 12.80
  • P. 723

126
Exercise 12.79
  • p. 723

127
Dummy Interactions
  • Quantitative Variable X1
  • Dummy Variable X2
  • Model with interaction term
  • What is slope of X1 when X2 is 0?
  • What is slope of X1 when X2 is 1?
  • So Beta3 gives X2s impact on slope of X1

128
Testing Model Portions
129
Testing Model Portions
  • 1. Tests the Contribution of a Set of X Variables
    to the Relationship With Y
  • 2. Null Hypothesis H0 ?g1 ... ?k 0
  • Variables in Set Do Not Improve Significantly the
    Model When All Other Variables Are Included
  • 3. Used in Selecting X Variables or Models
  • Part of Most Computer Programs

130
F-Test for Nested Models
  • Numerator
  • Reduction in SSE from additional parameters
  • df k-g number of additional parameters
  • Denominator
  • SSE of complete model
  • dfn-(k1)error df

131
Exercise 12.89
  • Which of these models is nested?
  • p. 732

132
Exercise 12.93
  • Page 732

133
Exercise 12.90
  • Why is the F-test a one-tailed, upper-tailed test?

134
Selecting Variables in Model Building
135
Selecting Variables in Model Building
A Butterfly Flaps its Wings in Japan, Which
Causes It to Rain in Nebraska. -- Anonymous
Use Theory Only!
Use Computer Search!
136
Model Building with Computer Searches
  • 1. Rule Use as Few X Variables As Possible
  • 2. Stepwise Regression
  • Computer Selects X Variable Most Highly
    Correlated With Y
  • Continues to Add or Remove Variables Depending on
    SSE
  • 3. Best Subset Approach
  • Computer Examines All Possible Sets

137
Should You Do It?
  • Its quite problematic
  • Youve run a large number of tests, so
    probability of at least one error is high
  • P-values too low, confidence intervals too small
  • Gives biased estimates of coefficients for
    variables not dropped
  • See http//www.stata.com/support/faqs/stat/stepwis
    e.html
  • But its commonly done

138
Residual Analysis
139
Evaluating Multiple Regression Model Steps
Expanded!
  • 1. Examine Variation Measures
  • 2. Do Residual Analysis
  • 3. Test Parameter Significance
  • Overall Model
  • Individual Coefficients
  • 4. Test for Multicollinearity

New!
New!
New!
140
Residual Analysis
  • 1. Graphical Analysis of Residuals
  • Plot Estimated Errors vs. Xi Values
  • Difference Between Actual Yi Predicted Yi
  • Estimated Errors Are Called Residuals
  • Plot Histogram or Stem--Leaf of Residuals
  • 2. Purposes
  • Examine Functional Form (Linear vs. Non-Linear
    Model)
  • Evaluate Violations of Assumptions

141
Linear Regression Assumptions
  • 1. Mean of Probability Distribution of Error Is 0
  • 2. Probability Distribution of Error Has Constant
    Variance
  • 3. Probability Distribution of Error is Normal
  • 4. Errors Are Independent

142
Residual Plot for Functional Form
Add X2 Term
Correct Specification


143
Residual Plot for Constant Variance
Unequal Variance
Correct Specification
Fan-shaped.Standardized residuals used typically
(residual divided by standard error of
prediction)
144
Residual Plot for Independence
Not Independent
Correct Specification
145
GFCLOCKS again
. regress price age numbids Source
SS df MS Number of obs
32 ---------------------------------------
---- F( 2, 29) 120.19
Model 4283062.96 2 2141531.48
Prob gt F 0.0000 Residual
516726.54 29 17818.1565 R-squared
0.8923 -------------------------------------
------ Adj R-squared 0.8849
Total 4799789.5 31 154831.919
Root MSE 133.48 -------------------------
--------------------------------------------------
--- price Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- age 12.74057 .9047403
14.08 0.000 10.89017 14.59098
numbids 85.95298 8.728523 9.85 0.000
68.10115 103.8048 _cons -1338.951
173.8095 -7.70 0.000 -1694.432
-983.4711 ----------------------------------------
--------------------------------------
146
Calculating Residuals
. predict yhat, xb . predict ehat, residuals .
predict stdehat, rstandard . predict stderr, stdr
. list price yhat ehat stderr stdehat in 1/5
-----------------------------------------------
------ price yhat ehat
stderr stdehat -----------------------
------------------------------ 1. 1235
1396.49 -161.4904 127.7914 -1.263703 2.
1080 1157.651 -77.65049 127.9045
-.6070973 3. 845 880.7725 -35.77246
127.7805 -.2799525 4. 1522 1345.712
176.2884 131.2617 1.34303 5. 1047
1164.296 -117.2961 127.9363 -.9168323
------------------------------------------------
-----
147
Plot Residuals Against Each X
  • . scatter ehat age
  • . scatter ehat numbids

148
Plot Standardized Residuals
  • . scatter stdehat age
  • . scatter stdehat numbids

149
Regression Pitfalls
150
Evaluating Multiple Regression Model Steps
Expanded!
  • 1. Examine Variation Measures
  • 2. Do Residual Analysis
  • 3. Test Parameter Significance
  • Overall Model
  • Individual Coefficients
  • 4. Test for Multicollinearity

New!
New!
New!
151
Multicollinearity
  • 1. High Correlation Between X Variables
  • 2. Coefficients Measure Combined Effect
  • 3. Leads to Unstable Coefficients Depending on X
    Variables in Model
  • 4. Always Exists -- Matter of Degree
  • 5. Example Using Both Age Height as
    Explanatory Variables in Same Model

152
Detecting Multicollinearity
  • 1. Examine Correlation Matrix
  • Correlations Between Pairs of X Variables Are
    More than With Y Variable
  • 2. Examine Variance Inflation Factor (VIF)
  • If VIFj gt 5 (or 10 according to text),
    Multicollinearity is a Problem
  • 3. Few Remedies
  • Obtain New Sample Data
  • Eliminate One Correlated X Variable

153
Correlation Matrix Computer Output
. corr price age numbids (obs32)
price age numbids ----------------------
------------------ price 1.0000
age 0.7296 1.0000 numbids 0.3952
-0.2537 1.0000
But only correlation among independent variables
matters
154
Extrapolation
Y
Interpolation
Extrapolation
Extrapolation
X
Relevant Range
155
Cause Effect
Liquor Consumption
Teachers
156
Exercise 12.116
  • p.764 whats wrong in each of the residual plots?

157
Exercise 12.153
  • p. 776
  • Analyze FLAG dataset
  • Any multicollinearity?
  • Test regression model with interaction term
  • Conduct residual analysis
  • Good exercise, but we wont have time in class

158
Conclusion
  • 1. Explained the Linear Multiple Regression Model
  • 2. Tested Overall Significance
  • 3. Described Various Types of Models
  • 4. Evaluated Portions of a Regression Model
  • 5. Interpreted Linear Multiple Regression
    Computer Output
  • 6. Described Stepwise Regression
  • 7. Explained Residual Analysis
  • 8. Described Regression Pitfalls

159
End of Chapter
Any blank slides that follow are blank
intentionally.
Write a Comment
User Comments (0)
About PowerShow.com