Regression presentation

About This Presentation

Transcript and Presenter's Notes

Title: Regression

1
KNN Ch7

Multiple Regression II
SS Decomposition
Tests of significance
Multicollinearity

2
Extra Sum of Squares (ESS)

Marginal reduction in SSE when one or several
predictor variables are added to the regression
model given that the other variables are already
in the model.
In what other, equivalent manner, can you state
the above?
The word Extra is used since we would like to
know what the marginal contribution (or extra
contribution) is of a variable or a set of
variables when added as explanatory variables to
the regression model

3
Decomposition of SSR into ESS

A pictorial representation is also possible. See
page 261, Fig. 7.1 of KNN

SSR(X2)
SSR(X1, X2)
SSR(X1X2)
SSE(X2)
SSE(X1, X2)
4
Decomposition of SSR into ESS

For two or three explanatory variables the
formulae are quite easy.
With two variables we have,
And with three variables,

ConsideringX3 adjusted for X1 and X2 as the
predictor, this would be SSR
Considering Y adjusted for X1 and X2 as the
response bvariable, and X3 adjusted for X1 and X2
as the predictor, this would be the SSE
Considering Y adjusted for X1 and X2 as the
response bvariable, this would be the SSTO
5
Decomposition of SSR into ESS

Note that with three variables, we may also have,
To test the hypothesis, v/s
, the test statistic is given
as,
To test (say), v/s ,
,the test statistic is
given as,

6
Decomposition of SSR into ESS

In general however we can write,
This form is very convenient to use since we do
not have to keep track of the individual sums of
squares
Also, this form will minimize any errors due to
subtraction when calculating the SSRs
On the next page we see the ANOVA table with
decomposition of SSR and three variables

7
The ANOVA Table
Source of variation Sum of squares df Mean Squares
Regression 3
1
1
1
Error n-4
Total n-1
8
Another ANOVA Table (whats the difference?)
Source of variation Sum of squares df Mean Squares
Regression 3
1
1
1
Error n-4
Total n-1
9
An Example
The regression equation is Y 236 - 0.203 X1
9.09 X2 - 0.330 X3 Predictor Coeff.
StDev. T P Constant 236.1
254.5 0.93 0.355 X1 -0.20286
0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141 S 1802 R-Sq 95.7
R-Sq(adj) 95.6 Analysis of Variance Source
DF SS MS F
P Regression 3 9833046236
3277682079 1009.04 0.000 Error
137 445017478 3248303 Total 140
10278063714 Source DF Seq SS X1
1 80601012 X2 1
9745311037 X3 1
7134188 Source DF Seq SS X3
1 9733071257 X2 1
61498868 X1 1 38476111
10
Test for a ßk0, in a general model

Full model with all variables,
Compute,
Reduced model without Xk
Compute,

11
An Example
The regression equation is Y 236 - 0.203 X1
9.09 X2 - 0.330 X3 Predictor Coef
StDev T P Constant 236.1
254.5 0.93 0.355 X1
-0.20286 0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141 S 1802 R-Sq 95.7
R-Sq(adj) 95.6 Analysis of Variance Source
DF SS MS F
P Regression 3 9833046236
3277682079 1009.04 0.000 Error
137 445017478 3248303 Total
140 10278063714 The regression equation
is Y 881 - 0.0918 X1 0.846 X3 Predictor
Coef StDev T P Constant
881.4 244.2 3.61 0.000 X1
-0.09185 0.06023 -1.52
0.130 X3 0.84614 0.01696
49.88 0.000 S 1971 R-Sq 94.8
R-Sq(adj) 94.7 Analysis of
Variance Source DF SS
MS F P Regression 2
9742103306 4871051653 1254.21 0.000 Error
138 535960409 3883771 Total
140 10278063714
12
Test for some ßk0, in a general model
See (7.26) pg. 267 of KNN

Full model with all variables,
Compute,
Reduced model without the vectorXk
Compute,
OR,

13
An Example
The regression equation is Y 236 - 0.203 X1
9.09 X2- 0.330 X3 Predictor Coef
StDev T P Constant 236.1
254.5 0.93 0.355 X1
-0.20286 0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141 S 1802 R-Sq 95.7
R-Sq(adj) 95.6 Analysis of Variance Source
DF SS MS F
P Regression 3 9833046236
3277682079 1009.04 0.000 Error
137 445017478 3248303 Total
140 10278063714 The regression equation
is Y 14 6.50 X2 Predictor Coef
StDev T P Constant 14.4
194.9 0.07 0.941 X2
6.4957 0.1225 53.05 0.000 S 1866
R-Sq 95.3 R-Sq(adj)
95.3 Analysis of Variance Source
DF SS MS F
P Regression 1 9794265737 9794265737
2813.99 0.000 Residual Error 139 483797978
3480561 Total 140 10278063714
14
Test for ßk ßq, in a general model

Full model with all variables,
Compute,
Reduced model with XkXq
Compute,

15
An Example
The regression equation is Y 236 - 0.203 X1
9.09 X2- 0.330 X3 Predictor Coef
StDev T P Constant 236.1
254.5 0.93 0.355 X1
-0.20286 0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141 S 1802 R-Sq 95.7
R-Sq(adj) 95.6 Analysis of Variance Source
DF SS MS F
P Regression 3 9833046236
3277682079 1009.04 0.000 Error
137 445017478 3248303 Total
140 10278063714 The regression equation
is Y 324 - 0.200 (X1X3) 8.09 X2 Predictor
Coef StDev T
P Constant 324.2 208.7 1.55
0.123 (X1X3) -0.19971 0.05858 -3.41
0.001 X2 8.0891 0.4820
16.78 0.000 S 1798 R-Sq 95.7
R-Sq(adj) 95.6 Analysis of Variance Source
DF SS MS F
P Regression 2 9831847860
4915923930 1520.33 0.000 Residual Error
138 446215854 3233448 Total 140
10278063714
16
Coefficients of Partial Determination

Recall the definition of the coefficient of
(multiple) determination)
R-sq is the proportionate reduction in Y
variation when the set of X variables is
considered in the model.
Now consider a coefficient of partial
determination
R-sq for a predictor, given the presence of a set
of predictors in the model, measures the marginal
contribution of each variable given that others
are already in the model.
A graphical representation of the strength of the
relationship between Y and X1, adjusted for X2,
is provided by partial regression plots (see HW6)

17
Coefficients of Partial Determination

For a model with two independent variables
Interpret this ,
Generalization is easy, for e.g.,
etc.
Is there an alternate interpretation of the above
partial coefficients? What, is say ??

18
An Example
The regression equation is Y - 4.9 1.12
X1 Predictor Coef StDev T
P Constant -4.92 51.52
-0.10 0.924 X1 1.1209 0.9349
1.20 0.233 S 87.46 R-Sq 1.0
R-Sq(adj) 0.3 Analysis of
Variance Source DF SS
MS F P Regression 1
10995 10995 1.44 0.233 Residual
Error 139 1063300 7650 Total
140 1074295 The regression equation
is Y - 6.17 0.144 X2 Predictor Coef
StDev T P Constant
-6.167 2.075 -2.97 0.003 X2
0.144481 0.002842 50.83 0.000 S
19.86 R-Sq 94.9 R-Sq(adj)
94.9 Analysis of Variance Source
DF SS MS F
P Regression 1 1019453 1019453
2583.84 0.000 Residual Error 139 54842
395 Total 140 1074295
19
Another Example
The regression equation is Y 236 - 0.203 X1
9.09 X2 - 0.330 X3 Predictor Coef
StDev T P Constant 236.1
254.5 0.93 0.355 X1
-0.20286 0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141 S 1802 R-Sq 95.7 R-Sq(adj)
95.6 Analysis of Variance Source
DF SS MS F
P Regression 3 9833046236 3277682079
1009.04 0.000 Residual Error 137 445017478
3248303 Total 140
10278063714 Source DF Seq SS X1
1 80601012 X2 1
9745311037 X3 1 7134188 The
regression equation is Y 408 - 0.173 X1
6.55 X2 Predictor Coef StDev
T P Constant 407.8 227.6
1.79 0.075 X1 -0.17253
0.05551 -3.11 0.002 X2
6.5506 0.1201 54.54 0.000 S 1810
R-Sq 95.6 R-Sq(adj)
95.5 Analysis of Variance Source DF
SS MS F
P Regression 2 9825912049 4912956024
1499.47 0.000 Residual Error 138 452151666
3276461 Total 140 10278063714
20
The Standardized Multiple Regression Model
21
The Standardized Multi. Regression Model

Why necessary?
- Round-off errors in normal equations
calculations (especially when inverting a large,
X X matrix. What is the size of this inverse for
say Yb0b1X1.b7X7)
- Lack of comparability of coefficients in
regression models
(differences in units involved)
- Especially important in presence of
multicollinearity. The X X matrix is almost
close to zero in this case.
OK. So we have a problem. How do we take care of
it?
- The Correlation Transformation
- Centering Take the difference between
each observation and the average AND
- Scaling Dividing the centered observation
by the standard deviation of the variable.
You must have noticed that this is nothing but
regular standardization? Whats the twist? See
next slide

22
The Standardized Multi. Regression Model
Standardization
Correlation Transformation
23
The Standardized Multi. Regression Model

Once we have performed the Correlation
Transformation, then all that remains is to
obtain the new regression parameters. The
standardized regression model is
where, the original parameters can be had from
the transformation,
In Matrix Notation we have some interesting
relationships

Is this surprising?
WHY?
24
An Example
Part of the original (unstandardized) data set
The regression equation is Y 236 - 0.203 X1
9.09 X2 - 0.330 X3 Predictor Coef
StDev T P Constant 236.1
254.5 0.93 0.355 X1
-0.20286 0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141 S 1802 R-Sq 95.7
R-Sq(adj) 95.6 Analysis of Variance Source
DF SS MS F
P Regression 3 9833046236
3277682079 1009.04 0.000 Residual Error
137 445017478 3248303 Total 140
10278063714
25
An Example (continued)
Standardized
and then Correlation Transformed
26
An Example (continued)
The regression equation is Y - 0.00000 -
0.0660 X1 1.37 X2 - 0.381 X3 Predictor
Coef StDev T P Constant
-0.000000 0.001497 -0.00 1.000 X1
-0.06596 0.01916 -3.44
0.001 X2 1.3661 0.2582
5.29 0.000 X3 -0.3813 0.2573
-1.48 0.141 S 0.01778 R-Sq 95.7
R-Sq(adj) 95.6 Analysis of
Variance Source DF SS
MS F P Regression 3
0.95670 0.31890 1009.04 0.000 Residual
Error 137 0.04330 0.00032 Total
140 1.00000

Compare to the regression model obtained from the
untransformed variables, what can we say about
the two models? Is there a difference in
predictive power, or is there a difference in
ease of interpretation?
Why is b00 ? Just by chance?

27
Multicollinearity

One of the assumptions of the OLS model is that
the predictor variables are uncorrelated.
When this assumption is not satisfied, then
multicollinearity is said to exist.(Think about
Venn Diagrams for this)
Note that multicollinearity is strictly a
sample phenomenon.
We may try to avoid it by doing controlled
experiments, but in most social sciences
research, this is very difficult to do.
Let us first, consider the case of uncorrelated
predictor variables, i.e., no multicollinearity.
-Usually occurs in controlled experiments
-In this case the R2 between each pair of
variables is zero
The ESS for each variable is the same as when the
variable
is regressed alone on the response variable.

28
An Example
The regression equation is Y - 4.73 0.107
X1 3.75 X2 Predictor Coef StDev
T P Constant -4.732
4.428 -1.07 0.334 X1 0.1071
0.3537 0.30 0.774 X2
3.750 1.621 2.31 0.069 S 2.292
R-Sq 52.1 R-Sq(adj)
33.0 Analysis of Variance Source
DF SS MS F
P Regression 2 28.607 14.304
2.72 0.159 Residual Error 5 26.268
5.254 Total 7
54.875 Source DF Seq SS X1
1 0.482 X2 1 28.125 X2
1 28.125 X1 1
0.482
X1 X2 Y
1 2 1
2 2 5
3 3 7
4 3 8
5 3 4
6 3 9
7 2 5
8 2 2
29
An Example (continued)
The regression equation is Y 4.64 0.107
X1 Predictor Coef StDev T
P Constant 4.643 2.346
1.98 0.095 X1 0.1071 0.4646
0.23 0.825 S 3.011 R-Sq 0.9
R-Sq(adj) 0.0 Analysis of
Variance Source DF SS
MS F P Regression 1
0.482 0.482 0.05 0.825 Residual
Error 6 54.393 9.065 Total
7 54.875 The regression equation is Y
- 4.25 3.75 X2 Predictor Coef
StDev T P Constant -4.250
3.807 -1.12 0.307 X2
3.750 1.493 2.51 0.046 S 2.111
R-Sq 51.3 R-Sq(adj)
43.1 Analysis of Variance Source
DF SS MS F
P Regression 1 28.125 28.125
6.31 0.046 Residual Error 6 26.750
4.458 Total 7 54.875
Source DF Seq SS X1 1
0.482 X2 1 28.125 X2
1 28.125 X1 1 0.482 (From
previous slide)
30
Multicollinearity (Effects of)

The regression coefficient or any independent
variable cannot be interpreted as usual. One has
to take into account which other correlated
variables are included in the model.
The predictive ability of the overall model is
usually unaffected.
The ESS are usually reduced to a great extent.
The variability of OLS regression parameter
estimates is inflated.
(Let us see an intuitive reason for this based on
a model with p-12)
Note that the standardized regression
coefficients have equal standard deviations. Will
this be the case even when p-13? Or is this
just a special case scenario.

31
Multicollinearity (Effects of)

High R2 , but few significant t-ratios
(By now, you should be able to guess the reason
for this)
Wider individual confidence intervals for
regression parameters (This is obvious based on
what we discussed on the earlier slide)
e.g. What would you conclude based on the above
picture?

32
Multicollinearity (How to detect it?)

High R2 (gt0.8) , but few significant t-ratios
Caveat There is a particular situation when the
above is caused w/out any multicollinearity.
Thankfully this situation never arises in
practice
High pair-wise correlation (gt0.8) between
independent variables
Caveat This is a sufficient, but not necessary
condition. For example consider the case where,
rX1X20.5, rX1X30.5 and rX2X3-0.5. We may
conclude, no multicollinearity. However, we find
that R21 when we regress X1 on X2 and X3
together. This means that X1 is a perfect linear
combination of the two other independent
variables. In fact the formula for the R2 is
given as,
and one can readily verify
that the numbers satisfy this equation.
Due to the above caveat, always examine the
partial correlation coefficients.

33
Multicollinearity (How to detect it?)

Run auxiliary regressions, i.e . Regress each of
the independent variables on the other
independent variables taken together and conclude
if it is correlated to the other or not based on
the R2.
The Condition Index (CI)
If, Moderate to Strong
multicollinearity.
CI gt 30 means severe multicollinearity.

34
Multicollinearity (What is the remedy?)

Rely on joint confidence intervals rather than
individual ones
A priori information of relationship between some
independent variables? Then include it!
For example b1ab2 is known. Then use this in
the regression model which then becomes, Yb0
b2X, (where, XX2 aX1)
Data Pooling (Usually done by combining
cross-sectional and time series data. Time series
data is notorious for multicollinearity)

35
Multicollinearity (What is the remedy?)

Delete a variable which is causing problems
Caveat Beware of specification bias. This arises
when a model is incorrectly specified.
For example, in order to explain consumption
expenditure, we may only include income and drop
wealth since it highly correlated to income.
However economic theory may postulate that you
use both variables.
First difference transformation of variables
from time series data
The regression in run on differences between
successive values of variables rather than the
original variables. (Xi,1-Xi1,1) and
(Xi,2-Xi1,2) etc. The logic is that even if X1
and X2 are correlated, there is no reason for
their first differences to be correlated too.
Caveat Beware of autocorrelation which usually
arises due to this procedure. Also, we lose one
degree of freedom due to the difference
procedure.
Correlation transformation
Getting a new sample (Why?) and/or increasing
sample size (Why?)
Factor Analysis, Principal Components Analysis,
Ridge Regression

36
An Example
37
An Example (continued)
The regression equation is Y - 0.032 6.99 X1
- 0.064 X2 Predictor Coef StDev
T P Constant -0.0322
0.2516 -0.13 0.898 X1 6.986
1.667 4.19 0.000 X2
-0.0640 0.2171 -0.29 0.769 S
1.872 R-Sq 95.3 R-Sq(adj)
95.2 Analysis of Variance Source
DF SS MS F
P Regression 2 9794.6 4897.3
1397.80 0.000 Residual Error 138 483.5
3.5 Total 140
10278.1 Source DF Seq SS X1
1 9794.3 X2 1
0.3 Source DF Seq SS X2 1
9733.1 X1 1
61.5 Predicted Values Fit StDev Fit
95.0 CI 95.0 PI 12.020
3.351 (5.394, 18.646) (4.431,
19.609)

High R2
Low t-value for b2
Low ESS for X2 (i.e.SSR(X2X1))
Clearly, X2 contributes little to the model.
Really? Look at SSR(X2) ..its humungous !!
Clear case of Multi.coll.
Of course we knew that rX1X20.997. This should
have made us suspect that something was amiss.

38
Multicollinearity (Specification Bias)

Types of Specification Errors
Omitting a relevant variable
Including an unnecessary or irrelevant variable
Incorrect functional form
Errors of measurement bias
Incorrect specification of stochastic error term
(This is a model mis-specification error)
More on omitting a relevant variable
(under-fitting)
True Model Yi b0b1Xi1b2 Xi2 ui
Fitted Model Yi a0a1 Xi1 ni
Consequences of omission
If r12 is non-zero then the estimators of a0 and
a1 are biased and inconsistent
Variance of estimator of a1 is biased estimate of
variance of estimator of b1
s2 is incorrectly estimated and CIs, hypothesis
tests are misleading
E(Estimator of a1) b1b2 b21

Write a Comment

User Comments (0)

About PowerShow.com

Regression PowerPoint PPT Presentation