Title: Heteroskedasticity
1Chapter 5
2A regression line
3What is in this Chapter?
 How do we detect this problem
 What are the consequences of this problem?
 What are the solutions?
4What is in this Chapter?
 First, We discuss tests based on OLS residuals,
likelihood ratio test, GQ test and the BP test.
The last one is an LM test.  Regarding consequences, we show that the OLS
estimators are unbiased but inefficient and the
standard errors are also biased, thus
invalidating tests of significance
5What is in this Chapter?
 Regarding solutions, we discuss solutions
depending on particular assumptions about the
error variance and general solutions.  We also discuss transformation of variables to
logs and the problems associated with deflators,
both of which are commonly used as solutions to
the heteroskedasticity problem.
65.1 Introduction
 The homoskedasticity variance of the error
terms is constant  The heteroskedasticity variance of the error
terms is nonconstant  Illustrative Example
 Table 5.1 presents consumption expenditures (y)
and income (x) for 20 families.  Suppose that we estimate the equation by ordinary
least squares. We get (figures in parentheses are
standard errors)
75.1 Introduction
 We get (figures in parentheses are standard
errors)  y0.847 0.899 x R2 0.986
 (0.703) (0.0253)
RSS31.074
85.1 Introduction
95.1 Introduction
105.1 Introduction
115.1 Introduction
125.1 Introduction
 The residuals from this equation are presented in
Table 5.3  In this situation there is no perceptible
increase in the magnitudes of the residuals as
the value of x increases  Thus there does not appear to be a
heteroskedasticity problem.
135.2 Detection of Heteroskedasticity
 In the illustrative example in Section 5.1 we
plotted estimated residual against to
see whether we notice any systematic pattern in
the residuals that suggests heteroskedasticity in
the error.  Note however, that by virtue if the normal
equation, and are uncorrelated though
could be correlated with .
145.2 Detection of Heteroskedasticity
 Thus if we are using a regression procedure to
test for heteroskedasticity, we should use a
regression of on or a
regression of or
 In the case of multiple regression, we should use
powers of , the predicted value of , or
powers of all the explanatory variables.
155.2 Detection of Heteroskedasticity
 The test suggested by Anscombe and a test called
RESET suggested by Ramsey both involve regressing
and testing whether or
not the coefficients are significant.  The test suggested by White involves regressing
on all the explanatory variables
and their squares and cross products. For
instance, with explanatory variables x1, x2, x3,
it involves regressing
165.2 Detection of Heteroskedasticity
 Glejser suggested estimating regressions of the
type

 and so on and testing the hypothesis
175.2 Detection of Heteroskedasticity
 The implicit assumption behind all these tests is
that where
zi os an unknown variable and the different tests
use different proxies or surrogates for the
unknown function f(z).
185.2 Detection of Heteroskedasticity
195.2 Detection of Heteroskedasticity
205.2 Detection of Heteroskedasticity
 Thus there is evidence of heteroskedasticity even
in the log linear from, although casually
looking at the residuals in Table 5.3, we
concluded earlier that the errors were
homoskedastic.  The GoldfeldQuandt, to be discussed later in
this section, also did not reject the hypothesis
of homoskedasticity.  The Glejser tests, however, show significant
heteroskedasticity in the loglinear form.
21Assignment
 Redo this illustrative example
 The figure of the absolute value of the residual
and x variable  Linear form
 Loglinear form
 Three types of tests
 Linear form and loglinear form
 The eview table
 Reject/accept the null hypothesis of homogenous
variance
225.2 Detection of Heteroskedasticity
 Some Other Tests (General tests)
 Likelihood Ratio Test
 Goldfeld and Quandt Test
 BreuschPagan Test
235.2 Detection of Heteroskedasticity
 Likelihood Ratio Test
 If the number of observations is large, one can
use a likelihood ratio test.  Divide the residuals (estimated from the OLS
regression) into k group with ni observations in
the i th group, .  Estimate the error variances in each group by
.  Let the estimate of the error variance from the
entire sample be .Then if we define as
245.2 Detection of Heteroskedasticity
 Goldfeld and Quandt Test
 If we do not have large samples, we can use the
Goldfeld and Quandt test.  In this test we split the observations into two
groups one corresponding to large values of x
and the other corresponding to small values of x
255.2 Detection of Heteroskedasticity
 Fit separate regressions for each and then apply
an Ftest to test the equality of error
variances.  Goldfeld and Quandt suggest omitting some
observations in the middle to increase our
ability to discriminate between the two error
variances.
265.2 Detection of Heteroskedasticity
 BreuschPagan Test
 Suppose that .
 If there are some variables
that influence the error variance and if
, then the Breusch and Pagan test is atest
of the hypothesis  The function can be any function.
275.2 Detection of Heteroskedasticity
 For instance, f(x) can be ,and so
on.  The Breusch and pagan test does not depend on the
functional form.  Let
 S0 regression sum of squares from
a  regression of
 Then has a X 2
distribution with d.f.r.  This test is an asymptotic test. An intuitive
justification for the test will be given atter an
illustrative example.
285.2 Detection of Heteroskedasticity
 Illustrative Example
 Consider the data in Table 5.1. To apply the
GoldfeldQuandt test we consider two groups of 10
observations each, ordered by the values of the
variable x.  The first group consists of observations 6, 11,
9, 4, 14, 15, 19, 20 ,1, and 16.  The second group consists of the remaining 10.
295.2 Detection of Heteroskedasticity
 Illustrative Example
 The estimate equations were
 Group 1 y1.0533 0.876 x R2
0.985  (0.616) (0.038)
0.475  Group 2 y3.279 0.835 x R2
0.904  (3.443) (0.096)
3.154
305.2 Detection of Heteroskedasticity
 The F ratio for the test is
 The 1 point for the Fdistribution with d.f. 8
and 8 is 6.03.  Thus the Fvalue is significant at the 1 level
and we reject the hypothesis if homoskedasticity.
315.2 Detection of Heteroskedasticity
 Group 1 log y 0.128 0.934 x R2 0.992
 (0.079) (0.030)

0.001596  Group 2 log y 0.276 0.902 x R2 0.912
 (0.352) (0.099)

0.002789  The Fratio for the test is
325.2 Detection of Heteroskedasticity
 For d.f. 8 and 8, the 5 point from the Ftables
is 3.44.  Thus if we use the 5 significance level, we do
not reject the hypothesis of homoskedasticity if
we consider the linear form but do not reject it
in the loglinear form.  Note that the White test rejected the hypothesis
in both the forms.
335.2 Detection of Heteroskedasticity
 Turning now to the Breusch and Pagan test, the
regression of
gave the following regression sums of squares.  For the linear form
 S 40.842 for the regression of
 S 40.065 for the regression of
 Also .
 The test statistic for the X2test is (using
second regression)
345.2 Detection of Heteroskedasticity
 We use statistic as a X 2distribution with d.f.
2 since two slop parameters are estimated.  This is significant at the 5 level, thus,
rejecting the hypothesis of homoskedasticity.  For the loglinear form, using only and
as regressors we get S0.000011 and
 The test statistic is
355.2 Detection of Heteroskedasticity
 Using the X 2tables with d.f. 2 we see that this
is not significant even at the 50 level.  Thus, the test does not reject the hypothesis of
homoskedasticity in the loglinear form.
365.3 Consequences of Heteroskedasticity
 To see this, consider a very simple model with no
constant term  The least squares estimator of is
 If and are independent of
the . We have
and hence .  Thus is unbiased.
375.3 Consequences of Heteroskedasticity
 If the are mutually independent, denoting
by we write
385.3 Consequences of Heteroskedasticity
 Then dividing (5.1) by we have the model


 where has a constant variance
.  Since we are weighting the i th observation by
the OLS estimation of (5.3) is called
weighted least squares (WLS).
395.3 Consequences of Heteroskedasticity
 If is the WLS estimator of , we have
 and since the latter term has expectation zero,
we have
405.3 Consequences of Heteroskedasticity
 Thus the WLS estimator is also unbiased.
 We will show that is more efficient that the
OLS estimator .  We have
 and substituting in
equation(5.2), we have
415.3 Consequences of Heteroskedasticity
 Thus
 This expression is of the form
,  where .
 An example by Gauss
425.3 Consequences of Heteroskedasticity
 Thus it is less than 1 and is equal to 1 only if
and are proportional, that is,
and are proportional or is a
constant, which is the case if the errors are
homoskedastic.
435.3 Consequences of Heteroskedasticity
 Thus the OLS estimator is unbiased but efficient
(has a higher variance) than the WLS estimator.  Turning now to the estimation if the variance of
, it is estimated by  where RSS is the residual sum if squares from
the OLS model.
445.3 Consequences of Heteroskedasticity
 But
 The variance of by an expression whose
expected value is  whereas the true variance is
455.3 Consequences of Heteroskedasticity
 Thus the estimated variances are also biased.
 If and are positively correlated as is
often the case with economic data so that
,then the
expected value of the estimated variance is
smaller than the true variance.
465.3 Consequences of Heteroskedasticity
 Thus we would be underestimating the true
variance of the OLS estimator and getting shorter
confidence intervals than the true ones.  This also affects tests of hypotheses about
.
475.3 Consequences of Heteroskedasticity
 The solution to the heteroskedasticity problem
depends on the assumptions we make about the
sources of heteroskedasticity.  When we are not sure or this, we can at least try
to make corrections for the standard errors,
since we seen that the least squares estimator is
unbiased but inefficient, and moreover, the
standard errors are also biased.
485.3 Consequences of Heteroskedasticity
 White suggests that we use the formula (5.2) with
 substituted for .
 Using this formula we find that in the case of
the illustrative example with data in Table 5.1
the standard error of , the slope coefficient
is 0.027.  Earlier, we estimated it from the OLS regression
as 0.0253.  Thus the difference is really not very large in
this example.
495.4 Solutions to the Heteroskedasticity Problem
 There are two types of solutions that have been
suggested in the literature for the problem of
heteroskedasticity  Solutions dependent on particular assumptions
about si.  General solutions.
 We first discuss category 1. Here we have two
methods of estimation weighted least squares
(WLS) and maximum likelihood (ML).
505.4 Solutions to the Heteroskedasticity Problem
515.4 Solutions to the Heteroskedasticity Problem
Thus the constant term in this equation is the
slope coefficient in the original equation.
525.4 Solutions to the Heteroskedasticity Problem
 Prais and Houthakker found in their analysis of
family budget data that the errors from the
equation had variance increasing with household
income.  They considered a model
,that is, .  In this case we cannot divide the whole equation
by a known constant as before.  For this model we can consider a twostep
procedure as follows.
535.4 Solutions to the Heteroskedasticity Problem
 First estimate and by OLS.
 Let these estimators be and .
 Now use the WLS procedure as outlined earlier,
that is, regress on
and with no
constant term.  The limitation of the twostep procedure the
error involved in the first step will affect the
second step
545.4 Solutions to the Heteroskedasticity Problem
 This procedure is called a twostep weighted
least squares procedure.  The standard errors we get for the estimates of
and from this procedure are valid only
asymptotically.  The are asymptotic standard errors because the
weights have been estimated.
555.4 Solutions to the Heteroskedasticity Problem
 One can iterate this WLS procedure further, that
is, use the new estimates of and to
construct new weights and then use the WLS
procedure, and repeat this procedure until
convergence.  This procedure is called the iterated weighted
least squares procedure. However, there is no
gain in (asymptotic) efficiency by iteration.
565.4 Solutions to the Heteroskedasticity Problem
 If we make some specific assumptions about the
errors, say that they are normal  We can use the maximum likelihood method, which
is more efficient than the WLS if errors are
normal
575.4 Solutions to the Heteroskedasticity Problem
585.4 Solutions to the Heteroskedasticity Problem
595.4 Solutions to the Heteroskedasticity Problem
 Illustrative Example
 As an illustration, again consider the data
in Table 5.1.We saw earlier that regressing the
absolute values of the residuals on x (in
Glejsers tests) gave the following estimates  Now we regress
(with no constant term) where
.
605.4 Solutions to the Heteroskedasticity Problem
 The resulting equation is
 If we assume that
, the twostep WLS procedure would be as
follows.
615.4 Solutions to the Heteroskedasticity Problem
 Next we compute
 and regress
.The results were  The in these equations are not comparable.
But our interest is in estimates if the
parameters in the consumption function.
62Assignment
 Use the data of Table 5.1 to do the WLS
 Consider the logliner form
 Run the Glejsers tests to check if the
loglinear regression model still has
nonconstant variance  Estimate the nonconstant variance and run the
WLS  Write a onestep program using Gauss program
635.4 Solutions to the Heteroskedasticity Problem
 Comparing the results with the OLS estimates
presented in Section 5.2, we notice that the
estimates of are higher than the OLS
estimates, the estimates of are lower, and
the standard errors are lower.
645.5 Heteroskedasticity and the Use of Deflators
 There are two remedies often suggested and used
for solving the heteroskedasticity problem  Transforming the data to logs
 Deflating the variables by some measure of
"size."
655.5 Heteroskedasticity and the Use of Deflators
665.5 Heteroskedasticity and the Use of Deflators
675.5 Heteroskedasticity and the Use of Deflators
 One important thing to note is that the purpose
in all these procedures of deflation is to get
more efficient estimates of the parameters  But once those estimates have been obtained, one
should make all inferencescalculation of the
residuals, prediction of future values,
calculation of elasticities at the means, etc.,
from the original equationnot the equation in
the deflated variables.
685.5 Heteroskedasticity and the Use of Deflators
 Another point to note is that since the purpose
of deflation is to get more efficient estimates,
it is tempting to argue about the merits of the
different procedures by looking at the standard
errors of the coefficients.  However, this is not correct, because in the
presence of heteroskedasticity the standard
errors themselves are biased, as we showed earlier
695.5 Heteroskedasticity and the Use of Deflators
 For instance, in the five equations presented
above, the second and third are comparable and so
are the fourth and fifth.  In both cases if we look at the standard errors
of the coefficient of X, the coefficient in the
undeflated equation has a smaller standard error
than the corresponding coefficient in the
deflated equation.  However, if the standard errors are biased, we
have to be careful in making too much of these
differences.
705.5 Heteroskedasticity and the Use of Deflators
 In the preceding example we have considered miles
M as a deflator and also as an explanatory
variable.  In this context we should mention some discussion
in the literature on "spurious correlation"
between ratios.
715.5 Heteroskedasticity and the Use of Deflators
 The argument simply is that even if we have two
variables X and Y that are uncorrelated, if we
deflate both the variables by another variable Z,
there could be a strong correlation between X/Z
and Y/Z because of the common denominator Z .  It is wrong to infer from this correlation that
there exists a close relationship between X and Y.
725.5 Heteroskedasticity and the Use of Deflators
 Of course, if our interest is in fact the
relationship between X/Z and Y/Z, there is no
reason why this correlation need be called
"spurious."  As Kuh and Meyer point out, "The question of
spurious correlation quite obviously does not
arise when the hypothesis to be tested has
initially been formulated in terms of ratios, for
instance, in problems involving relative prices.
735.5 Heteroskedasticity and the Use of Deflators
 Similarly, when a series such as money value of
output is divided by a price index to obtain a
'constant dollar' estimate of output, no question
of spurious correlation need arise.  Thus, spurious correlation can only exist when a
hypothesis pertains to undeflated variables and
the data have been divided through by another
series for reasons extraneous to but not in
conflict with the hypothesis framed an exact,
i.e., nonstochastic relation."
745.5 Heteroskedasticity and the Use of Deflators
 In summary, often in econometric work deflated or
ratio variables are used to solve the
heteroskedasticity problem  Deflation can sometimes be justified on pure
economic grounds, as in the case of the use of
"real" quantities and relative prices  In this case all the inferences from the
estimated equation will be based on the equation
in the deflated variables.
755.5 Heteroskedasticity and the Use of Deflators
 However, if deflation is used to solve the
heteroskedasticity problem, any inferences we
make have to be based on the original equation,
not the equation in the deflated variables  In any case, deflation may increase or decrease
the resulting correlations, but this is beside
the point. Since the correlations are not
comparable anyway, one should not draw any
inferences from them.
765.5 Heteroskedasticity and the Use of Deflators
 Illustrative Example
 In Table 5.5 we present data on
 y population density
 x distance from the central business
district  for 39 census tracts on the Baltimore area in
1970. It has been suggested (this is called the
density gradient model) that population density
follows the relationship  where A is the density of the central business
district.
775.5 Heteroskedasticity and the Use of Deflators
 The basic hypothesis is that as you move away
from the central business district population
density drops off.  For estimation purposes we take logs and write
785.5 Heteroskedasticity and the Use of Deflators
 where
.  Estimation of this equation by OLS gave the
following results (figures in oarenthese are
tvalues, not standard errors)
795.5 Heteroskedasticity and the Use of Deflators
 The tvalues are very high and the coefficients
and significantly different from zero (with
a significance level of less than 1).The sign of
is negative, as expected.  With crosssectional data like these we expect
heteroskedasticity, and this could result in an
underestimation of the standard errors (and thus
an overestimation of the tratios).
805.5 Heteroskedasticity and the Use of Deflators
 To check whether there is heteroskedasticity, we
have to analyze the estimated residuals .  A plot if against showed a positive
relationship and hence Glejsers tests were
applied.
815.5 Heteroskedasticity and the Use of Deflators
 Defining by , the following
equations were estimated
825.5 Heteroskedasticity and the Use of Deflators
 We choose the specification that gives the
highest or equivalently the highest
tvalue, since in the
case of only one regressor.
835.5 Heteroskedasticity and the Use of Deflators
 The estimated regressions with tvalues in
parentheses were
845.5 Heteroskedasticity and the Use of Deflators
 All the tstatistics are significant, indicating
the presence of heteroskedasticity.  Based on the highest tratio, we chose the second
specification (although the fourth specification
is equally valid).
855.5 Heteroskedasticity and the Use of Deflators
 Deflating throughout by gives the regression
equations to be estimated as  The estimates were (figures in parentheses are
tratios)
865.6 Testing the Linear Versus LogLinear
Functional Form
875.6 Testing the Linear Versus LogLinear
Functional Form
 When comparing the linear with the loglinear
forms, we cannot compare the R2 because R2 is the
ratio of explained variance to the total variance
and the variances of y and log y are different  Comparing R2's in this case is like comparing two
individuals A and B, where A eats 65 of a carrot
cake and B eats 70 of a strawberry cake  The comparison does not make sense because there
are two different cakes.
885.6 Testing the Linear Versus LogLinear
Functional Form
 The BoxCox Test
 One solution to this problem is to consider a
more general model of which both the linear and
loglinear forms are special cases. Box and Cox
consider the transformation.
895.6 Testing the Linear Versus LogLinear
Functional Form
 Box and Cox consider the regression model
 where .
 For the sake of simplicity of exposition we are
considering only one explanatory variable.  Also, instead of considering we can consider
.  For this is a loglinear model, and for
this is a linear model.
905.6 Testing the Linear Versus LogLinear
Functional Form
 There are two main problems with the
specification in equation (5.7)  The assumption that the errors in (5.7) are
IN( 0 , ) for all values of is not a
reasonable assumption.
915.6 Testing the Linear Versus LogLinear
Functional Form
 Since y gt0, unless 0 the definition of
in (5.6) imposes some constraints on
that depend on the unknown .Since y gt0, we
have, from equation(5.6),  However, we will ignore these problems and
describe the BoxCox method.
925.6 Testing the Linear Versus LogLinear
Functional Form
 Base on the specification given by equation (5.7)
Box and Cox suggest estimating by the ML
method.  We can then test the hypotheses
 If the hypothesis is accepted, we use
log y as the explained variable.  If the hypothesis is accepted, we use
log y as the explained variable.  A problem arises only if both hypotheses are
rejected or both accepted. In this case we have
to use the estimated , and work with
.
935.6 Testing the Linear Versus LogLinear
Functional Form
 The ML method suggested by Box and Cox amounts to
the following procedure  Divide each y by the geometric mean of the ys.
 Now compute for different values of and
regress it on x. Compute the residual sum of
squares and denote it by .  choose the value of for which is
minimum. This value of is the ML estimator of
.
945.6 Testing the Linear Versus LogLinear
Functional Form
 As a special case, consider the problem of
choosing between the linear and loglinear model  and
 What we do is first divide each by the
geometric mean of the ys.  Then we estimate the two regressions and choose
the one with the smaller residual sum of squares.
This is the BoxCox procedure.
955.6 Testing the Linear Versus LogLinear
Functional Form
 The BM Test
 This is the test suggested by Bera and McAleer.
 Suppose the loglinear and liner models to be
tested are given by  and
965.6 Testing the Linear Versus LogLinear
Functional Form
 The BM test involves three steps
 Step 1
 Obtain the predicted values log
from the two equation, respectively.  The predicted value of from the loglinear
equation is exp (log ). The predicted value
of log from the linear equation is log .
975.6 Testing the Linear Versus LogLinear
Functional Form
 Step 2
 Compute the artificial regressions
 and
 Let the estimated residuals from these two
regression equations be and
respectively.
985.6 Testing the Linear Versus LogLinear
Functional Form
 Step 3
 The tests for are based on
in the artificial regressions  and
 We use the usual ttests to test these
hypotheses.
995.6 Testing the Linear Versus LogLinear
Functional Form
 Step 3
 If is accepted, we choose the
loglinear model.  If is accepted, we choose the linear
model.  A problem arises if both these hypotheses are
rejected or both are accepted.
100Summary
 1. If the error variance is not constant for all
the observations, this is known as the
heteroskedasticity problem. The problem is
informally illustrated with an example in Section
5.1.
101Summary
 2. First, we would like to know whether the
problem exists. For this purpose some tests have
been suggested. We have discussed the following
tests  (a) Ramsey's test.
 (b) Glejser's tests.
 (c) Breusch and Pagan's test.
 (d) White's test.
 (e) Goldfeld and Quandt's test.
 (f) Likelihood ratio test.
102Summary
 3. The consequences of the heteroskedasticity
problem are  (a) The least squares estimators are unbiased but
inefficient.  (b) The estimated variances are themselves
biased.  If the heteroskedasticity problem is detected, we
can try to solve it by the use of weighted least
squares.  Otherwise, we can at least try to correct the
error variances .
103Summary
 4. There are three solutions commonly suggested
for the heteroskedasticity problem  (a) Use of weighted least squares.
 (b) Deflating the data by some measure of
"size.  (c) Transforming the data to the logarithmic
form.  In weighted least squares, the particular
weighting scheme used will depend on the nature
of heteroskedasticity.
104Summary
 5. The use of deflators is similar to the
weighted least squared method, although it is
done in a more ad hoc fashion.  6. The question of estimation in linear versus
logarithmic form has received considerable
attention during recent years. Several
statistical tests have been suggested for testing
the linear versus logarithmic form.