# Quantitative Methods - PowerPoint PPT Presentation

Title:

## Quantitative Methods

Description:

### Then, calculate GQ, which has an F distribution. Heteroskedasticity Tests ... In other words, if GQ is significantly greater or less than 1, that means that ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 27
Provided by: kathleen5
Category:
Tags:
Transcript and Presenter's Notes

Title: Quantitative Methods

1
Quantitative Methods
• Heteroskedasticity

2
Heterskedasticity
• OLS assumes homoskedastic error terms. In OLS,
the data are homoskedastic if the error term does
not have constant variance.
• If there is non-constant variance of the error
terms, the error terms are related to some
variable (or set of variables), or to case .
The data is then heteroskedastic.

3
Heteroskedasticity
• Example (from wikipedia, I confessit has
relevant graphs which are easily pasted!)
• Note as X increases, the variance of the error
term increases (the goodness of fit gets worse)

4
Heteroskedasticity
• As you can see from the graph, the b (parameter
estimate estimated slope or effect of x on y)
will not necessarily change.
• However, heteroskedasticity changes the standard
errors of the bsmaking us more or less
confident in our slope estimates than we would be
otherwise.

5
Heteroskedasticity
• Note that whether one is more confident or less
confident depends in large part on the
distribution of the dataif there is relatively
poor goodness of fit near the mean of X, where
most of the data points tend to be, then it is
likely that you will be less confident in your
slope estimates than you would b otherwise. If
the data fit the line relatively well near the
mean of X, then it is likely that you will be
more confident in your slope estimates than you
would be otherwise.

6
Heteroskedasticity why?
• Learning?either your coders learn (in which case
you have measurement error), or your cases
actually learn. For example, if you are
predicting wages with experience, it is likely
that variance is reduced among those with more
experience.

7
Heteroskedasticity why?
• Scope of choice some subsets of your data may
have more discretion. So, if you want to predict
saving behavior with wealth?wealthier individuals
might show greater variance in their behavior.

8
Heteroskedasticity
• Heteroskedasticity is very common in pooled data,
which makes sensefor example, some phenomenon
(i.e., voting) may be more predictable in some
states than in others.

9
Heteroskedasticity
• But note that what looks like heteroskedasticity
could actually be measurement error (improving or
deteriorating, thus causing differences in
goodness of fit), or specification issues (you
have failed to control for something which might
account for how predictable your dependent
variable is across different subsets of data).

10
Heteroskedasticity Tests
• The tests for heteroskedasticity tend to
incorporate the same basic idea of figuring out
through an auxiliary regression analysis
whether the independent variables (or case , or
some combination of independent variables) have a
significant relationship to the goodness of fit
of the model.

11
Heteroskedasticity Tests
• In other words, all of the tests seek to answer
the question Does my model fit the data better
in some places than in others? Is the goodness
of fit significantly better at low values of some
independent variable X? Or at high values? Or
in the mid-range of X? Or in some subsets of
data?

12
Heteroskedasticity Tests
• Also note that no single test is definitivein
part because, as observed in class, there could
be problems with the auxiliary regressions
themselves.
• Well examine just a few tests, to give you the
basic idea.

13
Heteroskedasticity Tests
• The first thing you could do is just examine your
data in a scatterplot.
• Of course, it is time consuming to examine all
the possible ways in which your data could be
heteroskedastic (that is, relative to each X, to
combinations of X, to case , to other variables
that arent in the model such as pooling unit,
etc.)

14
Heteroskedasticity Tests
• Another test is the Goldfeld-Quandt. The
Goldfeld Quandt essentially asks you to compare
the goodness of fit of two areas of your data.
• Disadvantages?you need to have pre-selected an X
that you think is correlated with the variance of
the error term.
• G-Q assumes a monotonic relations between X and
the variance of the error term.
• That is, is will only work to diagnose
heteroskedasticity where the goodness of fit at
the low levels of X is different than the
goodness of fit of high levels of X (as in the
graph above). But it wont work to diagnose
heteroskedasticity where the goodness of fit in
the mid-range of X is different from the goodness
of fit at both the low end of X and the high end
of X.

15
Heteroskedasticity Tests
• Goldfeld-Quandt test--steps
• First, order the n cases by the X that you think
is correlated with ei2.
• Then, drop a section of c cases out of the
middle(one-fifth is a reasonable number).
• Then, run separate regressions on both upper and
lower samples. You will then be able to compare
the goodness of fit between the two subsets of

16
Heteroskedasticity Tests
• Obtain the residual sum of squares from each
regression (ESS-1 and ESS-2).
• Then, calculate GQ, which has an F distribution.

17
Heteroskedasticity Tests
• The numerator represents the residual mean
square from the first regressionthat is, ESS-1
/ df. The df (degrees of freedom) are n-k-1.
n is the number of cases in that first subset
of data, and k is the of independent variables
(and then, 1 is for the intercept estimate).

18
Heteroskedasticity Tests
• The denominator represents the residual mean
square from the first regressionthat is, ESS-2
/ df. The df (degrees of freedom) are n-k-1.
n is the number of cases in that second subset
of data, and k is the of independent variables
(and then, 1 is for the intercept estimate).

19
Heteroskedasticity Tests
• Note that the F test is useful in comparing the
goodness of fit of two sets of data.
• How would we know if the goodness of fit was
significantly different across the two subsets of
data?
• By comparing them (as in the ratio above), we can
see if one goodness of fit is significantly
better than the other (accounting for degrees of
freedom?sample size, number of variables, etc.)
• In other words, if GQ is significantly greater or
less than 1, that means that the ESS-1 / df is
significantly greater or less than the ESS-2 /
df?in other words, we have evidence of
heteroskedasticity.

20
Heteroskedasticity Tests
• A second test is the Glejser test
• Perform the regression analysis and save the
residuals.
• Regress the absolute value of the residuals on
possible sources of heteroskedasticity
• A significant coefficient indicates
heteroskedasticity

21
Heteroskedasticity Tests
• Glejser test
• This makes sense conceptuallyyou are testing to
see if one of your independent variables is
significantly related to the variance of your
residuals.

22
Heteroskedasticity Tests
• Whites Test
• Regress the squared residuals (as the dependent
variables) on...
• All the X variables, all the cross products
(i.e., possible interactions) of the X variables,
and all squared values of the X variables.
• Calculate an LM test statistics, which is n
R2
• The LM test statistic has a chi-squared
distribution, with the degrees of freedom
independent variables.

23
Heteroskedasticity Tests
• Whites Test
• The advantage of Whites test is that it does not
assume that there is a monotonic relationship
between any one X and the variance of the error
termsthe inclusion of the interactions allows
some non-linearity in that relationship.
• And, it tests for heteroskedasticity in the
entire modelyou do not have to choose a
particular X to examine.
• However, if you have many variables, the number
of possible interactions plus the squared
variables plus the original variables can be
quite high!

24
Heteroskedasticity Solutions
• GLS / Weighted Least Squares
• In a perfect world, we would actually know what
heteroskedasticity we could expectand we would
then use weighted least squares.
• WLS essentially transforms the entire equation by
dividing through every part of the equation with
the square root of whatever it is that one thinks
the variance is related to.
• In other words, if one thinks ones variance of
the error terms is related to X1 2, then one
divides through every element of the equation
(intercept, each bx, residual) by X1.

25
Heteroskedasticity Solutions
• GLS / Weighted Least Squares
• In this way, one creates a transformed equation,
where the variance of the error term is now
constant (because youve weighted it
appropriately).
• Note, however, that since the equation has been
transformed, the parameter esimates are
different than in the non-transformed versionin
the example above, for b2, you have the effect of
X2/X1 on Y, not the effect of X2 on Y. So, you
need to think about that when you are

26
Heteroskedasticity Solutions
• However...
• We almost never know the precise form that we
expect heteroskedasticity to take.
• So, in general, we ask the software package to
give us Whites Heteroskedastic-Constant
Variances and Standard Errors (Whites robust
standard errors). (alternatively, less commonly,
Newey-West is similar.)
• (For those of you who have dealt with
clusteringthe basic idea here is somewhat
similar, except that in clustering, you identify
an X that you believe your data are clustered
on. When I have repeated states in a
databasethat is, multiple cases from California,
etc.I might want to cluster on state (or, if I
have repeated legislators, I could cluster on
legislator. Etc.) In general, its a
recognition that the error terms will be related
to those repeated observationsthe goodness of
fit within the observations from California will
be better than the goodness of fit across the
observations from all states.)