Advanced Quantitative Methods - PS 401 Notes - PowerPoint PPT Presentation

1 / 150
About This Presentation
Title:

Advanced Quantitative Methods - PS 401 Notes

Description:

Selected special topics. Syllabus. Required texts. Additional ... Research paper (30%) Participation (10%) http://www.polsci.wvu.edu/duval/ps401/401syl.html ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 151
Provided by: rdu9
Category:

less

Transcript and Presenter's Notes

Title: Advanced Quantitative Methods - PS 401 Notes


1
Advanced Quantitative Methods - PS 401Notes
Version as of 9/21/2000
  • Robert D. Duval
  • WVU Dept of Political Science
  • Class Office
  • 306E Woodburn 301A Woodburn
  • TTh 1130-1245 T 200-300
  • Th 100-300
  • Phone 293-3811 x5299
  • 293-4372 x13050
  • e-mail bduval_at_wvu.edu

2
Introduction
  • This course is about Regression analysis.
  • The principle method in the social science
  • Three basic parts to the course
  • An introduction to the general Model
  • The formal assumptions and what they mean.
  • Selected special topics

3
Syllabus
  • Required texts
  • Additional readings
  • Computer exercises
  • Course requirements
  • Midterm - in class, open book (30)
  • Final - in class, open book (30)
  • Research paper (30)
  • Participation (10)
  • http//www.polsci.wvu.edu/duval/ps401/401syl.html

4
Introduction The General Linear Model
  • The General Linear Model is a phrase used to
    indicate a class of statistical models which
    include simple linear regression analysis.
  • Regression is the predominant statistical tool
    used in the social sciences due to its simplicity
    and versatility.
  • Also called Linear Regression Analysis.

5
Simple Linear Regression The Basic Mathematical
Model
  • Regression is based on the concept of the simple
    proportional relationship - also known as the
    straight line.
  • We can express this idea mathematically!
  • Theoretical aside All theoretical statements of
    relationship imply a mathematical theoretical
    structure.
  • Just because it isnt explicitly stated doesnt
    mean that the math isnt implicit in the language
    itself!

6
Alternate Mathematical Notation for the Line
  • Alternate Mathematical Notation for the straight
    line - dont ask why!
  • 10th Grade Geometry
  • Statistics Literature
  • Econometrics Literature

7
Alternate Mathematical Notation for the Line
cont.
  • These are all equivalent. We simply have to live
    with this inconsistency.
  • We wont use the geometric tradition, and so you
    just need to remember that B0 and a are both the
    same thing.

8
Linear Regression the Linguistic Interpretation
  • In general terms, the linear model states that
    the dependent variable is directly proportional
    to the value of the independent variable.
  • Thus if we state that some variable Y increases
    in direct proportion to some increase in X, we
    are stating a specific mathematical model of
    behavior - the linear model.
  • Hence, if we say that the crime rate goes up as
    unemployment goes up, we are stating a simple
    linear model.

9
Linear RegressionA Graphic Interpretation

10
The linear model is represented by a simple
picture

11
The Mathematical Interpretation The Meaning of
the Regression Parameters
  • a the intercept
  • the point where the line crosses the Y-axis.
  • (the value of the dependent variable when all of
    the independent variables 0)
  • b the slope
  • the increase in the dependent variable per unit
    change in the independent variable (also known as
    the 'rise over the run')

12
The Error Term
  • Such models do not predict behavior perfectly.
  • So we must add a component to adjust or
    compensate for the errors in prediction.
  • Having fully described the linear model, the rest
    of the semester (as well as several more) will be
    spent of the error.

13
The Nature of Least Squares Estimation
  • There is 1 essential goal and there are 4
    important concerns with any OLS Model

14
The 'Goal' of Ordinary Least Squares
  • Ordinary Least Squares (OLS) is a method of
    finding the linear model which minimizes the sum
    of the squared errors.
  • Such a model provides the best explanation/predict
    ion of the data.

15
Why Least Squared error?
  • Why not simply minimum error?
  • The errors about the line sum to 0.0!
  • Minimum absolute deviation (error) models now
    exist, but they are mathematically cumbersome.
  • Try algebra with Absolute Value signs!

16
Other models are possible...
  • Best parabola...?
  • (i.e. nonlinear or curvilinear relationships)
  • Best maximum likelihood model ... ?
  • Best expert system...?
  • Complex Systems?
  • Chaos/Non-linear systems models
  • Catastrophe models
  • others

17
The Simple Linear Virtue
  • I think we over emphasize the linear model.
  • It does, however, embody this rather important
    notion that Y is proportional to X.
  • As noted, we can state such relationships in
    simple English.
  • As unemployment increases, so does the crime
    rate.
  • As domestic conflict increased, national leaders
    will seek to distract their populations by
    initiating foreign disputes.

18
The Notion of Linear Change
  • The linear aspect means that the same amount of
    increase in unemployment will have the same
    effect on crime at both low and high
    unemployment.
  • A nonlinear change would mean that as
    unemployment increases, its impact upon the crime
    rate might increase at higher unemployment levels.

19
Why squared error?
  • Because
  • (1) the sum of the errors expressed as deviations
    would be zero as it is with standard deviations,
    and
  • (2) some feel that big errors should be more
    influential than small errors.
  • Therefore, we wish to find the values of a and b
    that produce the smallest sum of squared errors.

20
Minimizing the Sum of Squared Errors
  • Who put the Least in OLS
  • In mathematical jargon we seek to minimize the
    Unexplained Sum of Squares (USS), where

21
The Parameter estimates
  • In order to do this, we must find parameter
    estimates which accomplish this minimization.
  • In calculus, if you wish to know when a function
    is at its minimum, you take the first
    derivative.
  • In this case we must take partial derivatives
    since we have two parameters (a b) to worry
    about.
  • We will look closer at this and its not a pretty
    sight!

22
Why squared error?
  • Because
  • (1) the sum of the errors expressed as
    deviations would be zero as it is with standard
    deviations, and
  • (2) some feel that big errors should be more
    influential than small errors.
  • Therefore, we wish to find the values of a and b
    that produce the smallest sum of squared errors.

23
Decomposition of the error in LS
24
Goodness of Fit
  • Since we are interested in how well the model
    performs at reducing error, we need to develop a
    means of assessing that error reduction. Since
    the mean of the dependent variable represents a
    good benchmark for comparing predictions, we
    calculate the improvement in the prediction of Yi
    relative to the mean of Y (the best guess of Y
    with no other information).

25
Sum of Squares Terminology
  • In mathematical jargon we seek to minimize the
    Unexplained Sum of Squares (USS), where

26
Sums of Squares
  • This gives us the following 'sum-of-squares'
    measures
  • Total Variation Explained Variation
    Unexplained Variation

27
Sums of Squares Confusion
  • Note Occasionally you will run across ESS and
    RSS which generate confusion since they can be
    used interchangeably. ESS can be error
    sums-of-squares or estimated or explained SSQ.
    Likewise RSS can be residual SSQ or regression
    SSQ. Hence the use of USS for Unexplained SSQ in
    this treatment.

28
The Parameter estimates
  • In order to do this, we must find parameter
    estimates which accomplish this minimization.
  • In calculus, if you wish to know when a function
    is at its minimum, you take the first derivative.
  • In this case we must take partial derivatives
    since we have two parameters to worry about.

29
Deriving the Parameter Estimates
  • Since
  • We can take the partial derivative with respect
    to a and b

30
Deriving the Parameter Estimates (cont.)
  • Which simplifies to
  • We also set these derivatives to 0 to indicate
    that we are at a minimum.

31
Deriving the Parameter Estimates (cont.)
  • We now add a hat to the parameters to indicate
    that the results are estimators and .
  • We also Set these derivatives equal to zero.

32
Deriving the Parameter Estimates (cont.)
  • Dividing through by -2 and rearranging terms, we
    get

33
Deriving the Parameter Estimates (cont.)
  • We can solve these equations simultaneously to
    get our estimators.

34
Deriving the Parameter Estimates (cont.)
  • The estimator for a which shows that the
    regression line always goes through the point
    which is the intersection of the two means.
  • This formula is quite manageable for bivariate
    regression. If there are two or more independent
    variables, the formula for b2, etc. becomes
    unmanageable!

35
Tests of Inference
  • t-tests for coefficients
  • F-test for entire model

36
T-Tests
  • Since we wish to make probability statements
    about our model, we must do tests of inference.
  • Fortunately,

37
This gives us the F test
38
Measures of Goodness of fit
  • The Correlation coefficient
  • r-squared

39
The correlation coefficient
  • A measure of how close the residuals are to the
    regression line
  • It ranges between -1.0 and 1.0
  • It is closely related to the slope.

40
R2 (r-square)
  • The r2 (or R-square) is also called the
    coefficient of determination.

41
Tests of Inference
  • t-tests for coefficients
  • F-test for entire model
  • Since we are interested in how well the model
    performs at reducing error, we need to develop a
    means of assessing that error reduction. Since
    the mean of the dependent variable represents a
    good benchmark for comparing predictions, we
    calculate the improvement in the prediction of Yi
    relative to the mean of Y (the best guess of Y
    with no other information).

42
Goodness of fit
  • The correlation coefficient
  • A measure of how close the residuals are to the
    regression lineIt ranges between -1.0 and 1.0
  • r2 (r-square)
  • The r-square (or R-square) is also called the
    coefficient of determination

43
The assumptions of the model
  • We will spend the next 4 weeks on this!

44
The Multiple Regression Model The Scalar Version
  • The basic multiple regression model is a simple
    extension of the bivariate equation. by adding
    extra independent variables, we are creating a
    multiple-dimensioned space, where the model fit
    is a some appropriate space. For instance, if
    there are two independent variables, we are
    fitting the points to a plane in space.

45
The Scalar Equation
  • The basic linear model

46
The Matrix Model
  • The multiple regression model may be easily
    represented in matrix terms.
  • Where the Y, X, B and e are all matrices of data,
    coefficients, or residuals

47
The Matrix Model (cont.)
  • The matrices in are
    represented by
  • Note that we postmultiply X by B since this order
    makes them conformable.

48
Assumptions of the modelScalar Version
  • The OLS model has seven fundamental assumptions.
    These assumptions form the foundation for all
    regression analysis. Failure of a model to
    conform to these assumptions frequently presents
    severe problems for estimation and inference.

49
The Assumptions of the ModelScalar Version
(cont.)
  • 1. The ei's are normally distributed.
  • 2. E(ei) 0
  • 3. E(ei2) ?2
  • 4. E(eiej) 0 (i?j)
  • 5. X's are nonstochastic with values fixed in
    repeated samples and ?(Xik-Xbark)2/n is a finite
    nonzero number.
  • 6. The number of observations is greater than the
    number of coefficients estimated.
  • 7. No exact linear relationship exists between
    any of the explanatory variables.

50
The Assumptions of the ModelThe English Version
  • The errors have a normal distribution.
  • The residuals are heteroskedastic.
  • There is no serial correlation.
  • There is no multicollinearity.
  • The Xs are fixed. (non-stochastic)
  • There are more data points than unknowns.
  • The model is linear.
  • OKso its not really English.

51
The Assumptions of the Model The Matrix Version
  • These same assumptions expressed in matrix format
    are
  • 1. e ? N(0,?)
  • 2. ? ?2I
  • 3. The elements of X are fixed in repeated
    samples and (1/ n)X'X is nonsingular and its
    elements are finite

52
Extra Material on OLS The Adjusted R2
  • Since R2 always increases with the addition of a
    new variable, the adjusted R2 compensates for
    added explanatory variables.

53
Extra Material on OLS The F-test
  • In addition, the F test for the entire model must
    be adjusted to compensate for the changed degrees
    of freedom.
  • Note that F increases as n or R2 increases and
    decreases as k increasesAdding a variable will
    always increase R2, but not necessarily adjusted
    R2 or F. In addition values of adjusted R2 below
    0.0 are possible.

54
Derivation of B's in matrix notation
  • Skip this material in PS 401
  • Given the matrix algebra model
  • 1.33
  • we can replicate the least squares normal
    equations in matrix format.We need to minimize
    ee, which is the sum of squared errors.1.34
  • Setting the derivative equal to 0 we get1.35
    1.36 1.37 1.38
  • Note that XX is called the sums-of-squares and
    cross-products matrix.

55
Properties of Estimators (?)
  • Since we are concerned with error, we will be
    concerned with those properties of estimators
    which have to do with the errors produced by the
    estimates - the ?s

56
Types of estimator error
  • Estimators are seldom exactly correct due to any
    number of reasons, most notably sampling error
    and biased selection. There are several important
    concepts that we need to understand in examining
    how well estimators do their job.

57
Sampling error
  • Sampling error is simply the difference between
    the true value of a parameter and its estimate in
    any given sample.
  • This sampling error means that an estimator will
    vary from sample to sample and therefore
    estimators have variance.

58
Bias
  • The bias of an estimate is the difference between
    its expected value and its true value.
  • If the estimator is always low (or high) then the
    estimator is biased.
  • An estimator is unbiased if
  • And

59
Mean Squared Error
  • The mean square error (MSE) is different from the
    estimators variance in that the variance
    measures dispersion about the estimated parameter
    while mean squared error measures the dispersion
    about the true parameter.
  • If the estimator is unbiased then the variance
    and MSE are the same.

60
Mean Squared Error (cont.)
  • The MSE is important for time series and
    forecasting since it allows for both bias and
    efficiency
  • For instance
  • These concepts lead us to look at the properties
    of estimators. Estimators may behave differently
    in large and small samples, so we look at both
    the small and large (asymptotic) sample
    properties.

61
Small Sample Properties
  • These are the ideal properties. We desire these
    to hold.
  • Bias
  • Efficiency
  • Best Linear Unbiased Estimator

62
Bias
  • A parameter is unbiased if
  • In other words, the average value of the
    estimator in repreated sampling equals the true
    parameter.
  • Note that whether an estimator is biased or not
    implies nothing about its dispersion.

63
Efficiency
  • An estimator is efficient if it is unbiased and
    where its variance is less than any other
    unbiased estimator of the parameter.
  • Is unbiased
  • Var( ) ? Var ( )
    where
    is any other unbiased estimator of
  • There might be instances in which we might choose
    a biased estimator, if it has a smaller variance.

64
BLUE (Best Linear Unbiased Estimate)
  • An estimator is described as a BLUE estimator if
    it is
  • is a linear function
  • is unbiased
  • Var( ) ? Var ( )
    where
    is any other linear unbiased estimator of

65
What is a linear estimator?
  • Note that the sample mean is an example of a
    linear estimator.

66
Asymptotic (Large Sample) Properties
  • Asymptotically unbiased
  • Consistency
  • Asymptotic efficiency

67
Asymptotic bias
  • An estimator is unbiased if

68
Consistency
  • The point at which a distribution collapses is
    called the probability limit (plim)If the bias
    and variance both decrease as gets larger, the
    estimator is consistent.

69
Asymptotic efficiency
  • An estimator is asymptotically efficient if
  • asymptotic distribution with finite mean and
    variance
  • is consistent
  • no other estimator has smaller asymptotic
    variance

70
Rifle and Target Analogy
  • Small sample properties
  • Bias The shots cluster around some spot other
    than the bulls-eye)
  • Efficient When one rifles cluster is smaller
    than anothers.
  • BLUE - Smallest scatter for rifles of a
    particular type of simple construction

71
Rifle and Target Analogy (cont.)
  • Asymptotic properties
  • Think of increased sample size as getting closer
    to the target. When all of the assumptions of
    the OLS model hold its estimators are
  • unbiased
  • Minimum variance, and
  • BLUE

72
Assumption Violations How we will approach the
question.
  • Definition
  • Implications
  • Causes
  • Tests
  • Remedies

73
Non-zero Mean for the residuals (Definition)
  • Definition
  • The residuals have a mean other than 0.0.
  • Note that this refers to the true residuals.
    Hence the estimated residuals have a mean of 0.0,
    while the true residuals are non-zero.

74
Non-zero Mean for the residuals (Implications)
  • The true regression line is
  • Therefore the intercept is biased.
  • The slope, b, is unbiased. There ia also no way
    of separating out a and ?.

75
Non-zero Mean for the residuals (Causes, Tests,
Remedies)
  • Causes Non-zero means result from some form of
    specification error. Something has been omitted
    from the model which accounts for that mean in
    the estimation.
  • We will discuss Tests and Remedies when we look
    closely at Specification errors.

76
Non-normally distributed errors Definition
  • The residuals are not NID(0,?)

Normality Tests Section Assumption Value Probabili
ty Decision(5) Skewness 5.1766 0.000000 Rejected
Kurtosis 4.6390 0.000004 Rejected Omnibus 48.3172
0.000000 Rejected
77
Non-normally distributed errors Implications
  • The existence of residuals which are not normally
    distributed has several implications.
  • First is that it implies that the model is to
    some degree misspecified.
  • A collection of truly stochastic disturbances
    should have a normal distribution. The central
    limit theorem states that as the number of random
    variables increases, the sum of their
    distributions tends to be a normal distribution.

78
Non-normally distributed errors Implications
(cont.)
  • If the residuals are not normally distributed,
    then the estimators of a and b are also not
    normally distributed.
  • Estimates are, however, still BLUE.
  • Estimates are unbiased and have minimum variance.
  • They are no longer efficient, even though they
    are asymptotically unbiased and consistent.
  • It is only our hypothesis tests which are suspect.

79
Non-normally distributed errors Causes
  • Generally causes by a misspecification error.
  • Usually an omitted variable.
  • Can also result from
  • Outliers in data.
  • Wrong functional form.

80
Non-normally distributed errors Tests for
non-normality
  • Chi-Square goodness of fit
  • Since the cumulative normal frequency
    distribution has a chi-square distribution, we
    can test for the normality of the error terms
    using a standard chi-square statistic.
  • We take our residuals, group them, and count how
    many occur in each group, along with how many we
    would expect in each group.

81
Non-normally distributed errors Tests for
non-normality (cont.)
  • We then calculate the simple ?2 statistic.
  • This statistic has (N-1) degrees of freedom,
    where N is the number of classes.

82
Non-normally distributed errors Tests for
non-normality (cont.)
  • Jarque-Bera test
  • This test examines both the skewness and kurtosis
    of a distribution to test for normality.
  • Where S is the skewness and K is the kurtosis of
    the residuals.
  • JB has a ?2 distribution with 2 df.

83
Non-normally distributed errors Remedies
  • Try to modify your theory. Omitted variable?
    Outlier needing specification?
  • Modify your functional form by taking some
    variance transforming step such as square root,
    exponentiation, logs, etc.
  • Be mindful that you are changing the nature of
    the model.
  • Bootstrap it!

84
Multicollinearity Definition
  • Multicollinearity is the condition where the
    independent variables are related to each other.
    Causation is not implied by multicollinearity.
  • As any two (or more) variables become more and
    more closely correlated, the condition worsens,
    and approaches singularity.
  • Since the X's are supposed to be fixed, this a
    sample problem.
  • Since multicollinearity is almost always present,
    it is a problem of degree, not merely existence.

85
Multicollinearity Implications
  • Consider the following cases
  • A) No multicollinearity
  • The regression would appear to be identical to
    separate bivariate regressionsThis produces
    variances which are biased upward (too large)
    making t-tests too small.For multiple regression
    this satisfies the assumption.
  • B) Perfect Multicollinearity
  • Some variable Xi is a perfect linear combination
    of one or more other variables Xj, therefore X'X
    is singular, and X'X 0.
  • A model cannot be estimated under such
    circumstances. The computer dies.
  • C. A high degree of Multicollinearity
  • When the independent variables are highly
    correlated the variances and covariances of the
    Bi's are inflated (t ratio's are lower) and R2
    tends to be high as well.
  • The B's are unbiased (but perhaps useless due to
    their imprecise measurement as a result of their
    variances being too large). In fact there are
    still BLUE.
  • OLS estimates tend to be sensitive to small
    changes in the data.
  • Relevant variables may be discarded

86
Multicollinearity Implications
  • Consider the following cases
  • A) No multicollinearity
  • The regression would appear to be identical to
    separate bivariate regressions
  • This produces variances which are biased upward
    (too large) making t-tests too small.
  • For multiple regression this satisfies the
    assumption.

87
Multicollinearity Implications (cont.)
  • B) Perfect Multicollinearity
  • Some variable Xi is a perfect linear combination
    of one or more other variables Xj, therefore X'X
    is singular, and X'X 0.
  • This is matrix algebra notation. It means that
    one variable is a perfect linear function of
    another. (e.g. X2 X13.2)
  • A model cannot be estimated under such
    circumstances. The computer dies.

88
Multicollinearity Implications (cont.)
  • C. A high degree of Multicollinearity
  • When the independent variables are highly
    correlated the variances and covariances of the
    Bi's are inflated (t ratio's are lower) and R2
    tends to be high as well.
  • The B's are unbiased (but perhaps useless due to
    their imprecise measurement as a result of their
    variances being too large). In fact they are
    still BLUE.
  • OLS estimates tend to be sensitive to small
    changes in the data.
  • Relevant variables may be discarded.

89
Multicollinearity Causes
  • Sampling mechanism.Poorly constructed design
    measurement scheme or limited range.
  • Statistical model specification adding
    polynomial terms or trend indicators.
  • Too many variables in the model - the model is
    overdetermined.
  • Theoretical specification is wrong. Inappropriate
    construction of theory or even measurement

90
Multicollinearity Tests/Indicators
  • X'X approaches 0
  • Since the determinant is a function of variable
    scale, this measure doesn't help a whole lot. We
    could, however, use the determinant of the
    correlation matrix and therefore bound the range
    from 0. to 1.0

91
Multicollinearity Tests/Indicators (cont.)
  • Tolerance
  • If the tolerance equals 1, the variables are
    unrelated. If TOLj 0, then they are perfectly
    correlated.
  • Variance Inflation Factors (VIFs)
  • Tolerance

92
Interpreting VIFs
  • No multicollinearity produces VIFs 1.0
  • If the VIF is greater than 10.0, then
    multicollinearity is probably severe. 90 of the
    variance of Xj is explained by the other Xs.
  • In small samples, a VIF of about 5.0 may indicate
    problems

93
Multicollinearity Tests/Indicators (cont.)
  • R2 deletes - tries all possible models of X's and
    by includes/ excludes based on small changes in
    R2 with the inclusion/omission of the variables
    (taken 1 at a time)
  • F is significant, But no t value is.
  • Adjusted R2 declines with a new variable
  • Multicollinearity is of concern when either

94
Multicollinearity Tests/Indicators (cont.)
  • I would avoid the rule of thumb
  • Beta's are gt 1.0 or lt -1.0
  • Sign changes occur with the introduction of a new
    variable
  • The R2 is high, but few t-ratios are.
  • Eigenvalues and Condition Index - If this topic
    is beyond Gujarati, its beyond me.

95
Multicollinearity Remedies
  • Increase sample size
  • Omit Variables
  • Scale Construction/Transformation
  • Factor Analysis
  • Constrain the estimation. Such as the case where
    you can set the value of one coefficient relative
    to another.

96
Multicollinearity Remedies (cont.)
  • Change design (LISREL maybe or Pooled
    cross-sectional Time series)
  • Ridge Regression
  • This technique introduces a small amount of bias
    into the coefficients to reduce their variance.
  • Ignore it - report adjusted r2 and claim it
    warrants retention in the model.

97
Heteroskedasticity Definition
  • Heteroskedasticity is a problem where the error
    terms do not have a constant variance.
  • That is, they may have a larger variance when
    values of some Xi (or the Yis themselves) are
    large (or small).

98
Heteroskedasticity Definition
  • This often gives the plots of the residuals by
    the dependent variable or appropriate independent
    variables a characteristic fan or funnel shape.

99
Heteroskedasticity Implications
  • The regression B's are unbiased.
  • But they are no longer the best estimator. They
    are not BLUE (not minimum variance - hence not
    efficient).
  • They are, however, consistent.

100
Heteroskedasticity Implications (cont.)
  • The estimator variances are not asymptotically
    efficient, and they are biased.
  • So confidence intervals are invalid.
  • What do we know about the bias of the variance?
  • If Yi is positively correlated with ei, bias is
    negative - (hence t values will be too large.)
  • With positive bias many t's too small.

101
Heteroskedasticity Implications (cont.)
  • Types of Heteroskedasticity
  • There are a number of types of heteroskedasticity.
  • Additive
  • Multiplicative
  • ARCH (Autoregressive conditional heteroskedastic)
    - a time series problem.

102
Heteroskedasticity Causes
  • It may be caused by
  • Model misspecification - omitted variable or
    improper functional form.
  • Learning behaviors across time
  • Changes in data collection or definitions.
  • Outliers or breakdown in model.
  • Frequently observed in cross sectional data sets
    where demographics are involved (population, GNP,
    etc).

103
Heteroskedasticity Tests
  • Informal Methods
  • Graph the data and look for patterns!

104
Heteroskedasticity Tests (cont.)
  • Park test
  • As an exploratory test, log the residuals and
    regress them on the logged values of the
    suspected independent variable.
  • If the B is significant, then heteroskedasticity
    may be a problem.

105
Heteroskedasticity Tests (cont.)
  • Glejser Test
  • This test is quite similar to the park test,
    except that it uses the absolute values of the
    residuals, and a variety of transformed Xs.
  • A significant B2 indicated Heteroskedasticity.
  • Easy test, but has problems.

106
Heteroskedasticity Tests (cont.)
  • Goldfeld-Quandt test
  • Order the n cases by the X that you think is
    correlated with ei2.
  • Drop a section of c cases out of the
    middle(one-fifth is a reasonable number).
  • Run separate regressions on both upper and lower
    samples.

107
Heteroskedasticity Tests (cont.)
  • Goldfeld-Quandt test (cont.)
  • Do F-test for difference in error variancesF has
    (n - c - 2k)/2 degrees of freedom for each

108
Heteroskedasticity Tests (cont.)
  • Breusch-Pagan-Godfrey Test (Lagrangian Multiplier
    test)
  • Estimate model with OLS
  • Obtain
  • Construct variables

109
Heteroskedasticity Tests (cont.)
  • Breusch-Pagan-Godfrey Test (cont.)
  • Regress pi on the X (and other?!) variables
  • Calculate
  • Note that

110
Heteroskedasticity Tests (cont.)
  • Whites Generalized Heteroskedasticity test
  • Estimate model with OLS and obtain residuals
  • Run the following auxiliary regression
  • Higher powers may also be used, along with more
    Xs

111
Heteroskedasticity Tests (cont.)
  • Whites Generalized Heteroskedasticity test
    (cont.)
  • Note that
  • The degrees of freedom is the number of
    coefficients estimated above.

112
Heteroskedasticity Remedies
  • GLS
  • We will cover this after autocorrelation
  • Weighted Least Squares
  • si2 is a consistent estimator of si2
  • use same formula (BLUE) to get a ß

113
  • Iteratively weighted least squares (IWLS)
  • Uses BLUE
  • The Variance equals
  • Obtain estimates of and using OLS
  • Use these to get "1st round" estimates of si2
  • Using formula above replace wi with 1/ si2 and
    obtain new estimates for a and ß.
  • Use these to re-estimate
  • Repeat Step 2 until a and ß converge.

114
Heteroskedasticity Remedies (cont.)
  • Whitess corrected standard errors
  • Discussion beyond this course
  • Some software will calculate these.
  • (SHAZAM,TSP)

115
Autocorrelation Definition
  • Autocorrelation is simply the presence of
    standard correlation between adjacent residuals.
  • If a residual is negative (positive) then its
    neighbors tend to also be negative (positive).
  • Most often autocorrelation is between adjacent
    observations, however, lagged or seasonal
    patterns can also occur.
  • Autocorrelation is also usually a function of
    order by time, but it can occur for other orders
    as well.

116
Autocorrelation Definition (cont.)
  • The assumption violated is
  • Meaning that the Pearsons r between the
    residuals from OLS and the same residuals lagged
    on period is non-zero.

117
Autocorrelation Definition (cont.)
  • Most autocorrelation is what we call 1st order
    autocorrelation, meaning that the residuals are
    related to their contiguous values
  • For instance

118
Autocorrelation Definition (cont.)
  • Types of Autocorrelation
  • Autoregressive processes
  • Moving Averages

119
Autocorrelation Definition (cont.)
  • Autoregressive processes AR(p)
  • The residuals are related to their preceding
    values.
  • This is classic 1st order autocorrelation

120
Autocorrelation Definition (cont.)
  • Autoregressive processes (cont.)
  • In 2nd order autocorrelation the residuals are
    related to their t-2 values as well
  • Larger order processes may occur as well

121
Autocorrelation Definition (cont.)
  • Moving Average Processes MA(q)
  • The error term is a function of some random error
    plus a portion of the previous random error.

122
Autocorrelation Definition (cont.)
  • Moving Average Processes (cont.
  • Higher order processes for MA(q) also exist.
  • The error term is a function of some random error
    plus a portion of the previous random error.

123
Autocorrelation Definition (cont.)
  • Mixed processes ARMA(p,q)
  • The error term is a complex function of both
    autoregressive and moving average processes.

124
Autocorrelation Definition (cont.)
  • There are substantive interpretations that can be
    placed on these processes.
  • AR processes represent shocks to systems that
    have long-term memory.
  • MA processes are quick shocks that to systems
    that handle the process, but have only short term
    memory.

125
Autocorrelation Implications
  • Coefficient estimates are unbiased, but the
    estimates are not BLUE
  • The variances are often greatly underestimated
    (biased small)
  • Hence hypothesis tests are exceptionally suspect.

126
Autocorrelation Causes
  • Specification error
  • Omitted variable i.e inflation
  • Wrong functional form
  • Lagged effects
  • Data Transformations
  • Interpolation of missing data
  • differencing

127
Autocorrelation Tests
  • Observation of residuals
  • Graph/plot them!
  • Runs of signs
  • Geary test

128
Autocorrelation Tests (cont.)
  • Durbin-Watson d
  • Criteria for hypothesis of AC
  • Reject if d lt dL
  • Do not reject if d gt dU
  • Test is inconclusive if dL ? d ? dU.

129
Autocorrelation Tests (cont.)
  • Durbin-Watson d (cont.)
  • Note that the d is symmetric about 2.0, so that
    negative autocorrelation will be indicated by a d
    gt 2.0.
  • Use the same distances above 2.0 as upper and
    lower bounds.

130
Autocorrelation Tests (cont.)
  • Durbins h
  • Cannot use DW d if there is a lagged endogenous
    variable in the model
  • sc2 is the estimated variance of the Yt-1 term
  • h has a standard normal distribution

131
Autocorrelation Tests (cont.)
  • Tests for higher order autocorreltaion
  • Ljung-Box Q (?2 statistic)
  • Portmanteau test
  • Breusch-Godfrey

132
Autocorrelation Remedies
  • Generalized Least Squares
  • Later!
  • First difference method
  • Take 1st differences of your Xs and Y
  • Regress ?Y on ?X
  • Assumes that ? 1!
  • Generalized differences
  • Requires that ? be known.

133
Autocorrelation Remedies
  • Cochran-Orcutt method
  • (1) Estimate model using OLS and obtain the
    residuals, ut.
  • (2) Using the residuals run the following
    regression.

134
Autocorrelation Remedies (cont.)
  • Cochran-Orcutt method (cont.)
  • (3) using the p obtained, perform the regression
    on the generalized differences
  • (4) Substitute the values of B1 and B2 into the
    original regression to obtain new estimates of
    the residuals.
  • (5) Return to step 2 and repeat until p no
    longer changes.

135
Model Specification Definition
  • The analyst should understand one fundamental
    truth about statistical models. They are all
    misspecified.
  • We exist in a world of incomplete information at
    best. Hence model misspecification is an
    ever-present danger. We do, however, need to come
    to terms with the problems associated with
    misspecification so we can develop a feeling for
    the quality of information, description, and
    prediction produced by our models.

136
Model Specification Definition (cont.)
  • There are basically 4 types of misspecification
    we need to examine
  • functional form
  • inclusion of an irrelevant variable
  • exclusion of a relevant variable
  • measurement error and misspecified error term

137
Model Specification Implications
  • If an omitted variable is correlated with the
    included variables, the estimates are biased as
    well as inconsistent.
  • In addition, the error variance is incorrect, and
    usually overestimated.
  • If the omitted variable is uncorrelated tot the
    included variables, the errors are still biased,
    even though the Bs are not.

138
Model Specification Implications
  • Incorrect functional form can result in
    autocorrelation or heteroskedasticity.
  • See these sections for the implications of each
    problem.

139
Model Specification Causes
  • This one is easy - theoretical design.
  • something is omitted, irrelevantly included,
    mismeasured or non-linear.
  • This problem is explicitly theoretical.

140
Model Specification Tests
  • Actual Specification Tests
  • No test can reveal poor theoretical construction
    per se.
  • The best indicator that your model is
    misspecified is the discovery that the model has
    some undesirable statistical property e.g a
    misspecified functional form will often be
    indicated by a significant test for
    autocorrelation.
  • Sometimes time-series models will have negative
    autocorrelation as a result of poor design.

141
Model Specification Tests
  • Specification Criteria for lagged designs
  • Most useful for comparing time series models with
    same set of variables, but differing number of
    parameters

142
Model Specification Tests (cont)
  • Schwartz Criterion
  • where ?2 equals RSS/n, m is the number of Lags
    (variables), and n is the number of observations
  • Note that this is designed for time series.

143
Model Specification Tests (cont)
  • AIC (Akaike Information Criterion)
  • Both of these criteria (AIC and Schwartz) are to
    be minimized for improved model specification.
    Note that they both have a lower bound which is a
    function of sample size and number of parameters.

144
Model Specification Remedies
  • Model Building
  • A. "Theory Trimming" (Pedhauzer 616)
  • B. Hendry and the LSE school of top-down
    modeling.
  • C. Nested Models
  • D. Stepwise Regression.
  • Stepwise regression is a process of including the
    variables in the model one step at a time.
    This is a highly controversial technique.

145
Model Specification Remedies (cont.) Stepwise
Regression
  • Twelve things someone else says are wrong with
    stepwise
  • Philosophical Problems
  • 1. Completely atheoretical
  • 2. Subject to spurious correlation
  • 3. Information tossed out - insignificant
    variables may be useful
  • 4. Computer replacing the scientist
  • 5. Utterly mechanistic

146
Model Specification Remedies (cont.) Stepwise
Regression
  • Statistical
  • 6. Population model from sample data
  • 7. Large N - statistical significance can be an
    artifact
  • 8. Inflates the alpha level
  • 9. The scientist becomes the beholden to the
    significant tests
  • 10. Overestimates the effect of the variables
    added early, and underestimates the variables
    added later
  • 11. Prevents data exploration
  • 12. Not even least squares for stagewise

147
Model Specification Remedies (cont.) Stepwise
Regression
  • Twelve Responses
  • Selection of the data selected for the procedure
    implies some minimal level of theorization
  • All analysis is subject to spurious correlation.
    If you think it might be spurious, - omit it.
  • True - but this can happen anytime
  • All the better
  • If it "works", is this bad? We use statistical
    decision rules in a mechanistic manner

148
Model Specification Remedies (cont.) Stepwise
Regression
  • this is true of regular regression as well
  • This is true of regular regression as well
  • No
  • No more than OLS
  • Not true
  • Also not true - this is a data exploration
    technique
  • Huh? Antiquated view of stepwise...probably not
    accurate in last 20 years

149
Measurement Error
  • Not much to say yet.Iif the measurement error is
    random, estimates are unbiased, but results are
    weakerIf biased measurement, results are biased.

150
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com