Multiple Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Multiple Regression

Description:

Multiple Regression From last time There were questions about the bowed shape of the confidence limits around the regression line, both for limits around the mean and ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 14
Provided by: balise
Learn more at: http://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression


1
Multiple Regression
2
From last time
  • There were questions about the bowed shape of the
    confidence limits around the regression line,
    both for limits around the mean and the
    individuals should be curved because they are
    estimates.
  • In theory if you knew the population values you
    would not need to bow CLs around the individuals.
    You just shift the density around following the
    regession line.

3
CLs
  • You do not have the same percision across the
    entire range.
  • The CLs around the mean are based on
  • The CLs for a new person has the distance from
    the mean of x in the formula as well.

Overall estimated variance in y
4
Multiple Regression
  • You have seen that you can include polynomials in
    a regression model. What about including
    entirely different predictors? Say you want to
    predict the blood pressure of daft scientists
    before they give talks at an international
    conference. You could predict with many single
    predictors or as a set
  • age
  • the size of the audience
  • number of hawt potential evil lackeys in the
    front row

5
Explaining variance
Total variance in blood pressure
Audience size
Lackey Quality
Age
6
Explained Multivariate Variance
  • The total variance explained depends on how
    correlated the predictors are.
  • You want to have a global R2 to indicate the
    amount of the variance explained by the model and
    also measures of the contributions of the
    predictors.

7
Multicolinearity
  • Even though audience size and lackey quality both
    allow for you to predict the heart rate of the
    mad scientists, the amount of variance that is
    uniquely associated with the lackey variable is
    very little. When you have very correlated
    predictors, they cant uniquely explain variance.
  • You can end up with a model that is statistically
    significant with a big R2 but none of the
    individual predictors is statistically
    significant.

8
Stop it before it starts
  • Before you do a multiple regression use subject
    matter knowledge to remove highly correlated
    predictors. Look at the bivaraite correlation
    coefficients between the predictors. If you have
    highly correlated variables use subject matter
    and pragmatism to decide which varible to put in
    the model.

9
Partitioning Variance
  • What is the unique contribution of each variable?
  • There are different formulas for adding up the
    sum of squares SS (in the variance).
  • Sequential (aka type 1 SS) lets the first
    variable explain as much variance as it can then
    add in the second and see if it can explain any
    of the remaining variance.
  • Simultaneous (aka type 2 SS) put the first all
    the variables in at the same time and let them
    divvy up the common variance.
  • Simultaneous with interactions (aka type 3 SS)
    have the variable try to explain all they can and
    consider they are used interactions.
  • Type 4 SS makes my head hurt.

10
SAS vs. R
  • S-Plus and SAS use the same formulas for SS but R
    does not use the same formula for Type 3 SS (and
    I have never tested it on Type 4 SS).

11
Partial and Semipartial Correlations
  • If you want to look at the correlation between a
    predictor (a) and an outcome (z) controlling for
    the impact of a second predictor (b) you can do a
    partial correlation to remove the impact b on
    both.
  • You can also remove the correlation between b and
    a only you can do a semi-partial correlation.

12
Hierarchical Stepwise Regression
  • If you take 2nd and 3rd quarter statistics you
    will learn the details on how to compare two
    models. Hierarchical stepwise regression is the
    process of figuring out what variables matter (in
    advance) and adding them to a model to see if you
    improve the quality of the model as you add them.
  • People frequently look at the R2 for the models
    and/or use AIC.
  • This is a fine thing to do so long as you keep
    track of the comparison you make and report it.

13
Automatic Stepwise Regression
  • These are BAD BAD BAD.
  • You feed the software a set of variables and tell
    it put them into the model one at a time to find
    the predictor that explains the most outcome
    variance. Once that is put into the model, add
    all the remaining ones one at a time to see if
    the residual variance is reduced with the second
    variable. Repeat over and over. Some of the
    methods subtract variables instead of adding
    (others do both).
  • The Type 1 error is astronomical.
  • These methods have horrible properties. Adding
    in completely random variables affects the model.
Write a Comment
User Comments (0)
About PowerShow.com