Regression: 2 Multiple Linear Regression and Path Analysis - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Regression: 2 Multiple Linear Regression and Path Analysis

Description:

3. Relationship between Y and X controlling for C. 4. Which of X's are most ... they are collinear. the regression problem is indeterminate. X(3)=5.X(2) 16, or ... – PowerPoint PPT presentation

Number of Views:1128
Avg rating:3.0/5.0
Slides: 33
Provided by: HalWhi9
Category:

less

Transcript and Presenter's Notes

Title: Regression: 2 Multiple Linear Regression and Path Analysis


1
Regression(2) Multiple Linear Regression and
Path Analysis
  • Hal Whitehead
  • BIOL4062/5062

2
Multiple Linear Regression and Path Analysis
  • Multiple linear regression
  • assumptions
  • parameter estimation
  • hypothesis tests
  • selecting independent variables
  • collinearity
  • polynomial regression
  • Path analysis

3
Regression
  • One Dependent Variable Y
  • Independent Variables X1,X2,X3,...

4
Purposes of Regression
  • 1. Relationship between Y and X's
  • 2. Quantitative prediction of Y
  • 3. Relationship between Y and X controlling for
    C
  • 4. Which of X's are most important?
  • 5. Best mathematical model
  • 6. Compare regression relationships Y1 on X,
    Y2 on X
  • 7. Assess interactive effects of X's

5
  • Simple regression one X
  • Multiple regression two or more X's
  • Y ß0 ß1 X(1) ß2 X(2) ß3 X(3) ... ßk
    X(k) E

6
Multiple linear regressionassumptions (1)
  • For any specific combination of X's, Y is a
    (univariate) random variable with a certain
    probability distribution having finite mean and
    variance (Existence)
  • Y values are statistically independent of one
    another (Independence)
  • Mean value of Y given the X's is a straight
    linear function of the X's (Linearity)

7
Multiple linear regressionassumptions (2)
  • The variance of Y is the same for any fixed
    combinations of X's (Homoscedasticity)
  • For any fixed combination of X's, Y has a normal
    distribution (Normality)
  • There are no measurement errors in the X's (Xs
    measured without error)

8
Multiple linear regressionparameter estimation
  • Y ß0 ß1 X(1) ß2 X(2) ß3 X(3) ... ßk
    X(k) E
  • Estimate the ß's in multiple regression using
    least squares
  • Sizes of the coefficients not good indicators of
    importance of X variables
  • Number of data points in multiple regression
  • at least one more than number of Xs
  • preferably 5 times number of Xs

9
Why do Large Animals have Large
Brains?(Schoenemann Brain Behav. Evol. 2004)
N39
  • Multiple regression of Y Log (CNS) on
  • X s ß SE(ß)
  • Log(Mass) -0.49 (0.70)
  • Log(Fat) -0.07 (0.10)
  • Log(Muscle) 1.03 (0.54)
  • Log(Heart) 0.42 (0.22)
  • Log(Bone) -0.07 (0.30)

10
Multiple linear regressionhypothesis tests
  • Usually test
  • H0 Y ß0 ß1X(1) ß2X(2) ... ßjX(j)
    E
  • H1 Y ß0 ß1X(1) ß2X(2) ... ßjX(j)
    ... ßkX(k) E
  • F-test with k-j, n-(k-j)-1 degrees of freedom
    (partial F-test)
  • H0 variables X(j1),,X(k) do not help explain
    variability in Y

11
Multiple linear regressionhypothesis tests
  • e.g. Test significance of overall multiple
    regression
  • H0 Y ß0 E
  • H1 Y ß0 ß1X(1) ß2X(2) ... ßkX(k)
    E
  • Test significance of
  • adding independent variable
  • deleting independent variable

12
Why do Large Animals have Large
Brains?(Schoenemann Brain Behav. Evol. 2004)
  • Multiple regression of Y Log (CNS) on
  • X s ß SE(ß) P
  • Log(Mass) -0.49 (0.70) 0.49
  • Log(Fat) -0.07 (0.10) 0.52
  • Log(Muscle) 1.03 (0.54) 0.07
  • Log(Heart) 0.42 (0.22) 0.06
  • Log(Bone) -0.07 (0.30) 0.83

Tests whether removal of variable reduces fit
13
Multiple linear regressionselecting independent
variables
  • Reasons for selecting a subset of independent
    variables (Xs)
  • cost (financial and other)
  • simplicity
  • improved prediction
  • improved explanation

14
Multiple linear regressionselecting independent
variables
  • Partial F-test
  • predetermined forward selection
  • forward selection based upon improvement in fit
  • backward selection based upon improvement in fit
  • stepwise (backward/forward)
  • Mallows C(p)
  • AIC

15
Multiple linear regressionselecting independent
variables
  • Partial F-test
  • predetermined forward selection
  • Mass, Bone, Heart, Muscle, Fat
  • forward selection based upon improvement in fit
  • backward selection based upon improvement in fit
  • Stepwise (backward/forward)

16
Multiple linear regressionselecting independent
variables
  • Partial F-test
  • predetermined forward selection
  • forward selection based upon improvement in fit
  • backward selection based upon improvement in fit
  • stepwise (backward/forward)

17
Why do Large Animals have Large
Brains?(Schoenemann Brain Behav. Evol. 2004)
  • Complete model (r20.97)
  • Forward stepwise (a-to-enter0.15
    a-to-remove0.15)
  • 1. Constant (r20.00)
  • 2. Constant Muscle (r20.97)
  • 3. Constant Muscle Heart (r20.97)
  • 4. Constant Muscle Heart Mass (r20.97)
  • -0.18 - 0.82xMass 1.24xMuscle 0.39xHeart

18
Why do Large Animals have Large
Brains?(Schoenemann Brain Behav. Evol. 2004)
  • Complete model (r20.97)
  • Backward stepwise (a-to-enter0.15
    a-to-remove0.15)
  • 1. All (r20.97)
  • 2. Remove Bone (r20.97)
  • 3. Remove Fat (r20.97)
  • -0.18 - 0.82xMass 1.24xMuscle 0.39xHeart

19
Comparing models
  • Mallows C(p)
  • C(p) (k-p).F(p) (2p-k1)
  • k parameters in full model p parameters in
    restricted model
  • F(p) is the F value comparing the fit of the
    restricted model with that of the full model
  • Lowest C(p) is best model
  • Akaike Information Criteria (AIC)
  • AICn.Log(s2) 2p
  • Lowest AIC indicates best model
  • Can compare models not included in one another

20
Comparing models
21
Collinearity
  • If two (or more) Xs are linearly related
  • they are collinear
  • the regression problem is indeterminate
  • X(3)5.X(2)16, or
  • X(2)4.X(1) 16.X(4)
  • If they are nearly linearly related (near
    collinearity), coefficients and tests are very
    inaccurate

22
What to do about collinearity?
  • Centering (mean 0)
  • Scaling (SD 1)
  • Regression on first few Principal Components
  • Ridge Regression

23
Curvilinear (Polynomial) Regression
  • Y ß0 ß1X ß2X² ß3X3 ... ßkXk E
  • Used to fit fairly complex curves to data
  • ßs estimated using least squares
  • Use sequential partial F-tests, or AIC, to find
    how many terms to use
  • kgt3 is rare in biology
  • Better to transform data and use simple linear
    regression, when possible

24
Curvilinear (Polynomial) Regression
Y0.066 0.00727.X Y0.117 0.00085.X
0.00009.X² Y0.201 - 0.01371.X 0.00061.X² -
0.000005.X3
From Sokal and Rohlf
25
Path Analysis
26
Path Analysis
  • Models with causal structure
  • Represented by path diagram
  • All variables quantitative
  • All path relationships assumed linear
  • (transformations may help)

27
Path Analysis
  • All paths one way
  • A gt C
  • C gt A
  • No loops
  • Some variables may not be directly observed
  • residual variables (U)
  • Some variables not observed but known to exist
  • latent variables (D)

28
Path Analysis
  • Path coefficients and other statistics calculated
    using multiple regressions
  • Variables are
  • centered (mean 0) so no constants in
    regressions
  • often standardized (SD 1)
  • So path coefficients usually between -1 and 1
  • Paths with coefficients not significantly
    different from zero may be eliminated

29
Path Analysis an example
  • Isaak and Hubert. 2001. Production of stream
    habitat gradients by montane watersheds
    hypothesis tests based on spatially explicit path
    analyses Can. J. Fish. Aquat. Sci.

30
- - - Predicted negative interaction
________ Predicted positive interaction
31
(No Transcript)
32
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com