Multiple Regression - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Multiple Regression

Description:

Assumption 4. Covariance between the X's and residual terms is 0 ... Recount Assumptions. Normality this means the elements of b are normally distributed ... – PowerPoint PPT presentation

Number of Views:1275
Avg rating:3.0/5.0
Slides: 52
Provided by: donald102
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression


1
Multiple Regression
  • Goals
  • Implementation
  • Assumptions

2
Goals of Regression
  • Description
  • Inference
  • Prediction (Forecasting)

3
Examples
4
Why is there a need for more than one predictor
variable?
  • Shown using the examples given above
  • more than one variable influences a response
    variable.
  • Predictors may themselves be correlated,
  • What is the independent contribution of each
    variable to explaining the variation in the
    response variable.

5
Three fundamental aspects of linear regression
  • Model selection
  • What is the most parsimonious set of predictors
    that explain the most variation in the response
    variable
  • Evaluation of Assumptions
  • Have we met the assumptions of the regression
    model
  • Model validation

6
The multiple regression model
  • Express a p variable regression model as a series
    of equations
  • P equations condensed into a matrix form,
  • gives the familiar general linear model
  • ? coefficients are known as partial regression
    coefficients

7
The p variable Regression Model
       
    This model gives the expected value of Y
conditional on the fixed values of X2, X3, ?Xp,
plus error
8
Matrix Representation
Regression model is best described as a system
of equations
9
We can re-write these equations
10
Summary of Terms
11
A Partial Regression Model
Burst 1.21 2.1 Femur Length 0.25 Tail
Length 1.0 Toe
Velocity
Partial Regression Coefficient
Predictor Variable
Response Variable
Intercept
12
Assumption 1.
  • Expected value of the residual vector is 0

13
Assumption 2.
  • There is no correlation between the ith and jth
    residual terms

14
Assumption 3.
  • The residuals exhibit constant variance

15
Assumption 4.
  • Covariance between the Xs and residual terms is
    0
  • Usually satisfied if the predictor variables are
    fixed and non-stochastic

16
Assumption 5.
  • The rank of the data matrix, X is p, the number
    of columns
  • p lt n, the number of observations.
  • No exact linear relationships among X variables.
  • Assumption of no multicollinearity

17
If these assumptions hold
  • Then the OLS estimators are in the class of
    unbiased linear estimators
  • Also minimum variance estimators

18
What does it mean to be BLUE?
  • What does this mean?
  • Allows us to compute a number of statistics.
  • OLS estimation

19
An estimator , is the best linear unbiased
estimator of ?, iff
  •  Linear
  • Unbiased, i.e., E( ) ?
  • Minimum variance in class of all linear unbiased
    estimators
  • Unbiased and minimum variance properties means
    that OLS estimators are efficient estimators
  • If one or more of the conditions are not met than
    the OLS estimators are no longer BLUE

20
Does is matter?
  • Yes, it means we require an alternative method
    for characterizing the association between our Y
    and X variables

21
OLS Estimation
Sample-based counter part to population
regression model
OLS requires choosing values of b, such that
error sum-of-squares (SSE) is as small as
possible.
22
The Normal Equations
Need to differentiate with respect to the
unknowns (b)
Yields p simultaneous equations in p unknowns,
Also known as the Normal Equations
23
Matrix form of the Normal Equations
24
The solution for the bs
It should be apparent how to solve for the
unknown parameters Pre-multiply by the inverse
of X?X
25
Solution Continued
From the properties of Inverses we note that
This is the fundamental outcome of OLS theory
26
Assessment of Goodness-of-Fit
  • Use the R2 statistic
  • It represents the proportion of variability in
    response variable that is accounted for by the
    regression model
  •  1 ? R2 ? 1
  •  Good fit of model means that R-square will be
    close to one.
  •  Poor fit means that R-square will be near 0.

27
R2 Multiple Coefficient of Determination
Alternative Expressions
28
Critique of R2 in Multiple Regression
  • R2 inflated by increasing the number of
    parameters in the model.
  • One should also analyze the residual values from
    the model (MSE)
  • Alternatively use the adjusted R2

29
Adjusted R2
30
How does adjusted R-square work?
  • Total Sum-of-Squares is fixed,
  • because it is independent of number of variables
  • The numerator, SSE, decreases as the number of
    variables increases.
  •  R2 artificially inflated by adding explanatory
    variables to the model
  • Use Adjusted R2 to compare different regression 
  • Adjusted R2 takes into account the number of
    predictors in the model

31
Statistical Inference and Hypothesis Testing
  • Our goal may be
  • 1) hypothesis testing
  • 2) interval estimation
  •  Hence we will need to impose distributional
    limits on the residuals
  •  It turns out the probability distribution of the
    OLS estimators depends on the probability
    distribution of the residuals, ?.

32
Recount Assumptions
  • Normality this means the elements of b are
    normally distributed
  • bs are unbiased.
  • If these hold then we can perform several
    hypothesis tests.

33
ANOVA Approach
  • Decomposition of total sums-of-squares into
    components relating
  • explained variance (regression)
  • unexplained variance (error)

34
ANOVA Table
35
Test of Null Hypothesis
Tests the null hypothesis   H0 ?2?3??p 0
 
Null hypothesis is known as a joint or
simultaneous hypothesis, because it compares the
values of all ?i simultaneously   This tests
overall significance of regression model
36
The F-test statistic and R2 vary directly
37
Tests of Hypotheses of true ?
Assume the regression coefficients are normally
distributed
b ?N??,?2???-1)
cov(b) E(b - ?)(b - ?)?
?2???-1
Estimate of ?2 is s2
38
Test Statistic
Follows a t distribution with n p df.
where cii is the element of the ith row and ith
column of ???-1
100(1-?) Confidence Interval is obtained from
39
Model Comparisons
  • Our interest is in parsimonious modeling
  • We seek a minimum set of X variables to predict
    variation in Y response variable.
  • Goal is to reduce the number of predictor
    variables to arrive at a more parsimonious
    description of the data.
  • Does leaving out one of the bs significantly
    diminish the variance explained by the model.
  • Compare a Saturated to an Unsaturated model
  • Note there are many possible Unsaturated models.

40
General Philosophy
  • Let SSE( r ) designate the error sum-of-squares
    for reduced model
  •  SSE( r ) ? SSE(f)
  • The saturated model will contain p parameters
  • The reduced model will contain k lt p parameters
  • If we assume the errors are normally distributed
    with mean 0 and variance sigma squared, then we
    can compare the two models.

41
Model Comparison
Compare saturated model with the reduced
model  Use the SSE terms as the basis for
comparison    
Hence,
Follows an F-distribution, with (p k), (n p)
df If Fobs gt Fcritical we reject the reduced
model as a parsimonious model the bi must be
included in the model  
42
How Many Predictors to Retain?A short course in
Model Selection
  • Several Options
  • Sequential Selection
  • Backward Selection
  • Forward Selection
  • Stepwise Selection
  • All possible subsets
  • MAXR
  • MINR
  • RSQUARE
  • ADJUSTED RSQUARE
  • CP

43
Sequential Methods
  • Forward, Stepwise, Backward selection procedures
  • Entails Partialling-out the predictor variables
  •  Based on the partial correlation coefficient

44
Forward Selection
  • Build-up procedure.
  • Add predictors until the best regression model
    is obtained

45
Outline of Forward Selection
  • No variables are included in regression equation
  • Calculate correlations of all predictors with
    dependent variable
  • Enter predictor variable with highest correlation
    into regression model if its corresponding
    partial F-value exceeds a predetermined threshold
  • Calculate the regression equation with the
    predictor
  • Select the predictor variable with the highest
    partial correlation to enter next.

46
Forward Selection Continued
  • Compare the partial F-test value
  • (called FH also known as F-to-enter)
  • to a predetermined tabulated F-value
  • (called FC)
  • If FH gt FC, include the variable with the highest
    partial correlation and return to step 5.
  • If FH lt FC, stop and retain the regression
    equation as calculated

47
Backward Selection
  • A deconstruction approach
  • Begin with the saturated (full) regression model
  • Compute the drop in R2 as a consequence of
    eliminating each predictor variable, and the
    partial F-test value treat as if the variable
    was the last to enter the regression equation
  • Compare the lowest partial F-test value,
    (designated FL), to the critical value of F
    (designated FC)
  • a. If FL lt FC, remove the variable
  • recompute the regression equation using the
    remaining predictor variables and return to step
    2.
  • b. FL lt FC, adopt the regression equation as
    calculated

48
Stepwise Selection
  • Calculate correlations of all predictors with
    response variable
  • Select the predictor variable with highest
    correlation. Regress Y on Xi. Retain the
    predictor if there is a significant F-test value.
  • Calculate partial correlations of all variable
    not in equation with response variable. Select
    next predictor to enter that has the highest
    partial correlation. Call this predictor Xj.
  • Compute the regression equation with both Xi and
    Xj entered. Retain Xj if its partial F-value
    exceeds the tabulated F (1, n-2-1) df. 
  • Now determine whether Xi warrants retention.
    Compare its partial F-value as if Xj was entered
    into the equation first.

49
Stepwise Continued
  • Retain if its F-value exceeds the tabulated F
    value
  •  Enter a new Xk variable. Compute regression
    with three predictors. Compute partial F-values
    for Xi, Xj and Xk.
  • Determine whether any should be retained by
    comparing observed partial F with the critical F.
  • 6) Retain regression equation when no other
    predictor can be entered or removed from the
    model.

50
All possible subsets
Requires use of optimality criterion, e.g.,
Mallows Cp
(p k 1)
  • s2 is residual variance for reduced model and ?2
    is the residual variance for full model
  • All subset regressions compute possible 1, 2, 3,
    variable models given some optimality
    criterion.

51
Mallows Cp
  • Measures total squared error
  • Choose model where Cp p
Write a Comment
User Comments (0)
About PowerShow.com