Chapter 3 Multiple Linear Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 3 Multiple Linear Regression

Description:

Multiple regression model: involve more than one regressor variable. ... The hat matrix, H, is an idempotent matrix and is a symmetric matrix. i.e. H2 = H and HT = H ... – PowerPoint PPT presentation

Number of Views:341
Avg rating:3.0/5.0
Slides: 46
Provided by: Lis158
Category:

less

Transcript and Presenter's Notes

Title: Chapter 3 Multiple Linear Regression


1
Chapter 3 Multiple Linear Regression
  • Ray-Bing Chen
  • Institute of Statistics
  • National University of Kaohsiung

2
3.1 Multiple Regression Models
  • Multiple regression model involve more than one
    regressor variable.
  • Example The yield in pounds of conversion
    depends on temperature and the catalyst
    concentration.

3
  • E(y) 50 10 x1 7 x2

4
  • The response y may be related to k regressor or
    predictor variables (multiple linear regression
    model)
  • The parameter ?j represents the expected change
    in the response y per unit change in xi when all
    of the remaining regressor variables xj are held
    constant.

5
  • Multiple linear regression models are often used
    as the empirical models or approximating
    functions. (True model is unknown)
  • The cubic model
  • The model with interaction effects
  • Any regression model that is linear in the
    parameters is a linear regression model,
    regardless of the shape of the surface that it
    generates.

6
(No Transcript)
7
  • The second-order model with interaction

8
(No Transcript)
9
3.2 Estimation of the Model Parameters
  • 3.2.1 Least-squares Estimation of the Regression
    Coefficients
  • n observations (n gt k)
  • Assume
  • The error term ?, E(?) 0 and Var(?) ?2
  • The errors are uncorrelated.
  • The regressor variables, x1,, xk are fixed.

10
  • The sample regression model
  • The least-squares function
  • The normal equations

11
  • Matrix notation

12
  • The least-squares function

13
  • The fitted model corresponding to the levels of
    the regressor variable, x
  • The hat matrix, H, is an idempotent matrix and is
    a symmetric matrix. i.e. H2 H and HT H
  • H is an orthogonal projection matrix.
  • Residuals

14
  • Example 3.1 The Delivery Time Data
  • y the delivery time,
  • x1 the number of cases of product stocked,
  • x2 the distance walked by the route driver
  • Consider y ?0 ?1 x1 ?2 x2 ?

15
(No Transcript)
16
(No Transcript)
17
  • 3.2.2 A Geometrical Interpretation of Least
    Square
  • y (y1,,yn) is the vector of observations.
  • X contains p (p k1) column vectors (n 1),
    i.e.
  • X (1,x1,,xk)
  • The column space of X is called the estimation
    space.
  • Any point in the estimation space is X?.
  • Minimize square distance
  • S(?)(y-X?)(y-X?)

18
  • Normal equation

19
  • 3.2.3 Properties of the Least Square Estimators
  • Unbiased estimator
  • Covariance matrix
  • Let C(XX)-1
  • The LSE is the best linear unbiased estimator
  • LSE MLE under normality assumption

20
  • 3.2.4 Estimation of ?2
  • Residual sum of squares
  • The degree of freedom n p
  • The unbiased estimator of ?2 Residual mean
    squares

21
  • Example 3.2 The Delivery Time Data
  • Both estimates are in a sense correct, but they
    depend heavily on the choice of model.
  • The model with small variance would be better.

22
  • 3.2.5 Inadequacy of Scatter Diagrams in Multiple
    Regression
  • For the simple linear regression, the scatter
    diagram is an important tool in analyzing the
    relationship between y and x.
  • However it may not be useful in multiple
    regression.
  • y 8 5 x1 12 x2
  • The y v.s. x1 plot do not exhibit any apparent
    relationship between y and x1
  • The y v.s. x2 plot indicates the linear
    relationship with the slope ? 8.

23
(No Transcript)
24
  • In this case, constructing scatter diagrams of y
    v.s. xj (j 1,2,,k) can be misleading.
  • If there is only one (or a few) dominant
    regressor, or if the regressors operate nearly
    independently, the matrix scatterplots is most
    useful.

25
  • 3.2.6 Maximum-Likelihood Estimation
  • The Model is y X? ?
  • ? N(0, ?2I)
  • The likelihood function and log-likelihood
    function
  • The MLE of ?2

26
3.3 Hypothesis Testing in Multiple Linear
Regression
  • Questions
  • What is the overall adequacy of the model?
  • Which specific regressors seem important?
  • Assume the errors are independent and follow a
    normal distribution with mean 0 and variance ?2

27
  • 3.3.1 Test for Significance of Regression
  • Determine if there is a linear relationship
    between y and xj, j 1,2,,k.
  • The hypotheses are
  • H0 ß1 ß2 ßk 0
  • H1 ßj? 0 for at least one j
  • ANOVA
  • SST SSR SSRes
  • SSR/?2 ?2k, SSRes/?2 ?2n-k-1, and SSR and
    SSRes are independent

28
  • Under H1, F0 follows F distribution with k and
    n-k-1 and a noncentrality parameter of

29
  • ANOVA table

30
(No Transcript)
31
  • Example 3.3 The Delivery Time Data

32
  • R2 and Adjusted R2
  • R2 always increase when a regressor is added to
    the model, regardless of the value of the
    contribution of that variable.
  • An adjusted R2
  • The adjusted R2 will only increase on adding a
    variable to the model if the addition of the
    variable reduces the residual mean squares.

33
  • 3.3.2 Tests on Individual Regression Coefficients
  • For the individual regression coefficient
  • H0 ßj 0 v.s. H1 ßj ? 0
  • Let Cjj be the j-th diagonal element of (XX)-1.
    The test statistic
  • This is a partial or marginal test because any
    estimate of the regression coefficient depends on
    all of the other regression variables.
  • This test is a test of contribution of xj given
    the other regressors in the model

34
  • Example 3.4 The Delivery Time Data

35
  • The subset of regressors

36
  • For the full model, the regression sum of square
  • Under the null hypothesis, the regression sum of
    squares for the reduce model
  • The degree of freedom is p-r for the reduce
    model.
  • The regression sum of square due to ß2 given ß1
  • This is called the extra sum of squares due to ß2
    and the degree of freedom is p - (p - r) r
  • The test statistic

37
  • If ß2 ? 0, F0 follows a noncentral F distribution
    with
  • Multicollinearity this test actually has no
    power!
  • This test has maximal power when X1 and X2 are
    orthogonal to one another!
  • Partial F test Given the regressors in X1,
    measure the contribution of the regressors in X2.

38
  • Consider y ß0 ß1 x1 ß2 x2 ß3 x3 ?
  • SSR(ß1 ß0 , ß2, ß3), SSR(ß2 ß0 , ß1, ß3)
    and SSR(ß3 ß0 , ß2, ß1) are signal-degree-of
    freedom sums of squares.
  • SSR(ßj ß0 ,, ßj-1, ßj, ßk) the contribution
    of xj as if it were the last variable added to
    the model.
  • This F test is equivalent to the t test.
  • SST SSR(ß1 ,ß2, ß3ß0) SSRes
  • SSR(ß1 ,ß2 , ß3ß0) SSR(ß1ß0) SSR(ß2ß1, ß0)
    SSR(ß3 ß1, ß2, ß0)

39
  • Example 3.5 Delivery Time Data

40
  • 3.3.3 Special Case of Orthogonal Columns in X
  • Model y Xß ? X1ß1 X2ß2 ?
  • Orthogonal X1X2 0
  • Since the normal equation (XX)ß Xy,

41
(No Transcript)
42
  • 3.3.4 Testing the General Linear Hypothesis
  • Let T be an m ? p matrix, and rank(T) r
  • Full model y Xß ?
  • Reduced model y Z? ? , Z is an n ? (p-r)
    matrix and ? is a (p-r) ?1 vector. Then
  • The difference SSH SSRes(RM) SSRes(FM) with
    r degree of freedom. SSH is called the sum of
    squares due to the hypothesis H0 Tß 0

43
  • The test statistic

44
(No Transcript)
45
  • Another form
  • H0 Tß c v.s. H1 Tß? c Then
Write a Comment
User Comments (0)
About PowerShow.com