REGRESSION - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

REGRESSION

Description:

Graph of the regression equation is a straight line. ... b0 is the y intercept of the line. The graph is called the estimated regression line. ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 72
Provided by: tarekbu
Category:
Tags: regression | graph | line

less

Transcript and Presenter's Notes

Title: REGRESSION


1
REGRESSION
2
Simple Linear Regression
  • Simple Linear Regression Model
  • Least Squares Method
  • Coefficient of Determination
  • Model Assumptions
  • Testing for Significance
  • Using the Estimated Regression Equation
  • for Estimation and Prediction
  • Computer Solution

3
Simple Linear Regression Model
  • Regression analysis is a statistical technique
    that attempts to explain movements in one
    variable, the dependent variable, as a function
    of movements in a set of other variables, called
    independent (or explanatory) variables through
    the quantification of a single equation.
  • However, a regression result no matter how
    statistically significant, cannot prove
    causality. All regression analysis can do is test
    whether a significant quantitative relationship
    exists.
  • Model Assumption X and Y are linearly related.

4
Simple Linear Regression Model
  • The equation that describes how y is related
    to x and
  • an error term is called the regression
    model.
  • The simple linear regression model is

y b0 b1x e
  • where
  • b0 and b1 are called parameters of the model,
  • e is a random variable called the error term.

5
Simple Linear Regression Equation
  • The simple linear regression equation is

E(y) ?0 ?1x
  • Graph of the regression equation is a straight
    line.
  • b0 is the y intercept of the regression line.
  • b1 is the slope of the regression line.
  • E(y) is the expected value of y for a given x
    value.

6
Simple Linear Regression Equation
  • Positive Linear Relationship

Regression line
Intercept b0
Slope b1 is positive
7
Simple Linear Regression Equation
  • Negative Linear Relationship

Regression line
Intercept b0
Slope b1 is negative
8
Simple Linear Regression Equation
  • No Relationship

Regression line
Intercept b0
Slope b1 is 0
9
Estimated Simple Linear Regression Equation
  • The estimated simple linear regression equation
  • The graph is called the estimated regression
    line.
  • b0 is the y intercept of the line.
  • b1 is the slope of the line.

10
Least Squares Method
  • Least Squares Criterion
  • where
  • yi observed value of the dependent variable
  • for the ith observation
  • yi estimated value of the dependent variable
  • for the ith observation
  • This regression technique that calculates the ?
    so as to minimize the sum of the squared
    residuals.


11
The Least Squares Method
  • Slope for the Estimated Regression Equation
  • y-Intercept for the Estimated Regression Equation
  • b0 y - b1x
  • where
  • xi value of independent variable for ith
    observation
  • yi value of dependent variable for ith
    observation
  • x mean value for independent variable
  • y mean value for dependent variable
  • n total number of observations

_
_
_
_
12
Example XYZ Auto Sales
  • Simple Linear Regression
  • XYZ Auto periodically has a special week-long
    sale. As part of the advertising campaign XYZ
    runs one or more television commercials during
    the weekend preceding the sale. Data from a
    sample of 5 previous sales are shown below.
  • Number of TV Ads Number of Cars
    Sold
  • 2
    17
  • 2
    21
  • 2
    18
  • 1
    17
  • 3
    27

13
Estimated Regression Equation
14
Excel Output
15
Scatter Diagram and Trend Line
16
Relationship Among SST, SSR, SSE
.
  • R



observed
.
SSE

SST
estimated
SSR
mean
  • where
  • SST total sum of squares
  • SSR sum of squares due to regression
  • SSE sum of squares due to error

17
Relationship Among SST, SSR, SSE
  • Relationship Among SST, SSR, SSE
  • SST SSR SSE
  • where
  • SST total sum of squares
  • SSR sum of squares due to regression
  • SSE sum of squares due to error

18
Degrees of Freedom
  • Relationship Among SST, SSR, SSE
  • SST SSR SSE
  • SST DF n-1
  • SSR DF of independent variables
  • SSE DF n - of independent variables (p) -1

19
Relationship Among SST, SSR, SSE
20
The Coefficient of Determination
  • The coefficient of determination is the
    proportion of the variability in the dependent
    variable Y that is explained by X.
  • r2 SSR/SST
  • where
  • SST total sum of squares
  • SSR sum of squares due to regression
  • SSE sum of squares due to error

21
Example XYZ Auto
  • Coefficient of Determination
  • r2 SSR/SST 50/72 .69
  • The regression relationship is strong since 69
    of the variation in number of cars sold can be
    explained by the linear relationship between the
    number of TV ads and the number of cars sold.

22
The Correlation Coefficient
  • Sample Correlation Coefficient
  • where
  • b1 the slope of the estimated regression
  • equation

23
Example XYZ Auto Sales
  • Sample Correlation Coefficient
  • The sign of b1 in the equation
    is .
  • rxy .8333

24
Testing for Significance
To test for a significant regression
relationship, we must conduct a hypothesis test
to determine whether the value of b1 is zero.
Two tests are commonly used
t Test
F Test
and
Both the t test and F test require an estimate
of s 2, the variance of e in the regression
model.
25
Testing for Significance
  • An Estimate of s 2
  • The mean square error (MSE) provides the estimate
  • of s 2, and the notation s2 is also used.
  • s2 MSE SSE/(n-p-1)
  • where

26
Testing for Significance
  • An Estimate of s
  • To estimate s we take the square root of s 2.
  • The resulting s is called the standard error of
    the estimate.

27
Sampling Distribution of b1
  • Sampling Distribution of b1
  • Expected Value
  • Standard Deviation
  • Estimated Standard Deviation of b1 (Also referred
    to as the standard error of b1

28
Testing for Significance t Test
  • Hypotheses
  • H0 ?1 0
  • Ha ?1 0
  • Test Statistic
  • Rejection Rule
  • Reject H0 if t lt -t????or t gt t????
  • where t??? is based on a t distribution with
  • n p-1 degrees of freedom.

29
Example XYZ Auto Sales
  • t Test
  • Hypotheses H0 ?1 0
  • Ha ?1 0
  • Rejection Rule
  • For ? .05 and d.f. 3, t.025 3.182
  • Reject H0 if t gt 3.182
  • Test Statistics
  • t 5/1.91 2.61
  • Conclusions
  • Do Not Reject H0

30
Confidence Interval for ?1
  • We can use a 95 confidence interval for ?1 to
    test the hypotheses just used in the t test.
  • H0 is rejected if the hypothesized value of ?1
    is not included in the confidence interval for
    ?1.

31
Confidence Interval for ?1
  • The form of a confidence interval for ?1 is

b1 is the point estimator
32
Example XYZ Auto Sales
  • Rejection Rule
  • Reject H0 if 0 is not included in the
    confidence interval for ?1.
  • 95 Confidence Interval for ?1
  • 5 /- 3.182(1.91) 5 /- 6.07
  • or -1.07 to 11.07
  • Conclusion
  • Cannot Reject H0

33
Testing for Significance F Test
Hypotheses H0 ?1 0
Ha ?1 0 Test Statistic F
MSR/MSE MSRSSR/Regression Degrees of
Freedom MSRSSR/Number of Independent
Variables MSR MEAN SQUARE REGRESSION
34
F- Test
  • With only one independent variable, the F test
    will provide the same conclusion as the t test.
  • Rejection Rule
  • Reject H0 if F gt F?
  • where F? is based on an F distribution with 1
    d.f. in
  • the numerator and n - 2 d.f. in the denominator.

35
Example XYZ Auto Sales
  • F Test
  • Hypotheses H0 ?1 0
  • Ha ?1 0
  • Rejection Rule
  • For ? .05 and d.f. 1, 3 F.05
    10.13
  • Reject H0 if F gt 10.13.
  • Test Statistic
  • F MSR/MSE 50/7.33 6.81
  • Conclusion
  • We cannot reject H0.

36
Some Cautions about theInterpretation of
Significance Tests
  • Rejecting H0 b1 0 and concluding that the
  • relationship between x and y is significant does
    not enable us to conclude that a
    cause-and-effect
  • relationship is present between x and y.
  • Just because we are able to reject H0 b1 0
    and
  • demonstrate statistical significance does not
    enable
  • us to conclude that there is a linear
    relationship
  • between x and y.

37
Using the Estimated Regression Equationfor
Estimation and Prediction
Confidence Interval Estimate of E(yp)the mean or
expected value of the dependent variable y
corresponding to the given value
x_p Prediction Interval Estimate of yp yp
t?/2 sind where the confidence coefficient is
1 - ? and t?/2 is based on a t distribution
with n - 2 d.f.
38
Using the Estimated Regression Equationfor
Estimation and Prediction
  • Confidence Interval Estimate of E(yp)Standard
    Deviation
  • Where s sqrt(MSE)2.708
  • X_pThe particular or given value of the
    independent variable x
  • Y-hat_pThe point estimate of E(yp) when xx_p

39
CONFIDENCE INTERVAL
  • Point Estimation
  • If 3 TV ads are run prior to a sale, we expect
    the mean number of cars sold to be
  • y 10 5(3) 25 cars
  • Confidence Interval for E(yp)
  • 95 confidence interval estimate of the mean
    number of cars sold when 3 TV ads are run is
  • 25 (3.182)2.265 17.79 to 32.20 cars

40
Prediction Interval
  • Prediction Interval Estimate of yp
  • yp t?/2 sind
  • where the confidence coefficient is 1 - ? and
  • t?/2 is based on a t distribution with n - 2
    d.f.

41
PREDICTION
  • Prediction Interval for yp
  • 95 prediction interval estimate of the number
    of cars sold in one particular week (new
    situation in the future same, population) when 3
    TV ads are run is y 10 5(3) 25 cars
  • 25 (3.182)3.53 13.8 to 36.2 cars

42
Some Cautions about theInterpretation of
Significance Tests
  • Rejecting H0 b1 0 and concluding that the
    relationship between x and y is significant does
    not enable us to conclude that a cause-and-effect
    relationship is present between x and y.
  • Just because we are able to reject H0 b1 0 and
    demonstrate statistical significance does not
    enable us to conclude that there is a linear
    relationship between x and y.

43
Assumptions About the Error Term ?
1. The error ? is a random variable with mean
of zero.
2. The variance of ? , denoted by ? 2, is the
same for all values of the independent
variable.
3. The values of ? are independent.
4. The error ? is a normally distributed
random variable.
44
Residual
  • The assumption of Constant Variance can be
    checked by looking at residual versus fit plot
  • yi yi


45
Residual Plot Against x
  • If the assumption that the variance of e is the
    same for all values of x is valid, and the
    assumed regression model is an adequate
    representation of the relationship between the
    variables, then

The residual plot should give an overall
impression of a horizontal band of points
46
Residual Plot Against x
Good Pattern
Residual
0
x
47
CONSTANT VARIANCE





Residual

0





48
Non Constant Variance















0

0
Residual
Residual















49
Residual Plot Against x
Nonconstant Variance
Residual
0
x
50
Residual Plot Against x
Model Form Not Adequate
Residual
0
x
51
Example XYZ Auto Sales
  • Residual Plot

52
Standardized Residuals
  • Method to test normal distribution assumption
    (error term)
  • Standardized Residual For Observation i
  • Where
  • And

53
Standardized Residuals
  • If the assumption is satisfied we should expect
    to see 95 of the standardized residuals between
    2 and 2

54
Influential Observation
  • Ex

55
Continued
  • Ex

56
Continued
  • An influential observation has h that is greater
    than 6/n.
  • In this case we do not have an Influential
    observation.

57
Standard Deviation of the ith Residual
  • The Standard Error of the estimate S.77

58
Standardized Residual for Observations i
  • Ex y1.02.45(x)
  • If the assumption is satisfied we should expect
    to see 95 of the standardized residuals between
    2 and 2

59
Example With Excel
  • Page 587 Problem 45
  • Go to Excel, Select Tools, Choose Data Analysis,
    Choose Regression from the list of Analysis
    tools. Click OK.
  • Enter the Y input Range, Enter the X range,
    select labels, select confidence levels. Select
    Residuals, Residuals Plot, Standardized Residuals.

60
(No Transcript)
61
Output
  • Excel Output

62
(No Transcript)
63
(No Transcript)
64
Checking for Outliers.
  • We are going to use the scatter plot of x versus
    y and the Standardized Residual versus the
    predicted plot. The outlier will not fit the
    trend shown by the remaining data.

65
Leverage Observation
  • We will detect Influential Observation using
  • An influential observation has h that is greater
    than 6/n

66
Problem 51 Using Excel
  • Consider the following data
  • Go to Excel, Select Tools, Choose Data Analysis,
    Choose Regression from the list of Analysis
    tools. Click OK.
  • Enter the Y input Range, Enter the X range,
    select labels, select confidence levels. Select
    Residuals, Residuals Plot

67
Continued
68
Continued
69
Continued
  • We identify an observation as having high
    leverage if hi gt 6/n for these data, 6/n 6/8
    .75. Since the leverage for the observation x
    22, y 19 is .76, We would identify observation
    8 as a high leverage point. Thus, we conclude
    that observation 8 is an influential observation.

70
Continued (Excel)
The The last two observations in the data set
appear to be outliers since the standardized
residuals for these observations are 2.00 and
-2.16, respectively.
71
Continued
The scatter diagram indicates that the
observation x 22, y 19 is an influential
observation.
Write a Comment
User Comments (0)
About PowerShow.com