Introduction%20to%20Probability%20and%20Statistics%20Thirteenth%20Edition - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction%20to%20Probability%20and%20Statistics%20Thirteenth%20Edition

Description:

Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 46
Provided by: ValuedG139
Category:

less

Transcript and Presenter's Notes

Title: Introduction%20to%20Probability%20and%20Statistics%20Thirteenth%20Edition


1
Introduction to Probability and
StatisticsThirteenth Edition
  • Chapter 12
  • Linear Regression and Correlation

2
Correlation Regression
  • Univariate Bivariate Statistics
  • U frequency distribution, mean, mode, range,
    standard deviation
  • B correlation two variables
  • Correlation
  • linear pattern of relationship between one
    variable (x) and another variable (y) an
    association between two variables
  • graphical representation of the relationship
    between two variables
  • Warning
  • No proof of causality
  • Cannot assume x causes y

3
1. Correlation Analysis
  • Correlation coefficient measures the strength of
    the relationship between x and y

Sample Pearsons correlation coefficient
4
Pearsons Correlation Coefficient
  • r indicates
  • strength of relationship (strong, weak, or none)
  • direction of relationship
  • positive (direct) variables move in same
    direction
  • negative (inverse) variables move in opposite
    directions
  • r ranges in value from 1.0 to 1.0

-1.0 0.0
1.0
Strong Negative No Rel.
Strong Positive
5
Limitations of Correlation
  • linearity
  • cant describe non-linear relationships
  • e.g., relation between anxiety performance
  • no proof of causation
  • Cannot assume x causes y

6
Some Correlation Patterns
Linear relationships
Curvilinear relationships
Y
Y
X
X
Y
Y
X
X
7
Some Correlation Patterns
Strong relationships
Weak relationships
Y
Y
X
X
Y
Y
X
X
8
Example
  • The table shows the heights and weights of n
    10 randomly selected college football players.

Player 1 2 3 4 5 6 7 8 9 10
Height, x 73 71 75 72 72 75 67 69 71 69
Weight, y 185 175 200 210 190 195 150 170 180 175
9
Example scatter plot
r .8261 Strong positive correlation As the
players height increases, so does his weight.
10
Inference using r
  • The population coefficient of correlation is
    called (rho). We can test for a significant
    correlation between x and y using a t test

11
Example
  • Is there a significant positive correlation
    between weight and height in the population of
    all college football players?

Use the t-table with n-2 8 df to bound the
p-value as p-value lt .005. There is a significant
positive correlation between weight and height in
the population of all college football players.
12
2. Linear Regression
  • Regression Correlation Prediction
  • Regression analysis is used to predict the value
    of one variable (the dependent variable) on the
    basis of other variables (the independent
    variables).
  • Dependent variable denoted y
  • Independent variables denoted x1, x2, , xk

13
Example
  • Let y be the monthly sales revenue for a company.
    This might be a function of several variables
  • x1 advertising expenditure
  • x2 time of year
  • x3 state of economy
  • x4 size of inventory
  • We want to predict y using knowledge of x1, x2,
    x3 and x4.

14
Some Questions
  • Which of the independent variables are useful and
    which are not?
  • How could we create a prediction equation to
    allow us to predict y using knowledge of x1, x2,
    x3 etc?
  • How good is this prediction?

We start with the simplest case, in which the
response y is a function of a single independent
variable, x.
15
Model Building
16
A Simple Linear Regression Model
  • Explanatory and Response Variables are Numeric
  • Relationship between the mean of the response
    variable and the level of the explanatory
    variable assumed to be approximately linear
    (straight line)
  • Model
  • b1 gt 0 ? Positive Association
  • b1 lt 0 ? Negative Association
  • b1 0 ? No Association

17
Picturing the Simple Linear Regression Model
Regression Plot
Y
y
? Slope
Error ?
1
a Intercept
X
0
x
18
Simple Linear Regression Analysis
y actual value of a score predicted
value
  • Variables
  • x Independent Variable
  • y Dependent Variable
  • Parameters
  • a y Intercept
  • ß Slope
  • e normal distribution with mean 0 and variance
    s2

19
Simple Linear Regression Model
y
bslope?y/?x
a
intercept
x
20
The Method of Least Squares
  • The equation of the best-fitting line
  • is calculated using a set of n pairs (xi, yi).
  • We choose our estimates a and b to estimate a and
    b so that the vertical distances of the points
    from the line,
  • are minimized.

21
Least Squares Estimators
22
Example
  • The table shows the IQ scores for a random
    sample of n 10 college freshmen, along with
    their final calculus grades.

Student 1 2 3 4 5 6 7 8 9 10
IQ Scores, x 39 43 21 64 57 47 28 75 34 52
Calculus grade, y 65 78 52 82 92 89 73 98 56 75
Use your calculator to find the sums and sums of
squares.
23
Example
24
The Analysis of Variance
  • The total variation in the experiment is measured
    by the total sum of squares
  • The Total SS is divided into two parts
  • SSR (sum of squares for regression) measures the
    variation explained by using x in the model.
  • SSE (sum of squares for error) measures the
    leftover variation not explained by x.

25
The Analysis of Variance
  • We calculate

26
The ANOVA Table
  • Total df Mean Squares
  • Regression df
  • Error df

n -1
1
MSR SSR/(1)
n 1 1 n - 2
MSE SSE/(n-2)
Source df SS MS F
Regression 1 SSR SSR/(1) MSR/MSE
Error n - 2 SSE SSE/(n-2)
Total n -1 Total SS
27
The Calculus Problem
Source df SS MS F
Regression 1 1449.9741 1449.9741 19.14
Error 8 606.0259 75.7532
Total 9 2056.0000
28
Testing the Usefulness of the Model (The F Test)
  • You can test the overall usefulness of the model
    using an F test. If the model is useful, MSR will
    be large compared to the unexplained variation,
    MSE.

This test is exactly equivalent to the t-test,
with t2 F.
29
Minitab Output
30
Testing the Usefulness of the Model
  • The first question to ask is whether the
    independent variable x is of any use in
    predicting y.
  • If it is not, then the value of y does not
    change, regardless of the value of x. This
    implies that the slope of the line, b, is zero.

31
Testing the Usefulness of the Model
The test statistic is function of b, our best
estimate of b. Using MSE as the best estimate of
the random variation s2, we obtain a t statistic.
32
The Calculus Problem
  • Is there a significant relationship between the
    calculus grades and the IQ scores at the 5 level
    of significance?

Reject H 0 when t gt 2.306. Since t 4.38
falls into the rejection region, H 0 is rejected .
There is a significant linear relationship
between the calculus grades and the IQ scores for
the population of college freshmen.
33
Measuring the Strength of the Relationship
  • If the independent variable x is of useful in
    predicting y, you will want to know how well the
    model fits.
  • The strength of the relationship between x and y
    can be measured using

34
Measuring the Strength of the Relationship
  • Since Total SS SSR SSE, r2 measures
  • the proportion of the total variation in the
    responses that can be explained by using the
    independent variable x in the model.
  • the percent reduction the total variation by
    using the regression equation rather than just
    using the sample mean y-bar to estimate y.

For the calculus problem, r2 .705 or 70.5.
Meaning that 70.5 of the variability of Calculus
Scores can be exlain by the model.
35
Estimation and Prediction
Confidence interval
Prediction interval
36
The Calculus Problem
  • Estimate the average calculus grade for students
    whose IQ score is 50 with a 95 confidence
    interval.

37
The Calculus Problem
  • Estimate the calculus grade for a particular
    student whose IQ score is 50 with a 95
    confidence interval.

Notice how much wider this interval is!
38
Minitab Output
  • Green prediction bands are always wider than red
    confidence bands.
  • Both intervals are narrowest when x x-bar.

39
Estimation and Prediction
  • Once you have
  • determined that the regression line is useful
  • used the diagnostic plots to check for violation
    of the regression assumptions.
  • You are ready to use the regression line to
  • Estimate the average value of y for a given value
    of x
  • Predict a particular value of y for a given value
    of x.

40
Estimation and Prediction
  • The best estimate of either E(y) or y for
  • a given value x x0 is
  • Particular values of y are more difficult to
    predict, requiring a wider range of values in the
    prediction interval.

41
Regression Assumptions
  • Remember that the results of a regression
    analysis are only valid when the necessary
    assumptions have been satisfied.

Assumptions
  1. The relationship between x and y is linear, given
    by y a bx e.
  2. The random error terms e are independent and, for
    any value of x, have a normal distribution with
    mean 0 and constant variance, s 2.

42
Diagnostic Tools
  • Normal probability plot or histogram of residuals
  • Plot of residuals versus fit or residuals versus
    variables
  • Plot of residual versus order

43
Residuals
  • The residual error is the leftover variation
    in each data point after the variation explained
    by the regression model has been removed.
  • If all assumptions have been met, these residuals
    should be normal, with mean 0 and variance s2.

44
Normal Probability Plot
  • If the normality assumption is valid, the plot
    should resemble a straight line, sloping upward
    to the right.
  • If not, you will often see the pattern fail in
    the tails of the graph.

45
Residuals versus Fits
  • If the equal variance assumption is valid, the
    plot should appear as a random scatter around the
    zero center line.
  • If not, you will see a pattern in the residuals.
Write a Comment
User Comments (0)
About PowerShow.com