Correlation and Regression - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Correlation and Regression

Description:

Score on SAT. Height. Hours of Training. Explanatory ... Grade Point Average. IQ. Negative Correlation as x increases, y decreases. x = hours of training ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 34
Provided by: BetsyF6
Category:

less

Transcript and Presenter's Notes

Title: Correlation and Regression


1
Correlation and Regression
9
Elementary Statistics Larson Farber
2
Section 9.1
Correlation
3
Correlation
A relationship between two variables
Explanatory (Independent) Variable
Response (Dependent) Variable
y
x
Hours of Training
Number of Accidents
Shoe Size
Height
Cigarettes smoked per day
Lung Capacity
Score on SAT
Grade Point Average
Height
IQ
What type of relationship exists between the two
variables and is the correlation significant?
4
Scatter Plots and Types of Correlation
x hours of training y number of accidents
60
50
40
Accidents
30
20
10
0
0
2
4
6
8
10
12
14
16
18
20
Hours of Training
Negative Correlationas x increases, y decreases
5
Scatter Plots and Types of Correlation
x SAT score y GPA
4.00
3.75
3.50
3.25
GPA
3.00
2.75
2.50
2.25
2.00
1.75
1.50
300
350
400
450
500
550
600
650
700
750
800
Math SAT
Positive Correlationas x increases, y increases
6
Scatter Plots and Types of Correlation
x height y IQ
160
150
140
130
IQ
120
110
100
90
80
60
64
68
72
76
80
Height
No linear correlation
7
Correlation Coefficient
A measure of the strength and direction of a
linear relationship between two variables
The range of r is from 1 to 1.
If r is close to 1 there is a strong positive
correlation.
If r is close to 1 there is a strong negative
correlation.
If r is close to 0 there is no linear correlation.
8
Application
Final Grade
Absences
x y 8 78 2 92 5 90 12
58 15 43 9 74 6 81
95
90
85
80
75
Final Grade
70
65
60
55
50
45
40
0
2
4
6
8
10
12
14
16
Absences
X
9
Computation of r
xy
x y
x2
y2
1 8 78 2 2 92 3
5 90 4 12 58 5 15 43 6
9 74 7 6 81
6084 8464 8100 3364 1849 5476 6561
624 184 450 696 645 666 486
64 4 25 144 225 81 36
57
516
3751
579
39898
10
Hypothesis Test for Significance
r is the correlation coefficient for the sample.
The correlation coefficient for the population is
(rho).
For a two tail test for significance
(The correlation is not significant)
(The correlation is significant)
For left tail and right tail to test negative or
positive significance
The sampling distribution for r is a
t-distribution with n 2 d.f.
Standardized test statistic
11
Test of Significance
You found the correlation between the number of
times absent and a final grade r 0.975. There
were seven pairs of data.Test the significance of
this correlation. Use 0.01.
1. Write the null and alternative hypothesis.
(The correlation is not significant)
(The correlation is significant)
2. State the level of significance.
0.01
3. Identify the sampling distribution.
A t-distribution with 5 degrees of freedom
12
Rejection Regions
Critical Values t0
t
0
4. Find the critical value.
5. Find the rejection region.
6. Find the test statistic.
13
t
0
4.032
4.032
7. Make your decision.
t 9.811 falls in the rejection region. Reject
the null hypothesis.
8. Interpret your decision.
There is a significant correlation between the
number of times absent and final grades.
14
Section 9.2
Linear Regression
15
The Line of Regression
Once you know there is a significant linear
correlation, you can write an equation describing
the relationship between the x and y variables.
This equation is called the line of regression or
least squares line.
The equation of a line may be written as y mx
b where m is the slope of the line and b is
the y-intercept.
The line of regression is
The slope m is
The y-intercept is
16
(xi,yi)
a data point
a point on the line with the same x-value
a residual
260
250
240
230
revenue
220
210
200
190
180
1.5
2.0
2.5
3.0
Ad
17
xy
x2
y2
x y
Write the equation of the line of regression with
x number of absences and y final grade.
1 8 78 2 2 92 3
5 90 4 12 58 5 15 43 6
9 74 7 6 81
6084 8464 8100 3364 1849 5476 6561
624 184 450 696 645 666 486
64 4 25 144 225 81 36
Calculate m and b.
57
516
3751
579
39898
The line of regression is
3.924x 105.667
18
The Line of Regression
m 3.924 and b 105.667
The line of regression is
95
90
85
Grade
80
75
70
65
Final
60
55
50
45
40
Absences
Note that the point (8.143, 73.714) is
on the line.
19
Predicting y Values
The regression line can be used to predict values
of y for values of x falling within the range of
the data.
The regression equation for number of times
absent and final grade is
3.924x 105.667
Use this equation to predict the expected grade
for a student with (a) 3 absences (b) 12
absences
3.924(3) 105.667 93.895
(a)
3.924(12) 105.667 58.579
(b)
20
Section 9.3
Measures of Regression and Correlation
21
The Coefficient of Determination
The coefficient of determination, r2, is the
ratio of explained variation in y to the total
variation in y.
The correlation coefficient of number of times
absent and final grade is r 0.975. The
coefficient of determination is r2 (0.975)2
0.9506.
Interpretation About 95 of the variation in
final grades can be explained by the number of
times a student is absent. The other 5 is
unexplained and can be due to sampling error or
other variables such as intelligence, amount of
time studied, etc.
22
The Standard Error of Estimate
23
The Standard Error of Estimate
x
y
1 8 78 74.275 13.8756 2
2 92 97.819 33.8608 3 5
90 86.047 15.6262 4 12 58
58.579 0.3352 5 15 43 46.807
14.4932 6 9 74 70.351
13.3152 7 6 81 82.123 1.2611
92.767
Calculate
for each x.
4.307
24
Prediction Intervals
Given a specific linear regression equation and
x0, a specific value of x, a c-prediction
interval for y is
where
The point estimate is and E is the maximum
error of estimate.
Use a t-distribution with n 2 degrees of
freedom.
25
Application
Construct a 90 confidence interval for a final
grade when a student has been absent 6 times.
1. Find the point estimate
The point (6, 82.123) is the point on the
regression line with x-coordinate of 6.
26
Application
Construct a 90 confidence interval for a final
grade when a student has been absent 6 times.
2. Find E,
At the 90 level of confidence, the maximum error
of estimate is 9.438.
27
Application
Construct a 90 confidence interval for a final
grade when a student has been absent 6 times.
3. Find the endpoints.
E 82.123 9.438 72.685
E 82.123 9.438 91.561
72.685 lt y lt 91.561
When x 6, the 90 confidence interval is from
72.685 to 91.586.
28
Minitab Output
Regression Analysis The regression equation
is y 106 3.92x Predictor Coef
StDev T P Constant 105.668
3.655 28.91 0.000
x 3.9241 0.4019 9.76 0.000
S 4.307 R-Sq 95.0 R-Sq(adj) 94.0
29
Section 9.4
Multiple Regression
30
More Explanatory Variables
Absence IQ Grade
8 2 5 12 15 9 6
115 135 126 110 105 120 125
78 92 90 58 43 74 81
31
Minitab Output
Regression Analysis The regression equation
is Grade 52.7 2.65 absence 0.357
IQ Predictor Coef StDev
T P Constant Absence IQ
0.573 0.277 0.571
0.61 1.26 0.62
86.110 2.111 0.580
52.720 2.652 0.357
S 4.603 R-Sq 95.4 R-Sq(adj) 93.2
32
Interpretation
The regression equation is Grade 52.7 2.65
absence 0.357 IQ
When other variables are 0, the grade is 52.7.
If IQ is held constant, each time there is one
more absence the predicted grade will decrease by
2.65 points.
If number of absences is held constant, and IQ is
increased by one point the predicted grade will
increase by 0.357 points.
33
Predicting the Response Variable
The regression equation is Grade 52.7 2.65
absence 0.357 IQ
Use the regression equation to predict a grade
when a student is absent 5 times and has an IQ of
125.
Grade 52.7 2.65 absence 0.357 IQ Grade
52.7 2.65(5) 0.357(125) 80.075 (about 80)
Use the regression equation to predict a grade
when a student is absent 9 times and has an IQ of
120.
Grade 52.7 2.65 absence 0.357 IQ Grade
52.7 2.65(9) 0.357(120) 71.69 (about 72)
Write a Comment
User Comments (0)
About PowerShow.com