Title: Describing the relationship between two data sets and using that information for prediction'
1Describing the relationship between two data sets
and using that information for prediction.
2Todays plan Review and describing two data sets
- Review z scores
- Characteristics of z scores
- Finding percentages with z scores
- Describing the relationship between two data
sets correlation - What is it?
- What does it mean?
- How is it calculated?
- Using the relationship between two variables for
prediction regression - What is it?
- What does it mean?
- How is it calculated?
- Reminder Exam will be handed out next week
after class!
3Some practice data for review purposes
Your stem leaf diagram here
4Some practice data for review purposes
Own brand
-0
3,1
0
8,2,5,7,5,0,4,7,4,8,4,4,8
1
6,7,4,5,2,1,4,3,1,4,8,6,1,1,6
2
0,2
Median?
Denic
-0
3,1,1,1,6,2,2,1,1,1,5,3,2
0
3,4,6,6,1,3,5,3,4,2,2,2,1,3,3,2,0
1
0,4
2
Median?
5Your X, SS, and s work here
X
SS
s
6Your X, SS, and s work here
Sx/N 313/32 9.8 45/32 1.4
X
Sx2- (Sx)2/N 4293 - (3132/32)
1231.5
SS
585 - (452/32)
521.7
s
SS/N-1
1231.5/31 6.3
521.7/31 4.1
7What is a z score?
- Simple data transformation.
- A score within a distribution minus the
distributions mean and then that difference
divided by the distributions standard deviation. - Used to describe data in standard deviation units
(rather than the original units of measurement) - Can be used to compare across distributions or
find percentile rank within distributions easily
(assuming a normal distribution).
z
or
8Four neat things about z scores
- Mean of a set of z scores 0.0
- Standard deviation of a set of z scores 1.0
- Each z score tells that scores distance from the
mean in standard deviation units. - Does not change the shape of the original
distribution (though the z distribution is a
different distribution, as indicated by the
different mean and standard deviation).
9How many standard deviations from the mean is a
heart rate difference of 5?
10How many standard deviations from the mean is a
heart rate difference of 5?
X - X s
z
(5 - 9.8) / 6.3 -0.76 (5 - 1.4) / 4.1 0.88
Why one positive and the other negative?
11Percentage below (or above) a given score?
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
Notes Negative z scores are not on the
table Rely on your sketch to tell you if you
need to add 0.5 to the tabled result.
1. Sketch the problem by locating on your sketch
where the necessary z scores will be in relation
to the mean. 2. Shade in the area that you are
looking for. 3. Use Table A, Column A to find
the correct z score. 4. Use either column B or
column C to find the shaded area.
mz 0.0
12For own brand assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
mz 0.0
z -.76
13(No Transcript)
14Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
22.36
mz 0.0
z -.76
15Percentage below (or above) a given score?
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
Notes Negative z scores are not on the
table Rely on your sketch to tell you if you
need to add 0.5 to the tabled result.
1. Sketch the problem by locating on your sketch
where the necessary z scores will be in relation
to the mean. 2. Shade in the area that you are
looking for. 3. Use Table A, Column A to find
the correct z score. 4. Use either column B or
column C to find the shaded area.
mz 0.0
16Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
mz 0.0
z 0.88
17(No Transcript)
18Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
31.06
mz 0.0
z 0.88
19Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
81.06
31.06
50.00
mz 0.0
z 0.88
20Describing the relationship between two data sets.
- In many cases, two variables (x and y) collected
from the same source are related - Weight (pounds) and height (inches) from the same
person. - Sunny days/month (days) and monthly rainfall
(inches). - Shoe size and annual income from the same person.
- How can we describe the degree to which these
variables are related in a single number? - Why helps us to understand natural
relationships. - Why may suggest if one variable causes another.
- Problem different scales of measurement.
- Scatterplots help us to see relationships between
variables. - Pearsons r uses the correlation coefficient (and
z scores) to measure the degree of relatedness of
two variables.
21Describing two data sets graphically the
scatterplot
(74,235)
(70,190)
(64,125)
(60,110)
22Describing two data sets graphically the
scatterplot
(10,12)
(25,3)
23Describing two data sets graphically the
scatterplot
(5,32)
(12,23)
24Pearsons r describes relatedness numerically
- Pearsons r is a correlation coefficient.
- A correlation coefficient expresses
quantitatively the magnitude and direction of the
relationship (between two variables). - Pearsons r is a measure of the extent to which
paired scores occupy the same or opposite
positions within their own distributions.
25Characteristics of Pearsons r.
- Pearsons r must be between -1 and 1, inclusive
(-1 lt r lt 1). - When r 1 or -1, there is a perfect linear
relationship between the two variables (usually
designated x and y). - r 1, perfect positive relationship (as x
increases, y always increases by a constant
ratio) - r -1, perfect negative relationship (as x
increases, y always decreases by a constant
ratio). - When r 0, there is no linear relationship
between x and y. - When r is not -1, 0, or 1, there is a linear
relationship of some direction (positive or
negative) and some magnitude. - r S(zxzy)/N-1 (the corrected average of the z
score cross-products)
26What is the relationship between x and y?
27What is the relationship between x and y?
28What is the relationship between x and y?
29What is the relationship between x and y?
30What is the relationship between x and y?
31What is the relationship between x and y?
32What is the relationship between x and y?
33What is the relationship between x and y?
34What is the relationship between x and y?
35Examples of various rs.
36What is the correlation between own and denic?
37What is the correlation between own (X) and denic
(Y)?
r S(zxzy)/N-1
or
38What is the correlation between own (X) and denic
(Y)?
The only new thing! Multiply each X by its paired
Y and sum them all up.
Recognize these?
39What is the correlation between own (X) and denic
(Y)?
40N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
41What is the correlation between own (X) and denic
(Y)?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
r .315
42There is a moderate positive correlation.
43Using correlation to predict scores regression
- If there is a non-zero correlation between x and
y (r 0.0), it means that there is a linear
relationship between the two variables, x and y. - It also means that if you know an x score, you
can use the correlation to help you predict what
the y score paired with that x score would be. - The closer the correlation is to a perfect one (r
1 or r -1) the better the prediction will be. - The predicted score (y or Y prime) will be based
on a straight line - (Y bX a) drawn through the scatterplot such
that the distance of each point from the line is
minimized (linear regression). - When r 1 or r -1, all the points fall on the
line - When r is lt 1 or r gt -1, at least some of the
points are not on the line.
44Fill in Eddies Y score.
45Fill in Eddies Y score.
Y 10
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope)
46Fill in Zacks Y score.
47Fill in Zacks Y score.
When X 5, Y 10
When X 0, Y 0
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope) When the line hits the Y axis
(x 0) Y 0 (Y intercept 0)
48Regression line Y bYXaY (bY slope, aY y
intercept)
Y 2X0
Note no errors (all points fall on the line)
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope bY) When the line hits the Y
axis (x 0) Y 0 (Y intercept 0 aY)
49For HR data, what Y is predicted for X 19?
50For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y bYX aY
aY Y - bYX
51For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y bYX aY
bY .2053
52For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y .2053X aY
aY Y - bYX aY (45/32) - .2053(313/32) aY
1.406 - .20539.781 aY -0.602
53For HR data, what Y is predicted for X 19?
Y .2053X -0.602 Y .2053(19) -0.602 Y
3.3
54For HR data, what Y is predicted for X 19?
Y .2053X -0.602
(19,3.3)
(0,-0.6)
55Next week
- Explore error in prediction (Chapter 7)
- Review correlation and regression.
- Hand out exam 1 (Due Friday, 2/18 at 3 PM).
56For HR data, what Y is predicted for X 19?
Y .2053X -0.602
(19,3.3)
(0,-0.6)
Y 1.4