Title: Describing the relationship between two data sets and using that information for prediction
1Describing the relationship between two data sets
and using that information for prediction
2Todays plan Review and describing two data sets
- Review z scores
- Characteristics of z scores
- Finding percentages with z scores
- Describing the relationship between two data
sets correlation - What is it?
- What does it mean?
- How is it calculated?
- Using the relationship between two variables for
prediction regression - What is it?
- What does it mean?
- How is it calculated?
- Reminder Exam will be handed out next week
after class!
3Some practice data for review purposes
Your stem leaf diagram here
4Your X, SS, and s work here
X
SS
s
5What is a z score?
- Simple data transformation.
- A score within a distribution minus the
distributions mean and then that difference
divided by the distributions standard deviation. - Used to describe data in standard deviation units
(rather than the original units of measurement) - Can be used to compare across distributions or
find percentile rank within distributions easily
(assuming a normal distribution).
z
or
6Four neat things about z scores
- Mean of a set of z scores 0.0
- Standard deviation of a set of z scores 1.0
- Each z score tells that scores distance from the
mean in standard deviation units. - Does not change the shape of the original
distribution (though the z distribution is a
different distribution, as indicated by the
different mean and standard deviation).
7How many standard deviations from the mean is a
heart rate difference of 5?
8Percentage below (or above) a given score?
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5 for
own brand?
1. Sketch the problem by locating on your sketch
where the necessary z scores will be in relation
to the mean. 2. Shade in the area that you are
looking for. 3. Use Table A, Column A to find
the correct z score. 4. Use either column B or
column C to find the shaded area.
Notes Negative z scores are not on the
table. Rely on your sketch to tell you if you
need to add 0.5 to the tabled result.
mz 0.0
9Percentage below (or above) a given score?
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5 for
denic?
1. Sketch the problem by locating on your sketch
where the necessary z scores will be in relation
to the mean. 2. Shade in the area that you are
looking for. 3. Use Table A, Column A to find
the correct z score. 4. Use either column B or
column C to find the shaded area.
Notes Negative z scores are not on the
table. Rely on your sketch to tell you if you
need to add 0.5 to the tabled result.
mz 0.0
10How many standard deviations from the mean is a
heart rate difference of 5?
X - X s
z
(5 - 9.8) / 6.3 -0.76 (5 - 1.4) / 4.1 0.88
Why one positive and the other negative?
11For own brand assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
mz 0.0
z -.76
12(No Transcript)
13For own brand assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
22.36
mz 0.0
z -.76
14For denic assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
mz 0.0
z 0.88
15(No Transcript)
16For denic assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
31.06
mz 0.0
z 0.88
17For denic assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
50.00
31.06
31.06
50.00
81.06
mz 0.0
z 0.88
18Describing the relationship between two data sets.
- In many cases, two variables (x and y) collected
from the same source may be related - Weight (pounds) and height (inches) from the same
person. - Sunny days/month (days) and monthly rainfall
(inches). - A persons shoe size and their annual income.
- How can we describe the degree to which these
variables are related in a single number? - Why helps us to understand natural
relationships. - Why may suggest if one variable causes another.
- Problem different scales of measurement.
- Scatterplots help us to see relationships between
variables. - Pearsons r uses the correlation coefficient (and
z scores) to measure the degree of relatedness of
two variables.
19Describing two data sets graphically the
scatterplot
(74,235)
(70,190)
(64,125)
(60,110)
20Describing two data sets graphically the
scatterplot
(10,12)
(25,3)
21Describing two data sets graphically the
scatterplot
(5,32)
(12,23)
22Pearsons r describes relatedness numerically
- Pearsons r is a correlation coefficient.
- A correlation coefficient expresses
quantitatively the magnitude and direction of the
relationship (between two variables). - Pearsons r is a measure of the extent to which
paired scores occupy the same or opposite
positions within their own distributions.
23Characteristics of Pearsons r.
- Pearsons r must be between -1 and 1, inclusive
(-1 lt r lt 1). - When r 1 or -1, there is a perfect linear
relationship between the two variables (usually
designated x and y). - r 1, perfect positive relationship (as x
increases, y always increases by a constant
ratio) - r -1, perfect negative relationship (as x
increases, y always decreases by a constant
ratio). - When r 0, there is no linear relationship
between x and y. - When r is not -1, 0, or 1, there is a linear
relationship of some direction (positive or
negative) and some magnitude. - r S(zxzy)/N-1 (the corrected average of the z
score cross-products)
24What is the relationship between x and y?
25What is the relationship between x and y?
26What is the relationship between x and y?
27What is the relationship between x and y?
28What is the relationship between x and y?
29What is the relationship between x and y?
30What is the correlation between own and denic?
31What is the correlation between own (X) and denic
(Y)?
r S(zxzy)/N-1
or
32What is the correlation between own (X) and denic
(Y)?
The only new thing! Multiply each X by its paired
Y and sum them all up.
Recognize these?
33What is the correlation between own (X) and denic
(Y)?
34N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
35What is the correlation between own (X) and denic
(Y)?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
r .315
36There is a moderate positive correlation.
37Fill in Eddies Y score.
Y 10
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope)
38Fill in Zacks Y score.
39Fill in Zacks Y score.
When X 5, Y 10
When X 0, Y 0
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope) When the line hits the Y axis
(x 0) Y 0 (Y intercept 0)
40Regression line YbYXaY (bY slope, aY y
intercept)
Y 2X0
Note no errors (all points fall on the line)
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope bY) When the line hits the Y
axis (x 0) Y 0 (Y intercept 0 aY)
41For HR data, what Y is predicted for X 19?
42For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y bYX aY
aY Y - bYX
43For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y bYX aY
bY .2053
44For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y .2053X aY
aY Y - bYX aY (45/32) - .2053(313/32) aY
1.406 - .20539.781 aY -0.602
45For HR data, what Y is predicted for X 19?
Y .2053X -0.602 Y .2053(19) -0.602 Y
3.3
46For HR data, what Y is predicted for X 19?
Y .2053X -0.602
(19,3.3)
(0,-0.6)
47Next week
- Explore error in prediction (Chapter 7)
- Review correlation and regression.
- Hand out exam 1 (Due in class on Monday, 2/26 at
200 PM).