Describing the relationship between two data sets and using that information for prediction - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Describing the relationship between two data sets and using that information for prediction

Description:

Finding percentages with z scores ... Assuming a normal distribution of scores, what percentage of scores ... 3. Use Table A, Column A to find the correct z score. ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 48
Provided by: has9
Category:

less

Transcript and Presenter's Notes

Title: Describing the relationship between two data sets and using that information for prediction


1
Describing the relationship between two data sets
and using that information for prediction
2
Todays plan Review and describing two data sets
  • Review z scores
  • Characteristics of z scores
  • Finding percentages with z scores
  • Describing the relationship between two data
    sets correlation
  • What is it?
  • What does it mean?
  • How is it calculated?
  • Using the relationship between two variables for
    prediction regression
  • What is it?
  • What does it mean?
  • How is it calculated?
  • Reminder Exam will be handed out next week
    after class!

3
Some practice data for review purposes
Your stem leaf diagram here
4
Your X, SS, and s work here
X
SS
s
5
What is a z score?
  • Simple data transformation.
  • A score within a distribution minus the
    distributions mean and then that difference
    divided by the distributions standard deviation.
  • Used to describe data in standard deviation units
    (rather than the original units of measurement)
  • Can be used to compare across distributions or
    find percentile rank within distributions easily
    (assuming a normal distribution).

z
or
6
Four neat things about z scores
  • Mean of a set of z scores 0.0
  • Standard deviation of a set of z scores 1.0
  • Each z score tells that scores distance from the
    mean in standard deviation units.
  • Does not change the shape of the original
    distribution (though the z distribution is a
    different distribution, as indicated by the
    different mean and standard deviation).

7
How many standard deviations from the mean is a
heart rate difference of 5?
8
Percentage below (or above) a given score?
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5 for
own brand?
1. Sketch the problem by locating on your sketch
where the necessary z scores will be in relation
to the mean. 2. Shade in the area that you are
looking for. 3. Use Table A, Column A to find
the correct z score. 4. Use either column B or
column C to find the shaded area.
Notes Negative z scores are not on the
table. Rely on your sketch to tell you if you
need to add 0.5 to the tabled result.
mz 0.0
9
Percentage below (or above) a given score?
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5 for
denic?
1. Sketch the problem by locating on your sketch
where the necessary z scores will be in relation
to the mean. 2. Shade in the area that you are
looking for. 3. Use Table A, Column A to find
the correct z score. 4. Use either column B or
column C to find the shaded area.
Notes Negative z scores are not on the
table. Rely on your sketch to tell you if you
need to add 0.5 to the tabled result.
mz 0.0
10
How many standard deviations from the mean is a
heart rate difference of 5?
X - X s
z
(5 - 9.8) / 6.3 -0.76 (5 - 1.4) / 4.1 0.88
Why one positive and the other negative?
11
For own brand assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
mz 0.0
z -.76
12
(No Transcript)
13
For own brand assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
22.36
mz 0.0
z -.76
14
For denic assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
mz 0.0
z 0.88
15
(No Transcript)
16
For denic assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
31.06
mz 0.0
z 0.88
17
For denic assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
50.00
31.06
31.06
50.00
81.06
mz 0.0
z 0.88
18
Describing the relationship between two data sets.
  • In many cases, two variables (x and y) collected
    from the same source may be related
  • Weight (pounds) and height (inches) from the same
    person.
  • Sunny days/month (days) and monthly rainfall
    (inches).
  • A persons shoe size and their annual income.
  • How can we describe the degree to which these
    variables are related in a single number?
  • Why helps us to understand natural
    relationships.
  • Why may suggest if one variable causes another.
  • Problem different scales of measurement.
  • Scatterplots help us to see relationships between
    variables.
  • Pearsons r uses the correlation coefficient (and
    z scores) to measure the degree of relatedness of
    two variables.

19
Describing two data sets graphically the
scatterplot
(74,235)
(70,190)
(64,125)
(60,110)
20
Describing two data sets graphically the
scatterplot
(10,12)
(25,3)
21
Describing two data sets graphically the
scatterplot
(5,32)
(12,23)
22
Pearsons r describes relatedness numerically
  • Pearsons r is a correlation coefficient.
  • A correlation coefficient expresses
    quantitatively the magnitude and direction of the
    relationship (between two variables).
  • Pearsons r is a measure of the extent to which
    paired scores occupy the same or opposite
    positions within their own distributions.

23
Characteristics of Pearsons r.
  • Pearsons r must be between -1 and 1, inclusive
    (-1 lt r lt 1).
  • When r 1 or -1, there is a perfect linear
    relationship between the two variables (usually
    designated x and y).
  • r 1, perfect positive relationship (as x
    increases, y always increases by a constant
    ratio)
  • r -1, perfect negative relationship (as x
    increases, y always decreases by a constant
    ratio).
  • When r 0, there is no linear relationship
    between x and y.
  • When r is not -1, 0, or 1, there is a linear
    relationship of some direction (positive or
    negative) and some magnitude.
  • r S(zxzy)/N-1 (the corrected average of the z
    score cross-products)

24
What is the relationship between x and y?
25
What is the relationship between x and y?
26
What is the relationship between x and y?
27
What is the relationship between x and y?
28
What is the relationship between x and y?
29
What is the relationship between x and y?
30
What is the correlation between own and denic?
31
What is the correlation between own (X) and denic
(Y)?
r S(zxzy)/N-1
or
32
What is the correlation between own (X) and denic
(Y)?
The only new thing! Multiply each X by its paired
Y and sum them all up.
Recognize these?
33
What is the correlation between own (X) and denic
(Y)?
34
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
35
What is the correlation between own (X) and denic
(Y)?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
r .315
36
There is a moderate positive correlation.
37
Fill in Eddies Y score.
Y 10
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope)
38
Fill in Zacks Y score.
39
Fill in Zacks Y score.
When X 5, Y 10
When X 0, Y 0
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope) When the line hits the Y axis
(x 0) Y 0 (Y intercept 0)
40
Regression line YbYXaY (bY slope, aY y
intercept)
Y 2X0
Note no errors (all points fall on the line)
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope bY) When the line hits the Y
axis (x 0) Y 0 (Y intercept 0 aY)
41
For HR data, what Y is predicted for X 19?
42
For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y bYX aY
aY Y - bYX
43
For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y bYX aY
bY .2053
44
For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y .2053X aY
aY Y - bYX aY (45/32) - .2053(313/32) aY
1.406 - .20539.781 aY -0.602
45
For HR data, what Y is predicted for X 19?
Y .2053X -0.602 Y .2053(19) -0.602 Y
3.3
46
For HR data, what Y is predicted for X 19?
Y .2053X -0.602
(19,3.3)
(0,-0.6)
47
Next week
  • Explore error in prediction (Chapter 7)
  • Review correlation and regression.
  • Hand out exam 1 (Due in class on Monday, 2/26 at
    200 PM).
Write a Comment
User Comments (0)
About PowerShow.com