Describing the relationship between two data sets and using that information for prediction' - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Describing the relationship between two data sets and using that information for prediction'

Description:

Using the relationship between two variables for prediction: regression. What is it? ... Fill in Eddie's Y score. Fill in Eddie's Y score. Y' = 10 ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 57
Provided by: has9
Category:

less

Transcript and Presenter's Notes

Title: Describing the relationship between two data sets and using that information for prediction'


1
Describing the relationship between two data sets
and using that information for prediction.
2
Todays plan Review and describing two data sets
  • Review z scores
  • Characteristics of z scores
  • Finding percentages with z scores
  • Describing the relationship between two data
    sets correlation
  • What is it?
  • What does it mean?
  • How is it calculated?
  • Using the relationship between two variables for
    prediction regression
  • What is it?
  • What does it mean?
  • How is it calculated?
  • Reminder Exam will be handed out next week
    after class!

3
Some practice data for review purposes
Your stem leaf diagram here
4
Some practice data for review purposes
Own brand
-0
3,1
0
8,2,5,7,5,0,4,7,4,8,4,4,8
1
6,7,4,5,2,1,4,3,1,4,8,6,1,1,6
2
0,2
Median?
Denic
-0
3,1,1,1,6,2,2,1,1,1,5,3,2
0
3,4,6,6,1,3,5,3,4,2,2,2,1,3,3,2,0
1
0,4
2
Median?
5
Your X, SS, and s work here
X
SS
s
6
Your X, SS, and s work here
Sx/N 313/32 9.8 45/32 1.4
X
Sx2- (Sx)2/N 4293 - (3132/32)
1231.5
SS
585 - (452/32)
521.7
s
SS/N-1
1231.5/31 6.3
521.7/31 4.1
7
What is a z score?
  • Simple data transformation.
  • A score within a distribution minus the
    distributions mean and then that difference
    divided by the distributions standard deviation.
  • Used to describe data in standard deviation units
    (rather than the original units of measurement)
  • Can be used to compare across distributions or
    find percentile rank within distributions easily
    (assuming a normal distribution).

z
or
8
Four neat things about z scores
  • Mean of a set of z scores 0.0
  • Standard deviation of a set of z scores 1.0
  • Each z score tells that scores distance from the
    mean in standard deviation units.
  • Does not change the shape of the original
    distribution (though the z distribution is a
    different distribution, as indicated by the
    different mean and standard deviation).

9
How many standard deviations from the mean is a
heart rate difference of 5?
10
How many standard deviations from the mean is a
heart rate difference of 5?
X - X s
z
(5 - 9.8) / 6.3 -0.76 (5 - 1.4) / 4.1 0.88
Why one positive and the other negative?
11
Percentage below (or above) a given score?
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
Notes Negative z scores are not on the
table Rely on your sketch to tell you if you
need to add 0.5 to the tabled result.
1. Sketch the problem by locating on your sketch
where the necessary z scores will be in relation
to the mean. 2. Shade in the area that you are
looking for. 3. Use Table A, Column A to find
the correct z score. 4. Use either column B or
column C to find the shaded area.
mz 0.0
12
For own brand assuming a normal distribution of
scores, what percentage of scores are below a
score of 5?
mz 0.0
z -.76
13
(No Transcript)
14
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
22.36
mz 0.0
z -.76
15
Percentage below (or above) a given score?
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
Notes Negative z scores are not on the
table Rely on your sketch to tell you if you
need to add 0.5 to the tabled result.
1. Sketch the problem by locating on your sketch
where the necessary z scores will be in relation
to the mean. 2. Shade in the area that you are
looking for. 3. Use Table A, Column A to find
the correct z score. 4. Use either column B or
column C to find the shaded area.
mz 0.0
16
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
mz 0.0
z 0.88
17
(No Transcript)
18
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
31.06
mz 0.0
z 0.88
19
Assuming a normal distribution of scores, what
percentage of scores are below a score of 5?
81.06
31.06
50.00
mz 0.0
z 0.88
20
Describing the relationship between two data sets.
  • In many cases, two variables (x and y) collected
    from the same source are related
  • Weight (pounds) and height (inches) from the same
    person.
  • Sunny days/month (days) and monthly rainfall
    (inches).
  • Shoe size and annual income from the same person.
  • How can we describe the degree to which these
    variables are related in a single number?
  • Why helps us to understand natural
    relationships.
  • Why may suggest if one variable causes another.
  • Problem different scales of measurement.
  • Scatterplots help us to see relationships between
    variables.
  • Pearsons r uses the correlation coefficient (and
    z scores) to measure the degree of relatedness of
    two variables.

21
Describing two data sets graphically the
scatterplot
(74,235)
(70,190)
(64,125)
(60,110)
22
Describing two data sets graphically the
scatterplot
(10,12)
(25,3)
23
Describing two data sets graphically the
scatterplot
(5,32)
(12,23)
24
Pearsons r describes relatedness numerically
  • Pearsons r is a correlation coefficient.
  • A correlation coefficient expresses
    quantitatively the magnitude and direction of the
    relationship (between two variables).
  • Pearsons r is a measure of the extent to which
    paired scores occupy the same or opposite
    positions within their own distributions.

25
Characteristics of Pearsons r.
  • Pearsons r must be between -1 and 1, inclusive
    (-1 lt r lt 1).
  • When r 1 or -1, there is a perfect linear
    relationship between the two variables (usually
    designated x and y).
  • r 1, perfect positive relationship (as x
    increases, y always increases by a constant
    ratio)
  • r -1, perfect negative relationship (as x
    increases, y always decreases by a constant
    ratio).
  • When r 0, there is no linear relationship
    between x and y.
  • When r is not -1, 0, or 1, there is a linear
    relationship of some direction (positive or
    negative) and some magnitude.
  • r S(zxzy)/N-1 (the corrected average of the z
    score cross-products)

26
What is the relationship between x and y?
27
What is the relationship between x and y?
28
What is the relationship between x and y?
29
What is the relationship between x and y?
30
What is the relationship between x and y?
31
What is the relationship between x and y?
32
What is the relationship between x and y?
33
What is the relationship between x and y?
34
What is the relationship between x and y?
35
Examples of various rs.
36
What is the correlation between own and denic?
37
What is the correlation between own (X) and denic
(Y)?
r S(zxzy)/N-1
or
38
What is the correlation between own (X) and denic
(Y)?
The only new thing! Multiply each X by its paired
Y and sum them all up.
Recognize these?
39
What is the correlation between own (X) and denic
(Y)?
40
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
41
What is the correlation between own (X) and denic
(Y)?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
r .315
42
There is a moderate positive correlation.
43
Using correlation to predict scores regression
  • If there is a non-zero correlation between x and
    y (r 0.0), it means that there is a linear
    relationship between the two variables, x and y.
  • It also means that if you know an x score, you
    can use the correlation to help you predict what
    the y score paired with that x score would be.
  • The closer the correlation is to a perfect one (r
    1 or r -1) the better the prediction will be.
  • The predicted score (y or Y prime) will be based
    on a straight line
  • (Y bX a) drawn through the scatterplot such
    that the distance of each point from the line is
    minimized (linear regression).
  • When r 1 or r -1, all the points fall on the
    line
  • When r is lt 1 or r gt -1, at least some of the
    points are not on the line.

44
Fill in Eddies Y score.
45
Fill in Eddies Y score.
Y 10
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope)
46
Fill in Zacks Y score.
47
Fill in Zacks Y score.
When X 5, Y 10
When X 0, Y 0
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope) When the line hits the Y axis
(x 0) Y 0 (Y intercept 0)
48
Regression line Y bYXaY (bY slope, aY y
intercept)
Y 2X0
Note no errors (all points fall on the line)
As X increases by 1, Y increases by 2. (rise/run
2/1 2 slope bY) When the line hits the Y
axis (x 0) Y 0 (Y intercept 0 aY)
49
For HR data, what Y is predicted for X 19?
50
For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y bYX aY
aY Y - bYX
51
For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y bYX aY
bY .2053
52
For HR data, what Y is predicted for X 19?
N 32 SX 313 SX2 4293 SSx 4293 -
(3132/32) 1231.5 SY 45 SY2 585 SSy
585-(452/32) 521.7 SXY 693
Y .2053X aY
aY Y - bYX aY (45/32) - .2053(313/32) aY
1.406 - .20539.781 aY -0.602
53
For HR data, what Y is predicted for X 19?
Y .2053X -0.602 Y .2053(19) -0.602 Y
3.3
54
For HR data, what Y is predicted for X 19?
Y .2053X -0.602
(19,3.3)
(0,-0.6)
55
Next week
  • Explore error in prediction (Chapter 7)
  • Review correlation and regression.
  • Hand out exam 1 (Due Friday, 2/18 at 3 PM).

56
For HR data, what Y is predicted for X 19?
Y .2053X -0.602
(19,3.3)
(0,-0.6)
Y 1.4
Write a Comment
User Comments (0)
About PowerShow.com