# Chapters 8 and 9: Correlations Between Data Sets - PowerPoint PPT Presentation

Loading...

PPT – Chapters 8 and 9: Correlations Between Data Sets PowerPoint presentation | free to download - id: 70bdf4-MjM1O

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

## Chapters 8 and 9: Correlations Between Data Sets

Description:

### ... and tall fathers tend to have tall sons We say there is a positive association between the heights of fathers and sons ... husbands and wives obtained the ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 39
Provided by: Evan167
Learn more at: http://www.math.unt.edu
Category:
User Comments (0)
Transcript and Presenter's Notes

Title: Chapters 8 and 9: Correlations Between Data Sets

1
Chapters 8 and 9 Correlations Between Data Sets
• Math 1680

2
Overview
• Scatter Plots
• Associations
• The Correlation Coefficient
• Sketching Scatter Plots
• Changes of Scale
• Summary

3
Scatter Plots
• Often, we are interested in comparing two related
data sets
• Heights and weights of students
• SAT scores and freshman GPA
• Age and fuel efficiency of vehicles
• We can draw a scatter plot of the data set
• Plot paired data points on a Cartesian plane

4
Scatter Plots
• Scatter plot for the heights of 1,078 fathers and
their adult sons
• From HANES study

5
Scatter Plots
• What does the dashed diagonal line represent?
• Find the point representing a 5'3¼" father who
has a 5'6½" son

6
Scatter Plots
• What does the vertical dashed column represent?
• Consider the families where the father was 72"
tall, to the nearest inch
• How tall was the tallest son?
• Shortest?

7
Scatter Plots
• Was the average height of the fathers around 64,
68 or 72?
• Was the SD of the fathers heights around 3", 6"
or 9"?

8
Scatter Plots
• The points form a swarm that is more or less
football-shaped
• This indicates that there is a linear association
between the fathers heights and the sons heights

9
Scatter Plots
• Short fathers tend to have short sons, and tall
fathers tend to have tall sons
• We say there is a positive association between
the heights of fathers and sons
• What would it mean for there to be a negative
association between the heights?

10
Scatter Plots
• Does knowing the fathers height give a precise
prediction of his sons height?
• Does knowing the fathers height let you better
predict his sons height?

11
Scatter Plots
• We will generally assume the scatter plots are
football-shaped
• Association is linear in nature
• Each data set is approximately normal

12
Scatter Plots
• Key features of scatter plots
• Given two data sets X and Y,
• The point of averages is the point (?x, ?y)
• The average of a data set is denoted by µ (Greek
mu, for mean)
• The subscript indicates which set is being
referenced
• It will be in the center of the cloud
• Due to the normal approximation, the vast
majority (95) of the cloud should fall within 2
SDs less than and greater than average for both
X and Y

13
Scatter Plots
14
Associations
• When given a value in one data set, we often want
to make a prediction for the other data set
• We call our given value the independent variable
• We call the value we are trying to predict the
dependent variable

15
Associations
• If there is indeed a relationship between the two
data sets, we can say various things about their
association
• Strong Knowing X helps you a lot in predicting
Y, and vice versa
• Weak Knowing X doesnt really help you predict
Y, and vice versa
• Positive X and Y are directly proportional
• The higher in one you look, the higher in the
other you should be
• Negative X and Y are inversely proportional
• The higher in one you look, the lower in the
other you should be

16
Associations
• Positive associations
• Study time/final grade
• Height/weight
• SAT score/GPA
• Clouds in sky/chance of rain
• Bowling practice/bowling score
• Age of husband/age of wife
• Negative associations
• Age of car/fuel efficiency
• Golfing practice/golf score
• Dental hygiene/cavities formed
• Pollution/air quality
• Speed/mile time

17
Associations
• What kind of association is this?

18
Associations
• What kind of association is this?

19
Associations
• Remember that even a very strong association does
not necessarily imply a causal relationship
• There may be a confounding influence at play

20
The Correlation Coefficient
• While strong/weak and positive/negative give a
sense of the association, we want a way to
quantify the strength and direction of the
association
• The correlation coefficient (r) is the statistic
which accomplishes this

21
The Correlation Coefficient
• The correlation coefficient is always between 1
and 1
• A positive r means that there is a positive
association between the sets
• A negative r means that there is a negative
association between the sets
• If r is close to 0, then there is only a weak
association between the sets
• If r is close to 1 or 1, then there is a strong
association between the sets

22
The Correlation Coefficient
• The following plots have and
, with 50 points in them
• The only difference between them is the
correlation coefficient
• Note how the points fall into a line as r
approaches 1 or 1

23
(No Transcript)
24
The Correlation Coefficient
• To calculate r
• Find the average and SD of each data set
• Multiply the data sets pairwise and find the
average
• The correlation is the average of the product
minus the product of the averages, all divided by
the product of the SDs

25
The Correlation Coefficient
X Y
1 5
3 9
4 7
5 1
7 13
26
The Correlation Coefficient
• Compute r for the following data

X Y
1 2
2 1
3 4
4 3
5 7
6 5
7 6
X Y
1 3
3 7
4 9
5 11
7 15
1
0.8214
27
The Correlation Coefficient
• Estimate the correlation

28
The Correlation Coefficient
• Estimate the correlation

29
Sketching Scatter Plots
• The SD line is the line consisting of all the
points where the standard score in X equals the
standard score in Y
• zX zY
• To sketch the SD line, draw a line bisecting the
long axis of the football shape
• Note that the SD line always goes through the
point of averages

30
Sketching Scatter Plots
• Given the five-statistic summary (averages, SDs,
and correlation) for a pair of data sets, we can
sketch the scatter plot
• Plot the point of averages in the center
• Mark two SDs in both directions, on both axes
• Plot the point 1 SD above average for both data
sets
• draw a line connecting this point and the point
of averages
• This is the SD line
• Draw an ellipse with the SD line as its long axis
• Ellipse should go just beyond the 2 SD marks in
all directions
• The value of r determines how oblong the ellipse
is

31
Sketching Scatter Plots
• A study of the IQs of husbands and wives obtained
the following results
• Husbands average IQ 100, SD 15
• Wives average IQ 100, SD 15
• r 0.6
• Sketch the scatter plot

32
Changes of Scale
• The correlation coefficient is not affected by
changes of scale
• Moving adding the same number to all of the
values of one variable
• Stretching multiplying the same positive number
to all the values of one variable
• Would r change if we multiplied by a negative
number?
• The correlation coefficient is also unaffected by
interchanging the two data sets

33
Changes of Scale
34
Changes of Scale
35
Changes of Scale
• Compute r for each of the following data sets

X Y
0 8
4 9
6 10
8 12
12 6
X Y
0 2
2 3
3 4
4 6
6 0
r -0.15
36
Summary
• The relationship between two variables, X and Y,
can be graphed in a scatter plot
• When the scatter plot is tightly clustered around
a line, there is a strong linear association
between X and Y
• A scatter plot can be characterized by its
five-statistic summary
• Average and SD of the X values
• Average and SD of the Y values
• Correlation coefficient

37
Summary
• When the correlation coefficient gets closer to 1
or 1, the points cluster more tightly around a
line
• Positive association has a positive r-value
• Negative association has a negative r-value
• Calculating the correlation coefficient
• Take the average of the product
• Subtract the product of the averages
• Divide the difference by the product of the SDs

38
Summary
• The correlation coefficient is not affected by
changes of scale or transposing the variables
• Correlation does not measure causation!
About PowerShow.com