# AP Statistics Chapter 3 - PowerPoint PPT Presentation

Loading...

PPT – AP Statistics Chapter 3 PowerPoint presentation | free to download - id: 6b8133-YzRiZ

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

## AP Statistics Chapter 3

Description:

### Title: Chapter 7: Correlation Author: Paul Kim Last modified by: Krausse, Joy Created Date: 5/17/2002 8:23:45 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Date added: 5 November 2019
Slides: 49
Provided by: PaulK168
Category:
Tags:
User Comments (0)
Transcript and Presenter's Notes

Title: AP Statistics Chapter 3

1
AP Statistics Chapter 3
• Scatterplots, Association, and Correlation

2
Relationships
• You can observe a lot by watching, Yogi Berra
• Although this statement is said in jest, it
carries much truth.
• Many statistical studies look at multiple
variables to try to show a relationship between
one variable and another.
• Most of the time, questions ask whether there is
an association between two variables.
• It is important to know the definition of a few
words before we continue to explore this concept.

3
• When one variable effects another, one variable
will be referred to the explanatory variable and
the other as the response variable
• Explanatory Variable
• A variable that attempts to explain the observed
outcomes.
• Response Variable
• A variable that measures an outcome of a study.
• Association
• Any relationship between two measured quantities
that renders them statistically dependent
• Simply, if there is a (direct or indirect) link
between two variables

4
Example
• Suppose that I randomly select 10 students from a
Stats. class and record their weight in pounds
and get the following results
• 103, 201, 125, 179, 150, 138, 181, 220, 113, 126
• Now, lets say that I was going to pick another
random student, could I come up with a prediction
on how much that student was going to weigh?
(The mean is 153.6 and standard deviation
39.58)
• How accurate will my prediction be?
• Is there a way to improve this prediction?

5
Example
• Now lets say that I have more information
• Weight 103, 201, 125, 179, 150, 138, 181, 220,
113, 126
• Height 61, 68, 65, 69, 65, 61, 64,
72, 63, 62
• The following represents both the weights and
heights (in inches) of the 10 students
• Now, lets say that I was going to pick another
random student, knowing their height is 65
inches, could I come up with a prediction on how
much that student was going to weigh?
• How accurate will my prediction be?
• Is this a better way to make a prediction?

6
Roles for Variables
• When we have two variable (or bivariate data), it
always a good idea to make a picture!
• As we graph the two variables, it is important to
determine which of the two quantitative variables
goes on the x-axis and which on the y-axis.
• This determination is made based on the roles
played by the variables.
• When the roles are clear, the explanatory or
independent variable goes on the x-axis, and the
response variable or dependent variable goes on
the y-axis.

7
Roles for Variables (cont.)
• The roles that we choose for variables are more
about how we think about them rather than about
the variables themselves.
• Just placing a variable on the x-axis doesnt
necessarily mean that it explains or predicts
anything. And the variable on the y-axis may not
respond to it in any way.
• In a cause and effect relationship, the
explanatory variable is the cause, and the
response variable is the effect. Regression is a
method for predicting the value of a dependent
variable y, based on the value of an independent
variable x.

8
Examples
• A study looks at smoking and lung cancer.
• Which (if any) is the explanatory variable?
• Which (if any) is the response variable?
• Is smoking a quantitative or categorical
variable?
• Is lung cancer a quantitative or categorical
variable?

9
Examples
• A study looks at cavities and milk drinking.
• Which (if any) is the explanatory variable?
• Which (if any) is the response variable?
• Is cavities a quantitative or categorical
variable?
• Is milk drinking a quantitative or categorical
variable?

10
Examples
• A study looks at rain fall and SAT scores.
• Which (if any) is the explanatory variable?
• Which (if any) is the response variable?
• Is rainfall a quantitative or categorical
variable?
• Is SAT scores a quantitative or categorical
variable?

11
Looking at Scatterplots
• Scatterplots may be the most common and most
effective display for two-variable data.
• In a scatterplot, you can see patterns, trends,
relationships, and even the occasional
extraordinary value sitting apart from the
others.
• Scatterplots are the best way to start observing
the relationship and the ideal way to picture
associations between two quantitative variables.

12
Example
• Now lets revisit the problem we started with
• Weight 103, 201, 125, 179, 150, 138, 181, 220,
113, 126
• Height 61, 68, 65, 69, 65, 61, 64,
72, 63, 62
• The following represents both the weights and
heights (in inches) of the 10 students
• Graph the data and describe the distribution.

13
Regression
• When we perform regression, we take two variables
and we attempt to use the explanatory variable to
estimate (or predict) the value of the response
variable. This process is called Regression
• As you can imagine, at times, theres clear
relationship between the two variable and
sometimes there might not be any relationship.
• Also, even if there is a relationship, some
relationships are strong while others are weak.
• To best describe the relationship, we should
always describe the form, direction, and the
strength.

14
Interpreting Association
• Form Is there a pattern? Is the data linear or
curved? Are there clusters of data?
• Strength Is it weak or strong? Does the data
tightly conform or loosely conform?
• Direction If linear, does the data go up
(positively associated) or go down (negatively
associated) or is it a horizontal line (no
association)?
• Deviations from pattern Are there areas where
the data conform less to the pattern? Are there
any outliers?

15
(Percent of graduates taking SAT vs. the Average
SAT Math Score)
• Attributes of a good scatterplot
• Consistent and uniform scale
• Label on both axis
• Accurate placement of data
• Data throughout the axis
• Axis break lines if not starting at zero.
• To achieve these goal you are required to do your
scatterplots on graph paper.

16
Examples
• Try to make a graph of the following situation,
then describe the association
• Points allowed vs. winning percentage
• World population vs. year
• Amount of rain vs. crop yield
• Height vs. weight
• Height vs. GPA
• Shoe size vs. probability of winning the National
Spelling Bee

17
Bivariate Data - Review
• The very first step to analyzing bivariate data
is to graph it. When we graph this data, we use
a scatterplot.
• After we graph it, we examine three things to
describe the association
• Form linear or curved (we will discussed curved
data later in this unit)
• Direction positive, negative, or no association
• Strength strong or weak

18
Looking at Scatterplots
• Form
• If there is a straight line (linear)
relationship, it will appear as a cloud or swarm
of points stretched out in a generally
consistent, straight form of a line.
• How should we describe the form for these two
graphs?

19
Looking at Scatterplots (cont.)
• Direction
• A positive association generally tells us that as
one variable increases, the other variable also
increases.
• A negative association generally tells us that as
one variable increases, the other variable
decreases.
• In this example, there is a negative association
between central pressure and maximum wind speed
is given.
• As the central pressure increases, the maximum
wind speed decreases.

20
Looking at Scatterplots (cont.)
• Direction
• When the points are scattered about randomly with
no discernable pattern, we say that there is no
association
• No association generally tells us that as one
variable increases, we know nothing about the
other variable
• The scatterplot above show no association between
the explanatory and response variables.

21
Looking at Scatterplots (cont.)
• Strength
• This describes how tightly the points follow
the form (or pattern)
• A strong association has points follow the
pattern very tightly whereas a weak association
has points that follow the pattern, but in a much
looser manner. Note we will quantify the
amount of scatter soon.

This has a strong, positive linear association
This has a weak, negative linear association
22
Looking at two-variable data
• Lets look at a real-life example to make this
idea a little more concrete
• Do taller people tend to have heavier weights?
• This question is an example of how two variable
play different roles in data.
• Height is the explanatory or predictor variable
and weight is the response variable.
• Lets take a look at the Detroit Pistons

23
Pistons Roster

24
Going to work with the Pistons
PLAYER POS HT WT
Chucky Atkins G 5-11 160
Chauncey Billups G 6-3 202
Elden Campbell C-F 7-0 279
Hubert Davis G 6-5 183
Carlos Delfino G 6-6 230
Andreas Glyniadakis C 7-1 280
Darvin Ham F-G 6-7 240
Richard Hamilton G-F 6-7 193
Lindsey Hunter G 6-2 195
Darko Milicic F-C 7-0 245
Mehmet Okur F 6-11 249
Tayshaun Prince F 6-9 215
Zeljko Rebraca C 7-0 257
Don Reid (FA) F 6-8 250
Bob Sura G 6-5 200
Ben Wallace F-C 6-9 240
Corliss Williamson F 6-7 245
25
Using the Calculator
26
300
275
250
225
200
175
150
125
68
70
72
74
76
78
80
82
84
86
27
Correlation Coefficient
• The correlation coefficient numerically measures
the strength of the linear association between
two quantitative variables.
• Correlation Linear Association Relationship
• There are three conditions that must be met
before we can look at the correlation
coefficient
• Quantitative Condition Correlation only applies
to quantitative variables. Dont apply
correlation to categorical data!
• Straight Enough Condition Correlation measures
the strength of linear association, which is
useless if the data is not linear!

28
Correlation Coefficient
• There is one more condition
• Outlier Condition
• Outliers can distort the correlation
dramatically.
• An outlier can make an otherwise small
correlation look big or hide a large correlation.
• It can even give an otherwise positive
association a negative correlation coefficient
(and vice versa).
• When you see an outlier, its often a good idea
to report the correlations with and without the
point.
• Note when asked about correlation, you should
memorize this phrase
• With a correlation of (r), there is a
(strong/weak), (positive/negative) linear
association between the (explanatory variable)
and the (response variable)

29
Correlation Coefficient
• Is there a correlation between a basketball
teams heights and weights?
• Is the association positively associated or
negatively associated?
• Is the association strong or weak?

30
What do we do with correlation?
Examine pg. 151
31
Calculating Correlation Coefficient
• The calculation of correlation is based on mean
and standard deviation.
• Remember that both mean and standard deviation
are not resistant measures.

32
Calculating Correlation Coefficient
• What does the contents of the parenthesis look
like?
• What happens when the values are both from the
lower half of the population? From the upper half?

Both z-values are negative. Their product is
positive.
The formula for calculating z-values.
Both z-values are positive. Their product is
positive.
33
Calculating Correlation Coefficient
• What happens when one value is from the lower
half of the population but other value is from
the upper half?

One z-value is positive and the other is
negative. Their product is negative.
34
Using the TI-84 to calculate r
• If you have a TI-84, you must turn your
Diagnostic on you need to enter DiagnosticOn
from the Catalog
• TI-89 users dont need to worry about this
operation since the Diagnostic is automatically
on in your calculator

35
Using the TI to calculate r
• Run LinReg(abx) with the explanatory variable as
the first list, and the response variable as the
second list

TI-84
TI-89
36
Using the TI to calculate r
• The results are the slope and vertical intercept
of the regression equation (more on that later)
and values of r and r2 (more on r2 later as well).

37
Facts about correlation
• Both variables need to be quantitative
• Because the data values are standardized, it does
not matter what units we use for each of the
variables
• Also, since r uses standardized values of the
observations, r does not change if we change the
units of x, y, or both (in other words, we can
multiply, divide, add or subtract a value to x,
y, or both and r will stay the same)
• The value of r is unit-less.

38
Facts about correlation
• The value of r will always be between -1 and 1.
• Values closer to -1 reflect strong negative
linear association.
• Values closer to 1 reflect strong positive
linear association.
• Values close to 0 reflect no linear association.
• Correlation does not measure the strength of
non-linear relationships

39
Facts about correlation
• Correlation is blind to the relationship between
explanatory and response variables.
• Even though you may get a r value close to -1 or
1, it does not mean that you can say that the
explanatory variable causes the response
variable.
• Scatterplots and correlation coefficients never
prove causation.
• A hidden variable that stands behind a
relationship and determines it by simultaneously
affecting the other two variables is called a
lurking variable.

40
Facts about correlation
• The value of r is a measure of the strength of a
linear relationship. It measures how closely the
data fall to a straight line. An r value near 0,
however, does not imply that there is no
relationship, only no linear relationship. For
example, quadratic or sinusoidal data have an r
close to 0, even though there is a strong
relationship present.
• r measures the correlation between 2 variables in
a sample of observations from the population of
interest. Thus, r is the sample correlation
coefficient which is used to estimate ? (rho),
the population correlation coefficient.

41
What Can Go Wrong?
• Dont say correlation when you mean
association.
• More often than not, people say correlation when
they mean association.
• The word correlation should be reserved for
measuring the strength and direction of the
linear relationship between two quantitative
variables.

42
What Can Go Wrong?
• Dont correlate categorical variables.
• Be sure to check the Quantitative Variables
Condition.
• Dont confuse correlation with causation.
• Scatterplots and correlations never demonstrate
causation.
• These statistical tools can only demonstrate an
association between variables.

43
What Can Go Wrong? (cont.)
• Be sure the association is linear.
• There may be a strong association between two
variables that have a nonlinear association. The
correlation will be near 0 why do you think
that is?

44
What Can Go Wrong? (cont.)
• Dont assume the relationship is linear just
because the correlation coefficient is high.
• Here the correlation is 0.979, but the
relationship is actually bent.

45
What Can Go Wrong? (cont.)
• Beware of outliers.
• Even a single outlier
• can dominate the
correlation value.
• Make sure to check
the Outlier
Condition.
• Without the outlier, the correlation would be 0
but with the outlier, the correlation,
deceivingly, is much closer to 1

46
What have we learned?
• We examine scatterplots for form, direction,
strength, and unusual features.
• Although not every relationship is linear, when
the scatterplot is straight enough, the
correlation coefficient is a useful numerical
summary.
• The sign of the correlation tells us the
direction of the association.
• The magnitude of the correlation tells us the
strength of a linear association.
• Correlation has no units, so shifting or scaling
the data, standardizing, or swapping the
variables has no effect on the numerical value.

47
What have we learned? (cont.)
• Doing Statistics right means that we have to
Think about whether our choice of methods is
appropriate.
• Before finding or talking about a correlation,
check the Straight Enough Condition.
• Watch out for outliers!
• Dont assume that a high correlation or strong
association is evidence of a cause-and-effect
relationshipbeware of lurking variables!

48
What have we learned? (cont.)
• Unusual features
• Look for the unexpected.
• Often the most interesting thing to see in a
scatterplot is the thing you never thought to
look for.
• One example of such a surprise is an outlier
standing away from the overall pattern of the
scatterplot.
• Clusters or subgroups should also raise questions.
About PowerShow.com