AP Statistics Chapter 3 - PowerPoint PPT Presentation


PPT – AP Statistics Chapter 3 PowerPoint presentation | free to download - id: 6b8133-YzRiZ


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

AP Statistics Chapter 3


Title: Chapter 7: Correlation Author: Paul Kim Last modified by: Krausse, Joy Created Date: 5/17/2002 8:23:45 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Date added: 5 November 2019
Slides: 49
Provided by: PaulK168


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: AP Statistics Chapter 3

AP Statistics Chapter 3
  • Scatterplots, Association, and Correlation

  • You can observe a lot by watching, Yogi Berra
  • Although this statement is said in jest, it
    carries much truth.
  • Many statistical studies look at multiple
    variables to try to show a relationship between
    one variable and another.
  • Most of the time, questions ask whether there is
    an association between two variables.
  • It is important to know the definition of a few
    words before we continue to explore this concept.

  • When one variable effects another, one variable
    will be referred to the explanatory variable and
    the other as the response variable
  • Explanatory Variable
  • A variable that attempts to explain the observed
  • Response Variable
  • A variable that measures an outcome of a study.
  • Association
  • Any relationship between two measured quantities
    that renders them statistically dependent
  • Simply, if there is a (direct or indirect) link
    between two variables

  • Suppose that I randomly select 10 students from a
    Stats. class and record their weight in pounds
    and get the following results
  • 103, 201, 125, 179, 150, 138, 181, 220, 113, 126
  • Now, lets say that I was going to pick another
    random student, could I come up with a prediction
    on how much that student was going to weigh?
    (The mean is 153.6 and standard deviation
  • How accurate will my prediction be?
  • Is there a way to improve this prediction?

  • Now lets say that I have more information
  • Weight 103, 201, 125, 179, 150, 138, 181, 220,
    113, 126
  • Height 61, 68, 65, 69, 65, 61, 64,
    72, 63, 62
  • The following represents both the weights and
    heights (in inches) of the 10 students
  • Now, lets say that I was going to pick another
    random student, knowing their height is 65
    inches, could I come up with a prediction on how
    much that student was going to weigh?
  • How accurate will my prediction be?
  • Is this a better way to make a prediction?

Roles for Variables
  • When we have two variable (or bivariate data), it
    always a good idea to make a picture!
  • As we graph the two variables, it is important to
    determine which of the two quantitative variables
    goes on the x-axis and which on the y-axis.
  • This determination is made based on the roles
    played by the variables.
  • When the roles are clear, the explanatory or
    independent variable goes on the x-axis, and the
    response variable or dependent variable goes on
    the y-axis.

Roles for Variables (cont.)
  • The roles that we choose for variables are more
    about how we think about them rather than about
    the variables themselves.
  • Just placing a variable on the x-axis doesnt
    necessarily mean that it explains or predicts
    anything. And the variable on the y-axis may not
    respond to it in any way.
  • In a cause and effect relationship, the
    explanatory variable is the cause, and the
    response variable is the effect. Regression is a
    method for predicting the value of a dependent
    variable y, based on the value of an independent
    variable x.

  • A study looks at smoking and lung cancer.
  • Which (if any) is the explanatory variable?
  • Which (if any) is the response variable?
  • Is smoking a quantitative or categorical
  • Is lung cancer a quantitative or categorical

  • A study looks at cavities and milk drinking.
  • Which (if any) is the explanatory variable?
  • Which (if any) is the response variable?
  • Is cavities a quantitative or categorical
  • Is milk drinking a quantitative or categorical

  • A study looks at rain fall and SAT scores.
  • Which (if any) is the explanatory variable?
  • Which (if any) is the response variable?
  • Is rainfall a quantitative or categorical
  • Is SAT scores a quantitative or categorical

Looking at Scatterplots
  • Scatterplots may be the most common and most
    effective display for two-variable data.
  • In a scatterplot, you can see patterns, trends,
    relationships, and even the occasional
    extraordinary value sitting apart from the
  • Scatterplots are the best way to start observing
    the relationship and the ideal way to picture
    associations between two quantitative variables.

  • Now lets revisit the problem we started with
  • Weight 103, 201, 125, 179, 150, 138, 181, 220,
    113, 126
  • Height 61, 68, 65, 69, 65, 61, 64,
    72, 63, 62
  • The following represents both the weights and
    heights (in inches) of the 10 students
  • Graph the data and describe the distribution.

  • When we perform regression, we take two variables
    and we attempt to use the explanatory variable to
    estimate (or predict) the value of the response
    variable. This process is called Regression
  • As you can imagine, at times, theres clear
    relationship between the two variable and
    sometimes there might not be any relationship.
  • Also, even if there is a relationship, some
    relationships are strong while others are weak.
  • To best describe the relationship, we should
    always describe the form, direction, and the

Interpreting Association
  • Form Is there a pattern? Is the data linear or
    curved? Are there clusters of data?
  • Strength Is it weak or strong? Does the data
    tightly conform or loosely conform?
  • Direction If linear, does the data go up
    (positively associated) or go down (negatively
    associated) or is it a horizontal line (no
  • Deviations from pattern Are there areas where
    the data conform less to the pattern? Are there
    any outliers?

(Percent of graduates taking SAT vs. the Average
SAT Math Score)
  • Attributes of a good scatterplot
  • Consistent and uniform scale
  • Label on both axis
  • Accurate placement of data
  • Data throughout the axis
  • Axis break lines if not starting at zero.
  • To achieve these goal you are required to do your
    scatterplots on graph paper.

  • Try to make a graph of the following situation,
    then describe the association
  • Points allowed vs. winning percentage
  • World population vs. year
  • Amount of rain vs. crop yield
  • Height vs. weight
  • Height vs. GPA
  • Shoe size vs. probability of winning the National
    Spelling Bee

Bivariate Data - Review
  • The very first step to analyzing bivariate data
    is to graph it. When we graph this data, we use
    a scatterplot.
  • After we graph it, we examine three things to
    describe the association
  • Form linear or curved (we will discussed curved
    data later in this unit)
  • Direction positive, negative, or no association
  • Strength strong or weak

Looking at Scatterplots
  • Form
  • If there is a straight line (linear)
    relationship, it will appear as a cloud or swarm
    of points stretched out in a generally
    consistent, straight form of a line.
  • How should we describe the form for these two

Looking at Scatterplots (cont.)
  • Direction
  • A positive association generally tells us that as
    one variable increases, the other variable also
  • A negative association generally tells us that as
    one variable increases, the other variable
  • In this example, there is a negative association
    between central pressure and maximum wind speed
    is given.
  • As the central pressure increases, the maximum
    wind speed decreases.

Looking at Scatterplots (cont.)
  • Direction
  • When the points are scattered about randomly with
    no discernable pattern, we say that there is no
  • No association generally tells us that as one
    variable increases, we know nothing about the
    other variable
  • The scatterplot above show no association between
    the explanatory and response variables.

Looking at Scatterplots (cont.)
  • Strength
  • This describes how tightly the points follow
    the form (or pattern)
  • A strong association has points follow the
    pattern very tightly whereas a weak association
    has points that follow the pattern, but in a much
    looser manner. Note we will quantify the
    amount of scatter soon.

This has a strong, positive linear association
This has a weak, negative linear association
Looking at two-variable data
  • Lets look at a real-life example to make this
    idea a little more concrete
  • Do taller people tend to have heavier weights?
  • This question is an example of how two variable
    play different roles in data.
  • Height is the explanatory or predictor variable
    and weight is the response variable.
  • Lets take a look at the Detroit Pistons

Pistons Roster

Going to work with the Pistons
Chucky Atkins G 5-11 160
Chauncey Billups G 6-3 202
Elden Campbell C-F 7-0 279
Hubert Davis G 6-5 183
Carlos Delfino G 6-6 230
Andreas Glyniadakis C 7-1 280
Darvin Ham F-G 6-7 240
Richard Hamilton G-F 6-7 193
Lindsey Hunter G 6-2 195
Darko Milicic F-C 7-0 245
Mehmet Okur F 6-11 249
Tayshaun Prince F 6-9 215
Zeljko Rebraca C 7-0 257
Don Reid (FA) F 6-8 250
Bob Sura G 6-5 200
Ben Wallace F-C 6-9 240
Corliss Williamson F 6-7 245
Using the Calculator
Correlation Coefficient
  • The correlation coefficient numerically measures
    the strength of the linear association between
    two quantitative variables.
  • Correlation Linear Association Relationship
  • There are three conditions that must be met
    before we can look at the correlation
  • Quantitative Condition Correlation only applies
    to quantitative variables. Dont apply
    correlation to categorical data!
  • Straight Enough Condition Correlation measures
    the strength of linear association, which is
    useless if the data is not linear!

Correlation Coefficient
  • There is one more condition
  • Outlier Condition
  • Outliers can distort the correlation
  • An outlier can make an otherwise small
    correlation look big or hide a large correlation.
  • It can even give an otherwise positive
    association a negative correlation coefficient
    (and vice versa).
  • When you see an outlier, its often a good idea
    to report the correlations with and without the
  • Note when asked about correlation, you should
    memorize this phrase
  • With a correlation of (r), there is a
    (strong/weak), (positive/negative) linear
    association between the (explanatory variable)
    and the (response variable)

Correlation Coefficient
  • Is there a correlation between a basketball
    teams heights and weights?
  • Is the association positively associated or
    negatively associated?
  • Is the association strong or weak?

What do we do with correlation?
Examine pg. 151
Calculating Correlation Coefficient
  • The calculation of correlation is based on mean
    and standard deviation.
  • Remember that both mean and standard deviation
    are not resistant measures.

Calculating Correlation Coefficient
  • What does the contents of the parenthesis look
  • What happens when the values are both from the
    lower half of the population? From the upper half?

Both z-values are negative. Their product is
The formula for calculating z-values.
Both z-values are positive. Their product is
Calculating Correlation Coefficient
  • What happens when one value is from the lower
    half of the population but other value is from
    the upper half?

One z-value is positive and the other is
negative. Their product is negative.
Using the TI-84 to calculate r
  • If you have a TI-84, you must turn your
    Diagnostic on you need to enter DiagnosticOn
    from the Catalog
  • TI-89 users dont need to worry about this
    operation since the Diagnostic is automatically
    on in your calculator

Using the TI to calculate r
  • Run LinReg(abx) with the explanatory variable as
    the first list, and the response variable as the
    second list

Using the TI to calculate r
  • The results are the slope and vertical intercept
    of the regression equation (more on that later)
    and values of r and r2 (more on r2 later as well).

Facts about correlation
  • Both variables need to be quantitative
  • Because the data values are standardized, it does
    not matter what units we use for each of the
  • Also, since r uses standardized values of the
    observations, r does not change if we change the
    units of x, y, or both (in other words, we can
    multiply, divide, add or subtract a value to x,
    y, or both and r will stay the same)
  • The value of r is unit-less.

Facts about correlation
  • The value of r will always be between -1 and 1.
  • Values closer to -1 reflect strong negative
    linear association.
  • Values closer to 1 reflect strong positive
    linear association.
  • Values close to 0 reflect no linear association.
  • Correlation does not measure the strength of
    non-linear relationships

Facts about correlation
  • Correlation is blind to the relationship between
    explanatory and response variables.
  • Even though you may get a r value close to -1 or
    1, it does not mean that you can say that the
    explanatory variable causes the response
  • Scatterplots and correlation coefficients never
    prove causation.
  • A hidden variable that stands behind a
    relationship and determines it by simultaneously
    affecting the other two variables is called a
    lurking variable.

Facts about correlation
  • The value of r is a measure of the strength of a
    linear relationship. It measures how closely the
    data fall to a straight line. An r value near 0,
    however, does not imply that there is no
    relationship, only no linear relationship. For
    example, quadratic or sinusoidal data have an r
    close to 0, even though there is a strong
    relationship present.
  • r measures the correlation between 2 variables in
    a sample of observations from the population of
    interest. Thus, r is the sample correlation
    coefficient which is used to estimate ? (rho),
    the population correlation coefficient.

What Can Go Wrong?
  • Dont say correlation when you mean
  • More often than not, people say correlation when
    they mean association.
  • The word correlation should be reserved for
    measuring the strength and direction of the
    linear relationship between two quantitative

What Can Go Wrong?
  • Dont correlate categorical variables.
  • Be sure to check the Quantitative Variables
  • Dont confuse correlation with causation.
  • Scatterplots and correlations never demonstrate
  • These statistical tools can only demonstrate an
    association between variables.

What Can Go Wrong? (cont.)
  • Be sure the association is linear.
  • There may be a strong association between two
    variables that have a nonlinear association. The
    correlation will be near 0 why do you think
    that is?

What Can Go Wrong? (cont.)
  • Dont assume the relationship is linear just
    because the correlation coefficient is high.
  • Here the correlation is 0.979, but the
    relationship is actually bent.

What Can Go Wrong? (cont.)
  • Beware of outliers.
  • Even a single outlier
  • can dominate the
    correlation value.
  • Make sure to check
    the Outlier
  • Without the outlier, the correlation would be 0
    but with the outlier, the correlation,
    deceivingly, is much closer to 1

What have we learned?
  • We examine scatterplots for form, direction,
    strength, and unusual features.
  • Although not every relationship is linear, when
    the scatterplot is straight enough, the
    correlation coefficient is a useful numerical
  • The sign of the correlation tells us the
    direction of the association.
  • The magnitude of the correlation tells us the
    strength of a linear association.
  • Correlation has no units, so shifting or scaling
    the data, standardizing, or swapping the
    variables has no effect on the numerical value.

What have we learned? (cont.)
  • Doing Statistics right means that we have to
    Think about whether our choice of methods is
  • Before finding or talking about a correlation,
    check the Straight Enough Condition.
  • Watch out for outliers!
  • Dont assume that a high correlation or strong
    association is evidence of a cause-and-effect
    relationshipbeware of lurking variables!

What have we learned? (cont.)
  • Unusual features
  • Look for the unexpected.
  • Often the most interesting thing to see in a
    scatterplot is the thing you never thought to
    look for.
  • One example of such a surprise is an outlier
    standing away from the overall pattern of the
  • Clusters or subgroups should also raise questions.
About PowerShow.com