Loading...

PPT – AP Statistics Chapter 3 PowerPoint presentation | free to download - id: 6b8133-YzRiZ

The Adobe Flash plugin is needed to view this content

AP Statistics Chapter 3

- Scatterplots, Association, and Correlation

Relationships

- You can observe a lot by watching, Yogi Berra
- Although this statement is said in jest, it

carries much truth. - Many statistical studies look at multiple

variables to try to show a relationship between

one variable and another. - Most of the time, questions ask whether there is

an association between two variables. - It is important to know the definition of a few

words before we continue to explore this concept.

- When one variable effects another, one variable

will be referred to the explanatory variable and

the other as the response variable - Explanatory Variable
- A variable that attempts to explain the observed

outcomes. - Response Variable
- A variable that measures an outcome of a study.
- Association
- Any relationship between two measured quantities

that renders them statistically dependent - Simply, if there is a (direct or indirect) link

between two variables

Example

- Suppose that I randomly select 10 students from a

Stats. class and record their weight in pounds

and get the following results - 103, 201, 125, 179, 150, 138, 181, 220, 113, 126
- Now, lets say that I was going to pick another

random student, could I come up with a prediction

on how much that student was going to weigh?

(The mean is 153.6 and standard deviation

39.58) - How accurate will my prediction be?
- Is there a way to improve this prediction?

Example

- Now lets say that I have more information
- Weight 103, 201, 125, 179, 150, 138, 181, 220,

113, 126 - Height 61, 68, 65, 69, 65, 61, 64,

72, 63, 62 - The following represents both the weights and

heights (in inches) of the 10 students - Now, lets say that I was going to pick another

random student, knowing their height is 65

inches, could I come up with a prediction on how

much that student was going to weigh? - How accurate will my prediction be?
- Is this a better way to make a prediction?

Roles for Variables

- When we have two variable (or bivariate data), it

always a good idea to make a picture! - As we graph the two variables, it is important to

determine which of the two quantitative variables

goes on the x-axis and which on the y-axis. - This determination is made based on the roles

played by the variables. - When the roles are clear, the explanatory or

independent variable goes on the x-axis, and the

response variable or dependent variable goes on

the y-axis.

Roles for Variables (cont.)

- The roles that we choose for variables are more

about how we think about them rather than about

the variables themselves. - Just placing a variable on the x-axis doesnt

necessarily mean that it explains or predicts

anything. And the variable on the y-axis may not

respond to it in any way. - In a cause and effect relationship, the

explanatory variable is the cause, and the

response variable is the effect. Regression is a

method for predicting the value of a dependent

variable y, based on the value of an independent

variable x.

Examples

- A study looks at smoking and lung cancer.
- Which (if any) is the explanatory variable?
- Which (if any) is the response variable?
- Is smoking a quantitative or categorical

variable? - Is lung cancer a quantitative or categorical

variable?

Examples

- A study looks at cavities and milk drinking.
- Which (if any) is the explanatory variable?
- Which (if any) is the response variable?
- Is cavities a quantitative or categorical

variable? - Is milk drinking a quantitative or categorical

variable?

Examples

- A study looks at rain fall and SAT scores.
- Which (if any) is the explanatory variable?
- Which (if any) is the response variable?
- Is rainfall a quantitative or categorical

variable? - Is SAT scores a quantitative or categorical

variable?

Looking at Scatterplots

- Scatterplots may be the most common and most

effective display for two-variable data. - In a scatterplot, you can see patterns, trends,

relationships, and even the occasional

extraordinary value sitting apart from the

others. - Scatterplots are the best way to start observing

the relationship and the ideal way to picture

associations between two quantitative variables.

Example

- Now lets revisit the problem we started with
- Weight 103, 201, 125, 179, 150, 138, 181, 220,

113, 126 - Height 61, 68, 65, 69, 65, 61, 64,

72, 63, 62 - The following represents both the weights and

heights (in inches) of the 10 students - Graph the data and describe the distribution.

Regression

- When we perform regression, we take two variables

and we attempt to use the explanatory variable to

estimate (or predict) the value of the response

variable. This process is called Regression - As you can imagine, at times, theres clear

relationship between the two variable and

sometimes there might not be any relationship. - Also, even if there is a relationship, some

relationships are strong while others are weak. - To best describe the relationship, we should

always describe the form, direction, and the

strength.

Interpreting Association

- Form Is there a pattern? Is the data linear or

curved? Are there clusters of data? - Strength Is it weak or strong? Does the data

tightly conform or loosely conform? - Direction If linear, does the data go up

(positively associated) or go down (negatively

associated) or is it a horizontal line (no

association)? - Deviations from pattern Are there areas where

the data conform less to the pattern? Are there

any outliers?

(Percent of graduates taking SAT vs. the Average

SAT Math Score)

- Attributes of a good scatterplot
- Consistent and uniform scale
- Label on both axis
- Accurate placement of data
- Data throughout the axis
- Axis break lines if not starting at zero.
- To achieve these goal you are required to do your

scatterplots on graph paper.

Examples

- Try to make a graph of the following situation,

then describe the association - Points allowed vs. winning percentage
- World population vs. year
- Amount of rain vs. crop yield
- Height vs. weight
- Height vs. GPA
- Shoe size vs. probability of winning the National

Spelling Bee

Bivariate Data - Review

- The very first step to analyzing bivariate data

is to graph it. When we graph this data, we use

a scatterplot. - After we graph it, we examine three things to

describe the association - Form linear or curved (we will discussed curved

data later in this unit) - Direction positive, negative, or no association
- Strength strong or weak

Looking at Scatterplots

- Form
- If there is a straight line (linear)

relationship, it will appear as a cloud or swarm

of points stretched out in a generally

consistent, straight form of a line.

- How should we describe the form for these two

graphs?

Looking at Scatterplots (cont.)

- Direction
- A positive association generally tells us that as

one variable increases, the other variable also

increases.

- A negative association generally tells us that as

one variable increases, the other variable

decreases.

- In this example, there is a negative association

between central pressure and maximum wind speed

is given. - As the central pressure increases, the maximum

wind speed decreases.

Looking at Scatterplots (cont.)

- Direction
- When the points are scattered about randomly with

no discernable pattern, we say that there is no

association - No association generally tells us that as one

variable increases, we know nothing about the

other variable

- The scatterplot above show no association between

the explanatory and response variables.

Looking at Scatterplots (cont.)

- Strength
- This describes how tightly the points follow

the form (or pattern) - A strong association has points follow the

pattern very tightly whereas a weak association

has points that follow the pattern, but in a much

looser manner. Note we will quantify the

amount of scatter soon.

This has a strong, positive linear association

This has a weak, negative linear association

Looking at two-variable data

- Lets look at a real-life example to make this

idea a little more concrete - Do taller people tend to have heavier weights?
- This question is an example of how two variable

play different roles in data. - Height is the explanatory or predictor variable

and weight is the response variable. - Lets take a look at the Detroit Pistons

Pistons Roster

Going to work with the Pistons

PLAYER POS HT WT

Chucky Atkins G 5-11 160

Chauncey Billups G 6-3 202

Elden Campbell C-F 7-0 279

Hubert Davis G 6-5 183

Carlos Delfino G 6-6 230

Andreas Glyniadakis C 7-1 280

Darvin Ham F-G 6-7 240

Richard Hamilton G-F 6-7 193

Lindsey Hunter G 6-2 195

Darko Milicic F-C 7-0 245

Mehmet Okur F 6-11 249

Tayshaun Prince F 6-9 215

Zeljko Rebraca C 7-0 257

Don Reid (FA) F 6-8 250

Bob Sura G 6-5 200

Ben Wallace F-C 6-9 240

Corliss Williamson F 6-7 245

Using the Calculator

300

275

250

225

200

175

150

125

68

70

72

74

76

78

80

82

84

86

Correlation Coefficient

- The correlation coefficient numerically measures

the strength of the linear association between

two quantitative variables. - Correlation Linear Association Relationship
- There are three conditions that must be met

before we can look at the correlation

coefficient - Quantitative Condition Correlation only applies

to quantitative variables. Dont apply

correlation to categorical data! - Straight Enough Condition Correlation measures

the strength of linear association, which is

useless if the data is not linear!

Correlation Coefficient

- There is one more condition
- Outlier Condition
- Outliers can distort the correlation

dramatically. - An outlier can make an otherwise small

correlation look big or hide a large correlation.

- It can even give an otherwise positive

association a negative correlation coefficient

(and vice versa). - When you see an outlier, its often a good idea

to report the correlations with and without the

point. - Note when asked about correlation, you should

memorize this phrase - With a correlation of (r), there is a

(strong/weak), (positive/negative) linear

association between the (explanatory variable)

and the (response variable)

Correlation Coefficient

- Is there a correlation between a basketball

teams heights and weights? - Is the association positively associated or

negatively associated? - Is the association strong or weak?

What do we do with correlation?

Examine pg. 151

Calculating Correlation Coefficient

- The calculation of correlation is based on mean

and standard deviation. - Remember that both mean and standard deviation

are not resistant measures.

Calculating Correlation Coefficient

- What does the contents of the parenthesis look

like? - What happens when the values are both from the

lower half of the population? From the upper half?

Both z-values are negative. Their product is

positive.

The formula for calculating z-values.

Both z-values are positive. Their product is

positive.

Calculating Correlation Coefficient

- What happens when one value is from the lower

half of the population but other value is from

the upper half?

One z-value is positive and the other is

negative. Their product is negative.

Using the TI-84 to calculate r

- If you have a TI-84, you must turn your

Diagnostic on you need to enter DiagnosticOn

from the Catalog

- TI-89 users dont need to worry about this

operation since the Diagnostic is automatically

on in your calculator

Using the TI to calculate r

- Run LinReg(abx) with the explanatory variable as

the first list, and the response variable as the

second list

TI-84

TI-89

Using the TI to calculate r

- The results are the slope and vertical intercept

of the regression equation (more on that later)

and values of r and r2 (more on r2 later as well).

Facts about correlation

- Both variables need to be quantitative
- Because the data values are standardized, it does

not matter what units we use for each of the

variables - Also, since r uses standardized values of the

observations, r does not change if we change the

units of x, y, or both (in other words, we can

multiply, divide, add or subtract a value to x,

y, or both and r will stay the same) - The value of r is unit-less.

Facts about correlation

- The value of r will always be between -1 and 1.
- Values closer to -1 reflect strong negative

linear association. - Values closer to 1 reflect strong positive

linear association. - Values close to 0 reflect no linear association.
- Correlation does not measure the strength of

non-linear relationships

Facts about correlation

- Correlation is blind to the relationship between

explanatory and response variables. - Even though you may get a r value close to -1 or

1, it does not mean that you can say that the

explanatory variable causes the response

variable. - Scatterplots and correlation coefficients never

prove causation. - A hidden variable that stands behind a

relationship and determines it by simultaneously

affecting the other two variables is called a

lurking variable.

Facts about correlation

- The value of r is a measure of the strength of a

linear relationship. It measures how closely the

data fall to a straight line. An r value near 0,

however, does not imply that there is no

relationship, only no linear relationship. For

example, quadratic or sinusoidal data have an r

close to 0, even though there is a strong

relationship present. - r measures the correlation between 2 variables in

a sample of observations from the population of

interest. Thus, r is the sample correlation

coefficient which is used to estimate ? (rho),

the population correlation coefficient.

What Can Go Wrong?

- Dont say correlation when you mean

association. - More often than not, people say correlation when

they mean association. - The word correlation should be reserved for

measuring the strength and direction of the

linear relationship between two quantitative

variables.

What Can Go Wrong?

- Dont correlate categorical variables.
- Be sure to check the Quantitative Variables

Condition. - Dont confuse correlation with causation.
- Scatterplots and correlations never demonstrate

causation. - These statistical tools can only demonstrate an

association between variables.

What Can Go Wrong? (cont.)

- Be sure the association is linear.
- There may be a strong association between two

variables that have a nonlinear association. The

correlation will be near 0 why do you think

that is?

What Can Go Wrong? (cont.)

- Dont assume the relationship is linear just

because the correlation coefficient is high.

- Here the correlation is 0.979, but the

relationship is actually bent.

What Can Go Wrong? (cont.)

- Beware of outliers.
- Even a single outlier
- can dominate the

correlation value. - Make sure to check

the Outlier

Condition.

- Without the outlier, the correlation would be 0

but with the outlier, the correlation,

deceivingly, is much closer to 1

What have we learned?

- We examine scatterplots for form, direction,

strength, and unusual features. - Although not every relationship is linear, when

the scatterplot is straight enough, the

correlation coefficient is a useful numerical

summary. - The sign of the correlation tells us the

direction of the association. - The magnitude of the correlation tells us the

strength of a linear association. - Correlation has no units, so shifting or scaling

the data, standardizing, or swapping the

variables has no effect on the numerical value.

What have we learned? (cont.)

- Doing Statistics right means that we have to

Think about whether our choice of methods is

appropriate. - Before finding or talking about a correlation,

check the Straight Enough Condition. - Watch out for outliers!
- Dont assume that a high correlation or strong

association is evidence of a cause-and-effect

relationshipbeware of lurking variables!

What have we learned? (cont.)

- Unusual features
- Look for the unexpected.
- Often the most interesting thing to see in a

scatterplot is the thing you never thought to

look for. - One example of such a surprise is an outlier

standing away from the overall pattern of the

scatterplot. - Clusters or subgroups should also raise questions.