Loading...

PPT – Chapter 3 Association: Contingency, Correlation, and Regression PowerPoint presentation | free to view - id: 1c6405-YzNmM

The Adobe Flash plugin is needed to view this content

Chapter 3Association Contingency, Correlation,

and Regression

- Section 3.1
- How Can We Explore the Association between Two

Categorical Variables?

Learning Objectives

- Identify variable type Response or Explanatory
- Define Association
- Contingency tables
- Calculate proportions and conditional proportions

Learning Objective 1Response and Explanatory

variables

- Response variable (Dependent Variable)
- the outcome variable on which comparisons are

made - Explanatory variable (Independent variable)
- defines the groups to be compared with respect to

values on the response variable - Example Response/Explanatory
- Blood alcohol level/ of beers consumed
- Grade on test/Amount of study time
- Yield of corn per bushel/Amount of rainfall

Learning Objective 2Association

- The main purpose of data analysis with two

variables is to investigate whether there is an

association and to describe that association - An association exists between two variables if a

particular value for one variable is more likely

to occur with certain values of the other

variable

Learning Objective 3Contingency Table

- A contingency table
- Displays two categorical variables
- The rows list the categories of one variable
- The columns list the categories of the other

variable - Entries in the table are frequencies

Learning Objective 3Contingency Table

What is the response variable? What is the

explanatory variable?

Learning Objective 4Calculate proportions and

conditional proportions

Learning Objective 4 Calculate proportions and

conditional proportions

- What proportion of organic foods contain

pesticides? - What proportion of conventionally grown foods

contain pesticides? - What proportion of all sampled items contain

pesticide residuals?

Learning Objective 4Calculate proportions and

conditional proportions

Use side by side bar charts to show conditional

proportions Allows for easy comparison of the

explanatory variable with respect to the response

variable

Learning Objective 4Calculate proportions and

conditional proportions

- If there was no association between organic and

conventional foods, then the proportions for the

response variable categories would be the same

for each food type

Chapter 3Association Contingency, Correlation,

and Regression

- Section 3.2
- How Can We Explore the Association between Two

Quantitative Variables?

Learning Objectives

- Constructing scatterplots
- Interpreting a scatterplot
- Correlation
- Calculating correlation

Learning Objective 1Scatterplot

- Graphical display of relationship between two

quantitative variables - Horizontal Axis Explanatory variable, x
- Vertical Axis Response variable, y

Learning Objective 1Internet Usage and Gross

National Product (GDP) Data Set

Learning Objective 1Internet Usage and Gross

National Product (GDP)

- Enter values of explanatory variable
- (x) in L1
- Enter values of of response variable
- (y) in L2
- STAT PLOT
- Plot 1 on
- Type scatter plot
- X list L2
- Y list L1
- ZOOM
- 9ZoomStat
- Graph

Learning Objective 1Baseball Average and Team

Scoring

Learning Objective 1Baseball Average and Team

Scoring

- Enter values of explanatory variable
- (x) in L1
- Enter values of of response variable
- (y) in L2
- STAT PLOT
- Plot 1 on
- Type scatter plot
- X list L1
- Y list L2
- ZOOM
- 9ZoomStat
- Graph

Use L3 for x and L4 for y. You will use data

from prior example again later on in the

PowerPoint.

Learning Objective 2Interpreting Scatterplots

- You can describe the overall pattern of a

scatterplot by the trend, direction, and strength

of the relationship between the two variables - Trend linear, curved, clusters, no pattern
- Direction positive, negative, no direction
- Strength how closely the points fit the trend
- Also look for outliers from the overall trend

Learning Objective 2Interpreting Scatterplots

Direction/Association

- Two quantitative variables x and y are
- Positively associated when
- High values of x tend to occur with high values

of y - Low values of x tend to occur with low values of

y - Negatively associated when high values of one

variable tend to pair with low values of the

other variable

Learning Objective 2Example 100 cars on the

lot of a used-car dealership

- Would you expect a positive association, a
- negative association or no association between
- the age of the car and the mileage on the
- odometer?
- Positive association
- Negative association
- No association

Learning Objective 2Example Did the Butterfly

Ballot Cost Al Gore the 2000 Presidential

Election?

Learning Objective 3Linear Correlation, r

- Measures the strength and direction of the linear

association between x and y - A positive r value indicates a positive

association - A negative r value indicates a negative

association - An r value close to 1 or -1 indicates a strong

linear association - An r value close to 0 indicates a weak

association

Learning Objective 3Correlation coefficient

Measuring Strength Direction of a Linear

Relationship

Learning Objective 3Properties of Correlation

- Always falls between -1 and 1
- Sign of correlation denotes direction
- (-) indicates negative linear association
- () indicates positive linear association
- Correlation has a unitless measure - does not

depend on the variables units - Two variables have the same correlation no matter

which is treated as the response variable - Correlation is not resistant to outliers
- Correlation only measures strength of linear

relationship

Leaning Objective 4Calculating the Correlation

Coefficient

Per Capita Gross Domestic Product and Average

Life Expectancy for Countries in Western Europe

Learning Objective 4Calculating the Correlation

Coefficient

Learning Objective 4Internet Usage and Gross

National Product (GDP)

- STAT CALC menu
- Choose 8 LinReg(abx)
- 1st number x variable
- 2nd number y variable
- Enter

Correlation .889

Learning Objective 4Baseball Average and Team

Scoring

- Enter x data into L1
- Enter y data into L2
- STAT CALC memu
- Choose 8 LinReg(abx)
- 1st number x variable
- 2nd number y variable
- Enter

Correlation .874

Learning Objective 4Cereal Sodium and Sugar

Chapter 3Association Contingency, Correlation,

and Regression

- Section 3.3
- How Can We Predict the Outcome of a Variable?

Learning Objectives

- Definition of a regression line
- Use a regression equation for prediction
- Interpret the slope and y-intercept of a

regression line - Identify the least-squares regression line as the

one that minimizes the sum of squared residuals - Calculate the least-squares regression line

Learning Objectives

- Compare roles of explanatory and response

variables in correlation and regression - Calculate r2 and interpret

Learning Objective 1Regression Analysis

- The first step of a regression analysis is to

identify the response and explanatory variables - We use y to denote the response variable
- We use x to denote the explanatory variable

Learning Objective 1Regression Line

- A regression line is a straight line that

describes how the response variable (y) changes

as the explanatory variable (x) changes - A regression line predicts the value of the

response variable (y) for a given level of the

explanatory variable (x) - The y-intercept of the regression line is denoted

by a - The slope of the regression line is denoted by b

Learning Objective 2Example How Can

Anthropologists Predict Height Using Human

Remains?

- Regression Equation
- is the predicted height and is the

length of a femur (thighbone), measured in

centimeters

- Use the regression equation to predict the height

of a person whose femur length was 50 centimeters

Learning Objective 3Interpreting the y-Intercept

- y-Intercept
- The predicted value for y when x 0
- Helps in plotting the line
- May not have any interpretative value if no

observations had x values near 0

Learning Objective 3Interpreting the Slope

- Slope measures the change in the predicted

variable (y) for a 1 unit increase in the

explanatory variable in (x) - Example A 1 cm increase in femur length results

in a 2.4 cm increase in predicted height

Learning Objective 3Slope Values Positive,

Negative, Equal to 0

Learning Objective 3Regression Line

- At a given value of x, the equation
- Predicts a single value of the response variable
- But we should not expect all subjects at that

value of x to have the same value of y - Variability occurs in the y values!

Learning Objective 3The Regression Line

- The regression line connects the estimated means

of y at the various x values - In summary,
- Describes the relationship between x and the

estimated means of y at the various values of x

Learning Objective 4Residuals

- Measures the size of the prediction errors, the

vertical distance between the point and the

regression line - Each observation has a residual
- Calculation for each residual
- A large residual indicates an unusual

observation

Learning Objective 4Least Squares Method

Yields the Regression Line

- Residual sum of squares
- The least squares regression line is the line

that minimizes the vertical distance between the

points and their predictions, i.e., it minimizes

the residual sum of squares - Note the sum of the residuals about the

regression line will always be zero

Learning Objective 5Regression Formulas for

y-Intercept and Slope

- Slope
- Y-Intercept

Regression line always passes through

Learning Objective 5Calculating the slope and y

intercept for the regression line

Slope 26.4

y intercept-2.28

Learning Objective 5Internet Usage and Gross

National Product (GDP)

Learning Objective 5Internet Usage and Gross

National Product

- Enter x data into L1
- Enter y data into L2
- STAT CALC menu
- Choose 8 LinReg(abx)
- 1st number x variable
- 2nd number y variable
- Enter

1.548x-3.63

Learning Objective 5Baseball Average and Team

Scoring

Learning Objective 5Baseball average and Team

Scoring

- Enter x data into L1
- Enter y data into L2
- STAT CALC
- Choose 8 LinReg(abx)
- 1st number x variable
- 2nd number y variable
- Enter

Learning Objective 5Cereal Sodium and Sugar

Learning Objective 6The Slope and the

Correlation

- Correlation
- Describes the strength of the linear association

between 2 variables - Does not change when the units of measurement

change - Does not depend upon which variable is the

response and which is the explanatory

Learning Objective 6The Slope and the

Correlation

- Slope
- Numerical value depends on the units used to

measure the variables - Does not tell us whether the association is

strong or weak - The two variables must be identified as response

and explanatory variables - The regression equation can be used to predict

values of the response variable for given values

of the explanatory variable

Learning Objective 7The Squared Correlation

- When a strong linear association exists, the

regression equation predictions tend to be much

better than the predictions using only - We measure the proportional reduction in error

and call it, r2

Learning Objective 7The Squared Correlation

- measures the proportion of the variation in

the y-values that is accounted for by the linear

relationship of y with x - A correlation of .9 means that
- 81 of the variation in the y-values can be

explained by the explanatory variable, x

Chapter 3Association Contingency, Correlation,

and Regression

- Section 3.4
- What Are Some Cautions in Analyzing Association?

Learning Objectives

- Extrapolation
- Outliers and Influential Observations
- Correlations does not imply causation
- Lurking variables and confounding
- Simpsons Paradox

Learning Objective 1Extrapolation

- Extrapolation Using a regression line to predict

y-values for x-values outside the observed range

of the data - Riskier the farther we move from the range of the

given x-values - There is no guarantee that the relationship given

by the regression equation holds outside the

range of sampled x-values

Learning Objective 2Outliers and Influential

Points

- Construct a scatterplot
- Search for data points that are well outside of

the trend that the remainder of the data points

follow

Learning Objective 2Outliers and Influential

Points

- A regression outlier is an observation that lies

far away from the trend that the rest of the data

follows - An observation is influential if
- Its x value is relatively low or high compared to

the remainder of the data - The observation is a regression outlier
- Influential observations tend to pull the

regression line toward that data point and away

from the rest of the data

Learning Objective 2Outliers and Influential

Points

- Impact of removing an Influential data point

Learning Objective 3Correlation does not Imply

Causation

- A strong correlation between x and y means that

there is a strong linear association that exists

between the two variables - A strong correlation between x and y, does not

mean that x causes y

Data are available for all fires in Chicago last

year on x number of firefighters at the fires

and y cost of damages due to fire

Learning Objective 3Association does not imply

causation

- Would you expect the correlation to be negative,

zero, or positive? - If the correlation is positive, does this mean

that having more firefighters at a fire causes

the damages to be worse? Yes or No - Identify a third variable that could be

considered a common cause of x and y - Distance from the fire station
- Intensity of the fire
- Size of the fire

Learning Objective 4Lurking Variables

Confounding

- A lurking variable is a variable, usually

unobserved, that influences the association

between the variables of primary interest - Ice cream sales and drowning lurking variable

temperature - Reading level and shoe size lurking

variableage - Childhood obesity rate and GDP-lurking

variabletime - When two explanatory variables are both

associated with a response variable but are also

associated with each other, there is said to be

confounding - Lurking variables are not measured in the study

but have the potential for confounding

Learning Objective 5Simpsons Paradox

- Simpsons Paradox
- When the direction of an association between two

variables changes after we include a third

variable and analyze the data at separate levels

of that variable

Learning Objective 5Simpsons Paradox Example

Is Smoking Actually Beneficial to Your Health?

Probability of Death of Smoker

139/58224 Probability of Death of

Nonsmoker230/73231

This cant be true that smoking improves your

chances of living! Whats going on!

Learning Objective 5Simpsons Paradox Example

Break out Data by Age

Learning Objective 5Simpsons Paradox Example

- An association can look quite different after

adjusting for the effect of a third variable by

grouping the data according to the values of the

third variable