# Chapter 3 Association: Contingency, Correlation, and Regression - PowerPoint PPT Presentation

PPT – Chapter 3 Association: Contingency, Correlation, and Regression PowerPoint presentation | free to view - id: 1c6405-YzNmM

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Chapter 3 Association: Contingency, Correlation, and Regression

Description:

### defines the groups to be compared with respect to values on the response variable ... Yield of corn per bushel/Amount of rainfall. Learning Objective 2: Association ... – PowerPoint PPT presentation

Number of Views:285
Avg rating:3.0/5.0
Slides: 67
Provided by: daniel544
Category:
Transcript and Presenter's Notes

Title: Chapter 3 Association: Contingency, Correlation, and Regression

1
Chapter 3Association Contingency, Correlation,
and Regression
• Section 3.1
• How Can We Explore the Association between Two
Categorical Variables?

2
Learning Objectives
• Identify variable type Response or Explanatory
• Define Association
• Contingency tables
• Calculate proportions and conditional proportions

3
Learning Objective 1Response and Explanatory
variables
• Response variable (Dependent Variable)
• the outcome variable on which comparisons are
• Explanatory variable (Independent variable)
• defines the groups to be compared with respect to
values on the response variable
• Example Response/Explanatory
• Blood alcohol level/ of beers consumed
• Grade on test/Amount of study time
• Yield of corn per bushel/Amount of rainfall

4
Learning Objective 2Association
• The main purpose of data analysis with two
variables is to investigate whether there is an
association and to describe that association
• An association exists between two variables if a
particular value for one variable is more likely
to occur with certain values of the other
variable

5
Learning Objective 3Contingency Table
• A contingency table
• Displays two categorical variables
• The rows list the categories of one variable
• The columns list the categories of the other
variable
• Entries in the table are frequencies

6
Learning Objective 3Contingency Table
What is the response variable? What is the
explanatory variable?
7
Learning Objective 4Calculate proportions and
conditional proportions
8
Learning Objective 4 Calculate proportions and
conditional proportions
• What proportion of organic foods contain
pesticides?
• What proportion of conventionally grown foods
contain pesticides?
• What proportion of all sampled items contain
pesticide residuals?

9
Learning Objective 4Calculate proportions and
conditional proportions
Use side by side bar charts to show conditional
proportions Allows for easy comparison of the
explanatory variable with respect to the response
variable
10
Learning Objective 4Calculate proportions and
conditional proportions
• If there was no association between organic and
conventional foods, then the proportions for the
response variable categories would be the same
for each food type

11
Chapter 3Association Contingency, Correlation,
and Regression
• Section 3.2
• How Can We Explore the Association between Two
Quantitative Variables?

12
Learning Objectives
• Constructing scatterplots
• Interpreting a scatterplot
• Correlation
• Calculating correlation

13
Learning Objective 1Scatterplot
• Graphical display of relationship between two
quantitative variables
• Horizontal Axis Explanatory variable, x
• Vertical Axis Response variable, y

14
Learning Objective 1Internet Usage and Gross
National Product (GDP) Data Set
15
Learning Objective 1Internet Usage and Gross
National Product (GDP)
• Enter values of explanatory variable
• (x) in L1
• Enter values of of response variable
• (y) in L2
• STAT PLOT
• Plot 1 on
• Type scatter plot
• X list L2
• Y list L1
• ZOOM
• 9ZoomStat
• Graph

16
Learning Objective 1Baseball Average and Team
Scoring
17
Learning Objective 1Baseball Average and Team
Scoring
• Enter values of explanatory variable
• (x) in L1
• Enter values of of response variable
• (y) in L2
• STAT PLOT
• Plot 1 on
• Type scatter plot
• X list L1
• Y list L2
• ZOOM
• 9ZoomStat
• Graph

Use L3 for x and L4 for y. You will use data
from prior example again later on in the
PowerPoint.
18
Learning Objective 2Interpreting Scatterplots
• You can describe the overall pattern of a
scatterplot by the trend, direction, and strength
of the relationship between the two variables
• Trend linear, curved, clusters, no pattern
• Direction positive, negative, no direction
• Strength how closely the points fit the trend
• Also look for outliers from the overall trend

19
Learning Objective 2Interpreting Scatterplots
Direction/Association
• Two quantitative variables x and y are
• Positively associated when
• High values of x tend to occur with high values
of y
• Low values of x tend to occur with low values of
y
• Negatively associated when high values of one
variable tend to pair with low values of the
other variable

20
Learning Objective 2Example 100 cars on the
lot of a used-car dealership
• Would you expect a positive association, a
• negative association or no association between
• the age of the car and the mileage on the
• odometer?
• Positive association
• Negative association
• No association

21
Learning Objective 2Example Did the Butterfly
Ballot Cost Al Gore the 2000 Presidential
Election?
22
Learning Objective 3Linear Correlation, r
• Measures the strength and direction of the linear
association between x and y
• A positive r value indicates a positive
association
• A negative r value indicates a negative
association
• An r value close to 1 or -1 indicates a strong
linear association
• An r value close to 0 indicates a weak
association

23
Learning Objective 3Correlation coefficient
Measuring Strength Direction of a Linear
Relationship
24
Learning Objective 3Properties of Correlation
• Always falls between -1 and 1
• Sign of correlation denotes direction
• (-) indicates negative linear association
• () indicates positive linear association
• Correlation has a unitless measure - does not
depend on the variables units
• Two variables have the same correlation no matter
which is treated as the response variable
• Correlation is not resistant to outliers
• Correlation only measures strength of linear
relationship

25
Leaning Objective 4Calculating the Correlation
Coefficient
Per Capita Gross Domestic Product and Average
Life Expectancy for Countries in Western Europe
26
Learning Objective 4Calculating the Correlation
Coefficient
27
Learning Objective 4Internet Usage and Gross
National Product (GDP)
• Choose 8 LinReg(abx)
• 1st number x variable
• 2nd number y variable
• Enter

Correlation .889
28
Learning Objective 4Baseball Average and Team
Scoring
• Enter x data into L1
• Enter y data into L2
• STAT CALC memu
• Choose 8 LinReg(abx)
• 1st number x variable
• 2nd number y variable
• Enter

Correlation .874
29
Learning Objective 4Cereal Sodium and Sugar
30
Chapter 3Association Contingency, Correlation,
and Regression
• Section 3.3
• How Can We Predict the Outcome of a Variable?

31
Learning Objectives
• Definition of a regression line
• Use a regression equation for prediction
• Interpret the slope and y-intercept of a
regression line
• Identify the least-squares regression line as the
one that minimizes the sum of squared residuals
• Calculate the least-squares regression line

32
Learning Objectives
• Compare roles of explanatory and response
variables in correlation and regression
• Calculate r2 and interpret

33
Learning Objective 1Regression Analysis
• The first step of a regression analysis is to
identify the response and explanatory variables
• We use y to denote the response variable
• We use x to denote the explanatory variable

34
Learning Objective 1Regression Line
• A regression line is a straight line that
describes how the response variable (y) changes
as the explanatory variable (x) changes
• A regression line predicts the value of the
response variable (y) for a given level of the
explanatory variable (x)
• The y-intercept of the regression line is denoted
by a
• The slope of the regression line is denoted by b

35
Learning Objective 2Example How Can
Anthropologists Predict Height Using Human
Remains?
• Regression Equation
• is the predicted height and is the
length of a femur (thighbone), measured in
centimeters
• Use the regression equation to predict the height
of a person whose femur length was 50 centimeters

36
Learning Objective 3Interpreting the y-Intercept
• y-Intercept
• The predicted value for y when x 0
• Helps in plotting the line
• May not have any interpretative value if no
observations had x values near 0

37
Learning Objective 3Interpreting the Slope
• Slope measures the change in the predicted
variable (y) for a 1 unit increase in the
explanatory variable in (x)
• Example A 1 cm increase in femur length results
in a 2.4 cm increase in predicted height

38
Learning Objective 3Slope Values Positive,
Negative, Equal to 0
39
Learning Objective 3Regression Line
• At a given value of x, the equation
• Predicts a single value of the response variable
• But we should not expect all subjects at that
value of x to have the same value of y
• Variability occurs in the y values!

40
Learning Objective 3The Regression Line
• The regression line connects the estimated means
of y at the various x values
• In summary,
• Describes the relationship between x and the
estimated means of y at the various values of x

41
Learning Objective 4Residuals
• Measures the size of the prediction errors, the
vertical distance between the point and the
regression line
• Each observation has a residual
• Calculation for each residual
• A large residual indicates an unusual
observation

42
Learning Objective 4Least Squares Method
Yields the Regression Line
• Residual sum of squares
• The least squares regression line is the line
that minimizes the vertical distance between the
points and their predictions, i.e., it minimizes
the residual sum of squares
• Note the sum of the residuals about the
regression line will always be zero

43
Learning Objective 5Regression Formulas for
y-Intercept and Slope
• Slope
• Y-Intercept

Regression line always passes through
44
Learning Objective 5Calculating the slope and y
intercept for the regression line
Slope 26.4
y intercept-2.28
45
Learning Objective 5Internet Usage and Gross
National Product (GDP)
46
Learning Objective 5Internet Usage and Gross
National Product
• Enter x data into L1
• Enter y data into L2
• Choose 8 LinReg(abx)
• 1st number x variable
• 2nd number y variable
• Enter

1.548x-3.63
47
Learning Objective 5Baseball Average and Team
Scoring
48
Learning Objective 5Baseball average and Team
Scoring
• Enter x data into L1
• Enter y data into L2
• STAT CALC
• Choose 8 LinReg(abx)
• 1st number x variable
• 2nd number y variable
• Enter

49
Learning Objective 5Cereal Sodium and Sugar
50
Learning Objective 6The Slope and the
Correlation
• Correlation
• Describes the strength of the linear association
between 2 variables
• Does not change when the units of measurement
change
• Does not depend upon which variable is the
response and which is the explanatory

51
Learning Objective 6The Slope and the
Correlation
• Slope
• Numerical value depends on the units used to
measure the variables
• Does not tell us whether the association is
strong or weak
• The two variables must be identified as response
and explanatory variables
• The regression equation can be used to predict
values of the response variable for given values
of the explanatory variable

52
Learning Objective 7The Squared Correlation
• When a strong linear association exists, the
regression equation predictions tend to be much
better than the predictions using only
• We measure the proportional reduction in error
and call it, r2

53
Learning Objective 7The Squared Correlation
• measures the proportion of the variation in
the y-values that is accounted for by the linear
relationship of y with x
• A correlation of .9 means that
• 81 of the variation in the y-values can be
explained by the explanatory variable, x

54
Chapter 3Association Contingency, Correlation,
and Regression
• Section 3.4
• What Are Some Cautions in Analyzing Association?

55
Learning Objectives
• Extrapolation
• Outliers and Influential Observations
• Correlations does not imply causation
• Lurking variables and confounding

56
Learning Objective 1Extrapolation
• Extrapolation Using a regression line to predict
y-values for x-values outside the observed range
of the data
• Riskier the farther we move from the range of the
given x-values
• There is no guarantee that the relationship given
by the regression equation holds outside the
range of sampled x-values

57
Learning Objective 2Outliers and Influential
Points
• Construct a scatterplot
• Search for data points that are well outside of
the trend that the remainder of the data points
follow

58
Learning Objective 2Outliers and Influential
Points
• A regression outlier is an observation that lies
far away from the trend that the rest of the data
follows
• An observation is influential if
• Its x value is relatively low or high compared to
the remainder of the data
• The observation is a regression outlier
• Influential observations tend to pull the
regression line toward that data point and away
from the rest of the data

59
Learning Objective 2Outliers and Influential
Points
• Impact of removing an Influential data point

60
Learning Objective 3Correlation does not Imply
Causation
• A strong correlation between x and y means that
there is a strong linear association that exists
between the two variables
• A strong correlation between x and y, does not
mean that x causes y

61
Data are available for all fires in Chicago last
year on x number of firefighters at the fires
and y cost of damages due to fire
Learning Objective 3Association does not imply
causation
• Would you expect the correlation to be negative,
zero, or positive?
• If the correlation is positive, does this mean
that having more firefighters at a fire causes
the damages to be worse? Yes or No
• Identify a third variable that could be
considered a common cause of x and y
• Distance from the fire station
• Intensity of the fire
• Size of the fire

62
Learning Objective 4Lurking Variables
Confounding
• A lurking variable is a variable, usually
unobserved, that influences the association
between the variables of primary interest
• Ice cream sales and drowning lurking variable
temperature
• Reading level and shoe size lurking
variableage
• Childhood obesity rate and GDP-lurking
variabletime
• When two explanatory variables are both
associated with a response variable but are also
associated with each other, there is said to be
confounding
• Lurking variables are not measured in the study
but have the potential for confounding

63
• When the direction of an association between two
variables changes after we include a third
variable and analyze the data at separate levels
of that variable

64
Is Smoking Actually Beneficial to Your Health?
Probability of Death of Smoker
139/58224 Probability of Death of
Nonsmoker230/73231
This cant be true that smoking improves your
chances of living! Whats going on!
65