Chapter 3 Association: Contingency, Correlation, and Regression - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Chapter 3 Association: Contingency, Correlation, and Regression

Description:

defines the groups to be compared with respect to values on the response variable ... Yield of corn per bushel/Amount of rainfall. Learning Objective 2: Association ... – PowerPoint PPT presentation

Number of Views:310
Avg rating:3.0/5.0
Slides: 67
Provided by: daniel544
Category:

less

Transcript and Presenter's Notes

Title: Chapter 3 Association: Contingency, Correlation, and Regression


1
Chapter 3Association Contingency, Correlation,
and Regression
  • Section 3.1
  • How Can We Explore the Association between Two
    Categorical Variables?

2
Learning Objectives
  • Identify variable type Response or Explanatory
  • Define Association
  • Contingency tables
  • Calculate proportions and conditional proportions

3
Learning Objective 1Response and Explanatory
variables
  • Response variable (Dependent Variable)
  • the outcome variable on which comparisons are
    made
  • Explanatory variable (Independent variable)
  • defines the groups to be compared with respect to
    values on the response variable
  • Example Response/Explanatory
  • Blood alcohol level/ of beers consumed
  • Grade on test/Amount of study time
  • Yield of corn per bushel/Amount of rainfall

4
Learning Objective 2Association
  • The main purpose of data analysis with two
    variables is to investigate whether there is an
    association and to describe that association
  • An association exists between two variables if a
    particular value for one variable is more likely
    to occur with certain values of the other
    variable

5
Learning Objective 3Contingency Table
  • A contingency table
  • Displays two categorical variables
  • The rows list the categories of one variable
  • The columns list the categories of the other
    variable
  • Entries in the table are frequencies

6
Learning Objective 3Contingency Table
What is the response variable? What is the
explanatory variable?
7
Learning Objective 4Calculate proportions and
conditional proportions
8
Learning Objective 4 Calculate proportions and
conditional proportions
  • What proportion of organic foods contain
    pesticides?
  • What proportion of conventionally grown foods
    contain pesticides?
  • What proportion of all sampled items contain
    pesticide residuals?

9
Learning Objective 4Calculate proportions and
conditional proportions
Use side by side bar charts to show conditional
proportions Allows for easy comparison of the
explanatory variable with respect to the response
variable
10
Learning Objective 4Calculate proportions and
conditional proportions
  • If there was no association between organic and
    conventional foods, then the proportions for the
    response variable categories would be the same
    for each food type

11
Chapter 3Association Contingency, Correlation,
and Regression
  • Section 3.2
  • How Can We Explore the Association between Two
    Quantitative Variables?

12
Learning Objectives
  • Constructing scatterplots
  • Interpreting a scatterplot
  • Correlation
  • Calculating correlation

13
Learning Objective 1Scatterplot
  • Graphical display of relationship between two
    quantitative variables
  • Horizontal Axis Explanatory variable, x
  • Vertical Axis Response variable, y

14
Learning Objective 1Internet Usage and Gross
National Product (GDP) Data Set
15
Learning Objective 1Internet Usage and Gross
National Product (GDP)
  • Enter values of explanatory variable
  • (x) in L1
  • Enter values of of response variable
  • (y) in L2
  • STAT PLOT
  • Plot 1 on
  • Type scatter plot
  • X list L2
  • Y list L1
  • ZOOM
  • 9ZoomStat
  • Graph

16
Learning Objective 1Baseball Average and Team
Scoring
17
Learning Objective 1Baseball Average and Team
Scoring
  • Enter values of explanatory variable
  • (x) in L1
  • Enter values of of response variable
  • (y) in L2
  • STAT PLOT
  • Plot 1 on
  • Type scatter plot
  • X list L1
  • Y list L2
  • ZOOM
  • 9ZoomStat
  • Graph

Use L3 for x and L4 for y. You will use data
from prior example again later on in the
PowerPoint.
18
Learning Objective 2Interpreting Scatterplots
  • You can describe the overall pattern of a
    scatterplot by the trend, direction, and strength
    of the relationship between the two variables
  • Trend linear, curved, clusters, no pattern
  • Direction positive, negative, no direction
  • Strength how closely the points fit the trend
  • Also look for outliers from the overall trend

19
Learning Objective 2Interpreting Scatterplots
Direction/Association
  • Two quantitative variables x and y are
  • Positively associated when
  • High values of x tend to occur with high values
    of y
  • Low values of x tend to occur with low values of
    y
  • Negatively associated when high values of one
    variable tend to pair with low values of the
    other variable

20
Learning Objective 2Example 100 cars on the
lot of a used-car dealership
  • Would you expect a positive association, a
  • negative association or no association between
  • the age of the car and the mileage on the
  • odometer?
  • Positive association
  • Negative association
  • No association

21
Learning Objective 2Example Did the Butterfly
Ballot Cost Al Gore the 2000 Presidential
Election?
22
Learning Objective 3Linear Correlation, r
  • Measures the strength and direction of the linear
    association between x and y
  • A positive r value indicates a positive
    association
  • A negative r value indicates a negative
    association
  • An r value close to 1 or -1 indicates a strong
    linear association
  • An r value close to 0 indicates a weak
    association

23
Learning Objective 3Correlation coefficient
Measuring Strength Direction of a Linear
Relationship
24
Learning Objective 3Properties of Correlation
  • Always falls between -1 and 1
  • Sign of correlation denotes direction
  • (-) indicates negative linear association
  • () indicates positive linear association
  • Correlation has a unitless measure - does not
    depend on the variables units
  • Two variables have the same correlation no matter
    which is treated as the response variable
  • Correlation is not resistant to outliers
  • Correlation only measures strength of linear
    relationship

25
Leaning Objective 4Calculating the Correlation
Coefficient
Per Capita Gross Domestic Product and Average
Life Expectancy for Countries in Western Europe
26
Learning Objective 4Calculating the Correlation
Coefficient
27
Learning Objective 4Internet Usage and Gross
National Product (GDP)
  • STAT CALC menu
  • Choose 8 LinReg(abx)
  • 1st number x variable
  • 2nd number y variable
  • Enter

Correlation .889
28
Learning Objective 4Baseball Average and Team
Scoring
  • Enter x data into L1
  • Enter y data into L2
  • STAT CALC memu
  • Choose 8 LinReg(abx)
  • 1st number x variable
  • 2nd number y variable
  • Enter

Correlation .874
29
Learning Objective 4Cereal Sodium and Sugar
30
Chapter 3Association Contingency, Correlation,
and Regression
  • Section 3.3
  • How Can We Predict the Outcome of a Variable?

31
Learning Objectives
  • Definition of a regression line
  • Use a regression equation for prediction
  • Interpret the slope and y-intercept of a
    regression line
  • Identify the least-squares regression line as the
    one that minimizes the sum of squared residuals
  • Calculate the least-squares regression line

32
Learning Objectives
  • Compare roles of explanatory and response
    variables in correlation and regression
  • Calculate r2 and interpret

33
Learning Objective 1Regression Analysis
  • The first step of a regression analysis is to
    identify the response and explanatory variables
  • We use y to denote the response variable
  • We use x to denote the explanatory variable

34
Learning Objective 1Regression Line
  • A regression line is a straight line that
    describes how the response variable (y) changes
    as the explanatory variable (x) changes
  • A regression line predicts the value of the
    response variable (y) for a given level of the
    explanatory variable (x)
  • The y-intercept of the regression line is denoted
    by a
  • The slope of the regression line is denoted by b

35
Learning Objective 2Example How Can
Anthropologists Predict Height Using Human
Remains?
  • Regression Equation
  • is the predicted height and is the
    length of a femur (thighbone), measured in
    centimeters
  • Use the regression equation to predict the height
    of a person whose femur length was 50 centimeters

36
Learning Objective 3Interpreting the y-Intercept
  • y-Intercept
  • The predicted value for y when x 0
  • Helps in plotting the line
  • May not have any interpretative value if no
    observations had x values near 0

37
Learning Objective 3Interpreting the Slope
  • Slope measures the change in the predicted
    variable (y) for a 1 unit increase in the
    explanatory variable in (x)
  • Example A 1 cm increase in femur length results
    in a 2.4 cm increase in predicted height

38
Learning Objective 3Slope Values Positive,
Negative, Equal to 0
39
Learning Objective 3Regression Line
  • At a given value of x, the equation
  • Predicts a single value of the response variable
  • But we should not expect all subjects at that
    value of x to have the same value of y
  • Variability occurs in the y values!

40
Learning Objective 3The Regression Line
  • The regression line connects the estimated means
    of y at the various x values
  • In summary,
  • Describes the relationship between x and the
    estimated means of y at the various values of x

41
Learning Objective 4Residuals
  • Measures the size of the prediction errors, the
    vertical distance between the point and the
    regression line
  • Each observation has a residual
  • Calculation for each residual
  • A large residual indicates an unusual
    observation

42
Learning Objective 4Least Squares Method
Yields the Regression Line
  • Residual sum of squares
  • The least squares regression line is the line
    that minimizes the vertical distance between the
    points and their predictions, i.e., it minimizes
    the residual sum of squares
  • Note the sum of the residuals about the
    regression line will always be zero

43
Learning Objective 5Regression Formulas for
y-Intercept and Slope
  • Slope
  • Y-Intercept

Regression line always passes through
44
Learning Objective 5Calculating the slope and y
intercept for the regression line
Slope 26.4
y intercept-2.28
45
Learning Objective 5Internet Usage and Gross
National Product (GDP)
46
Learning Objective 5Internet Usage and Gross
National Product
  • Enter x data into L1
  • Enter y data into L2
  • STAT CALC menu
  • Choose 8 LinReg(abx)
  • 1st number x variable
  • 2nd number y variable
  • Enter

1.548x-3.63
47
Learning Objective 5Baseball Average and Team
Scoring
48
Learning Objective 5Baseball average and Team
Scoring
  • Enter x data into L1
  • Enter y data into L2
  • STAT CALC
  • Choose 8 LinReg(abx)
  • 1st number x variable
  • 2nd number y variable
  • Enter

49
Learning Objective 5Cereal Sodium and Sugar
50
Learning Objective 6The Slope and the
Correlation
  • Correlation
  • Describes the strength of the linear association
    between 2 variables
  • Does not change when the units of measurement
    change
  • Does not depend upon which variable is the
    response and which is the explanatory

51
Learning Objective 6The Slope and the
Correlation
  • Slope
  • Numerical value depends on the units used to
    measure the variables
  • Does not tell us whether the association is
    strong or weak
  • The two variables must be identified as response
    and explanatory variables
  • The regression equation can be used to predict
    values of the response variable for given values
    of the explanatory variable

52
Learning Objective 7The Squared Correlation
  • When a strong linear association exists, the
    regression equation predictions tend to be much
    better than the predictions using only
  • We measure the proportional reduction in error
    and call it, r2

53
Learning Objective 7The Squared Correlation
  • measures the proportion of the variation in
    the y-values that is accounted for by the linear
    relationship of y with x
  • A correlation of .9 means that
  • 81 of the variation in the y-values can be
    explained by the explanatory variable, x

54
Chapter 3Association Contingency, Correlation,
and Regression
  • Section 3.4
  • What Are Some Cautions in Analyzing Association?

55
Learning Objectives
  • Extrapolation
  • Outliers and Influential Observations
  • Correlations does not imply causation
  • Lurking variables and confounding
  • Simpsons Paradox

56
Learning Objective 1Extrapolation
  • Extrapolation Using a regression line to predict
    y-values for x-values outside the observed range
    of the data
  • Riskier the farther we move from the range of the
    given x-values
  • There is no guarantee that the relationship given
    by the regression equation holds outside the
    range of sampled x-values

57
Learning Objective 2Outliers and Influential
Points
  • Construct a scatterplot
  • Search for data points that are well outside of
    the trend that the remainder of the data points
    follow

58
Learning Objective 2Outliers and Influential
Points
  • A regression outlier is an observation that lies
    far away from the trend that the rest of the data
    follows
  • An observation is influential if
  • Its x value is relatively low or high compared to
    the remainder of the data
  • The observation is a regression outlier
  • Influential observations tend to pull the
    regression line toward that data point and away
    from the rest of the data

59
Learning Objective 2Outliers and Influential
Points
  • Impact of removing an Influential data point

60
Learning Objective 3Correlation does not Imply
Causation
  • A strong correlation between x and y means that
    there is a strong linear association that exists
    between the two variables
  • A strong correlation between x and y, does not
    mean that x causes y

61
Data are available for all fires in Chicago last
year on x number of firefighters at the fires
and y cost of damages due to fire
Learning Objective 3Association does not imply
causation
  • Would you expect the correlation to be negative,
    zero, or positive?
  • If the correlation is positive, does this mean
    that having more firefighters at a fire causes
    the damages to be worse? Yes or No
  • Identify a third variable that could be
    considered a common cause of x and y
  • Distance from the fire station
  • Intensity of the fire
  • Size of the fire

62
Learning Objective 4Lurking Variables
Confounding
  • A lurking variable is a variable, usually
    unobserved, that influences the association
    between the variables of primary interest
  • Ice cream sales and drowning lurking variable
    temperature
  • Reading level and shoe size lurking
    variableage
  • Childhood obesity rate and GDP-lurking
    variabletime
  • When two explanatory variables are both
    associated with a response variable but are also
    associated with each other, there is said to be
    confounding
  • Lurking variables are not measured in the study
    but have the potential for confounding

63
Learning Objective 5Simpsons Paradox
  • Simpsons Paradox
  • When the direction of an association between two
    variables changes after we include a third
    variable and analyze the data at separate levels
    of that variable

64
Learning Objective 5Simpsons Paradox Example
Is Smoking Actually Beneficial to Your Health?
Probability of Death of Smoker
139/58224 Probability of Death of
Nonsmoker230/73231
This cant be true that smoking improves your
chances of living! Whats going on!
65
Learning Objective 5Simpsons Paradox Example
Break out Data by Age
66
Learning Objective 5Simpsons Paradox Example
  • An association can look quite different after
    adjusting for the effect of a third variable by
    grouping the data according to the values of the
    third variable
Write a Comment
User Comments (0)
About PowerShow.com