Chapter 2: Looking at Data Relationships Section 9'1: Data Analysis for TwoWay Tables - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Chapter 2: Looking at Data Relationships Section 9'1: Data Analysis for TwoWay Tables

Description:

Compare high speeds for 4 different car brands. Compare prices for no-airbag, one-airbag and two-airbag cars. Compare GPR for 20 different majors ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 38
Provided by: hint9
Category:

less

Transcript and Presenter's Notes

Title: Chapter 2: Looking at Data Relationships Section 9'1: Data Analysis for TwoWay Tables


1
Chapter 2 Looking at Data RelationshipsSectio
n 9.1 Data Analysis for Two-Way Tables
2
Relationships Between 2 Variables
  • More than one variable can be measured on each
    individual.
  • Examples
  • Gender and Height
  • Eye color and Major
  • Size and Cost
  • We want to look at the relationship among these
    variables.

3
Relationships Between 2 Variables
  • A response variable measures an outcome of a
    study. An explanatory variable explains or causes
    changes in the response variables.
  • The explanatory variable influences the response
    variable.
  • Examples
  • Attendance for class and the grade of STAT 303
  • Gender and Height
  • Smoking and lung cancer

4
Relationships Between 2 Variables
  • We may be interested in relationships of
    different types of variables.
  • Categorical and Numeric
  • E.g. Gender and Height
  • Categorical and Categorical
  • E.g. Eye color and Major
  • Numeric and Numeric
  • E.g. Size and Cost

5
1. Relationships Between Categorical and Numeric
Variables
  • We are interested in comparing the numerical
    variable across each of the levels of the
    categorical variable.
  • Examples
  • Compare high speeds for 4 different car brands
  • Compare prices for no-airbag, one-airbag and
    two-airbag cars
  • Compare GPR for 20 different majors

6
1. Relationships Between Categorical and Numeric
Variables
  • We could look at summary statistics for each
    group.
  • Example prices for no airbags, one airbag and
    two airbag cars.
  • Explanatory airbag
  • Response price

7
1. Relationships Between Categorical and Numeric
Variables
  • Graphical Comparison

Side by Side Boxplots
8
1. Relationships Between Categorical and Numeric
Variables
  • Associations A categorical and numeric variable
    are associated if the distribution of the numeric
    variable is not the same for all populations.
    (The populations are defined by the values the
    categorical explanatory variable takes on.)
  • Example of no association

9
2. Relationships Between Two Categorical Variables
  • Depending on the situation, one of the variables
    is the explanatory variable and the other is the
    response variable.
  • Examples
  • Gender and Tomatoes Preference
  • Country of Origin and Marital Status
  • Gender and Highest Degree Obtained
  • Compare percentages in each level of one
    categorical variable across the levels of the
    other categorical variable

10
Two-Way Tables
  • A two-way (contingency) table can summarize the
    data for relationships between two categorical
    variables.
  • - Example Response Tomatoes, Explanatory
    Gender

11
2. Relationships Between Two Categorical Variables
  • Associations Two categorical variables are
    associated if the relative frequencies in the
    response variable are not the same for all
    populations. (The populations are defined by the
    values the categorical explanatory variable takes
    on.)
  • Percentages for the joint, marginal, and
    conditional distributions
  • Joint Distribution How likely are you to like
    tomatoes and be a male? Ans13/38
  • Marginal Distribution What is the percentage of
    people who like tomatoes? Ans21/38
  • Conditional Distribution If you are a female,
    how likely are you to like tomatoes? Ans8/19

12
2. Relationships Between Two Categorical Variables
  • Example 9.10
  • Example 9.9

Eg 9.9 Hospital A loses 3, B loses 2. Choose
B. Eg 9.10 Good condition A loses 1, B
1.3. Poor condition A loses 3.8, B 4. Choose
A.
13
2. Relationships Between Two Categorical Variables
  • Lurking Variable A variable that is not among
    the explanatory or response variables in a study
    and yet may influence the interpretation of
    relationships among those variables.
  • E.g. Good/ Poor Condition.
  • Simpsons Paradox An association or comparison
    that holds for all of several groups can reverse
    direction when the data are combined to form a
    single group. This reversal is called Simpsons
    Paradox. This can happen when a lurking variable
    is present.
  • E.g. We chose Hospital B in 9.9, but chose
    Hospital A in 9.10.

14
3. Relationships Between Two Numeric Variables
  • Depending on the situation, one of the variables
    is the explanatory variable and the other is the
    response variable.
  • Examples
  • Height and Weight
  • Income and Age
  • Time and Growth
  • Amount of time spent studying for STAT303 and
    exam scores

15
3. Relationships Between Two Numeric Variables
  • Example
  • Response MPG
  • Explanatory Weight

This is called a scatter plot. Each individual in
the data appears as one point in the plot.
Response Variable (y-axis)
Explanatory Variable (x-axis)
16
3. Relationships Between Two Numeric Variables
  • Example
  • Response MPG
  • Explanatory Weight

17
3. Relationships Between Two Numeric Variables
  • Example
  • Response Horsepower
  • Explanatory Weight

18
3. Relationships Between Two Numeric Variables
  • Correlation or r measures the direction and
    strength of the linear relationship between two
    numeric variables.
  • If X represents the explanatory and Y represents
    the response, the correlation is calculated as

19
3. Relationships Between Two Numeric Variables
  • General Properties of Correlation
  • It must be between -1 and 1, or (-1lt r lt 1).
  • If r is negative, the relationship is negative.
  • If r is -1, there is a perfect negative linear
    relationship.
  • If r is positive, the relationship is positive.
  • If r is 1, there is a perfect positive linear
    relationship.
  • If r is 0, there is no linear relationship.
  • If explanatory and response are switched, r
    remains the same.
  • r has no units of measurement associated with it
  • Scale changes do not affect r
  • r measures ONLY linear relationships.

20
3. Relationships Between Two Numeric Variables
r 1
r 0
r -1
21
3. Relationships Between Two Numeric Variables
r 0.0489
r 0.04
r 0.4306
r -0.8428
22
3. Relationships Between Two Numeric Variables
  • It is possible for there to be a strong
    relationship between two variables and still have
    r 0.
  • EX.

23
3. Relationships Between Two Numeric Variables
  • Important notes
  • Association/Correlation does not imply causation
  • Slope is not correlation
  • scale change does not affect correlation, but
    affects slope.
  • For correlation, it doesnt matter which is x,
    which is y.
  • But for slope, it does matter.
  • Correlation doesnt measure the strength of a
    non-linear relationship
  • r 0.46

24
3. Relationships Between Two Numeric Variables
  • Association does not imply causation
  • For the worlds nations,
  • variable X number of TV sets per person
  • variable Y average life expectancy
  • There is high positive correlation nations with
    more TV sets have higher life expectancies.
  • Is there causation? Can we lengthen the lives of
    people in poor nations by shipping them TV sets?
    No!

25
3. Relationships Between Two Numeric Variables
  • Regression Line a straight line that describes
    how a response variable Y changes as an
    explanatory variable X changes.
  • General form
  • y a bx, where a is the intercept, b is the
    slope.
  • Least Squares Regression best fit
  • Associations Two numerical variables are
    associated if the distribution of the response is
    not the same for each value of the explanatory
    variable.
  • We often use a regression line to predict the
    value of y for a given value of x.
  • Regression, unlike correlation, requires that we
    have an explanatory variable and a response
    variable.

26
Regression Line
  • Fitting a line to data means drawing a line that
    comes as close as possible to the points.
  • Extrapolation the use of a regression line for
    prediction far outside the range of values of the
    explanatory variable x that you used to obtain
    the line.
  • -----such predictions are often NOT accurate.

27
3. Relationships Between Two Numeric Variables
Horsepower -10.78 0.04weight (Equation of
the line.)
Intercept y-value or response (horsepower) when
line crosses the y-axis.
Slope increase in response for a unit increase
in explanatory variable.
So if weight increases by one pound, horsepower
increases by 0.04 units (on average).
28
Least-Squares Regression Line
  • The least-squares regression line of y on x is
    the line that makes the sum of squares of the
    vertical distances of the data points from the
    line as small as possible.
  • These vertical distances are called the
    residuals, or the error in prediction, because
    they measure how far the point is from the line
  • where y is the point and
    is the predicted point.

29
Least-Squares Regression Line
  • The equation of the least-squares regression line
    of y on x is

30
Least-Squares Regression Line
  • The expression for slope, b, says that along the
    regression line, a change of one standard
    deviation in x corresponds to a change of r
    standard deviations in y.
  • The slope, b, is the amount by which y changes
    when x increases by one unit.
  • The intercept, a, is the value of y when
  • The least-squares regression line ALWAYS passes
    through the point

31
r2 in Regression
  • The square of the correlation, r2, is the
    fraction of the variation in the values of y that
    is explained by the least-squares regression of y
    on x.
  • Use r2 as a measure of how successfully the
    regression explains the response.
  • Interpret r2 as the percent of variance
    explained

32
Relationships between 2 numeric variables
  • Example
  • How much of the variation is explained
  • by the least squares line of y on x? ______
  • What is the correlation coefficient? ______

Horsepower -10.78 0.04weight (Equation of
the line.)
__________ y-value or response (horsepower) when
line crosses the y-axis.
_______ increase in response for a unit increase
in explanatory variable.
So if weight increases by one pound, horsepower
increases by 0.04 units (on average).
33
3. Relationships Between Two Numeric Variables
  • What is the effect of an outlier on correlation?
  • Adding a point that is not near the line and is
    far from the other points (an outlier in the y
    direction)

r 0.53 Note This point does not greatly
affect the estimated regression equation.
34
3. Relationships Between Two Numeric Variables
  • What is the effect of an outlier on correlation?
  • Adding a point that is near the line and is far
    from the other points (an outlier in the x
    direction)

r 0.94 Note This point greatly influences the
estimated regression equation (an influential
point.)
35
  • How does the correlation change when adding a
    point to data set?
  • Adding a point at the mean doesnt change
    anything (not even intercept or slope )
  • The further a point is from the mean, the more
    the correlation changes.

36
SUMMARY Descriptive Statistics
  • ONE POPULATION (Chapter 1)
  • Describing the distribution of a single variable
  • Categorical variable
  • Frequency table
  • Pie chart
  • Bar chart
  • Relative frequencies, Mode
  • Numeric variable
  • Measures of location (center(mean, median), Q1,
    Q3, min, max)
  • Measures of spread (standard deviation, range,
    IQR)
  • Frequency table
  • Stemplot
  • Histogram
  • Boxplot
  • Normal quantile plot

37
SUMMARY Descriptive Statistics
  • COMPARING POPULATIONS (Chapter 2 and 9.1)
  • Looking for associations between two variables 
  • Explanatory (independent) variable is
    categorical, Response (dependent) variable is
    numeric
  • Measures of location and spread by category
  • Side by side boxplot
  • Explanatory variable is categorical, Response
    variable is also categorical
  • Two-way table
  • Simpsons Paradox
  • Lurking variable
  • Explanatory variable is numeric, Response
    variable is also numeric
  • Scatter plot
  • Correlation, r
Write a Comment
User Comments (0)
About PowerShow.com