Contingency Tables - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Contingency Tables

Description:

Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven Definition of Contingency Tables Basic Statistics SPSS program (Crosstabulation) – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 35
Provided by: AppliedMa7
Category:

less

Transcript and Presenter's Notes

Title: Contingency Tables


1
Contingency Tables
  • Chapters Seven, Sixteen, and Eighteen
  • Chapter Seven
  • Definition of Contingency Tables
  • Basic Statistics
  • SPSS program (Crosstabulation)
  • Chapter Sixteen
  • Basic Probability Theory Concepts
  • Test of Hypothesis of Independence

2
Contingency Tables (continued)
  • Chapter Eighteen
  • Measures of Association
  • For nominal variables
  • For ordinal variables

3
Basic Empirical Situation
  • Unit of data.
  • Two nominal scales measured for each unit.
  • Example interview study, sex of respondent,
    variable such as whether or not subject has a
    cellular telephone.
  • Objective is to compare males and females with
    respect to what fraction have cellular telephones.

4
Crosstabulation of Data
  • Prepare a data file for study.
  • One record per subject.
  • Three variables per record subject ID, sex of
    subject, and indicator variable of whether
    subject has cellular telephone.
  • SPSS analysis
  • Statistics, summarize, crosstabs
  • Basic information is the contingency table.

5
Two Common Situations
  • Hypothesized causal relation between variables.
  • No hypothesized causal relation.

6
Hypothesized Causal Relation
  • Classification of variables
  • Independent variable is one hypothesized to be
    cause. Example sex of respondent.
  • Dependent variable is hypothesized to be the
    effect. Example whether or not subject has
    cellular telephone.
  • Format convention
  • Columns to categories of independent variable
  • Rows to categories of dependent variable

7
Association Study
  • No hypothesized causal mechanism.
  • Whether or not subject above median on verbal SAT
    and whether or not above median on quantitative
    SAT.
  • No convention about assigning variables to rows
    and columns.

8
Contingency Table
  • One column for each value of the column variable
    C is the number of columns.
  • One row for each value of the row variable R is
    the number of rows.
  • R x C contingency table.

9
Contingency Table
  • Each entry is the OBSERVED COUNT O(i,j) of the
    number of units having the (i,j) contingency.
  • Column of marginal totals.
  • Row of marginal totals.

10
Example Contingency Table (Hypothetical)
11
Example Contingency Table (Hypothetical)
  • Entry 60 in the upper left hand corner means that
    there were 60 male respondents who owned a
    cellular telephone.
  • ASSUME marginal totals are known
  • THEN, knowing entry of 60 means that you can
    deduce all other entries.
  • This 2 x 2 table has one degree of freedom.
  • R x C table has (R-1)(C-1) degrees of freedom.

12
Row and Column Percentages
  • Natural to use percentages rather than raw
    counts.
  • Remember that you want to use these numbers for
    comparison purposes.
  • The term rate is often used to refer to a
    percentage or probability.
  • Can ask for column percentages, row percentages,
    or both.
  • Percentage in the direction of the independent
    variable (usually the column).

13
Relation of Percentages to Probabilities
  • ASSUME that the column variable is the
    independent variable.
  • THEN the column percentages are estimates of the
    conditional probabilities given the setting of
    the independent variable.
  • The basic questions revolve around whether or not
    the conditional distributions are the same for
    all settings of the independent variable.

14
Bar Charts
  • Graphical means of presenting data.
  • SPSS analysis
  • Graphs, bar chart.
  • Can use either count scale or percentage scale
    (prefer percentage scale).
  • Can have bars side by side or stacked.

15
Generalization of the R x C contingency table
  • Can have three or more variables to classify each
    subject. These are called layers.
  • In example, can add whether respondent is student
    in college or student in high school.

16
Chapter Sixteen Comparing Observed and Expected
Counts
  • Basic hypothesis
  • Definitions of expected counts.
  • Chi-squared test of independence.

17
Basic Hypothesis
  • ASSUME column variable is the independent
    variable.
  • Hypothesis is independence.
  • That is, the conditional distribution in any
    column is the same as the conditional
    distribution in any other column.

18
Expected Count
  • Basic idea is proportional allocation of
    observations in a column based on column total.
  • Expected count in (i, j ) contingency E(i,j)
    total number in column j total number in row
    i/total number in table.
  • Expected count need not be an integer one
    expected count for each contingency.

19
Residual
  • Residual in (i,j) contingency observed count in
    (i,j) contingency - expected count in (i,j)
    contingency.
  • That is, R(i,j) O(i,j)-E(i,j)
  • One residual for each contingency.

20
Pearson Chi-squared Component
  • Chi-squared component for (i, j) contingency
    C(i,j) (Residual in (i, j) contingency)2/expecte
    d count in (i, j) contingency.
  • C(i,j)(R(i,j))2 / E(i,j)

21
Assessing Pearson Component
  • Rough guides on whether the (i, j) contingency
    has an excessively large chi-squared component
    C(i,j)
  • the observed significance level of 3.84 is about
    0.05.
  • Of 6.63 is about 0.01.
  • Of 10.83 is 0.001.

22
Pearson Chi-Squared Test
  • Sum C(i,j) over all contingencies.
  • Pearson chi-squared test has (R-1)(C-1) degrees
    of freedom.
  • Under null hypothesis
  • Expected value of chi-square equals its degrees
    of freedom.
  • Variance is twice its degrees of freedom

23
Special Case of 2 x 2 Contingency Table
24
Chi-squared test for a 2x2 table
  • 1 degree of freedom (R-1)(C-1)1
  • Value of chi-squared test is given by
  • N(AD-BC)2 /(AB)(CD)(AC)(BD)
  • There is a correction for continuity

25
Computer Output for Chi-Squared Tests
  • Output gives value of test.
  • Asymptotic significance level (p-value)
  • Four types of test
  • Pearson chi-squared
  • Pearson chi-squared with continuity correction
  • Likelihood ratio test (theoretically strong test)
  • Fishers exact test (most accepted, if given.

26
Example Problem Set
  • The independent variable is whether or not the
    subject reported using marijuana at time 3 in a
    study (time 3 is roughly in later high school).
    The dependent variable is whether or not the
    subject reported using marijuana at time 4 in a
    study (time 4 is roughly in middle college or
    beginning independent living). The contingency
    table is on the next slide.

27
Marijuana Use at Time 4 by Marijuana Use at Time 3
28
Example Question 1
  • Which of the following conclusions is correct
    about the test of the null hypothesis that the
    distribution of whether or not a subject uses
    marijuana at time 3 is independent of whether the
    subject uses marijuana at time 4?
  • Usual options.

29
Solution to question 1
  • Find the significance level in the chi-square
    test output. Pearson chi-square (without and with
    continuity correction), likelihood ratio, and
    Fishers exact had significance levels of 0.000.
  • Option A (reject at the 0.001 level of
    significance) is the correct choice.

30
Example Question 2
  • How many degrees of freedom does the contingency
    table describing this output have?
  • Solution (R-1)(C-1)(2-1)(2-1)1.

31
Example Question 3
  • Specify how the expected count of 97.8 for
    subjects who did use marijuana at time 3 and
    time 4 was calculated?
  • Solution
  • Total number using at time 3 was 151.
  • Total number using at time 4 was 237.
  • Total N was 366.
  • Expected Count151237/366.

32
Example Question 4
  • Compute the contribution to Pearsons chi-square
    statistic from the cell used marijuana at time 3
    and used marijuana at time 4.
  • Solution
  • Observed count was 142
  • Expected count was 97.8
  • Component(142-97.8)2/97.819.97

33
Example Question 5
  • Describe the pattern of association between these
    two variables.
  • Solution. There was a strong dependence between
    the two variables. About 44 percent of nonusers
    at time 3 used at time 4, compared to 94 percent
    of users at time 3. That is, marijuana usage
    increases very consistently over time.

34
Review
  • Basic introduction to contingency tables.
  • Study Chapter 18 for next lecture.
Write a Comment
User Comments (0)
About PowerShow.com