Title: Contingency Tables
1Contingency Tables
- Chapters Seven, Sixteen, and Eighteen
- Chapter Seven
- Definition of Contingency Tables
- Basic Statistics
- SPSS program (Crosstabulation)
- Chapter Sixteen
- Basic Probability Theory Concepts
- Test of Hypothesis of Independence
2Contingency Tables (continued)
- Chapter Eighteen
- Measures of Association
- For nominal variables
- For ordinal variables
3Basic Empirical Situation
- Unit of data.
- Two nominal scales measured for each unit.
- Example interview study, sex of respondent,
variable such as whether or not subject has a
cellular telephone. - Objective is to compare males and females with
respect to what fraction have cellular telephones.
4Crosstabulation of Data
- Prepare a data file for study.
- One record per subject.
- Three variables per record subject ID, sex of
subject, and indicator variable of whether
subject has cellular telephone. - SPSS analysis
- Statistics, summarize, crosstabs
- Basic information is the contingency table.
5Two Common Situations
- Hypothesized causal relation between variables.
- No hypothesized causal relation.
6Hypothesized Causal Relation
- Classification of variables
- Independent variable is one hypothesized to be
cause. Example sex of respondent. - Dependent variable is hypothesized to be the
effect. Example whether or not subject has
cellular telephone. - Format convention
- Columns to categories of independent variable
- Rows to categories of dependent variable
7Association Study
- No hypothesized causal mechanism.
- Whether or not subject above median on verbal SAT
and whether or not above median on quantitative
SAT. - No convention about assigning variables to rows
and columns.
8Contingency Table
- One column for each value of the column variable
C is the number of columns. - One row for each value of the row variable R is
the number of rows. - R x C contingency table.
9Contingency Table
- Each entry is the OBSERVED COUNT O(i,j) of the
number of units having the (i,j) contingency. - Column of marginal totals.
- Row of marginal totals.
10Example Contingency Table (Hypothetical)
11Example Contingency Table (Hypothetical)
- Entry 60 in the upper left hand corner means that
there were 60 male respondents who owned a
cellular telephone. - ASSUME marginal totals are known
- THEN, knowing entry of 60 means that you can
deduce all other entries. - This 2 x 2 table has one degree of freedom.
- R x C table has (R-1)(C-1) degrees of freedom.
12Row and Column Percentages
- Natural to use percentages rather than raw
counts. - Remember that you want to use these numbers for
comparison purposes. - The term rate is often used to refer to a
percentage or probability. - Can ask for column percentages, row percentages,
or both. - Percentage in the direction of the independent
variable (usually the column).
13Relation of Percentages to Probabilities
- ASSUME that the column variable is the
independent variable. - THEN the column percentages are estimates of the
conditional probabilities given the setting of
the independent variable. - The basic questions revolve around whether or not
the conditional distributions are the same for
all settings of the independent variable.
14Bar Charts
- Graphical means of presenting data.
- SPSS analysis
- Graphs, bar chart.
- Can use either count scale or percentage scale
(prefer percentage scale). - Can have bars side by side or stacked.
15Generalization of the R x C contingency table
- Can have three or more variables to classify each
subject. These are called layers. - In example, can add whether respondent is student
in college or student in high school.
16Chapter Sixteen Comparing Observed and Expected
Counts
- Basic hypothesis
- Definitions of expected counts.
- Chi-squared test of independence.
17Basic Hypothesis
- ASSUME column variable is the independent
variable. - Hypothesis is independence.
- That is, the conditional distribution in any
column is the same as the conditional
distribution in any other column.
18Expected Count
- Basic idea is proportional allocation of
observations in a column based on column total. - Expected count in (i, j ) contingency E(i,j)
total number in column j total number in row
i/total number in table. - Expected count need not be an integer one
expected count for each contingency.
19Residual
- Residual in (i,j) contingency observed count in
(i,j) contingency - expected count in (i,j)
contingency. - That is, R(i,j) O(i,j)-E(i,j)
- One residual for each contingency.
20Pearson Chi-squared Component
- Chi-squared component for (i, j) contingency
C(i,j) (Residual in (i, j) contingency)2/expecte
d count in (i, j) contingency. - C(i,j)(R(i,j))2 / E(i,j)
21Assessing Pearson Component
- Rough guides on whether the (i, j) contingency
has an excessively large chi-squared component
C(i,j) - the observed significance level of 3.84 is about
0.05. - Of 6.63 is about 0.01.
- Of 10.83 is 0.001.
22Pearson Chi-Squared Test
- Sum C(i,j) over all contingencies.
- Pearson chi-squared test has (R-1)(C-1) degrees
of freedom. - Under null hypothesis
- Expected value of chi-square equals its degrees
of freedom. - Variance is twice its degrees of freedom
23Special Case of 2 x 2 Contingency Table
24Chi-squared test for a 2x2 table
- 1 degree of freedom (R-1)(C-1)1
- Value of chi-squared test is given by
- N(AD-BC)2 /(AB)(CD)(AC)(BD)
- There is a correction for continuity
25Computer Output for Chi-Squared Tests
- Output gives value of test.
- Asymptotic significance level (p-value)
- Four types of test
- Pearson chi-squared
- Pearson chi-squared with continuity correction
- Likelihood ratio test (theoretically strong test)
- Fishers exact test (most accepted, if given.
26Example Problem Set
- The independent variable is whether or not the
subject reported using marijuana at time 3 in a
study (time 3 is roughly in later high school).
The dependent variable is whether or not the
subject reported using marijuana at time 4 in a
study (time 4 is roughly in middle college or
beginning independent living). The contingency
table is on the next slide.
27Marijuana Use at Time 4 by Marijuana Use at Time 3
28Example Question 1
- Which of the following conclusions is correct
about the test of the null hypothesis that the
distribution of whether or not a subject uses
marijuana at time 3 is independent of whether the
subject uses marijuana at time 4? - Usual options.
29Solution to question 1
- Find the significance level in the chi-square
test output. Pearson chi-square (without and with
continuity correction), likelihood ratio, and
Fishers exact had significance levels of 0.000. - Option A (reject at the 0.001 level of
significance) is the correct choice.
30Example Question 2
- How many degrees of freedom does the contingency
table describing this output have? - Solution (R-1)(C-1)(2-1)(2-1)1.
31Example Question 3
- Specify how the expected count of 97.8 for
subjects who did use marijuana at time 3 and
time 4 was calculated? - Solution
- Total number using at time 3 was 151.
- Total number using at time 4 was 237.
- Total N was 366.
- Expected Count151237/366.
32Example Question 4
- Compute the contribution to Pearsons chi-square
statistic from the cell used marijuana at time 3
and used marijuana at time 4. - Solution
- Observed count was 142
- Expected count was 97.8
- Component(142-97.8)2/97.819.97
33Example Question 5
- Describe the pattern of association between these
two variables. - Solution. There was a strong dependence between
the two variables. About 44 percent of nonusers
at time 3 used at time 4, compared to 94 percent
of users at time 3. That is, marijuana usage
increases very consistently over time.
34Review
- Basic introduction to contingency tables.
- Study Chapter 18 for next lecture.