Contingency Tables - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Contingency Tables

Description:

Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven Definition of Contingency Tables Basic Statistics SPSS program (Crosstabulation) – PowerPoint PPT presentation

Number of Views:121

Avg rating:3.0/5.0

Slides: 35

Provided by: AppliedMa7

Learn more at: http://www.ams.sunysb.edu

Category:

more less

Transcript and Presenter's Notes

Title: Contingency Tables

1
Contingency Tables

Chapters Seven, Sixteen, and Eighteen
Chapter Seven
Definition of Contingency Tables
Basic Statistics
SPSS program (Crosstabulation)
Chapter Sixteen
Basic Probability Theory Concepts
Test of Hypothesis of Independence

2
Contingency Tables (continued)

Chapter Eighteen
Measures of Association
For nominal variables
For ordinal variables

3
Basic Empirical Situation

Unit of data.
Two nominal scales measured for each unit.
Example interview study, sex of respondent,
variable such as whether or not subject has a
cellular telephone.
Objective is to compare males and females with
respect to what fraction have cellular telephones.

4
Crosstabulation of Data

Prepare a data file for study.
One record per subject.
Three variables per record subject ID, sex of
subject, and indicator variable of whether
subject has cellular telephone.
SPSS analysis
Statistics, summarize, crosstabs
Basic information is the contingency table.

5
Two Common Situations

Hypothesized causal relation between variables.
No hypothesized causal relation.

6
Hypothesized Causal Relation

Classification of variables
Independent variable is one hypothesized to be
cause. Example sex of respondent.
Dependent variable is hypothesized to be the
effect. Example whether or not subject has
cellular telephone.
Format convention
Columns to categories of independent variable
Rows to categories of dependent variable

7
Association Study

No hypothesized causal mechanism.
Whether or not subject above median on verbal SAT
and whether or not above median on quantitative
SAT.
No convention about assigning variables to rows
and columns.

8
Contingency Table

One column for each value of the column variable
C is the number of columns.
One row for each value of the row variable R is
the number of rows.
R x C contingency table.

9
Contingency Table

Each entry is the OBSERVED COUNT O(i,j) of the
number of units having the (i,j) contingency.
Column of marginal totals.
Row of marginal totals.

10
Example Contingency Table (Hypothetical)
11
Example Contingency Table (Hypothetical)

Entry 60 in the upper left hand corner means that
there were 60 male respondents who owned a
cellular telephone.
ASSUME marginal totals are known
THEN, knowing entry of 60 means that you can
deduce all other entries.
This 2 x 2 table has one degree of freedom.
R x C table has (R-1)(C-1) degrees of freedom.

12
Row and Column Percentages

Natural to use percentages rather than raw
counts.
Remember that you want to use these numbers for
comparison purposes.
The term rate is often used to refer to a
percentage or probability.
Can ask for column percentages, row percentages,
or both.
Percentage in the direction of the independent
variable (usually the column).

13
Relation of Percentages to Probabilities

ASSUME that the column variable is the
independent variable.
THEN the column percentages are estimates of the
conditional probabilities given the setting of
the independent variable.
The basic questions revolve around whether or not
the conditional distributions are the same for
all settings of the independent variable.

14
Bar Charts

Graphical means of presenting data.
SPSS analysis
Graphs, bar chart.
Can use either count scale or percentage scale
(prefer percentage scale).
Can have bars side by side or stacked.

15
Generalization of the R x C contingency table

Can have three or more variables to classify each
subject. These are called layers.
In example, can add whether respondent is student
in college or student in high school.

16
Chapter Sixteen Comparing Observed and Expected
Counts

Basic hypothesis
Definitions of expected counts.
Chi-squared test of independence.

17
Basic Hypothesis

ASSUME column variable is the independent
variable.
Hypothesis is independence.
That is, the conditional distribution in any
column is the same as the conditional
distribution in any other column.

18
Expected Count

Basic idea is proportional allocation of
observations in a column based on column total.
Expected count in (i, j ) contingency E(i,j)
total number in column j total number in row
i/total number in table.
Expected count need not be an integer one
expected count for each contingency.

19
Residual

Residual in (i,j) contingency observed count in
(i,j) contingency - expected count in (i,j)
contingency.
That is, R(i,j) O(i,j)-E(i,j)
One residual for each contingency.

20
Pearson Chi-squared Component

Chi-squared component for (i, j) contingency
C(i,j) (Residual in (i, j) contingency)2/expecte
d count in (i, j) contingency.
C(i,j)(R(i,j))2 / E(i,j)

21
Assessing Pearson Component

Rough guides on whether the (i, j) contingency
has an excessively large chi-squared component
C(i,j)
the observed significance level of 3.84 is about
0.05.
Of 6.63 is about 0.01.
Of 10.83 is 0.001.

22
Pearson Chi-Squared Test

Sum C(i,j) over all contingencies.
Pearson chi-squared test has (R-1)(C-1) degrees
of freedom.
Under null hypothesis
Expected value of chi-square equals its degrees
of freedom.
Variance is twice its degrees of freedom

23
Special Case of 2 x 2 Contingency Table
24
Chi-squared test for a 2x2 table

1 degree of freedom (R-1)(C-1)1
Value of chi-squared test is given by
N(AD-BC)2 /(AB)(CD)(AC)(BD)
There is a correction for continuity

25
Computer Output for Chi-Squared Tests

Output gives value of test.
Asymptotic significance level (p-value)
Four types of test
Pearson chi-squared
Pearson chi-squared with continuity correction
Likelihood ratio test (theoretically strong test)
Fishers exact test (most accepted, if given.

26
Example Problem Set

The independent variable is whether or not the
subject reported using marijuana at time 3 in a
study (time 3 is roughly in later high school).
The dependent variable is whether or not the
subject reported using marijuana at time 4 in a
study (time 4 is roughly in middle college or
beginning independent living). The contingency
table is on the next slide.

27
Marijuana Use at Time 4 by Marijuana Use at Time 3
28
Example Question 1

Which of the following conclusions is correct
about the test of the null hypothesis that the
distribution of whether or not a subject uses
marijuana at time 3 is independent of whether the
subject uses marijuana at time 4?
Usual options.

29
Solution to question 1

Find the significance level in the chi-square
test output. Pearson chi-square (without and with
continuity correction), likelihood ratio, and
Fishers exact had significance levels of 0.000.
Option A (reject at the 0.001 level of
significance) is the correct choice.

30
Example Question 2

How many degrees of freedom does the contingency
table describing this output have?
Solution (R-1)(C-1)(2-1)(2-1)1.

31
Example Question 3

Specify how the expected count of 97.8 for
subjects who did use marijuana at time 3 and
time 4 was calculated?
Solution
Total number using at time 3 was 151.
Total number using at time 4 was 237.
Total N was 366.
Expected Count151237/366.

32
Example Question 4

Compute the contribution to Pearsons chi-square
statistic from the cell used marijuana at time 3
and used marijuana at time 4.
Solution
Observed count was 142
Expected count was 97.8
Component(142-97.8)2/97.819.97

33
Example Question 5

Describe the pattern of association between these
two variables.
Solution. There was a strong dependence between
the two variables. About 44 percent of nonusers
at time 3 used at time 4, compared to 94 percent
of users at time 3. That is, marijuana usage
increases very consistently over time.

34
Review