Title: Methods%20for%20Proportions%20Relations%20between%20Categorical%20Variables
1Methods for ProportionsRelations between
Categorical Variables
2Goals for Chapter 10
- 1. Standard deviations for proportion differences
- 2. Confidence intervals and hypothesis tests for
proportion differences - 3. Contingency Tables for several proportions
- 4. Statistical Significance
- the Chi-Square statistic for a contingency table
- 5. Relative Risk, Increased Risk, Odds Ratio
3Difference in Sample Proportions--standard
deviation, confidence interval
- (notation pi xi/Ni is observed proportion for
sample i, i 1 or 2 for two samples note text
uses ?hat, ? with carat over it) - estimated s.d. for proportion difference
- s?1- ?2 ?p1(1-p1)/N1 p2(1-p2)/N2
- Confidence Interval for proportion difference
- p1-p2 -z1-?/2 s?1- ?2 ?? ?1- ?2 ? p1-p2 z1-?/2
s?1- ?2 - Hypothesis test for proportion difference
- use z-statistic z (p1-p2)/ s?1- ?2
- formula above assumes null hypothesis,
H0 population proportion difference
is 0.
4Example for Proportion Differences
- A study classified pregnant women according to
whether they smoked and whether they were able to
get pregnant during the first cycle they tried. - RESULTS
5Calculating Conditional Percentages
- What is proportion of women who smoke who also
become pregnant during the first cycle? 29 /100
29 - What is proportion of women who dont smoke who
also become pregnant during the first cycle?
198/48640.7
6Statistical Significance of 2x2 Tables--1
- Strength of Relation Compare percents or rates
of those who do with those who dont - Example smokers pregnant for 1st cycle, 29
- nonsmokers pregnant 1st cycle,
41 - Therefore 41 / 29 1.4 times as likely to
become pregnant during 1st cycle if nonsmoker. - Size of Study Sample How does the number of
subjects affect the significance of the result? - Clearly the result becomes more significant (not
result of chance) as sample size increases. - If there had been only 59 women in the study,
difference in proportions would be much less
significant.
7Assessing Statistical Significance of Tables--2
- We use the Chi-Square Statistic to determine
whether differences between proportions is
real or due to chance. - The Chi-square statistic shows how the
distribution of observed proportions compared to
those expected on the basis of pure chance
varies for example, if we tossed snake-eyes in
craps on every throw we might think the dice were
loaded - For previous example, if there were no difference
between smokers and nonsmokers, we would expect
the proportions for both to be the same as in the
total - First cycle227/586 0.387 or 38.7 on the
basis of this expected proportion, we calculate
the numbers - .
8Calculating the Chisquare Statistic-Expected
Values
- Since the total number of smokers is 100, there
would be 100x 0.387 38.7 smokers pregnant in
the first cycle if there no difference between
smokers and non. - Since the total number of pregnant during the
first cycle is 227, there would be 227-38.7
188.3 nonsmokers pregnant during the first cycle, - .
9Calculating the Chisquare Statistic--Differences
- Once the Expected values for each cell are
calculated, we take the differences between the
observed and expected values for each cell i,
observedi - expectedi - Note that we only have to calculate one
difference the differences in rows or columns
have to sum to zero.
10Calculating the Chisquare Statistic-2x2 Tables
- Once the differences, Di, and expected values,
Ei, for each cell are calculated, then the
chisquare statistic is evaluated from the
formula. - ?2 ? Di2 / Ei,
- ?2 (9.7)2 1/38.7 1/61.3 1/188.3
1/297.7 4.78 - This value is greater than 3.84, the critical
value for chisquare - at a 95 significance level
11Two X Two Tables and Chi-Square Statistics
- Example Are males more likely to be
underachievers? Students classified as
underachievers if grades in high school below
the prediction given by a reading test at Age 12.
122x2 Table and Chisquare Statistic Example,cont.
- Calculation of Chisquare Statistic for previous
example - 1. Compute expected values boys under
(39/69)x34 19.2
girls under 34 - 19.2 14.8
boys over 39 - 19.2 19.8
girls over 30 - 14.8 15.2 - 2. Take the difference between observed and
expected, square it, and divide by expected for
each cell boys under (-6.8)2/ 19.2 2.41
girls under (6.8)2 / 14.8 3.12 boys over
(6.8)2 / 19.8 2.34 girls over (-6.8)2/
15.2 3.04. - 3. Sum the terms calculated in 2 to get the
Chisquare statistic Chisquare
2.41 3.12 2.34 3.04 10.91 - 4. Compare the calculated Chisquare statistic
with 3.84 to determine significance (at the 95
level). In this example, 10.91 is much greater
than 3.84 so results (difference in proportion)
is statistically significant.
13Risk and Odds
- Both the Risk and Odds give information about
the likelihood of a positive response to a
categorical variable, but their numerical values
differ. Example 2x2 Table gives results for
stopping smoking after eight weeks use of either
a nicotine patch or placebo Note that the risk
of continuing to smoke after using the nicotine
patch is 0.47 or 47 compared to the greater
risk for the placebo use, 0.80 or 80 . Thus
the RISK is equivalent to the conditional
probability for the outcome variable, given a
response variable. The ODDS is the ratio of
these conditional probabilities for the two
outcome variables and can be less than or greater
than one.
14Relative Risk and 2x2 Tables
- Every 2x2 table will have two explanatory
variables (eg, for the previous slide, whether a
nicotine patch or a placebo was used. - The ratio of the risks for these two variables is
called the RELATIVE RISK. - Example
- RR, relative risk of continuing to smoke if
placebo rather than nicotine patch used
RR 0.80 / 0.47 80 / 47 1.70
15Odds Ratio and 2x2 Tables
- The ratio of the odds for the two explanatory
variables is called, as might be expected, the
Odds Ratio (OR). If the odds are very small then
the Odds Ratio and Relative Risk are
approximately equal. - Examples
- stopping smoking with nicotine patch
- Odds(placebo) 96/24 4.0
- Odds (nicotine) 64/56 1.1
- Thus OR (96/24) / (64/ 56) 3.5,
- (compared to RR 1.7)
16Simpsons Paradox and Hidden Variables
- Example 1972 admissions rates for graduate
programs, UC (Berkeley)--found overall that
percent of women applicants admitted was less
than the percent of men applicants, even though
women percentages were higher for individual
departments. (see Exercise 13). The paradox can
be explained by different overall selectivity in
each program and different proportions of men and
women applying to each program.
17Goodness of Fit--Comparing Observed with
Theoretical Proportions
- Procedure
- tabulate observed frequencies,Ni,for each
category - tabulate expected (theoretical) frequencies, Ei
- take difference between corresponding observed
and theoretical frequencies, Di Ni-Ei - calculate Chi-square statistic by formula
- ?2 ? Di2 / Ei,
- with the degrees of freedom (df) k-1,
where k is the number of categories (one
proportion) - Example Exercises 10-14,10.15 (p 402).
18Comparing Several Proportions, Categories
- Procedure
- Set up a k (no. of explanatory categories) by r
(number of response proportions) contingency
table - tabulate number for each cell in the table,
marginal totals for each category, Nk, response
variable, Nr, and grand total, N. - For the cell in the table corresponding to the
kth category, rth response variable, the expected
number (if the proportion would be the same for
all response variables) is Ekr (Nk Nr) / N - Calculate the difference Dkr Nkr - Ekr and
calculate a Chi-square statistic from the formula - ?2 ? Dkr2 / Ekr
- Degrees of freedom, df (k-1)(r-1)
19Comparing Several Proportions, Categories
- Example, Exercise 10.18, p.408
- A study of potential age discrimination considers
promotions among middle managers in a large
company. The data are - promoted 9 under 30 29 in 30 to 39 category 32
in 40 to 49 and 10 in 50 and over category
(total 80) - not promoted 41 under 30 41 in 30 to 39
category, - 48 in 40 to 49, and 49 in 50 and over (total
170)