Methods%20for%20Proportions%20Relations%20between%20Categorical%20Variables - PowerPoint PPT Presentation

About This Presentation

Title:

Methods%20for%20Proportions%20Relations%20between%20Categorical%20Variables

Description:

4. Compare the calculated Chisquare ... Comparing Observed with Theoretical Proportions. Procedure: ... Comparing Several Proportions, Categories. Procedure: ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 20

Provided by: department97

Learn more at: http://www.departments.bucknell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Methods%20for%20Proportions%20Relations%20between%20Categorical%20Variables

1
Methods for ProportionsRelations between
Categorical Variables

Chapter 10

2
Goals for Chapter 10

1. Standard deviations for proportion differences
2. Confidence intervals and hypothesis tests for
proportion differences
3. Contingency Tables for several proportions
4. Statistical Significance
the Chi-Square statistic for a contingency table
5. Relative Risk, Increased Risk, Odds Ratio

3
Difference in Sample Proportions--standard
deviation, confidence interval

(notation pi xi/Ni is observed proportion for
sample i, i 1 or 2 for two samples note text
uses ?hat, ? with carat over it)
estimated s.d. for proportion difference
s?1- ?2 ?p1(1-p1)/N1 p2(1-p2)/N2
Confidence Interval for proportion difference
p1-p2 -z1-?/2 s?1- ?2 ?? ?1- ?2 ? p1-p2 z1-?/2
s?1- ?2
Hypothesis test for proportion difference
use z-statistic z (p1-p2)/ s?1- ?2
formula above assumes null hypothesis,
H0 population proportion difference
is 0.

4
Example for Proportion Differences

A study classified pregnant women according to
whether they smoked and whether they were able to
get pregnant during the first cycle they tried.
RESULTS

5
Calculating Conditional Percentages

What is proportion of women who smoke who also
become pregnant during the first cycle? 29 /100
29
What is proportion of women who dont smoke who
also become pregnant during the first cycle?
198/48640.7

6
Statistical Significance of 2x2 Tables--1

Strength of Relation Compare percents or rates
of those who do with those who dont
Example smokers pregnant for 1st cycle, 29
nonsmokers pregnant 1st cycle,
41
Therefore 41 / 29 1.4 times as likely to
become pregnant during 1st cycle if nonsmoker.
Size of Study Sample How does the number of
subjects affect the significance of the result?
Clearly the result becomes more significant (not
result of chance) as sample size increases.
If there had been only 59 women in the study,
difference in proportions would be much less
significant.

7
Assessing Statistical Significance of Tables--2

We use the Chi-Square Statistic to determine
whether differences between proportions is
real or due to chance.
The Chi-square statistic shows how the
distribution of observed proportions compared to
those expected on the basis of pure chance
varies for example, if we tossed snake-eyes in
craps on every throw we might think the dice were
loaded
For previous example, if there were no difference
between smokers and nonsmokers, we would expect
the proportions for both to be the same as in the
total
First cycle227/586 0.387 or 38.7 on the
basis of this expected proportion, we calculate
the numbers
.

8
Calculating the Chisquare Statistic-Expected
Values

Since the total number of smokers is 100, there
would be 100x 0.387 38.7 smokers pregnant in
the first cycle if there no difference between
smokers and non.
Since the total number of pregnant during the
first cycle is 227, there would be 227-38.7
188.3 nonsmokers pregnant during the first cycle,
.

9
Calculating the Chisquare Statistic--Differences

Once the Expected values for each cell are
calculated, we take the differences between the
observed and expected values for each cell i,
observedi - expectedi
Note that we only have to calculate one
difference the differences in rows or columns
have to sum to zero.

10
Calculating the Chisquare Statistic-2x2 Tables

Once the differences, Di, and expected values,
Ei, for each cell are calculated, then the
chisquare statistic is evaluated from the
formula.
?2 ? Di2 / Ei,
?2 (9.7)2 1/38.7 1/61.3 1/188.3
1/297.7 4.78
This value is greater than 3.84, the critical
value for chisquare
at a 95 significance level

11
Two X Two Tables and Chi-Square Statistics

Example Are males more likely to be
underachievers? Students classified as
underachievers if grades in high school below
the prediction given by a reading test at Age 12.

12
2x2 Table and Chisquare Statistic Example,cont.

Calculation of Chisquare Statistic for previous
example
1. Compute expected values boys under
(39/69)x34 19.2
girls under 34 - 19.2 14.8
boys over 39 - 19.2 19.8
girls over 30 - 14.8 15.2
2. Take the difference between observed and
expected, square it, and divide by expected for
each cell boys under (-6.8)2/ 19.2 2.41
girls under (6.8)2 / 14.8 3.12 boys over
(6.8)2 / 19.8 2.34 girls over (-6.8)2/
15.2 3.04.
3. Sum the terms calculated in 2 to get the
Chisquare statistic Chisquare
2.41 3.12 2.34 3.04 10.91
4. Compare the calculated Chisquare statistic
with 3.84 to determine significance (at the 95
level). In this example, 10.91 is much greater
than 3.84 so results (difference in proportion)
is statistically significant.

13
Risk and Odds

Both the Risk and Odds give information about
the likelihood of a positive response to a
categorical variable, but their numerical values
differ. Example 2x2 Table gives results for
stopping smoking after eight weeks use of either
a nicotine patch or placebo Note that the risk
of continuing to smoke after using the nicotine
patch is 0.47 or 47 compared to the greater
risk for the placebo use, 0.80 or 80 . Thus
the RISK is equivalent to the conditional
probability for the outcome variable, given a
response variable. The ODDS is the ratio of
these conditional probabilities for the two
outcome variables and can be less than or greater
than one.

14
Relative Risk and 2x2 Tables

Every 2x2 table will have two explanatory
variables (eg, for the previous slide, whether a
nicotine patch or a placebo was used.
The ratio of the risks for these two variables is
called the RELATIVE RISK.
Example
RR, relative risk of continuing to smoke if
placebo rather than nicotine patch used

RR 0.80 / 0.47 80 / 47 1.70

15
Odds Ratio and 2x2 Tables

The ratio of the odds for the two explanatory
variables is called, as might be expected, the
Odds Ratio (OR). If the odds are very small then
the Odds Ratio and Relative Risk are
approximately equal.
Examples
stopping smoking with nicotine patch
Odds(placebo) 96/24 4.0
Odds (nicotine) 64/56 1.1
Thus OR (96/24) / (64/ 56) 3.5,
(compared to RR 1.7)

16
Simpsons Paradox and Hidden Variables

Example 1972 admissions rates for graduate
programs, UC (Berkeley)--found overall that
percent of women applicants admitted was less
than the percent of men applicants, even though
women percentages were higher for individual
departments. (see Exercise 13). The paradox can
be explained by different overall selectivity in
each program and different proportions of men and
women applying to each program.

17
Goodness of Fit--Comparing Observed with
Theoretical Proportions

Procedure
tabulate observed frequencies,Ni,for each
category
tabulate expected (theoretical) frequencies, Ei
take difference between corresponding observed
and theoretical frequencies, Di Ni-Ei
calculate Chi-square statistic by formula
?2 ? Di2 / Ei,
with the degrees of freedom (df) k-1,
where k is the number of categories (one
proportion)
Example Exercises 10-14,10.15 (p 402).

18
Comparing Several Proportions, Categories

Procedure
Set up a k (no. of explanatory categories) by r
(number of response proportions) contingency
table
tabulate number for each cell in the table,
marginal totals for each category, Nk, response
variable, Nr, and grand total, N.
For the cell in the table corresponding to the
kth category, rth response variable, the expected
number (if the proportion would be the same for
all response variables) is Ekr (Nk Nr) / N
Calculate the difference Dkr Nkr - Ekr and
calculate a Chi-square statistic from the formula
?2 ? Dkr2 / Ekr
Degrees of freedom, df (k-1)(r-1)

19
Comparing Several Proportions, Categories

Example, Exercise 10.18, p.408
A study of potential age discrimination considers
promotions among middle managers in a large
company. The data are
promoted 9 under 30 29 in 30 to 39 category 32
in 40 to 49 and 10 in 50 and over category
(total 80)
not promoted 41 under 30 41 in 30 to 39
category,
48 in 40 to 49, and 49 in 50 and over (total
170)