Methods%20for%20Proportions%20Relations%20between%20Categorical%20Variables - PowerPoint PPT Presentation

About This Presentation
Title:

Methods%20for%20Proportions%20Relations%20between%20Categorical%20Variables

Description:

4. Compare the calculated Chisquare ... Comparing Observed with Theoretical Proportions. Procedure: ... Comparing Several Proportions, Categories. Procedure: ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Methods%20for%20Proportions%20Relations%20between%20Categorical%20Variables


1
Methods for ProportionsRelations between
Categorical Variables
  • Chapter 10

2
Goals for Chapter 10
  • 1. Standard deviations for proportion differences
  • 2. Confidence intervals and hypothesis tests for
    proportion differences
  • 3. Contingency Tables for several proportions
  • 4. Statistical Significance
  • the Chi-Square statistic for a contingency table
  • 5. Relative Risk, Increased Risk, Odds Ratio

3
Difference in Sample Proportions--standard
deviation, confidence interval
  • (notation pi xi/Ni is observed proportion for
    sample i, i 1 or 2 for two samples note text
    uses ?hat, ? with carat over it)
  • estimated s.d. for proportion difference
  • s?1- ?2 ?p1(1-p1)/N1 p2(1-p2)/N2
  • Confidence Interval for proportion difference
  • p1-p2 -z1-?/2 s?1- ?2 ?? ?1- ?2 ? p1-p2 z1-?/2
    s?1- ?2
  • Hypothesis test for proportion difference
  • use z-statistic z (p1-p2)/ s?1- ?2
  • formula above assumes null hypothesis,
    H0 population proportion difference
    is 0.

4
Example for Proportion Differences
  • A study classified pregnant women according to
    whether they smoked and whether they were able to
    get pregnant during the first cycle they tried.
  • RESULTS

5
Calculating Conditional Percentages
  • What is proportion of women who smoke who also
    become pregnant during the first cycle? 29 /100
    29
  • What is proportion of women who dont smoke who
    also become pregnant during the first cycle?
    198/48640.7

6
Statistical Significance of 2x2 Tables--1
  • Strength of Relation Compare percents or rates
    of those who do with those who dont
  • Example smokers pregnant for 1st cycle, 29
  • nonsmokers pregnant 1st cycle,
    41
  • Therefore 41 / 29 1.4 times as likely to
    become pregnant during 1st cycle if nonsmoker.
  • Size of Study Sample How does the number of
    subjects affect the significance of the result?
  • Clearly the result becomes more significant (not
    result of chance) as sample size increases.
  • If there had been only 59 women in the study,
    difference in proportions would be much less
    significant.

7
Assessing Statistical Significance of Tables--2
  • We use the Chi-Square Statistic to determine
    whether differences between proportions is
    real or due to chance.
  • The Chi-square statistic shows how the
    distribution of observed proportions compared to
    those expected on the basis of pure chance
    varies for example, if we tossed snake-eyes in
    craps on every throw we might think the dice were
    loaded
  • For previous example, if there were no difference
    between smokers and nonsmokers, we would expect
    the proportions for both to be the same as in the
    total
  • First cycle227/586 0.387 or 38.7 on the
    basis of this expected proportion, we calculate
    the numbers
  • .

8
Calculating the Chisquare Statistic-Expected
Values
  • Since the total number of smokers is 100, there
    would be 100x 0.387 38.7 smokers pregnant in
    the first cycle if there no difference between
    smokers and non.
  • Since the total number of pregnant during the
    first cycle is 227, there would be 227-38.7
    188.3 nonsmokers pregnant during the first cycle,
  • .

9
Calculating the Chisquare Statistic--Differences
  • Once the Expected values for each cell are
    calculated, we take the differences between the
    observed and expected values for each cell i,
    observedi - expectedi
  • Note that we only have to calculate one
    difference the differences in rows or columns
    have to sum to zero.

10
Calculating the Chisquare Statistic-2x2 Tables
  • Once the differences, Di, and expected values,
    Ei, for each cell are calculated, then the
    chisquare statistic is evaluated from the
    formula.
  • ?2 ? Di2 / Ei,
  • ?2 (9.7)2 1/38.7 1/61.3 1/188.3
    1/297.7 4.78
  • This value is greater than 3.84, the critical
    value for chisquare
  • at a 95 significance level

11
Two X Two Tables and Chi-Square Statistics
  • Example Are males more likely to be
    underachievers? Students classified as
    underachievers if grades in high school below
    the prediction given by a reading test at Age 12.

12
2x2 Table and Chisquare Statistic Example,cont.
  • Calculation of Chisquare Statistic for previous
    example
  • 1. Compute expected values boys under
    (39/69)x34 19.2
    girls under 34 - 19.2 14.8
    boys over 39 - 19.2 19.8
    girls over 30 - 14.8 15.2
  • 2. Take the difference between observed and
    expected, square it, and divide by expected for
    each cell boys under (-6.8)2/ 19.2 2.41
    girls under (6.8)2 / 14.8 3.12 boys over
    (6.8)2 / 19.8 2.34 girls over (-6.8)2/
    15.2 3.04.
  • 3. Sum the terms calculated in 2 to get the
    Chisquare statistic Chisquare
    2.41 3.12 2.34 3.04 10.91
  • 4. Compare the calculated Chisquare statistic
    with 3.84 to determine significance (at the 95
    level). In this example, 10.91 is much greater
    than 3.84 so results (difference in proportion)
    is statistically significant.

13
Risk and Odds
  • Both the Risk and Odds give information about
    the likelihood of a positive response to a
    categorical variable, but their numerical values
    differ. Example 2x2 Table gives results for
    stopping smoking after eight weeks use of either
    a nicotine patch or placebo Note that the risk
    of continuing to smoke after using the nicotine
    patch is 0.47 or 47 compared to the greater
    risk for the placebo use, 0.80 or 80 . Thus
    the RISK is equivalent to the conditional
    probability for the outcome variable, given a
    response variable. The ODDS is the ratio of
    these conditional probabilities for the two
    outcome variables and can be less than or greater
    than one.

14
Relative Risk and 2x2 Tables
  • Every 2x2 table will have two explanatory
    variables (eg, for the previous slide, whether a
    nicotine patch or a placebo was used.
  • The ratio of the risks for these two variables is
    called the RELATIVE RISK.
  • Example
  • RR, relative risk of continuing to smoke if
    placebo rather than nicotine patch used

    RR 0.80 / 0.47 80 / 47 1.70


15
Odds Ratio and 2x2 Tables
  • The ratio of the odds for the two explanatory
    variables is called, as might be expected, the
    Odds Ratio (OR). If the odds are very small then
    the Odds Ratio and Relative Risk are
    approximately equal.
  • Examples
  • stopping smoking with nicotine patch
  • Odds(placebo) 96/24 4.0
  • Odds (nicotine) 64/56 1.1
  • Thus OR (96/24) / (64/ 56) 3.5,
  • (compared to RR 1.7)

16
Simpsons Paradox and Hidden Variables
  • Example 1972 admissions rates for graduate
    programs, UC (Berkeley)--found overall that
    percent of women applicants admitted was less
    than the percent of men applicants, even though
    women percentages were higher for individual
    departments. (see Exercise 13). The paradox can
    be explained by different overall selectivity in
    each program and different proportions of men and
    women applying to each program.

17
Goodness of Fit--Comparing Observed with
Theoretical Proportions
  • Procedure
  • tabulate observed frequencies,Ni,for each
    category
  • tabulate expected (theoretical) frequencies, Ei
  • take difference between corresponding observed
    and theoretical frequencies, Di Ni-Ei
  • calculate Chi-square statistic by formula
  • ?2 ? Di2 / Ei,
  • with the degrees of freedom (df) k-1,
    where k is the number of categories (one
    proportion)
  • Example Exercises 10-14,10.15 (p 402).

18
Comparing Several Proportions, Categories
  • Procedure
  • Set up a k (no. of explanatory categories) by r
    (number of response proportions) contingency
    table
  • tabulate number for each cell in the table,
    marginal totals for each category, Nk, response
    variable, Nr, and grand total, N.
  • For the cell in the table corresponding to the
    kth category, rth response variable, the expected
    number (if the proportion would be the same for
    all response variables) is Ekr (Nk Nr) / N
  • Calculate the difference Dkr Nkr - Ekr and
    calculate a Chi-square statistic from the formula
  • ?2 ? Dkr2 / Ekr
  • Degrees of freedom, df (k-1)(r-1)

19
Comparing Several Proportions, Categories
  • Example, Exercise 10.18, p.408
  • A study of potential age discrimination considers
    promotions among middle managers in a large
    company. The data are
  • promoted 9 under 30 29 in 30 to 39 category 32
    in 40 to 49 and 10 in 50 and over category
    (total 80)
  • not promoted 41 under 30 41 in 30 to 39
    category,
  • 48 in 40 to 49, and 49 in 50 and over (total
    170)
Write a Comment
User Comments (0)
About PowerShow.com