Review of observational study design and basic statistics for contingency tables - PowerPoint PPT Presentation

About This Presentation
Title:

Review of observational study design and basic statistics for contingency tables

Description:

Relationship between atherosclerosis and late-life depression (Tiemeier et al. ... P('E')= Prevalence of atherosclerosis (coronary calcification 500): (511 12 ... – PowerPoint PPT presentation

Number of Views:375
Avg rating:3.0/5.0
Slides: 71
Provided by: Joh74
Learn more at: https://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Review of observational study design and basic statistics for contingency tables


1
Review of observational study design and basic
statistics for contingency tables
2
(No Transcript)
3
(No Transcript)
4
  • Coffee Chronicles
  • BY MELISSA AUGUST, ANN MARIE BONARDI, VAL
    CASTRONOVO, MATTHEW
  • JOE'S BLOWS Last week researchers reported that
    coffee might help prevent Parkinson's disease. So
    is the caffeine bean good for you or not? Over
    the years, studies haven't exactly been clear
  • According to scientists, too much coffee may
    cause...
  • 1986 --phobias, --panic attacks
  • 1990 --heart attacks, --stress, --osteoporosis
  • 1991 -underweight babies, --hypertension
  • 1992 --higher cholesterol
  • 1993, 08 --miscarriages
  • 1994 --intensified stress
  • 1995 --delayed conception
  • But scientists say coffee also may help
    prevent...
  • 1988 --asthma
  • 1990 --colon and rectal cancer,...
  • 2004Type II Diabetes (6 cups per day!)
  • 2006alcohol-induced liver damage
  • 2007skin cancer

5
Medical Studies
The General Idea
Evaluate whether a risk factor (or preventative
factor) increases (or decreases) your risk for an
outcome (usually disease, death or intermediary
to disease).
6
Observational vs. Experimental Studies
Observational studies the population is
observed without any interference by the
investigator
Experimental studies the investigator tries to
control the environment in which the hypothesis
is tested (the randomized, double-blind clinical
trial is the gold standard)
7
Limitation of observational research confounding
  • Confounding risk factors dont happen in
    isolation, except in a controlled experiment.
  • Example In a case-control study of a salmonella
    outbreak, tomatoes were identified as the source
    of the infection. But the association was
    spurious. Tomatoes are often eaten with serrano
    and jalapeno peppers, which turned out to be the
    true source of infection.
  • Example Breastfeeding has been linked to higher
    IQ in infants, but the association could be due
    to confounding by socioeconomic status. Women who
    breastfeed tend to be better educated and have
    better prenatal care, which may explain the
    higher IQ in their infants.

8
Confounding A major problem for observational
studies
9
Why Observational Studies?
  • Cheaper
  • Faster
  • Can examine long-term effects
  • Hypothesis-generating
  • Sometimes, experimental studies are not ethical
    (e.g., randomizing subjects to smoke)

10
Possible Observational Study Designs
  • Cross-sectional studies
  • Cohort studies
  • Case-control studies

11
Cross-Sectional (Prevalence) Studies
  • Measure disease and exposure on a random sample
    of the population of interest. Are they
    associated?
  • Marginal probabilities of exposure AND disease
    are valid, but only measures association at a
    single time point.

12
The 2x2 Table
N
13
Example cross-sectional study
  • Relationship between atherosclerosis and
    late-life depression (Tiemeier et al. Arch Gen
    Psychiatry, 2004).
  • Methods Researchers measured the prevalence of
    coronary artery calcification (atherosclerosis)
    and the prevalence of depressive symptoms in a
    large cohort of elderly men and women in
    Rotterdam (n1920).

14
Example cross-sectional study
P(D) Prevalence of depression (sub-thresshold
or depressive disorder) (20131291116)/1920
4.2
P(E) Prevalence of atherosclerosis (coronary
calcification gt500) (5111216)/1920 28.1
15
The 2x2 table
P(depression) 81/1920 4.2
P(atherosclerosis) 539/1920 28.1
P(depression/atherosclerosis) 28/539 5.2
16
Difference of proportions Z-test
17
Or, use relative risk (risk ratio)
Interpretation those with coronary calcification
are 37 more likely to have depression (not
significant).
18
Or, use chi-square test
Observed
Expected
19
Chi-square test
Note 1.77 1.332
20
Chi-square test also works for bigger contingency
tables (RxC)
21
Chi-square test also works for bigger contingency
tables (RxC)
Coronary calcification No depression Sub-threshhold depressive symptoms Clinical depressive disorder
0-100 865 20 9
101-500 463 13 11
gt500 511 12 16
22
Observed
Expected
Coronary calcification No depression Sub-threshhold depressive symptoms Clinical depressive disorder
0-100 865 20 9 894
101-500 463 13 11 487
gt500 511 12 16 539
1839 45 36 1920
Coronary calcification No depression Sub-threshhold depressive symptoms Clinical depressive disorder
0-100 8941839/1920 856.3 84945/1920 21 894-(21856.3)16.7
101-500 4871839/1920 466.5 48745/1920 11.4 487-(466.511.4)9.1
gt500 1839-(856.3466.5) 516.2 45-(2111.4) 12.6 36-(16.79.1) 10.2
23
Chi-square test
24
Cause and effect?
depression in elderly
atherosclerosis
25
Confounding?
depression in elderly
atherosclerosis
26
Cross-Sectional Studies
  • Advantages
  • cheap and easy
  • generalizable
  • good for characteristics that (generally) dont
    change like genes or gender
  • Disadvantages
  • difficult to determine cause and effect
  • problematic for rare diseases and exposures

27
2. Cohort studies
  • Sample on exposure status and track disease
    development (for rare exposures)
  • Marginal probabilities (and rates) of developing
    disease for exposure groups are valid.

28
Example The Framingham Heart Study
  • The Framingham Heart Study was established in
    1948, when 5209 residents of Framingham, Mass,
    aged 28 to 62 years, were enrolled in a
    prospective epidemiologic cohort study.
  • Health and lifestyle factors were measured (blood
    pressure, weight, exercise, etc.).
  • Interim cardiovascular events were ascertained
    from medical histories, physical examinations,
    ECGs, and review of interim medical record.

29
Example 2 Johns Hopkins Precursors
Study(medical students 1948 through 1964)
http//www.jhu.edu/jhumag/0601web/study.html
From the John Hopkins Magazine website (URL
above).
30
Cohort Studies
Disease
Disease-free
Target population
Disease
Disease-free
TIME
31
The Risk Ratio, or Relative Risk (RR)
32
Hypothetical Data

33
Advantages/LimitationsCohort Studies
  • Advantages
  • Allows you to measure true rates and risks of
    disease for the exposed and the unexposed groups.
  • Temporality is correct (easier to infer cause and
    effect).
  • Can be used to study multiple outcomes.
  • Prevents bias in the ascertainment of exposure
    that may occur after a person develops a disease.
  • Disadvantages
  • Can be lengthy and costly! 60 years for
    Framingham.
  • Loss to follow-up is a problem (especially if
    non-random).
  • Selection Bias Participation may be associated
    with exposure status for some exposures

34
Case-Control Studies
  • Sample on disease status and ask retrospectively
    about exposures (for rare diseases)
  • Marginal probabilities of exposure for cases and
    controls are valid.
  • Doesnt require knowledge of the absolute risks
    of disease
  • For rare diseases, can approximate relative risk

35
Case-Control Studies
Exposed in past
  • Disease
  • (Cases)

Not exposed
Target population
Exposed
No Disease (Controls)
Not Exposed
36
Example the AIDS epidemic in the early 1980s
  • Early, case-control studies among AIDS cases and
    matched controls indicated that AIDS was
    transmitted by sexual contact or blood products.
  • In 1982, an early case-control study matched AIDS
    cases to controls and found a positive
    association between amyl nitrites (poppers) and
    AIDS odds ratio of 8.6 (Marmor et al. 1982).
    This is an example of confounding.

37
Case-Control Studies in History
  • In 1843, Guy compared occupations of men with
    pulmonary consumption to those of men with other
    diseases (Lilienfeld and Lilienfeld 1979).
  • Case-control studies identified associations
    between lip cancer and pipe smoking (Broders
    1920), breast cancer and reproductive history
    (Lane-Claypon 1926) and between oral cancer and
    pipe smoking (Lombard and Doering 1928). All
    rare diseases.
  • Case-control studies identified an association
    between smoking and lung cancer in the 1950s.

38
Case-control example
  • A study of the relation between body mass index
    and the incidence of age-related macular
    degeneration (Moeini et al. Br. J. Ophthalmol,
    2005).
  • Methods Researchers compared 50 Iranian patients
    with confirmed age-related macular degeneration
    and 80 control subjects with respect to BMI,
    smoking habits, hypertension, and diabetes. The
    researchers were specifically interested in the
    relationship of BMI to age-related macular
    degeneration.

39
Results
Table 2  Comparison of body mass index (BMI) in
case and control groups

40
Corresponding 2x2 Table
50
80
What is the risk ratio here? Tricky There is no
risk ratio, because we cannot calculate the risk
of disease!!
41
The odds ratio
  • We cannot calculate a risk ratio from a
    case-control study.
  • BUT, we can calculate a measure called the odds
    ratio

42
Odds vs. Risk
If the risk is Then the odds are
½ (50)
¾ (75)
1/10 (10)
1/100 (1)
11
31
19
199
Note An odds is always higher than its
corresponding probability, unless the probability
is 100.
43
The Odds Ratio (OR)
abcases
cdcontrols
44
The Odds Ratio (OR)
45
Proof via Bayes Rule (optional)


46
The Odds Ratio (OR)
47
The Odds Ratio (OR)
48
The Odds Ratio (OR)
Can be interpreted as Overweight people have a
43 decrease in their ODDS of age-related macular
degeneration. (not statistically significant
here)
49
The odds ratio is a good approximation of the
risk ratio if the disease is rare.
If the disease is rare (affecting lt10 of the
population), then
WHY? If the disease is rare, the probability of
it NOT happening is close to 1, and the odds is
close to the risk. Eg
50
The rare disease assumption
51
The odds ratio vs. the risk ratio
Rare Outcome
1.0 (null)
Common Outcome
1.0 (null)
52
When is the OR is a good approximation of the RR?
53
Advantages/LimitationsCase-control studies
  • Advantages
  • Cheap and fast
  • Efficient for rare diseases
  • Disadvantages
  • Getting comparable controls is often tricky
  • Temporality is a problem (did risk factor cause
    disease or disease cause risk factor?
  • Recall bias

54
Inferences about the odds ratio
55
Properties of the OR (simulation)
(50 cases/50 controls/20 exposed)
If the Odds Ratio1.0 then with 50 cases and 50
controls, of whom 20 are exposed, this is the
expected variability of the sample OR?note the
right skew
56
Properties of the lnOR
57
Hypothetical Data
30
30
58
When can the OR mislead?
59
ExampleDoes dementia predict death?
  • Dementia The leading predictor of death in a
    defined elderly population. Neurology 2004 62
    1156-1162
  • Among patients with dementia 291/355 (82) died
  • Among patients without dementia 947/4328 (22)
    died

60
Dementia study
  • Authors report OR 16.23 (12.27, 21.48)
  • But the RR 3.72
  • Fortunately, they do not dwell on the OR, but it
    could mislead if not interpreted correctly

61
Better to give OR or RR?
From an RCT (prospective!) of a new diet drug,
the authors showed the following table
62
Better to give OR or RR?
63
Summary of statistical tests for contingency
tables
Table Size Test or measures of association
2x2 risk ratio (cohort or cross-sectional studies) odds ratio (case-control studies) Chi-square difference in proportions Fishers Exact test (cell size less than 5)
RxC Chi-square Fishers Exact test (expected cell size lt5)
64
Fishers Exact Test
65
Fishers Tea-tasting experiment
Claim Fishers colleague (call her Cathy)
claimed that, when drinking tea, she could
distinguish whether milk or tea was added to the
cup first. To test her claim, Fisher designed
an experiment in which she tasted 8 cups of tea
(4 cups had milk poured first, 4 had tea poured
first). Null hypothesis Cathys guessing
abilities are no better than chance. Alternatives
hypotheses Right-tail She guesses right more
than expected by chance. Left-tail She guesses
wrong more than expected by chance
66
Fishers Tea-tasting experiment
Experimental Results
67
Fishers Exact Test
Step 1 Identify tables that are as extreme or
more extreme than what actually happened Here
she identified 3 out of 4 of the
milk-poured-first teas correctly. Is that good
luck or real talent? The only way she could have
done better is if she identified 4 of 4 correct.
68
Fishers Exact Test
Step 2 Calculate the probability of the tables
(assuming fixed marginals)
69
Step 3 to get the left tail and right-tail
p-values, consider the probability mass
function Probability mass function of X, where
X the number of correct identifications of the
cups with milk-poured-first
SAS also gives a two-sided p-value which is
calculated by adding up all probabilities in the
distribution that are less than or equal to the
probability of the observed table (equal or more
extreme). Here 0.229.014.0.229.014 .4857
70
Summary of statistical tests for contingency
tables
Table Size Test or measures of association
2x2 risk ratio (cohort or cross-sectional study) odds ratio (case-control study) Chi-square difference in proportions Fishers Exact test (cell size less than 5)
RxC Chi-square Fishers Exact test (expected cell size lt5)
Write a Comment
User Comments (0)
About PowerShow.com