Title: Review of observational study design and basic statistics for contingency tables
1Review of observational study design and basic
statistics for contingency tables
2(No Transcript)
3(No Transcript)
4- Coffee Chronicles
- BY MELISSA AUGUST, ANN MARIE BONARDI, VAL
CASTRONOVO, MATTHEW - JOE'S BLOWS Last week researchers reported that
coffee might help prevent Parkinson's disease. So
is the caffeine bean good for you or not? Over
the years, studies haven't exactly been clear
-
- According to scientists, too much coffee may
cause... - 1986 --phobias, --panic attacks
- 1990 --heart attacks, --stress, --osteoporosis
- 1991 -underweight babies, --hypertension
- 1992 --higher cholesterol
- 1993, 08 --miscarriages
- 1994 --intensified stress
- 1995 --delayed conception
- But scientists say coffee also may help
prevent... - 1988 --asthma
- 1990 --colon and rectal cancer,...
- 2004Type II Diabetes (6 cups per day!)
- 2006alcohol-induced liver damage
- 2007skin cancer
5Medical Studies
The General Idea
Evaluate whether a risk factor (or preventative
factor) increases (or decreases) your risk for an
outcome (usually disease, death or intermediary
to disease).
6Observational vs. Experimental Studies
Observational studies the population is
observed without any interference by the
investigator
Experimental studies the investigator tries to
control the environment in which the hypothesis
is tested (the randomized, double-blind clinical
trial is the gold standard)
7Limitation of observational research confounding
- Confounding risk factors dont happen in
isolation, except in a controlled experiment. - Example In a case-control study of a salmonella
outbreak, tomatoes were identified as the source
of the infection. But the association was
spurious. Tomatoes are often eaten with serrano
and jalapeno peppers, which turned out to be the
true source of infection. - Example Breastfeeding has been linked to higher
IQ in infants, but the association could be due
to confounding by socioeconomic status. Women who
breastfeed tend to be better educated and have
better prenatal care, which may explain the
higher IQ in their infants.
8Confounding A major problem for observational
studies
9Why Observational Studies?
- Cheaper
- Faster
- Can examine long-term effects
- Hypothesis-generating
- Sometimes, experimental studies are not ethical
(e.g., randomizing subjects to smoke)
10Possible Observational Study Designs
- Cross-sectional studies
- Cohort studies
- Case-control studies
11Cross-Sectional (Prevalence) Studies
- Measure disease and exposure on a random sample
of the population of interest. Are they
associated? - Marginal probabilities of exposure AND disease
are valid, but only measures association at a
single time point.
12The 2x2 Table
N
13Example cross-sectional study
- Relationship between atherosclerosis and
late-life depression (Tiemeier et al. Arch Gen
Psychiatry, 2004). - Methods Researchers measured the prevalence of
coronary artery calcification (atherosclerosis)
and the prevalence of depressive symptoms in a
large cohort of elderly men and women in
Rotterdam (n1920).
14Example cross-sectional study
P(D) Prevalence of depression (sub-thresshold
or depressive disorder) (20131291116)/1920
4.2
P(E) Prevalence of atherosclerosis (coronary
calcification gt500) (5111216)/1920 28.1
15The 2x2 table
P(depression) 81/1920 4.2
P(atherosclerosis) 539/1920 28.1
P(depression/atherosclerosis) 28/539 5.2
16Difference of proportions Z-test
17Or, use relative risk (risk ratio)
Interpretation those with coronary calcification
are 37 more likely to have depression (not
significant).
18Or, use chi-square test
Observed
Expected
19Chi-square test
Note 1.77 1.332
20Chi-square test also works for bigger contingency
tables (RxC)
21Chi-square test also works for bigger contingency
tables (RxC)
Coronary calcification No depression Sub-threshhold depressive symptoms Clinical depressive disorder
0-100 865 20 9
101-500 463 13 11
gt500 511 12 16
22Observed
Expected
Coronary calcification No depression Sub-threshhold depressive symptoms Clinical depressive disorder
0-100 865 20 9 894
101-500 463 13 11 487
gt500 511 12 16 539
1839 45 36 1920
Coronary calcification No depression Sub-threshhold depressive symptoms Clinical depressive disorder
0-100 8941839/1920 856.3 84945/1920 21 894-(21856.3)16.7
101-500 4871839/1920 466.5 48745/1920 11.4 487-(466.511.4)9.1
gt500 1839-(856.3466.5) 516.2 45-(2111.4) 12.6 36-(16.79.1) 10.2
23Chi-square test
24Cause and effect?
depression in elderly
atherosclerosis
25Confounding?
depression in elderly
atherosclerosis
26Cross-Sectional Studies
- Advantages
- cheap and easy
- generalizable
- good for characteristics that (generally) dont
change like genes or gender - Disadvantages
- difficult to determine cause and effect
- problematic for rare diseases and exposures
272. Cohort studies
- Sample on exposure status and track disease
development (for rare exposures) - Marginal probabilities (and rates) of developing
disease for exposure groups are valid.
28Example The Framingham Heart Study
- The Framingham Heart Study was established in
1948, when 5209 residents of Framingham, Mass,
aged 28 to 62 years, were enrolled in a
prospective epidemiologic cohort study. - Health and lifestyle factors were measured (blood
pressure, weight, exercise, etc.). - Interim cardiovascular events were ascertained
from medical histories, physical examinations,
ECGs, and review of interim medical record.
29Example 2 Johns Hopkins Precursors
Study(medical students 1948 through 1964)
http//www.jhu.edu/jhumag/0601web/study.html
From the John Hopkins Magazine website (URL
above).
30Cohort Studies
Disease
Disease-free
Target population
Disease
Disease-free
TIME
31The Risk Ratio, or Relative Risk (RR)
32Hypothetical Data
33Advantages/LimitationsCohort Studies
- Advantages
- Allows you to measure true rates and risks of
disease for the exposed and the unexposed groups. - Temporality is correct (easier to infer cause and
effect). - Can be used to study multiple outcomes.
- Prevents bias in the ascertainment of exposure
that may occur after a person develops a disease. - Disadvantages
- Can be lengthy and costly! 60 years for
Framingham. - Loss to follow-up is a problem (especially if
non-random). - Selection Bias Participation may be associated
with exposure status for some exposures
34Case-Control Studies
- Sample on disease status and ask retrospectively
about exposures (for rare diseases) - Marginal probabilities of exposure for cases and
controls are valid. - Doesnt require knowledge of the absolute risks
of disease - For rare diseases, can approximate relative risk
35Case-Control Studies
Exposed in past
Not exposed
Target population
Exposed
No Disease (Controls)
Not Exposed
36Example the AIDS epidemic in the early 1980s
- Early, case-control studies among AIDS cases and
matched controls indicated that AIDS was
transmitted by sexual contact or blood products. - In 1982, an early case-control study matched AIDS
cases to controls and found a positive
association between amyl nitrites (poppers) and
AIDS odds ratio of 8.6 (Marmor et al. 1982).
This is an example of confounding.
37Case-Control Studies in History
- In 1843, Guy compared occupations of men with
pulmonary consumption to those of men with other
diseases (Lilienfeld and Lilienfeld 1979). - Case-control studies identified associations
between lip cancer and pipe smoking (Broders
1920), breast cancer and reproductive history
(Lane-Claypon 1926) and between oral cancer and
pipe smoking (Lombard and Doering 1928). All
rare diseases. - Case-control studies identified an association
between smoking and lung cancer in the 1950s.
38Case-control example
- A study of the relation between body mass index
and the incidence of age-related macular
degeneration (Moeini et al. Br. J. Ophthalmol,
2005). - Methods Researchers compared 50 Iranian patients
with confirmed age-related macular degeneration
and 80 control subjects with respect to BMI,
smoking habits, hypertension, and diabetes. The
researchers were specifically interested in the
relationship of BMI to age-related macular
degeneration.
39Results
Table 2 Comparison of body mass index (BMI) in
case and control groups
40Corresponding 2x2 Table
50
80
What is the risk ratio here? Tricky There is no
risk ratio, because we cannot calculate the risk
of disease!!
41The odds ratio
- We cannot calculate a risk ratio from a
case-control study. - BUT, we can calculate a measure called the odds
ratio
42Odds vs. Risk
If the risk is Then the odds are
½ (50)
¾ (75)
1/10 (10)
1/100 (1)
11
31
19
199
Note An odds is always higher than its
corresponding probability, unless the probability
is 100.
43The Odds Ratio (OR)
abcases
cdcontrols
44The Odds Ratio (OR)
45Proof via Bayes Rule (optional)
46The Odds Ratio (OR)
47The Odds Ratio (OR)
48The Odds Ratio (OR)
Can be interpreted as Overweight people have a
43 decrease in their ODDS of age-related macular
degeneration. (not statistically significant
here)
49The odds ratio is a good approximation of the
risk ratio if the disease is rare.
If the disease is rare (affecting lt10 of the
population), then
WHY? If the disease is rare, the probability of
it NOT happening is close to 1, and the odds is
close to the risk. Eg
50The rare disease assumption
51The odds ratio vs. the risk ratio
Rare Outcome
1.0 (null)
Common Outcome
1.0 (null)
52When is the OR is a good approximation of the RR?
53Advantages/LimitationsCase-control studies
- Advantages
- Cheap and fast
- Efficient for rare diseases
- Disadvantages
- Getting comparable controls is often tricky
- Temporality is a problem (did risk factor cause
disease or disease cause risk factor? - Recall bias
54Inferences about the odds ratio
55Properties of the OR (simulation)
(50 cases/50 controls/20 exposed)
If the Odds Ratio1.0 then with 50 cases and 50
controls, of whom 20 are exposed, this is the
expected variability of the sample OR?note the
right skew
56Properties of the lnOR
57Hypothetical Data
30
30
58When can the OR mislead?
59ExampleDoes dementia predict death?
- Dementia The leading predictor of death in a
defined elderly population. Neurology 2004 62
1156-1162 - Among patients with dementia 291/355 (82) died
- Among patients without dementia 947/4328 (22)
died
60Dementia study
- Authors report OR 16.23 (12.27, 21.48)
- But the RR 3.72
- Fortunately, they do not dwell on the OR, but it
could mislead if not interpreted correctly
61Better to give OR or RR?
From an RCT (prospective!) of a new diet drug,
the authors showed the following table
62Better to give OR or RR?
63Summary of statistical tests for contingency
tables
Table Size Test or measures of association
2x2 risk ratio (cohort or cross-sectional studies) odds ratio (case-control studies) Chi-square difference in proportions Fishers Exact test (cell size less than 5)
RxC Chi-square Fishers Exact test (expected cell size lt5)
64Fishers Exact Test
65Fishers Tea-tasting experiment
Claim Fishers colleague (call her Cathy)
claimed that, when drinking tea, she could
distinguish whether milk or tea was added to the
cup first. To test her claim, Fisher designed
an experiment in which she tasted 8 cups of tea
(4 cups had milk poured first, 4 had tea poured
first). Null hypothesis Cathys guessing
abilities are no better than chance. Alternatives
hypotheses Right-tail She guesses right more
than expected by chance. Left-tail She guesses
wrong more than expected by chance
66Fishers Tea-tasting experiment
Experimental Results
67Fishers Exact Test
Step 1 Identify tables that are as extreme or
more extreme than what actually happened Here
she identified 3 out of 4 of the
milk-poured-first teas correctly. Is that good
luck or real talent? The only way she could have
done better is if she identified 4 of 4 correct.
68Fishers Exact Test
Step 2 Calculate the probability of the tables
(assuming fixed marginals)
69Step 3 to get the left tail and right-tail
p-values, consider the probability mass
function Probability mass function of X, where
X the number of correct identifications of the
cups with milk-poured-first
SAS also gives a two-sided p-value which is
calculated by adding up all probabilities in the
distribution that are less than or equal to the
probability of the observed table (equal or more
extreme). Here 0.229.014.0.229.014 .4857
70Summary of statistical tests for contingency
tables
Table Size Test or measures of association
2x2 risk ratio (cohort or cross-sectional study) odds ratio (case-control study) Chi-square difference in proportions Fishers Exact test (cell size less than 5)
RxC Chi-square Fishers Exact test (expected cell size lt5)