Title: A short introduction to epidemiology Chapter 2b: Conducting a casecontrol study
1A short introduction to epidemiologyChapter 2b
Conducting a case-control study
- Neil Pearce
- Centre for Public Health Research
- Massey University
- Wellington, New Zealand
2Chapter 2 (additional material)Case-control
studies
- This presentation includes additional material on
conducting a case-control study - More information on data analysis is given in
chapter 9
3Chapter 2 (additional material)Case-control
studies
- Reasons for doing a case-control study
- Basic study design
- Selection of cases
- Selection of controls
- control sampling strategies
- sources of controls
- issues in control selection
4Birth
End of Follow up
Death other death lost to follow up
non-diseased symptoms severe disease
5A Hypothetical Incidence Study
6A Hypothetical Case-Control Study
- 1813/8187 a/c ad
- Odds ratio ---------------- ----- ----
- 952/9048 b/d bc
- 1813/952 a/b ad
- ---------------- -----
---- 8187/9048 c/d bc
7Reasons for Doing a Case-Control Study
- It may be inefficient to have to obtain exposure
information on all people in the source
population - It is sufficient to obtain information on all of
the 2,765 deaths and a control sample (e.g. 2,765
controls) of the 17,235 survivors - We therefore only need to get exposure
information on 5,530 people instead of 20,000 - This gain in efficiency is much greater when the
disease is rare (e.g. if it were 1/10th as
common then we would have 277 cases and 277
controls)
8Reasons for Doing a Case-Control Study
- Rare disease
- long induction time
- smaller study size permits collection and
analysis of more detailed exposure information - cohort difficult to enumerate (registry-based
studies)
9Chapter 2 (additional material)Case-control
studies
- Reasons for doing a case-control study
- Basic study design
- Selection of cases
- Selection of controls
- control sampling strategies
- sources of controls
- issues in control selection
10Basic Case-Control Study Design
- Every study is based on a particular source
population followed over a particular period of
time (the risk period) - Ideally the study base should be made explicit
- We study all cases of the outcome and a sample of
controls drawn from the source population - The case-control design thus involves all of the
potential biases involved in a full cohort study,
as well as additional biases involved in sampling
controls - Information bias is not an inherent feature of
such studies
11Cohort-Based (Nested) Case-Control Studies
- Enumerate the cohort (source population) and its
experience over time (the risk period) - Ascertain all cases generated by this study base
- Sample controls from the person-time (or persons)
that generated the cases
12Registry-Based Case-Control Studies
- Ascertain all cases appearing in the registry
during a specified period of time - Sample controls from the source population for
the registry
13Chapter 2 (additional material)Case-control
studies
- Reasons for doing a case-control study
- Basic study design
- Selection of cases
- Selection of controls
- control sampling strategies
- sources of controls
- issues in control selection
14Selection of Cases
- Cohort-based
- All cases (or deceased cases) generated by the
cohort study - Living cases may be added from other sources
(e.g. hospital records, cancer registrations) - Registry-based
- All eligible cases appearing in the
registryduring a specified period of time
15Chapter 2 (additional material)Case-control
studies
- Reasons for doing a case-control study
- Basic study design
- Selection of cases
- Selection of controls
- control sampling strategies
- sources of controls
- issues in control selection
16Control Sampling Strategies
- Cumulative incidence sampling
- Case-base sampling
- Density sampling
17Birth
End of Follow up
Death other death lost to follow up
non-diseased symptoms severe disease
18A Hypothetical Incidence Study
19Cumulative Incidence Sampling
- Traditional method of control selection in
nested case-control studies - Controls are sampled from the non-cases, those
free of disease at the end of the follow-up
period, i.e. the survivors - I.e. controls are sampled from the denominators
for (cohort) odds ratio analyses
20A Hypothetical Incidence Study
21A Hypothetical Case-control Study
22Cumulative Incidence Sampling
- Estimates the (cohort) odds ratio (without any
rare disease assumption) - Estimates the risk ratio and rate ratio
approximately (with a rare disease assumption) - May involve matching on age, etc
- Exposure is usually only considered up to the
time (year or age) that the case occurred
23Case-cohort Sampling
- Controls can be selected from those at risk at
the beginning of the follow-up period, I.e. from
the entire source population - I.e. controls are selected from the denominators
for (cohort) risk ratio analyses
24A Hypothetical Incidence Study
25A Hypothetical Case-control Study
26Case-cohort Sampling
- Estimates the risk ratio (without any rare
disease assumption) - Requires minor modifications to the standard
formulas for confidence intervals and p-values - May involve matching on age, etc
- Once again, exposure is usually only considered
up until the time that the case occurred
27Birth
End of Follow up
Death other death lost to follow up
non-diseased symptoms severe disease
28Density Sampling
- Controls are selected longitudinally throughout
the course of the study, i.e. from the
person-time of the study base - I.e. controls are sampled from the denominators
for the rate ratio analyses - In general, controls are selected from the risk
set of persons at risk at the time that each
case occurred
29A Hypothetical Incidence Study
30A Hypothetical Case-control Study
31Density Sampling
- The time variable is usually taken to be age
rather than calendar time (year) - Estimates the rate ratio (without any rare
disease assumption) - Matching may also be done on other time-related
factors, although this is usually not necessary - Usual method of sampling in registry-based
studies
32Selecting Controls
- Cohort-based studies
- Sample of the cohort (preferably by density
sampling on age) - Registry-based studies
- Sample of the source population for the Registry
(usually by density sampling on year, perhaps
with matching on age)
33Selecting Controls in Registry-Based Studies
- Cases chosen from all lung cancer cases at
hospitals in the City - Controls chosen from general population of the
City?
34Selecting Controls in Registry-Based Studies
- All lung cancer cases at all hospitals in the
City - Controls chosen from general population of the
City? - Restrict cases to those living in the City
(exclude those who have come to the City for
treatment) - Restrictions that apply to one group (e.g. having
a telephone, being on Electoral Roll, having
health insurance) should also be applied to the
other
35Selecting Controls in Registry-Based Studies
- Cases chosen from all lung cancer cases at the
main hospital in the City - What is the source population for these cases?
36Selecting Controls in Registry-Based Studies
- Cases chosen from all lung cancer cases at the
main hospital in the City - What is the source population for these cases?
- All those who would have come to the main
hospital in the City for treatment if they had
developed lung cancer
37Issues in Control Selection
- Controls are usually sampled at random from the
entire study base - However, it is sometimes desirable to restrict
the controls to a sample of a subset of the study
base - In particular, we may select controls from
persons with other diseases generated by the same
study base (e.g. other deaths, other cancers,
other hospital admissions)
38Other Disease Controls
- All other diseases
- All other diseases except those known to be
related to exposure - A disease known to be unrelated to exposure
39Reasons for Using Other Disease Controls
- The cohort (source population) is not enumerated
- E.g. if the cases are identified from hospital
admissions (e.g. for lung cancer) then the study
base is all persons who would have been admitted
to this hospital if they had developed lung
cancer - Controls might be selected from other admissions
to the same hospital
40Reasons for Using Other Disease Controls
- Comparability of information
- E.g. in a case-control study of non-Hodgkins
lymphoma and pesticide exposure, cases might be
more likely to recall brief exposures - We might therefore select controls from other
cancer registrations rather than from the entire
source population for the Cancer Registry
41Selection Bias in Case-Control Studies
- In a case-control study, the controls are a
sample of the source population - Selection bias can occur if the sample is
non-random, and the selection of controls is
related to exposure status - In other words, selection bias can occur if the
controls are not representative of the exposure
in the source population
42Selection Bias in Case-Control Studies Solutions
- Selection bias can occur if the selection of
controls is related to exposure status - In the analysis, we can control for the
determinants of control selection (e.g. social
class) - An exception is when we have chosen other
disease controls and the other diseases are
directly caused by the main exposure of interest
this selection bias cannot be removed
43General Population and Other Cancer Controls
- General population
- Represents study base
- May be more prone to recall bias if cases are
more likely to recall exposures - Difficult to keep interviewer blind, and may get
interviewer bias
- Other cancers
- Other diseases may be caused by exposure
(selection bias) - Equal motivation and recall in cases and
controls - Easier to keep interviewer blind
44Reasons for Matching
- Practical efficiency
- e.g. if we are using hospital controls then it is
usually more efficient to select a control
admitted on the same day as the case, rather than
sampling at random from all admissions for the
year
45Reasons for Matching
- Statistical efficiency
- e.g. if we select general population controls at
random in a lung cancer case-control study then
the cases will be mostly old and the controls
will be mostly young. It will therefore be
difficult to stratify on, and control for, age
46Reasons for Not Matching
- Practical efficiency
- matching can be costly and time-consuming and is
usually not necessary since we can adjust for the
major matching factors (e.g. age, gender, smoking
status) in the analysis
47Reasons for Not Matching
- Statistical efficiency
- Matching on a weak risk factor (or a non-risk
factor) that is strongly correlated with the main
exposure can dramatically reduce efficiency
48Matching
- Only match on risk factors that are
- Not of intrinsic interest in themselves (e.g.
age) - Strong risk factors for disease
- Not too difficult to match on
49Common misconceptions about case-control studies
- Fundamentally different type of study design that
proceeds from disease to exposure (I.e. reverse
causality) - Inherently less valid (more biased) than cohort
studies - Require a rare-disease assumption
- Odds ratio only approximates the rate ratio or
risk ratio (under the rare disease assumption)
50A short introduction to epidemiologyChapter 2b
Conducting a case-control study
- Neil Pearce
- Centre for Public Health Research
- Massey University
- Wellington, New Zealand