Capture recapture analysis ??-????? - PowerPoint PPT Presentation

1 / 84
About This Presentation
Title:

Capture recapture analysis ??-?????

Description:

Capture recapture analysis - Keith Sabin, PhD, MPH DHHS/CDC/GAP What is it for? ? Capture-recapture analysis is used for counting the ... – PowerPoint PPT presentation

Number of Views:328
Avg rating:3.0/5.0
Slides: 85
Provided by: LisaJ152
Category:

less

Transcript and Presenter's Notes

Title: Capture recapture analysis ??-?????


1
Capture recapture analysis??-?????
  • Keith Sabin, PhD, MPH
  • DHHS/CDC/GAP

2
What is it for?????
  • Capture-recapture analysis is used for counting
    the total number of people in a population using
    two or more incomplete lists of those people
  • ??-???????????????????????????
  • Why should I be interested???????????
  • Evaluating surveillance systems ??????
  • Magnitude of issues ?????

3
Overview??
  • Origin of method?????
  • Application to epidemiology - why is it useful
    for us? ?????-?????????
  • Principles??
  • Conditions for using capture-recapture
    methods????-??????
  • Methods??
  • Two sources????
  • Multiple sources????
  • Limitations????

4
Origins of capture-recapture analysis??-?????????
?
  • Origins in demography??????
  • 1662 - used to estimate the population of London
  • 1662???????????
  • 1783 - Laplace used to estimate population of
    France
  • 1783?laplace?????????
  • 1949 - Sekar and Deming used to estimate birth
    rate and mortality in India
  • 1949?Sekar?Deming????????????
  • Subsequently most often for estimating wildlife
    populations
  • ????????????
  • More recently applied to epidemiology (Wittes
    1968)
  • ?????????

5
Application of capture-recapture analysis to
human epidemiology??-????????????????
  • Evaluating completeness of a surveillance source
    ??????????
  • Passive surveillance????
  • Registers??
  • Refining incidence and prevalence estimates from
    surveillance systems or population
    surveys???????????????????????
  • Used for cancers, stroke, homelessness, mental
    illness, drug use, congenital disorders,
    infections??????????????????????????????

6
Principles??
  • Two or more sources (lists) of cases a given
    disease
  • ??????????????
  • Sources considered random capture samples in
    population
  • ??????????????????
  • Cases can be matched by unique identifiers
  • ?????????????
  • Estimate total number of cases that are not
    captured by any source from the matched and
    unmatched??????????(??????)???????

7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Critical assumptions/conditions????/??
  • 1. Population is closed ??????
  • methods exist for open populations??????????
  • 2. Individuals captured on both occasions can be
    matched?????????????
  • 3. Capture in the second sample is independent
    of capture in the first????????????
  • 4. Probability of capture is homogeneous across
    individuals???????????????
  • Homogeneity of individuals????
  • Homogeneity of lists?????

14
Application to humans???????
  • Capture appearing on a list????????
  • re-capture linking by identifying individuals
    appearing on both lists by criteria name, date
    of birth etc
  • ??????????????????(??????)???
  • Trap fascination ????
  • if you feed the animal they are more likely to be
    caught again
  • ?????,???????????
  • laboratory confirmed cases are more likely to be
    reported in other systems?????????????????????
  • Trap avoidance ????
  • if you scare the animal they will avoid the trap
  • ??????,???????
  • a person cant appear on community injecting drug
    user registry if they are in prison
    ????????????????IDU????

15
????? ???
????? ???
????
????
??????? ????
16
Two sources??????
Source B
Source A
x12
x11
x21
x22?
1 included in source ?????? 2 not included in
source???????
17
Capture (Source A) and recapture (source
B)??(??A)????(??B)
18
Estimation??
  • If sources independent P(A if B) P(A if B-)
    ???????

19
Capture (Source A) recapture (Source B)
20
Estimation
  • Sensitivity of sources????????
  • If numbers in cells small, probability that x11
    0 is not zero ?
  • ??????, X110??????0

21
Conditions??
  • Same study period and area
  • ?????????
  • Closed population?????
  • All cases in any source are true cases
  • ??????????????
  • True matches are identified
  • ????????
  • Equal catchability??????????
  • Sources are independent????

22
Same study period and area??????????
  • Cases occur during the study period and in study
    area ??????????????
  • Different period of capture ???????
  • Probability of recapture ? ? x11 ? ?
    overestimates N
  • ???????? ? x11 ? ? ?? N

23
Closed population????
  • Nobody enters or leaves the population during the
    study period?????????????
  • No immigration, emigration, death??????????
  • ????Open population
  • Individuals captured in first sample cannot be
    captured in second
  • Probability of recapture ? ? x11 ? ?
    overestimates N

24
True cases?????
  • All cases in any source are true cases
  • ?????????????
  • False positive cases?????
  • Positive predictive value (PPV)?????lt 1
  • Overestimation of N1 or N2 ? overestimates N
  • ??N1?N2 ? ??N
  • Correction??
  • Take random sample of positive samples and
    verify???????????????
  • Estimate PPV and apply to formula
  • ??PPV??????

25
True matches?????
  • Matches and only matches are identified
  • ???????????
  • Ideally, unique identifier available (social
    security number, name, etc)?????????????(?????????
    )
  • Combination of criteria Name initials, age, sex,
  • True matches missed?????????
  • x11 ? ? overestimates N ??N
  • Wrong matches created????
  • x11 ? ? underestimates N ??N

26
Equal catchability?????????
  • For a given source, probability of capture is the
    same for all cases, although this probability may
    differ from one source to another???????,?????????
    ??????,????????????????????
  • Often not true for epidemiological datasets
  • ????????????????
  • Low or no probability of capture by any source
    (eg, IVDU, homeless, disease severity)???????????
    ?????????(??,IVDU,?????,????)
  • Disregarded in estimate ? underestimates N
  • ???????? ? ??N
  • Identify and exclude population outside of all
    sources
  • ???????????????????

27
Accounting for variable catchability?????????????
?
  • Stratify by factor introducing variable
    catchability??????????????????
  • Calculate estimates by strata???????

Stratum 1
Stratum 2
N ? Ni N1 N2
28
Sources are independent ()????
  • Being in one source does not influence the
    probability of being in the other
    source????????????????????????

OR gt 1 (positive dependence) d lt d ?
underestimates N OR lt 1 (negative dependence) d
gt d ? overestimates N
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Example
  • Estimation of number of IVDU in Bangkok in 1991
    (Maestro 1994) 1991???IVDU????
  • Two sources used??????
  • Methadone (April May 1991)???
  • Police arrests (June September 1991)????
  • Methadone ???? Need for drugs???? ? ? Probability
    of being arrested??????? ? negative
    dependence, over-estimation of N ??????,??N

33
Evaluation of source dependence
  • Two sources??????
  • Qualitative analysis of the notification process
    in each source i.e. there is no statistical
    method to allow for dependence for two
    sources?????????????,??????????????????????
  • Multiple (gt2) sources??????
  • Wittes method
  • Log-linear modelling

34
Behavioral Surveillance Using Respondent Driven
Sampling ??????????????
35
Presentation Outline??
  • Sampling methods for hard to reach populations
  • ??????????????
  • Description of RDS
  • RDS???
  • Lessons learned from Vietnam
  • ????????????

36
Probability Sampling ???? (Simple????,
Systematic??, Cluster??)
  • Gold Standard-Best methods for sampling
  • But, do not reach hidden populations
  • ???-???????,?????????
  • No sampling frame??????
  • Stigmatized???
  • Would need huge sample sizes in order to capture
    a hidden population ????????????????
  • Expensive??

37
Sampling Methods to Reach Hidden Populations
???????????
  • Time-Location (TLS), Venue-Based
  • ??????-?????
  • -Major Bias Only captures those who are
    visible
  • ???? ????????
  • Snowball???
  • -Major Bias Not representative of the
    population (tendency for in-group affiliation,
    volunteerism and masking)
  • ?????????(??????,???)

38
Background on RDSRDS??
  • Developed by D. Heckathorn and R. Broadhead with
    IDUs in Connecticut and in Yaroslavl, Russia
  • ?D. Heckathorn?R. Broadhead?
    ??Connecticut?????Yaroslavl?IDU?????
  • Sampling vs. Recruitment strategy?? vs ????
  • Different from other chain referral methods
    because it can give us point estimations with
    standard errors.???????????,???????????????

39
How RDS Works???RDS
  • Use of a dual system of recruitment through the
    use of incentives.
  • ????????,?????(????)
  • Use of recruitment quotas.
  • ??????
  • Use of peers to recruit peers.
  • ????????
  • Use of links between recruiters and recruits.
  • ???????????????

40
The Theory Behind RDSRDS?????
  • Uses prinicples of First Order Markov Theory
  • ??Markov????
  • Long referral chains ????
  • Final sample will be independent of those
    selected as seeds
  • ??????????????
  • Final sample will be similar to the population of
    the network from which you are recruiting
  • ?????????????????

41
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5
42
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5
43
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5
44
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5
45
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5
46
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5
47
A Long Referral Chain Jazz Musicians in New
York City
48
Selection of Seeds
49
Example in Hai Phong Vietnam???????
  • Final Sample size 420 IDUs in Hai Phong and
    Saigon 418 CSWs in Saigon and 220 in Hai Phong
    ?????? ??????420?IDU,???418?CSW,????220?CSW
  • Recruitment process????
  • 20 seeds selected by peer educators
  • ???????20???
  • Three coupons to each participant
  • ???????????
  • Participants asked to recruit their peers
  • ???????????
  • Time March June, 2004?? 2004?3-6?
  • Three sites (Hai Phong) Four sites (Saigon)
  • ??????????????

50
Eligibility Criteria????
  • CSWs
  • Women, 18 years or more, living or working in Hai
    Phong or Saigon??,18????,???????????
  • Has sold sex for money in the last 30 days
  • ????30??????
  • Has a green coupon (except seeds) ??????(????)
  • Has provided consent. ????
  • IDUs
  • Women (Saigon only) or Men,18 years or more,
    living in Hai Phong or Saigon
  • ?????,18????,???????,??????
  • Has injected drugs during the last 30 days
  • ???30???????
  • Has a yellow coupon (except seeds)??????(????)
  • Has provided consent.????

51
Coupon Front Side????
LIFE-GAP project For Your Health and
Safety Payment coupon Address____________________
____ Telephone___________________________________
(You can call to make an appointment in
advance) You will receive 15,000 VND for each
person who you recruit and enrolls into the study
(you may recruit up to 3 persons) ID number
Please call us in advance. You must present
this coupon for payment
52
Coupon Back Side????
53
Networks of CSWs in Hai Phong
54
A network in Hai Phong
Seed
55
Initial Lessons from Vietnam???????
  • Seeds should have high degree-initial focus group
    may be important?????????,??????????
  • No slow down mechanism to end RDS
  • ?????????RDS
  • Need for security-Interviewers have no choice of
    whom they interview
  • ????-???????????
  • Managing multiple sites can be difficult
  • ???????????
  • Managing coupon numbers??????
  • No way to control for those who recruit
    faster.???????????

56
Initial Lessons from Vietnam (Cont)
  • Difficult to discourage recruiters from selling
    coupons or giving them out in a non random way
  • ???????????????????????
  • Non response information difficult to obtain
  • (incentives picked up by friends, recruiters
    do not return for secondary incentive)
  • ?????????(???????,??????????)

57
Philosophical objection????????
  • Capture-recapture is fun, so it must be
    epidemiology! ??-???????,????????????!
  • But, as epidemiologists we are interested in
    ?????????,??????????
  • Time, place and person
  • Capture-recapture does not capture time - it is a
    static tool which relies on lists which
    correspond to prevalence of a chronic disease
    (e.g. diabetes) or long time periods for acute
    diseases (legionella)??-????????,
    ????????,???????(????)???????(???)???
  • Can be used for measuring broad trends by repeat
    analysis (Nardone et al Epidemiol Infect
    2003)??????????????

58
Practical limitations????????
  • Unique identifier has to match in all data
    sources
  • ????????????????????
  • This may contravene confidentiality
    laws??????????
  • Clever statistics cant correct bad data
  • ??????????????
  • Rubbish in, rubbish out. ???,???
  • For chronic and expensive diseases (eg diabetes)
    it may be better to carry out an expensive
    detailed survey than to use quick and dirty
    methods?????????(????),????????????????????????
  • it may be even more expensive to get it wrong.
  • ????,????

59
Extrapolation is based on assumptions
  • we are assuming that the model which describes
    the observed data also describes the count of the
    unobserved individuals. We have no way of
    checking this assumption. This is analogous to,
    and has the same dangers as fitting an arbitrary
    curve to a series of points (x,y), where xgt0,
    with the intention of estimating y at x0.
    .this is analogous to the position of those who
    automatically assume that the k samples in our
    problem are independent.
  • ?????????????????????????,??????????.?????????????
    (X,Y)?????????????,?Xgt0?,?X)??Y.????????????K???
    ???????????.
  • Fienberg, Biometrika 197259591-603

60
Conclusion??
  • If conditions are met??????
  • Potential to use multiple incomplete registers
    and to estimate population size by
    capture-recapture???????????????,???-????????????
  • Cheaper than exhaustive registers???????????
    ?????? ???????????
  • Two sources??????
  • Impossible to quantify extent of dependence
  • Requires third source
  • Multiple sources
  • Log-linear modelling method of choice
  • Can adjust for dependence and variable
    catchability

61
Caveats??
  • Use technique but be careful!????????
  • Dont treat this as a black box method
    ???????????
  • All prior knowledge should be used to formulate
    the model??????????????
  • Know your data!??????
  • Not the solution to all problems
  • Conditions often not met when applied to
    epidemiology
  • There may still be heterogeneity you dont
    understand
  • Complementary technique

62
References
  • Wittes JT, Colton T and Sidel VW.
    Capture-recapture models for assessing the
    completeness of case ascertainment using multiple
    information sources. J Chronic diseases
    19742725-36.
  • Hook EB, Regal RR. Capture-recapture methods in
    epidemiology. Methods and limitations.
    Epidemiologic Rev 1995 17(2) 243-264
  • International Working Group for Disease
    Monitoring and Forecasting. Am J Epidemiol.
    Capture-recapture and multiple-record systems
    estimation I History and theoretical
    development. 19951421047-58
  • International Working Group for Disease
    Monitoring and Forecasting. Am J Epidemiol.
    Capture-recapture and multiple-record systems
    estimation II Applications in human diseases.
    19951421059-68
  • LaPorte RE, Dearwater SR, Yue-Fang C et al.
    Efficiency and accuracy of disease monitoring
    systems Application of capture-recapture methods
    to injury monitoring. Am J Epidemiol
    19951421069-77

63
Recent examples of application to field
epidemiology
  • Legionnaires disease. Infuso et al
    Eurosurveillance 1998348-50 Nardone et al
    2003131647-54
  • Malaria. Van Hest et al. Epidemiol Infect 2002
    129371-7
  • Measles. Van den Hof et al Pediatr Inf Dis J
    2002 211146-50
  • Acute flaccid paralysis. Whitfield Bull WHO
    200280846-851
  • Pertussis deaths. Crowcroft et al Arch Dis Child
    200286336-8
  • Intussception after rotavirus vaccination.
    Verstraeten et al Am J Epidemiol
    20011541006-1012
  • Tuberculosis. Tocque et al Commun Dis Public
    Health 20014141-3
  • Salmonella outbreaks. Gallay et al Am J Epidemiol
    2000 152171-7
  • AIDS. Bernillon et al Int J Epidemiol
    200029168-174
  • Meningitis. Faustini et al. Eur J Epidemiol
    200016843-8

64
Special thanks to Nancy Crowcroft Health
Protection Agency London Many of the
capture-recapture analysis slides come directly
from her class at Epi-Et.
65
THANK YOU!
66
RDS Advantages
  • Ease of field operations
  • Little for formative research/mapping
  • Target members recruit for you
  • Reach less visible segment of population
  • Good external validity (found in other
    studies-still waiting to see in Vietnam)
  • Minimal number of additional questions needed
  • Computer software available
  • Lower Cost (Still waiting to see)

67
RDS Limitations
  • Population must be a network
  • Must be able to verify group membership
  • Must track links between recruiters and
    recruits-coupon management
  • Incentives
  • Very difficult to deal with selective non
    response bias.

68
Option 1 Use RDS with Institutional Data
  • Capture-recapture requires two samples of the
    population, only one of which need be
    representative.
  • If an institutional database is available, only a
    single number is required to recapture the
    population.
  • Example of Registered NEP members

69
Example of Capture-Recapture
  • Capture During the study period, police recorded
    contacts with 86 injectors. The detective who
    provided this information said he was confident
    that this is almost all the shooters in town.
  • Recapture During the study period, 388 were
    interviewed using RDS.
  • Overlap 32 respondents were in both the police
    and the RDS samples.
  • Estimated population size

70
Estimating the Number of Jazz Musicians in NYC
using the Logic of Capture/Recapture
  • Capture Proportion of NYC musician union members
    who identified themselves as jazz musicians (in
    response to a union member survey) 70
    (415/592).
  • Number of musician union members in the New York
    metropolitan area, according to union records is
    10,499.
  • Therefore, the estimated number of union jazz
    musicians is 7,360 (10,499 x .70).
  • Recapture Proportion of all NYC jazz musicians
    who are union members according to a RDS study is
    22.
  • Using estimate of number of NYC union jazz
    musicians and estimated portion of all NYC jazz
    musicians who are union members, the size of the
    NYC jazz musician universe is 7,360/.223 33,003

71
Multiple sources
72
Wittes Method
  • Evaluate dependence among sources
  • Compare two-source estimates of N
  • If estimates different ?
  • Test of independence
  • Calculate odds ratios between cell counts of two
    sources within a third source
  • If OR ? 1 ? dependence
  • Merge dependent sources
  • Repeat calculation of estimates with merged source

73
Test of independence
A
B
a
b
f
c
d
e
g
C
74
Test of independence
A
B
a
b
f
c
d
e
OR cg/de
g
C
OR 1 ? independence OR gt 1 ? positive
dependence ? underestimation of N OR lt 1 ?
negative dependence ? overestimation of N
75
Test of independence
  • To solve, have to assume highest order
    interaction0
  • i.e. the chance of being in all the lists (in c)
    is a simple function of the chance of being on
    any single or list of lesser combination
  • Or, there is nothing special about c

A
B
a
b
f
c
d
e
g
C
76
Log-linear modeling - General
  • Analyze relationship between categorical
    variables in a contingency table
  • Logarithm of expected frequency of a cell
    expressed as linear function of effects for each
    cell and interaction term
  • For 3 variables A with i levels, B with j levels,
    C with k levels, logarithm of expected frequency
    of cell Fijk for cell ijk is

? main effect ?A first order effect ?AB second
order effect (interaction)
77
Log-linear modeling - CRM
  • Estimates value of a missing cell in a 2k
    contingency table
  • k number of sources
  • Missing cell number of cases not listed by any
    source (m222)

78
Log-linear modeling
  • No interaction sources are independent (1 model)
  • Interaction between 2 sources only (3 models)
  • Interactions between pairs of sources (3 models)
  • Interactions between all sources 2 by 2 (1 model)

79
How to chose the best model
  • Aim
  • Best fit of observed data with least number of
    interaction terms
  • Principle of parsimony
  • Strategy
  • Start with saturated model (all interactions
    accounting for all potential dependency)
  • Remove interaction terms in stepwise fashion
    based on likelihood ratio statistic G2

80
Evaluation of Legionella notification system,
France 1995
  • Mandatory notification system
  • Implemented 1987
  • Clinician report
  • No validation, little feedback
  • 60 cases per year (average)
  • Re-organisation defined as priority

? Evaluate sensitivity of system using
Capture-recapture
81
Three sources
  1. Notification system (NS)
  2. National Reference Laboratory (NRL)
  3. Confirmation of diagnosis, typing of strains
  4. gt200 diagnoses per year
  5. Hospital Laboratories (HL)
  6. Survey among all hospital bacteriology
    laboratories (n432)
  7. 357 cases identified in 1995

82
Distribution of case reports by source
NS Notification system NRL National Reference
Laboratory HL Hospital Laboratories
83
Two-source estimates
  • Two-source estimates
  • Tests of independence (Wittes)
  • Merge NS/NLR into one source

NS/NRL 389 cases NS/HL 615 cases HL/NRL
715 cases
NS?NRL / HL 528 495561 cases
84
How many deaths from pertussis in England 1994-9?
Official statistics (ONS) 18 Deaths in
hospital (HES) 9 Laboratory surveillance (ES)
22 Total 33 deaths observed
Estimated true number of deaths 46
(37-71) Official statistics 18/33 (54) observed
or 18/46 (39) estimated
85
What is the sensitivity of hepatitis A
surveillance in England?
  • Known under-reporting of cases
  • Known failure to report risk factor data
  • Under-ascertainment of outbreaks in injecting
    drug users
  • Evaluation of surveillance system

86
What is the sensitivity of hepatitis A
surveillance in England?
Write a Comment
User Comments (0)
About PowerShow.com