RECORD LINKAGE 201: VISION FOR DATA INTEGRATION TO ACTION AND IMPLEMENTATION - PowerPoint PPT Presentation

1 / 106
About This Presentation
Title:

RECORD LINKAGE 201: VISION FOR DATA INTEGRATION TO ACTION AND IMPLEMENTATION

Description:

RECORD LINKAGE 201: VISION FOR DATA INTEGRATION TO ACTION AND IMPLEMENTATION Russell S. Kirby, Ph.D., M.S., F.A.C.E. Department of Maternal and Child Health – PowerPoint PPT presentation

Number of Views:267
Avg rating:3.0/5.0
Slides: 107
Provided by: RUSS63
Category:

less

Transcript and Presenter's Notes

Title: RECORD LINKAGE 201: VISION FOR DATA INTEGRATION TO ACTION AND IMPLEMENTATION


1
RECORD LINKAGE 201VISION FOR DATA INTEGRATION
TO ACTION AND IMPLEMENTATION
  • Russell S. Kirby, Ph.D., M.S., F.A.C.E.
  • Department of Maternal and Child Health
  • School of Public Health
  • University of Alabama at Birmingham

2
Objectives
  • Place record linkage in a broad framework for
    planning, analysis, and public health action
  • Focus on key issues in planning, implementation,
    evaluation, and utilization of record linkage
    projects with administrative public health
    databases
  • Avoid falling asleep listening to a boring
    presentation right after lunch

3
(No Transcript)
4
What is Record Linkage?
  • If we assume there is a single record as well as
    a file of records and all records relate to some
    entities persons, businesses, addresses, etc . .
    . Record linkage is the operation that, using
    the identifying information contained in the
    single record, seeks another record in the file
    referring to the same entity.
  • Ivan Felligi, Statistics Canada

5
A Long History
  • Based on this definition, record linkage has been
    around for a long time!
  • In public health, modern methods date only back
    to the 1960s, and its broad use is truly a
    phenomenon of the 1990s into the present decade.

6
Population Health Informatics
  • Record linkage should not be undertaken as an end
    unto itself.
  • Rather, projects should be done within a broad
    informatics context, with scientifically sound
    strategies. Data quality issues should be a
    paramount concern at all steps in the record
    linkage process.
  • Ideally, record linkage should be done within the
    context of a theoretical framework and a research
    study design.

7
CONCEPTUAL FRAMEWORK FOR POPULATION HEALTH
Genetic Endowment
Physical Environment
Social Environment
  • Individual
  • Response
  • Behavior
  • Biology

Health Care
Disease
Prosperity
Well-Being
Traditional Medical Model of Health Care
Source modified from Evans RG, Barer ML, Marmor
TR, Eds, Why are some people healthy and others
not? New York Aldine de Gruyter, 1994
8
Maternal and Child Health
  • Most of our databases represent administrative
    data
  • Most of these data focus on aspects of disease
    processes or systems of care (traditional medical
    model)
  • While some of our databases are population-based,
    some are program-based (and by no means are all
    public health programs population-based)

9
COMPONENTS OF AN IDEALSTATEWIDE PERINATAL
DATABASE1. Linkages relating to the index
pregnancy
Maternity/ Newborn/ Postpartum Hospital Data
Death Certificates (linked to age 14)
Certificates of Live Birth and Fetal Death
Perinatal Risk Assessment (8, 20, 36 Wks)
NICU Discharge Data
Cancer Registry Cases (under age 15)
a.k.a. prenatal care data
MSAFP Data
Fetal/Infant Mortality Review
Child Fatality Review
Clinical Genetics Database
Newborn Screening Database
Birth Certificate Linkage
Blood Lead Screening Registry
Risk Assessment Linkage
Hospital/NICU Data Linkage
Birth to Three IDEA Part H
Infant Hearing Screening Registry
Death Certificate Linkage
Clinical Genetics Data Linkage
Screening Data Linkage
Immunization Database
MSAFP Data Linkage
R. S. Kirby, Version 5/30/02
BDS/High Risk Linkage
10
(No Transcript)
11
COMPONENTS OF AN IDEALSTATEWIDE PERINATAL
DATABASE2. Linkages across pregnancies
Birth Certificate of Mother
Birth Certificate of Index Child
a. Sibship studies involving risk factors from a
previous pregnancy, or prospective outcomes
conditional on the index pregnancy. This
can also apply to pedigrees, and to educational
records across family members. b.
Intergenerational effects of pregnancy
outcomes. c. Linkages within maternal sibships
across generations. d. These approaches apply
equally to hospital discharge data.
R. S. Kirby, Version 4/2/07
12
COMPONENTS OF AN IDEALSTATEWIDE PERINATAL
DATABASE3. Linkages between mother and pregnancy
Certificates of Live Birth and Fetal Death
Death Certificates
Hospital Discharge Survey
Routine linkage to identify maternal and
reproductive deaths among women of child-bearing
age (10 - 49), conducted among deaths
occurring at 42 or 90 days, or within one year of
termination of the index pregnancy. If
spontaneous abortion and/or induced termination
data are collected with personal identifiers,
these events should also be routinely linked with
death certificates, as should hospital discharge
(in-patient or emergency room) records for deaths
occuring to women with ICD-9-CM or CPT-4 codes
relating to reproductive health.
R. S. Kirby, Version 8/19/96
13
COMPONENTS OF AN IDEALSTATEWIDE PERINATAL
DATABASE4. Routine automated geocoding of
addresses to latitude-longitude coordinates
  • All vital statistics records should be geocoded
    by place of residence.
  • All health facilities should be geocoded by
    location.
  • For mortality and injury studies, data sufficient
    to identify the location where the death or
    injury occurred should be recorded in the
    documentation, and this location should also be
    geocoded.
  • Routine geocoding is an automated,
    computer-assisted process the time required to
    do it diminishes with the implementation of a
    prospective system in which address files are
    continually corrected and updated.

R. S. Kirby, Version 8/19/96
14
COMPONENTS OF AN IDEALSTATEWIDE PERINATAL
DATABASE5. Linkages for child health, growth
and development
Link with Birth Defects Surveillance
CSHCN Database
R. S. Kirby, Version 7/7/03
15
NICU Discharge Data
Hospital Discharge Data
Medicaid
Reports of Communicable Diseases
WIC
Newborn Metabolic Screening
DDS and CSHCN
Certificates Of Live Birth
Newborn Hearing Screening
Blood Lead Screening
Birth Defects Surveillance Data
Immunization Registry
Early Intervention (Birth to 3)
Child Abuse And Neglect/ Child Protective Services
16
Elvis Presley on Love
  • You dont know what youve got,
  • until you LOSE it . . .

17
Kirby on Data in Databases
  • You dont know what youve got,
  • until you USE it . . .

18
RECORD LINKAGEWho, What, Why, When, Where, How?
  • Which question is primary?

19
RECORD LINKAGE Why?
  • What is the purpose of the study?
  • Does a record linkage make sense?
  • would a simple numerator/denominator analysis
    suffice?
  • can the linkage be conducted in a manner that
    supports the use of the resultant database for
    other projects?
  • is a record linkage technically feasible?
  • is a record linkage necessary?

20
RECORD LINKAGE How?
  • Manual versus automated linkage
  • The theoretical basis for record linkage
  • deterministic methods
  • probabilistic methods
  • The need for identifiers
  • Record linkage with names and dates
  • Software buy specialized, use statistical
    software package, develop your own?
  • Statistical evaluation of linkage results is
    imperative, regardless of the method

21
RECORD LINKAGE Who?
  • What personnel should do the linkage?
  • dedicated linkage specialists?
  • statisticians/programmers/analysts?
  • Should linkage staff be subjected to personality
    profiles?
  • What cases/events qualify for the linkage?

22
RECORD LINKAGE What?
  • What databases should be linked?
  • What are the functional relationships between the
    records in each of the candidate datasets? Are
    they sufficient to answer the research question?
  • How does the linkage support the
    programmatic/research needs for which the linkage
    was proposed?
  • Is there a plan for data warehousing or
    systematic data integration?

23
RECORD LINKAGE Where?
  • Where should the linkage be done?
  • statistical agency?
  • epidemiological agency?
  • university research center?
  • contract to vendor?
  • Dont forget the importance of spatial
    identifiers
  • consider geocoding as another aspect of record
    linkage

24
RECORD LINKAGE When?
  • How often should linkages be done?
  • The periodicity of routine linkages is predicated
    on the programmatic need for timeliness and
    reporting, e.g.
  • infant deaths link immediately
  • hospital discharges and birth certificates
    quarterly or annually may be appropriate
  • linkages to support impassive case-finding
    registries periodicity defined by registry needs

25
With all this in mind . . .
  • Lets review some perspectives from the experts
    on how to do record linkage with public health
    databases.

26
TOP TEN LISTTEN BEST WAYS TO DO BAD PUBLIC
HEALTH RECORD LINKAGE
With apologies to David Letterman, and thanks for
editorial assistance to Elizabeth Kirby and for
their insights to the following Internet
contributors Kim Hauser, University of South
Florida Phil Klein, Wisconsin Department of
Workforce Development Richard Miller, Wisconsin
Bureau of Health Information Mark Fulcomer,
Pennsylvania Kate Kvale, Wisconsin Division of
Public Health Patrick Remington, Department of
Population Health, University of Wisconsin Russel
Rickard, Colorado Department of Health and
Environment Melissa Adams, University of Alabama
at Birmingham Phil Cross, NY State Congenital
Malformations Registry
R.S. Kirby, December 2002
27
Just have someone else do the linkage for you,
then use the dont ask, dont tell method
perfected by the military. That way, what don't
know doesn't hurt you!
  • -- Anonymous correspondent, summer of 2002

28
Top Ten List Ten Best Ways to Do Bad Public
Health Record Linkage
Number 10
All for one and one for all
Always trust the Social Security Number in the
database as the correct Social Security Number
for that individual. If there are duplicate
Social Security Numbers for obviously different
individuals (based on age, gender or other
conflicting information between persons),
randomly select just one. Use the latest state
lottery results to obtain the random numbers.
29
Top Ten List Ten Best Ways to Do Bad Public
Health Record Linkage
Number 9
The Shell Game
Change the linkage identifier every time you
recreate the data set. This keeps your data
users guessing, plus they can't refer to specific
records based on the linkage identifier. This
ensures confidentiality! If the vital statistics
agency refuses to allow birth certificate numbers
to be used, generate your own unique identifier
based on the records physical location in the
input file. Overwrite this field each time the
dataset is accessed. Compiling the final
analysis file should be a snap!
30
Top Ten List Ten Best Ways to Do Bad Public
Health Record Linkage
Number 8
A rose is a rose is a rose is . . .
It doesn't matter if you get twins matched
correctly across files, since they are identical
anyway. If subjects share a genotype, this
should entitle you to share a link.
31
Maam, you can have any color car you want, so
long as its black
  • -- Henry Ford, 1920s

32
Top Ten List Ten Best Ways to Do Bad Public
Health Record Linkage
Number 7
What you get is what you see
If a variable is listed in a data dictionary it
is safe to assume to you can use it for linking.
After all, it has always been collected, and in
exactly the same manner, for the time period and
geographical area related to your study. This
rule of thumb holds especially for
race/ethnicity, educational attainment, and all
disease, procedure, and billing fields.
33
Top Ten List Ten Best Ways to Do Bad Public
Health Record Linkage
Number 6
If it runs, dont fix it
Always strive to develop computer algorithms that
overmatch. High match percentages are impressive
and will also save staff time. Corollary There
should never be a need to physically examine any
of the source documents used in the linkage
process.
34
Top Ten List Ten Best Ways to Do Bad Public
Health Record Linkage
Number 5
What, me worry?
Checking for duplicate records just slows down
the process this is a step that can be
eliminated. Instead, simply verify that the
output dataset contains the same number of
records as the largest input file. Then, proceed
to conduct the analyses.
35
Top Ten List Ten Best Ways to Do Bad Public
Health Record Linkage
Number 4
The quality goes in, before the name
Don't bother to check for name changes. It
doesn't happen often enough to change your
statistics. This is especially true for women,
children who are adopted or in foster care, or
the rare family that speaks Spanish or other
languages, or comes from a culture where surnames
are listed first.
36
Top Ten List Ten Best Ways to Do Bad Public
Health Record Linkage
Number 3
Only you, and you alone . . .
There is only one valid and reliable record
linkage strategy your own. Never test or
evaluate it, and by all means never subject the
computer algorithm to scrutiny by others!
37
Top Ten List Ten Best Ways to Do Bad Public
Health Record Linkage
Number 2
Black and white, or shades of gray
Deterministic linkages must be correct after
all, they are based on EXACT matches. Why settle
for a complicated probabilistic matching
procedure, when you can be certain?
38
Top Ten List Ten Best Ways to Do Bad Public
Health Record Linkage
Number 1
Bread and Roses
Spend months of staff time discussing whether to
do record linkages. Be sure to include the
department attorneys, the HIPAA privacy
consultant, and the division directors for each
program dataset to be included. Assume that the
project will take weeks to a month at most, and
that once completed, next year it can be run as
an overnight computer job.
39
KEY ISSUES
  • Why link?
  • To link, or not to link? . . . or
  • I link, therefore I am?
  • Defining the nature of the problem
  • What is the purpose?
  • What do the records in each dataset represent?
  • What will we do with the results?

40
Why Link? (select the best answer)
  1. We cannot answer the research or policy question
    without linking the databases.
  2. We have to under the terms of our grant or
    cooperative agreement.
  3. Integrating record linkage into the routine data
    management process of our program enables us to
    assess the programs effectiveness and efficiency
    on a continual basis.

41
Why Not Link? (select the best answer)
  1. Lack of funding.
  2. Staff dont have training.
  3. Necessary hardware/software/data storage
    unavailable.
  4. Bureaucratic inertia.
  5. Turf battles between programs.
  6. Question doesnt warrant linkage.
  7. Some of the above?
  8. All of the above?

42
(No Transcript)
43
First Steps
  • Before conducting a record linkage, carefully
    examine the broad informatics, program and
    research context.
  • Above all else, consider the purpose of the
    linkage project in relation to the planned
    approach and other potential uses of the
    resulting linked dataset.
  • Hint if you only ask the people on your team
    about other potential uses, the uses identified
    will mostly be within the same frame of reference
    for your own approach.
  • Lets carefully explore the question of linking
    birth certificates and Medicaid pregnancy claims
    data.

44
What do the records represent?
  • Medicaid claims database
  • Pregnant women/mothers
  • Women who are not pregnant or may be pregnant
    (including the elderly)
  • Infants and children
  • Men (what a concept!)
  • Birth certificates
  • Live births
  • Fetal deaths

45
What do the records represent (continued)?
  • Some questions to consider
  • What records are included in the claims database?
    Are there systematic exclusions (e.g. global
    bills for Medicaid managed care recipients)?
    Does the database include only paid claims?
  • Are there records in the Medicaid database that
    may not represent prenatal services?
  • Are there potentially multiple records per
    patient in the claims database?
  • What is the purpose of the linkage?

46
What do the records represent (continued)?
  • Some questions to consider
  • Is the focus of the study on mothers, infants, or
    mother-infant dyads?
  • How do the concepts of residence and
    occurrence affect the likelihood that an event
    will be included in either database?
  • What is the relationship between Medicaid
    eligibility and utilization?
  • What a priori expectations are there concerning
    which records will and will not match?

47
Some possible purposes of the linkage
  1. Link all Medicaid-eligible pregnant women with
    their birth outcomes?
  2. Link all Medicaid-paid deliveries with their
    birth certificates?
  3. Link all Medicaid-eligible pregnant women with
    their infants (or all Medicaid-eligible infants
    with their mothers)?
  4. Create a proxy measure for socio-economic status
    for vital statistics analyses?
  5. Create Medicaid pregnancy episodes of care
    records?
  6. Other purposes?

48
Issues with residence and occurrence in the
context of linking Medicaid and vital statistics
records
  1. Vital statistics datasets include all resident
    and occurrence events in the state thanks to
    the VSCP-NAPHSIS interstate exchange agreement.
    This includes live births, fetal deaths, deaths
    but does not extend to non-vital statistics
    records.
  2. Medicaid program data are generally
    state-specific, and state residence is part of
    the eligibility requirement. A woman who gives
    birth in your state, but is a resident of another
    state, may have been a Medicaid participant
    there, but youll never know. Some states have
    special programs governing reimbursement for
    Medicaid services provided by physicians/health
    care facilities in other states.
  3. If records fail to match, might it be due to
    differences in reporting requirements and
    eligibility?

49
What is the relationship between eligibility
and utilization?
  • Remember that eligibility data are just that
    unfortunately some persons who meet eligibility
    requirements never apply or get signed up, while
    others do, but never access the services for
    which they are eligible.
  • Consider linking your eligibility data with
    service utilization data not only to find out
    which clients actually used the program, but also
    for the insights you might gain from the
    utilization data themselves.

50
What records will and wont match?
  • Some pregnancies involving Medicaid-eligible
    women or Medicaid pregnancy claims do not result
    in live births.
  • Fetal deaths
  • Spontaneous or induced abortions
  • Some Medicaid-eligible women may not have been
    residents of the study area at the time of the
    vital event.
  • Over-reliance on unique identifiers (SSNs,
    service IDs) can lead to both mis-matched and
    unmatched records.
  • Whats in a name, anyway?

51
Record Linkage Methods
  • Generally there are two classes of linkage
    methodologies
  • Deterministic linkage methods
  • Probabilistic linkage methods

52
Linking data deterministically
53
Which variables are common to both datasets??
  • Do a PROC Contents

Note This section is based on the use of SAS.
54
A Word of Caution
  • On the previous slide, mention was made of using
    SAS.
  • If you plan to do record linkage using Microsoft
    Access without complex Visual Basic code, DONT!
    The same applies to other relational database
    software.
  • Linkages based solely on straightforward JOINs
    will allow significant error to remain in your
    matched results.

55
An Even Stronger Word of Caution
  • If you plan to conduct a deterministic match
    using a single identifying variable, or requiring
    a match on that variable together with others,
    DONT!
  • A good example of this is the Social Security
    Number.
  • On the other hand, once you have linked records,
    assigning a common identifier to both datasets
    will facilitate future data processing.

56
And now, back to our regular program . . .
57
Mothers information
  • Birth certificate Newborn Screen
  • Birth_mom_legal Screen_mom_legal_last
  • Birth_mom_mid Screen_mom_mid
  • Birth_mom_first Screen_mom_first
  • Birth_mother_dob Screen_mother_dob

58
Infants information
  • Birth Certificate Newborn Screen
  • Birth_child_last Screen_child_last
  • Birth_child_mid Screen_child_mid
  • Birth_child_first Screen_child_first
  • _Birth_gender _Screen_gender
  • Birth_child_dob Screen_date

59
Other information
  • There could also be related fields that dont
    specifically identify the individual, but are
    useful for record linkage
  • Birth Certificate Newborn Screen
  • Birth_zip_code Screen_zip_code
  • Birth_hosp Screen_hosp

60
Missing data
  • Look for missing data in linkage variables.
  • What do you do when you find it?

61
Duplicate records
  • Look for records that share the same values for
    your vector of matching variables.
  • What do you do when you find records that share
    these values?

62
Ranking of linkage variables
  • Which variables are the best variables?
  • How much missing data in each variable?
  • What do you know about the variables?
  • How do you decide?

63
The art of creating a linkage algorithm
  • Use the most discriminating combination of
    variables first
  • Loosen criteria as go along

64
The art of creating a linkage algorithm
Most strict criteria
Linkage step 1
Linkage step 2
Linkage step 3
Least strict criteria
Linkage step
65
Create id in data set
  • Allows you to easily merge back with original
    data
  • Easy as
  • data new
  • set old
  • id_n_
  • run

66
Sort by chosen linkage variables
  • What happens when you dont use by variables??,
    for example
  • DATA LINKED
  • MERGE BCERT MED
  • RUN
  • Be sure you unduplicate the output file (ie
    NODUPKEY option in PROC SORT)

67
Merge by chosen linkage variables
  • Create data set with only linked records
  • Keep track of the link level level of linkage
    where records matched
  • Dont discard records that fail to match at each
    step
  • Consider allowing full replacement prior to
    running each new iteration

68
Re-merge to get unlinked datasets
  • Unlinked data sets contain only variables from
    that data set
  • Unlinked records sent to next level of linkage
    algorithm

69
Last step
  • Combine all linked data sets
  • Investigate unlinked records
  • Look for systematic errors responsible for
    non-linking
  • Look for biases
  • Evaluate quality of links in linked records

70
Probabilistic Record Linkage
  • Uses probabilities to determine whether a pair of
    records refer to the same individual
  • Calculates weights to quantify the likelihood
    that a pair of records are a true match
  • Computationally intensive each record in each
    dataset is compared with every other record in
    the other dataset
  • Probabilistic weights may be either non-specific
    or value specific

71
General (Non-Specific) Weights
  • Agreement on a specific variable
  • Example
  • - Agreement on date of birth receives a higher
    weight then match on sex
  • - Disagreement on sex receives a higher
    penalty than disagreement on date of birth

72
Value Specific Weights
  • Agreement on a specific value of the variable
    being compared
  • Example Comparing initials using value specific
    weights
  • - Agreement on initial Z receives higher weight
    than match on initial S
  • - Disagreement on initial S receives higher
    penalty than disagreement on Z

73
Benefit of Weights
  • Weights objectively reflect our confidence in a
    match
  • Individual choice in cutting off low weights

74
Probabilistic Linkage Methods
  • Some SAS programmers write their own
    probabilistic code
  • Software packages
  • - Very expensive
  • - Difficult to use
  • - Some applications are available as freeware or
    shareware

75
Choosing Probabilistic Software
  • Links same as LinkPro but freeware
  • Link the King
  • Link Plus (CDC - http//www.cdc.gov/cancer/npcr/to
    ols/registryplus/lp.htm)
  • FEBRL also freeware, opensource has steep
    learning curve

76
Linkage Evaluation
  • A significant advantage of probabilistic methods
    is that evaluation of the linkage results is an
    explicit step in the methodology.
  • The analyst must determine what level of
    tolerance will be applied for acceptance of a
    matched pair of records.

77
Document, Document, Document
  • Even if you plan to remain in your current job
    for the next 30 years, the importance of careful
    documentation in programs, output, data
    dictionaries, and reports cannot be stressed
    strongly enough.
  • Retain statistical program logs, keep track of
    the provenance of input datasets, and document
    all decisions made concerning methods and their
    application.

78
Data Warehouses
  • Be wary of warehouses, lest you fall into the
    trap of believing they are all things to all
    people.
  • More specifically
  • When linkages within the warehouse are made
    solely on the basis of unique identifiers,
    caveat emptor.
  • Always ask the question of how the linkages for
    the warehouse were done, and more importantly,
    for what purpose.

79
Data Warehouses (continued)
  • The term data warehouse means different things
    to different people.
  • For some, its a perfect one-to-many/many-to-one
    linkage repository
  • For others, its a library of databases
    containing records of unknown or untested
    relationship to one another
  • For still others, it is a Swiss cheese data
    cube in which some regions are fully populated
    and linked across data sources, while others
    contain data measured at differing levels of
    aggregation, while others contain unlinked
    records, while still others are empty

80
And finally, one more time . . .
81
Evaluate before you analyze
  • Dont assume the linkage has been done correctly,
    whether you did it yourself or it was done by
    someone else.
  • Each time the linkage is done the results must be
    evaluated, whether you use deterministic or
    probabilistic linkage algorithms.
  • Compare values on non-linkage variables as well
    as those used to conduct the linkage, across all
    observations in the dataset.
  • Create pairwise linkage scores and throw out
    linkages between records that dont meet your
    minimum criteria.
  • If you publish reports or submit manuscripts, it
    is imperative that information on how the linkage
    was done and how the results were evaluated prior
    to analysis be included in your methods.

82
(No Transcript)
83
But weve always done it this way . . . (or,
close enough for government work)
  • Why do the linkages once a year?
  • Consider building linkages into the routine
    processing of records as they are filed or
    reported.
  • Even if linkages are done annually, consider
    creating a database in which links across
    subjects can cross reporting years. This can
    result in a self-correcting feedback loop that
    enables additional unmatched records to be linked
    later on the basis of more current information.

84
THE TEN COMMANDMENTS OF RECORD LINKAGE
With apologies to Mel Brooks, and thanks for
editorial assistance to Elizabeth Kirby and for
their insights to the following Internet
contributors Jane Lazar, Boston
University Craig Mason, University of
Maine Russel Rickard, Colorado Department of
Health and Environment Greg Alexander, University
of Alabama at Birmingham
R.S. Kirby, November 2003
85
The Ten Commandments of Record Linkage
Number 10
Thou shalt not taketh the name of thine software
in vain.
86
The Ten Commandments of Record Linkage
Number 9
Thou shalt not covet thy neighbors database, yet
neither should thou hoardeth thine database.
87
The Ten Commandments of Record Linkage
Number 8
Know thy purpose (in doing record linkage).
88
The Ten Commandments of Record Linkage
Number 7
Thou shalt not merge without by variable(s).
89
The Ten Commandments of Record Linkage
Number 6
Thou shalt checketh thine statistical software
log before thou proceedeth to thy next step or
process.
90
The Ten Commandments of Record Linkage
Number 5
Thou shalt protect the privacy of those whose
information is recorded in thy databases, even as
thou useth their personal identifiers to conduct
thine linkage analyses.
91
The Ten Commandments of Record Linkage
Number 4
Thou shalt not bear false witness against the
inconsistent values of variable common to two
datasets, nor because thou faileth to evaluate
thine linkage results.
92
The Ten Commandments of Record Linkage
Number 3
Know thy data.
93
The Ten Commandments of Record Linkage
Number 2
Thou shalt not underestimate the complexity, time
commitment, and staffing required to conduct
thine record linkage projects, nor shalt thou
overestimate the time needed to conduct thine
analyses.
94
The Ten Commandments of Record Linkage
Number 1

Thou shalt show humility to others, even to those
who doubted that the tasks thou hast accomplished
could be done.
95
The life which is unexamined is not worth living
  • - Plato (428-348 B.C.)

96
The database which is unexamined is not worth
analyzing
  • - Kirby, (1954- A.D.)

97
Contact Information
  • Russell S. Kirby, PhD, MS, FACE
  • Department of Maternal and Child Health School of
    Public Health, University of Alabama at
    Birmingham
  • Email rkirby_at_uab.edu
  • Telephone 205-934-2985

98
(No Transcript)
99
What is reality?
100
CONTROLLING THE URGE TO MERGEDIAGNOSIS AND
TREATMENT OF A NEW CLINICAL PSYCHOSIS AFFECTING
PUBLIC HEALTH WORKERS AND RESEARCHERS
  • Russell S. Kirby, Ph.D., M.S., F.A.C.E.
  • Originally described Dec. 1996,
  • revised at UAB Nov. 2002

101
(No Transcript)
102
Impulse-Control Disorders Not Elsewhere
Classified (269)
  • 312.34 Intermittent Explosive Disorder (269)
  • 312.32 Kleptomania (269)
  • 312.33 Pyromania (270)
  • 312.31 Pathological Gambling (271)
  • 312.39 Trichotillomania (272)
  • 312.35 Urge to Merge (272)
  • 312.30 Impulse-Control Disorder NOS (272)

103
Impulse-Control Disorders Not Elsewhere
Classified (269)
  • 312.35 Urge to Merge
  • A. Recurrent failure to resist impulses to link
    public health and/or clinical medical records
    that result in ill-conceived, often unscientific
    linkage strategies and linked files which may be
    inappropriate for the research purposes for which
    they were created.
  • B. The urge to merge manifested by researchers
    and analysts is often stimulated by external
    forces (administrators) but is grossly out of
    proportion to any precipitating bureaucratic
    stressors.
  • C. The urge to merge is not better accounted for
    by Conduct Disorder, Manic Episode, Substance
    Dependence, or Antisocial Personality Disorder.

104
Some clinical features of the urge to merge
psychosis
  • subject observed constantly mumbling about the
    need for a unique identifier
  • subject suffers from multiple tools disorders
    (see DSM-4R for diagnostic criteria), e.g.
  • if Access doesnt work, subject tries SAS
  • if direct importation doesnt work, subject
    converts files to spreadsheets first, then into
    statistical file formats

105
Some clinical features of the urge to merge
psychosis (continued)
  • subject given to making grandiose statements,
    e.g.
  • if you cant drill down, then roll up
  • linked files are data rich and information poor
  • electronic data rules paper is for illiterates
  • subject often forgets why research projects are
    being done, as the linkage task becomes both
    primary and primal

106
If this is you . . .
  • There is hope.
  • Join the national community of LA (Linkers
    Anonymous) and practice its iterative
    twelve-step algorithm.
  • Talk to your colleagues and co-workers in time,
    they may come to understand, or at least become
    more tolerant.
  • Remember, you dont have to go-it-alone!
Write a Comment
User Comments (0)
About PowerShow.com