Title: National Center for Health Statistics Record Linkage Program
1National Center for Health Statistics Record
Linkage Program
- Christine S. Cox,
- Chief, Special Projects Branch (SPB)
- Office of Analysis Epidemiology (OAE)
- NCHS Data Users Conference
- August 12, 2008
- U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES
- Centers for Disease Control and Prevention
- National Center for Health Statistics
2Overview
- NCHS Record Linkage Program
- Analytic Issues Tools
- Comparative Analysis of Public vs Restricted
Linked Mortality Files - Accessing the Restricted-use Linked Data
3NCHS Record Linkage Program
- Links survey data with data collected from
administrative records - Designed to maximize the scientific value of the
NCHS population-based surveys - Examine factors that influence chronic disease,
disability, health care utilization, morbidity,
and mortality
4Why Do Linkage?
- Augments available information for major
diseases, risk factors, and health service
utilization - Links exposures to outcomes
- Provides longitudinal component to survey data
- Reduces cost burden
- Re-contacting survey respondents for follow-up
information can be expensive - Increases accuracy and detail of data collected
5How Records are Linked
6Research Potential of NCHS Linked Data
- Aging
- Risk factors for poor health outcomes (hip
fractures, stroke, etc.) - Disability
- Effects of chronic illness and obesity on
disability and mortality - Disparities
- Mortality patterns by race/ethnicity or
socioeconomic status - Health services
- Functional impairment and health care costs
- Methodologic Studies
- Validation of self-reports vs. administrative
records - Genetics
- Genetic variants and health outcomes
7Record Linkage Activities
- Mortality
- National Death Index
- Social Security Retirement and Disability
- Data from the Retirement, Survivors, Disability
Insurance (RSDI) and Supplemental Security Income
(SSI) programs - Medicare enrollment and payments
- Enrollment and claims data
8NCHS Linked Mortality Data Files
9Number of Deaths by Survey
- NHIS and LSOA II have mortality follow-up through
12/31/2002. - NHEFS, NHANES II and III have mortality follow-up
through 12/31/2000.
10Public-use Linked Mortality Files
- In 2007, released public-use files with a limited
amount of perturbed data and reduced number of
mortality variables - NHIS 1986-2000
- NHANES III
- LSOA II
- Study comparing analyses from public-use and
restricted-use linked mortality files
demonstrated similar results - Lochner et al. Am. J. Epidemiol. 2008 168 336-344
11Mortality Data Elements
- Vital status
- Date of death or follow-up time
- Underlying cause of death
- Multiple cause of death
- Age at death
- Age last presumed alive
- only available on restricted-use files
12Research Potential of Linked Mortality Data
- Excess Deaths Associated with Underweight,
Overweight, and ObesityKM Flegal, BI Graubard,
DF Williamson, MH Gail JAMA, 20052931861-1867. - Living and Dying in the USA Behavioral, Health,
and Social Differentials of Adult Mortality RG
Rogers, CB Nam, RA Hummer 2000. - Suicide among male veterans a prospective
population-based study MS Kaplan, N Huguet, BH
McFarland, JT Newsom J Epidemiol Community
Health, 2007 61619-624.
13NCHS Linked Medicare Data Files
14Medicare Linkage
- Medicare enrollment and claims data for the years
1991-2000 - Denominator file
- MEDPAR Inpatient hospitalization
- MEDPAR Skilled nursing facility (SNF)
- Hospital outpatient
- Home Health Agency (HHA)
- Hospice
- Carrier (physician/supplier Part B file)
- Durable Medical Equipment (DMERC)
- Next data release (1999-2007)
- All of the above files
- Chronic Conditions Warehouse
- Medicare Part D (Prescription Drugs)
15Summary Medicare Data File
- Summary Medicare Enrollment and Claims Files
(SMEC) for 1991-2000 - Enrollment information from the Denominator file
plus summary variables of claims and payments - Variables modeled after MCBS cost and use files
- Total reimbursements per year
- Total number of claims by Medicare record type
- Summary of charges by Medicare record type
- Termination status reason for termination
- Monthly HMO enrollment
- Medicare status code (i.e. Part A, B or both)
16Research Potential of Linked Medicare Data
- Examine risk factors for health conditions
- Examine reliability of survey data
- Compare survey reported Medicare enrollment to
Medicare claims records - Examine survey report of disability with program
participation eligibility criteria - Examine disparities in Medicare service
utilization
17NCHS Linked SSA Data Files
18Social Security Linkage
- Old Age, Survivor, Disability Income
- Master Beneficiary Record (MBR), 1962 - 2003
- Program eligibility, benefit amount, payment
status, dual entitlement - Payment History Update System (PHUS), 1984-2003
- Benefit payment amounts, including withholding
information for Medicare Part B premiums - Supplemental Security Income
- Supplemental Security Record (SSR), 1974 - 2003
- Program eligibility, benefit information, and
payment status
19Research Potential of Linked Social Security Data
- Examine reliability of survey information for SSA
program participation and benefits - Compare the health characteristics of early
retirees (age 62) to those who postpone benefits - Policy analysis using validated survey data
- Predicting the number of people who will become
disabled based upon survey reported health
conditions - Determining whether current disability
entitlement funding levels will be adequate as
the population ages
20Future Linkage Activities
- Linkage of 1999-2004 Medicaid enrollment and
claims data linked to 1999-2004 NHIS and NHANES - NCHS series report comparing the mortality
experience of the 1986-2000 National Health
Interview Survey Participants with the U.S.
population
21Overview
- NCHS Record Linkage Program
- Analytic Issues Tools
- Comparative Analysis of Public vs Restricted
Linked Mortality Files - Accessing the Restricted-use Linked Data
22National Center for Health Statistics Record
Linkage ProgramAnalytic Issues and Tools
- Kimberly A. Lochner, SPB, OAE
- NCHS Data Users Conference
- August 12, 2008
- U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES
- Centers for Disease Control and Prevention
- National Center for Health Statistics
23Analytic Issues Overview
- Linkage eligibility
- Linkage match status
- Combining survey years for the linked mortality
files - Changes in surveys or administrative data over
time - Issues with administrative data
24Mortality Analytic Issues
- Eligibility status
- Sample weights
- Combining survey years for the linked mortality
files - Variance estimation
- Changes over time
- ICD-9 and ICD-10 codes
- Most of these issues apply only to the NHIS
Linked Mortality Files
25(No Transcript)
26Eligibility Status
- What determines eligibility for mortality
follow-up? - Age
- Non adult survey respondents are INELIGIBLE
- Future linkages will include children
- Sufficient data for matching
- Lack of identifying data makes you INELIGIBLE
- Drop INELIGIBLE survey respondents
- Variable indicating eligibility status on files
27Mortality IneligibilityLack of Matching Data
(adults only)
28Eligibility Status
- Ineligibility a problem for NHIS
- Created new sample weights to account for
ineligibility due to insufficient identifying
data - Original NHIS sample weights (WTFA)
- New NHIS sample weights (WGT_NEW)
- Only for core/person files
- Recommend using WGT_NEW
29Combining Survey Years
- NHIS linked mortality files cover two design
periods (1986-1994 and 1995-2000) - Follow guidelines on pooling NHIS years
- http//www.cdc.gov/nchs/nhis/methods.htm
- Created new stratum and psu variables for NHIS
Linked Mortality files to allow combining across
NHIS design years
30Changes in Data Over Time
- ICD-9 (deaths 1979 1998) and ICD-10 (deaths
1999 to present) cover linked mortality files - Use both sets of codes to obtain full counts of
cause-specific deaths - Individual codes (ICD_9REV, ICD_10REV)
- Recodes
- UCOD_282, (ICD-9)
- UCOD_72, (ICD-9)
- UCOD_34, (ICD-9)
- UCOD_358, (ICD-10)
- UCOD_113 - recodes deaths before 1998 using
ICD-10 guidelines - Refer to vital statistics report on ICD
comparability
31Medicare Analytic Issues
- Eligibility status
- Eligible but not matched
- Death
- Linked but no Medicare data
- Managed care enrollment
- Non covered services
- Gaps in coverage
- Issues with Medicare data files
- See the NCHS-CMS linkage web page under
Analytic/Programming Support
32Medicare Ineligible Population and Linkage Rates
(65 years)
33Ineligibles and Non-Matches
- Must be excluded from your sample
- Identify using the variable (CMS_MATCH) on the
Feasibility Study Data files
34Identifying Deaths
- Survey participants interviewed before the
availability of linked Medicare files could have
died before 1991 - E.g. NHEFS, NHANES II or NHANES III respondents
interviewed in Phase I (1988-91) - Persons may die during study period and cease to
have Medicare records - Enrolled in Medicare in 1991 but died before 2000
35Identifying Deaths
- Survey respondents who died before 1991 (e.g.
from NHANES) can be identified by merging
mortality information from the Linked Mortality
files - Needed to create analytic sample
- Persons who died during 1991-2000 should no
longer have Medicare records after date of death - Look for a CMS date of death (DOD) on each of the
Denominator or SMEC files (1991 to 2000)
36Linked but no Medicare data
- No denominator file because
- Loss of entitlement during 1991-2000
- Deceased prior to 1991
- CMS record keeping inconsistencies
- No claims data
- Not utilizing Medicare in 1991-2000
- No reimbursable claims
- CMS record keeping inconsistencies
37No Denominator Record
- Lack of denominator record can affect your
analytic sample why? - Cant determine managed care enrollment
- In general, managed care enrollees are excluded
from sample (more on this to come)
38Managed Care Enrollment
- Medicare does not receive claims for
beneficiaries enrolled in managed care plans
(HMO) - Do not have complete information on payments or
services received - Could miss health events that are being counted
based upon submitted claims - Complex issue. Refer to ResDAC
- http//www.resdac.umn.edu/
39How managed care enrollees affect your research
depends upon your question
- Studies on reimbursements/charges
- Option may be to exclude those with any managed
care enrollment because you dont have complete
information on payments or services received - Studies on health outcomes/events
- Option may be to exclude those with any managed
care enrollment because you could miss events - Option may be to censor observations at time of
first HMO enrollment - Other methods for addressing HMO enrollment
possible depending upon research question
40Services not covered in Medicare 1991-2000 files
- Out-patient prescription drugs
- Routine physical and dental exams
- Dentures
- Eye glasses
- Out-of-pocket expenses for Medicare beneficiaries
(e.g. deductibles, coinsurance)
41SSA Analytic Issues
- Eligibility status
- Eligible but not matched
- Linked but no benefit history data
- Records are extracted from files designed for
program administration - not for research
42SSA Ineligible Population and Linkage Rates
43Ineligibles and Non-Matches
- Must be excluded from your sample
- Identify using the variable (SSA_MATCH) on the
Feasibility Study Data files
44Linked but no SSA Data
- Linkage is to SSA NUMIDENT file
- Linked to NUMIDENT file but may not be eligible
for Social Security benefits - Not age eligible for retirement
- Defer retirement benefits because working
full-time - Not eligible for Social Security
45Issues with Administrative Data
- Administrative data updates
- Payment history updates
- Previously denied claims may be overridden
- Changes to type of benefit status
- Individuals receiving disability (DI) switch to
retirement (R) benefits at age 65 in RSDI program - Complicated data
- File layouts are complex, e.g. each MBR record
has 2 parts - Calculation of benefits not straightforward, e.g.
SSI benefits come from both federal and state
programs
46Final Tips
- Read relevant documentation !!!
- Survey file layouts detailed notes
- Linkage methodology reports
- Sample SAS STATA input statements for
public-use linked mortality files - Analytic guidelines
- Consult basic program information
- CMS http//www.cms.gov
- ResDAC http//www.resdac.umn.edu (Medicare)
- SSA http//www.ssa.gov and
- http//www.ssa.gov/regulations/index.htm
47Final Tips
- Determine NCHS public-use files needed
- Determine RDC linked files needed
- Determine feasibility of research question based
upon successfully linked respondents - Public-use Feasibility Study Data files available
indicating whether respondent was linked to
Medicare or SSA data and whether there is a
record on the various Medicare and/or SSA files - Match status (SSA_MATCH CMS_MATCH)
48Overview
- NCHS Record Linkage Program
- Analytic Issues Tools
- Comparative Analysis of Public vs Restricted
Linked Mortality Files - Accessing the Restricted-use Linked Data
49National Center for Health Statistics Record
Linkage Program Comparative Analysis of the
Public-use and Restricted-use Linked Mortality
Files Kimberly A. Lochner, SPB, OAE NCHS Data
Users Conference August 12, 2008 U.S.
DEPARTMENT OF HEALTH AND HUMAN SERVICESCenters
for Disease Control and PreventionNational
Center for Health Statistics
50Objectives
- Present an overview of the newly available
public-use linked mortality files - National Health Interview Survey (NHIS) 1986 to
2000 - Third National Health a Nutrition Examination
Survey (NHANES III) - The Second Longitudinal Study of Aging (LSOA II)
- Demonstrate the analytic comparability between
the public-use and restricted-use versions of the
linked mortality files
51Background
- Mortality follow-up studies are a major focus of
NCHS record linkage activities - NCHS linked mortality files created in 2004 made
available through NCHS Research Data Center (RDC) - Protects confidentiality of survey participants
- May minimize access to highly utilized data
sources
52Background
- NCHS plan for public-use linked mortality files
included - Releasing a reduced number of key mortality
variables - Perturbing date or cause of death for select
records - Determining that survey participants could not be
reidentified - Comparing the analytic utility of the public-use
file to the restricted-use file
53Public-use Linked Mortality Files
- NHIS (1986 2000)
- Each NHIS year is nationally representative
survey of the civilian non-institutionalized U.S.
population - Questionnaire content
- Basic socio-demographic characteristics
- Health conditions and utilization
- Health status, health care services, and behavior
- Mortality follow-up through December 2002
54Public-use Linked Mortality Files
- NHANES III (1988 1994)
- Includes survey and examination information
designed to assess the health and nutritional
status of U.S. adults and children. - Study content
- Basic socio-demographic characteristics
- Medical and dental examinations
- Laboratory tests
- Environmental exposures
- Mortality follow-up through December 2000
55Public-use Linked Mortality Files
- LSOA II
- Prospective survey of persons 70 years of age and
over at the time of their baseline interview
(1994 NHIS) - Follow-up interviews in 1997-98 and 1999-00
- Questionnaire content
- Basic socio-demographic characteristics
- Health conditions, functional health status and
disability - Health care utilization
- Mortality follow-up through December 2002
56Data Elements NHIS Linked Mortality Files
- MCOD flags only for diabetes, hypertension, and
hip fracture - Available on the public-use NHIS survey data
files
57Data Elements NHANES III Linked Mortality Files
- MCOD flags only for diabetes, hypertension, and
hip fracture - Available on the public-use NHANESIII survey
data files
58Data Elements LSOA II Linked Mortality Files
- MCOD flags only for diabetes, hypertension, and
hip fracture - Available on the public-use LSOA II survey data
files
59Comparative Analyses
60Statistical Methods
- Compared mean follow-up times and distributions
for select causes of death - Compared the mortality risk for a standard set of
socio-demographic covariates for all-cause as
well as cause-specific mortality - Cox proportional hazard models
- SUDAAN to take into account complex survey design
61Analytic Samples
- Eligible for mortality follow-up
- At least 25 years of age at the time of the
survey interview - Non-Hispanic white, non-Hispanic black, or
Hispanic - Non missing values for cause of death or other
covariates
62Covariates
- Socio-demographic characteristics reported
- at time of interview and taken from public-use
- survey data files
- Age
- Sex
- Race and ethnicity
- Educational attainment
- Marital status (except NHANES III)
- Region of the country (except NHANES III)
63Outcomes
- All-cause and cause-specific mortality
- Cause-specific deaths based on underlying cause
of death from the ICD-10 113 grouped recode - Duration of follow-up calculated from time of
interview until death or censored at end of the
follow-up period - Restricted-use files use complete information on
interview and death month, day, and year - Public-use files use less detailed information on
timing of death, some of which is perturbed - NHIS/LSOA II use interview year and death year
only - NHANES III use person-time follow-up provided on
the file
64NHIS Results
- Sample (n 897,232)
- Deaths (n 114,264)
- 11.8 weighted
- Follow-up (mean)
- Restricted-use 8.6 years
- Public-use 8.7 years
65NHIS Linked Mortality Files Cause-specific
Deaths
66NHIS Linked Mortality Files Relative Hazards
for All-Cause Mortality
- Note Models also adjusted for marital status and
region of the country.
67NHIS Linked Mortality Files Relative Hazards
for Homicide Mortality
- Note Models are restricted to Non Hispanic
Whites and Blacks (n 802,307). - Models also adjusted for marital status
and region of the country
68NHANES III Results
- Sample (n 16,048)
- Deaths (n 3,209)
- 12.1 weighted
- Follow-up (mean)
- Restricted-use 104.1 months
- Public-use 103.8 months
69NHANES III Linked Mortality Files
Cause-specific Deaths
70NHANES III Linked Mortality File Relative
Hazards for All-Cause Mortality
71NHANES III Linked Mortality File Relative
Hazards for Cerebrovascular Mortality
- Note Models restricted to Non Hispanic Whites
and Blacks (n 11,985).
72LSOA II Results
- Sample (n 8,867)
- Deaths (n 3,671)
- 41.4 weighted
- Follow-up (mean)
- Restricted-use 4.4 years
- Public-use 4.4 years
73LSOA II Linked Mortality Files Cause-specific
Deaths
74LSOA II Linked Mortality File Relative Hazards
for All-Cause Mortality
- Note Models also adjusted for marital status and
region of the country.
75LSOA II Linked Mortality File Relative Hazards
for Cancer Mortality
- Note Models restricted to Non Hispanic Whites (n
7,586). - Models also adjusted for region of the
country.
76Conclusions
- Public-use linked mortality files yield similar
results as the restricted-use data - Public-use and restricted-use files yield similar
hazard ratios and confidence intervals,
particularly for common causes of death - Results for less common causes of death remain
consistent, although there tends to be less
agreement in the estimates
77Conclusions
- Caution is urged for analyses of very rare causes
of death or small population subgroups -
- Users of the public-use linked mortality files
may request to verify their results through the
NCHS Research Data Center
78Public-use Linked Mortality Files Can Be
Downloaded
- http//www.cdc.gov/nchs/data_access/data_linkage_a
ctivities.htm
79Acknowledgements
- American Journal of Epidemiology 2008
168(3)336-344 - SPB data linkage team
- Stephanie Bartee
- Jim Brittain
- Cordell Golden
- Donna Miller
- Gloria Wheatcroft
80Overview
- NCHS Record Linkage Program
- Analytic Issues Tools
- Comparative Analysis of Public vs Restricted
Linked Mortality Files - Accessing the Restricted-use Linked Data
81NCHS Record Linkage Activities Accessing
Restricted Linked data at the NCHS Research Data
CenterChristine CoxNCHS Data Users
ConferenceAugust 12, 2008 U.S. DEPARTMENT
OF HEALTH AND HUMAN SERVICES Centers for
Disease Control and Prevention National Center
for Health Statistics
82Why cant you just give me the data?
- NCHS does not own the linked administrative
data - NCHS data confidentiality rules prohibit the
release of potentially identifiable data
special considerations concerning the protection
of linked data - The RDC is the only option for access to
restricted-use data files
83Research Data Center
- The RDC is a organizational unit located at NCHS
headquarters in Hyattsville, MD - Provides access to restricted use data files
84Restricted Data Files Include
- Linked administrative data
- Medicare
- SSA
- Restricted-use linked mortality files
- Detailed geographic data or contextual data
- Census tract State/county level data
- EPA air pollution data
85What to Expect?
- To gain access to NCHS restricted data user must
- Submit a research proposal
- Sign an affidavit of confidentiality
- Promise not to use any method to attempt to
identify respondents
86What to Expect?
- How long for a proposal to be reviewed?
- Usually within 2 weeks, if proposing to use
public use survey data with the linked data - Up to 1-2 months, if proposing to use non-public
survey data with the linked data
87Access Methods
- Once approved, three methods to access restricted
data - on-site - use local computing resources in the
NCHS RDC, Hyattsville, MD - remote submit programs electronically to be
executed in the RDC with output returned by email - Census RDC- access NCHS data using any one of the
nine Census RDCs. - For all methods of access, restricted data files
remain in RDC and output is inspected for
disclosure violations
88On-Site Access Method
- On-site Facilities
- Four user workstations-expandable as needed
- Pentium IV computers
- Windows XP
- SAS, STATA, SUDAAN, LIMDEP, SPSS, Watcom Fortran
77, HLM - No removable media
- Secure printer
- Open only during normal working hours
- RDC staff constructs necessary data files,
including merged user data
89Remote Access Method
- RDC staff constructs necessary data files,
including merged user data - SAS programs only, including SAS callable SUDAAN
(certain procedures and functions not allowed) - Both submitted programs and output undergo a
programmed disclosure limitation review - Ability to submit analytical computer programs
via email from anywhere in the world with access
available 24hrs/day
90Census RDC Access Method
- 9 Census RDCs
- Los Angeles, Berkeley, Boston, Durham,
- Ann Arbor, Ithaca, NYC, Chicago, DC
- Separate Census research proposal is not needed
- May have to follow additional security
requirements at Census Bureau facilities
91User Fees Linked Data Access
92Proposal Requirements
- Proposal is evaluated by review committee
- Review criteria
- Scientific and technical feasibility
- Availability of RDC resources
- Disclosure risk for restricted information
- The extent to which project is in accordance with
the mission of NCHS - Special note NCHS does not try to determine if
proposals are duplicative
93Proposal Requirements Helpful Tips
- Be clear about research and data requirements
(helps to determine feasibility of project) - Clearly identify the sample to be included in the
analytic file - Provide data dictionaries for both
- Public-use data
- Restricted-use data
- Provide examples of expected output
94Visit the RDC at http//www.cdc.gov/nchs/rd/rdc
.htm or email rdca_at_cdc.gov
95Where to get Help?
- RDC website contains
- Proposal Checklist
- Sample Proposal
- List of available restricted data files
- Detail on Census RDC locations and contact
information - FAQs regarding proposal review process, on-site
procedures, area information and contact
information - Email rdca_at_cdc.gov
96Questions?