Title: DIMACS Working Group on Privacy Confidentiality of Health Data
1- DIMACS Working Group on Privacy / Confidentiality
of Health Data - Rutgers University Center
- Piscataway, New Jersey
- December 10-12, 2003
2Health Care Databases under HIPAA Statistical
Approaches to De-identification of Protected
Health Information
- Judith E. Beach, Ph.D., Esq.
- Associate General Counsel, Regulatory Affairs
- Chief Privacy Officer
- Chair, Council on Data Protection and Council on
Research Ethics
3Outline
- 1.Evolution of De-identification Standards
HIPAA Privacy Regulation - 2.De-identification Standards for Health
Information in Research - a. Safe Harbor
- b. Statistician Method
- )HIPAA Provisions
- )Quintiles Experience and Methodology
- c. Limited Data Set
- 3.Preemption of State laws on De-identification
Standards for Health Information - 4.Health Information Privacy - Cases and
Controversies
4Evolution of De-Identification Standards in
HIPAA Privacy Regulation
5Federal Policy De-Identification of Health
Information
5
- Governments intent - to provide a balance of
stringent standards flexible enough not to be a
disincentive to use or disclose de-identified
health information, wherever possible. - De-Identified health data is one of the best
mechanisms for avoiding wrongful disclosure of
Protected Health Information (PHI). - See Draft (05/27/03) DHHS Policy and Procedure
Manual De-Identification Policy d11 (effective
date 6/1/03) - applies to DHHS agencies HIPAA
covered health care components and Internal
Business Associates
6Federal Policy Use of De-identified Health Data
Rather than PHI for Research
6
- We HHS expressed the hope that covered
entities, their business associates and others
would make greater use of de-identified health
information . . . when it is sufficient for the
research purpose and that such practice would
reduce the burden and the confidentiality
concerns that result from the use of individually
identifiable health information for some of these
purposes. HHS, in final privacy rule, 65 Fed.
Reg. at 82543 (Dec. 28, 2000), citing proposed
privacy rule of Nov. 3, 1999
7HIPAAs Jurisdiction
7
- Individually Identifiable Health Information
(IIHI) - A subset of health information, including
demographic information, that identifies the
individual or with respect to which there is a
reasonable basis to believe the information can
be used to identify the individual - Protected health information (PHI)
- Means individually identifiable health
information (IIHI Health Information
Identifier) that is transmitted or maintained
electronically, or transmitted or maintained in
any other form or medium - An investigator who submits health claims would
be a HIPAA covered entity (CE) - CE Health Information Identifier PHI
- CE Identifier - Health Information NOT PHI
- Health Information Identifier - CE NOT PHI
8De-identification Standards for Health
Information in Research
9De-identified Health Information
9
- Definition health information that does not
identify an individual and with respect to which
there is no reasonable basis to believe that the
information can be used to identify an
individual. 45 CFR 164.514(a) - The Privacy Rule permits de-identification of PHI
so that such information may be used and
disclosed freely, without being subject to the
Privacy Rules requirements. - Once de-identified, the data is out of the
Privacy Rule.
10HIPAA De-identification Standards
10
- Two methods for the de-identification of health
information - Safe Harbor -- remove 18 specified identifiers
- intended to provide a simple, definitive method
for de-identifying health information with
protection from litigation - Statistician Method -- retain some of the 18
safe harbors specified identifiers and
demonstrate the standard is met if person with
appropriate knowledge of and experience with
generally accepted statistical and scientific
principles and methods, e.g., a Biostatistician,
makes and documents that the risk of
re-identification is very small. - 45 CFR 160.514
11Limited Data Set
11
- Final rule added another method requiring
removal of facial identifiers -- Limited Data
Set - Under confidentiality agreements - for research,
public health, and health care operations - Regarded as PHI - NOT de-identified
- therefore, still subject to Privacy Rule
requirements such as minimum necessary rule. -
12Safe Harbor Method
13Safe Harbor
13
- Covered entities must remove all of a list of 18
enumerated identifiers and have no actual
knowledge that the information remaining could be
used alone or in combination to identify a
subject of the information. - The identifiers to be removed include
- direct identifiers such as name, address, SSN
- indirect identifiers such as birth date,
admission and discharge dates, and five-digit zip
code - 45 CFR 160.514(b)(2)
14Safe Harbor
14
- The safe harbor does allow for the disclosure of
- All geographic subdivisions no smaller than a
State, as well as the initial three digits of a
zip code - IF the geographic unit formed by combining all
zip codes with the same initial three digits
contains more than 20,000 people - AGE, if less than 90, gender, ethnicity and other
demographic information not listed.
15Safe Harbors 18 Identifiers
15
- Names
- All geographic subdivisions smaller than a State,
including street address, city, county, precinct,
zip code, and their equivalent geocodes - Except for the initial three digits of a zip code
if according to the currently available data from
the Bureau of the Census - The geographic unit formed by combining all zip
codes with the same three initial digits contains
more than 20,000 people and - The initial three digits of a zip code for all
such geographic units containing 20,000 or fewer
people are changed to 000 - All elements of dates (except year) or dates
directly relating to an individual, including - birth date, admission date, discharge date, date
of death - and all ages over 89 and all elements of dates
(including year) indicative of such age, except
that such ages and elements may be aggregated
into a single category of age 90 or older
- Telephone numbers
- Fax numbers
- Electronic mail addresses
- Social security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers, including
license plate numbers - Device identifiers and serial numbers
- Web Universal Resource Locators (URLs)
- Internet Protocol (IP) address numbers
- Biometric identifiers, including finger and voice
prints - Full face photographic images and any comparable
images and - Any other unique identifying number,
characteristic, or code.
16Sources of Authority
16
- In Privacy Rule Preamble, HHS recognizes two
sources of authority as to what constitutes such
principles and methods for de-identification
adequate for posting a de-identified database on
the Internet 65 Fed. Reg. at 82,709-82,710 (Dec.
28, 2000) - Paper 22 Statistical Policy Working Paper
22Report on Statistical Disclosure Limitation
Methodology - The Checklist The Checklist on Disclosure
Potential of Proposed Data Releases -intended
primarily for use in the development of
public-use data products.
16
17Safe Harbor
17
- BUT many researchers and other groups have
complained that the Safe Harbor renders the
de-identified data as virtually useless for
research so that the result will be MORE research
using PHI. - No dates of service, no patient initials, no date
of birth - Can have deltas such as number of patient
visits over time - However, the safe harbor was NOT designed for
research, but to provide an approved method of
de-identification for any purpose by any covered
entity, regardless of sophistication. - For instance, such de-identified data would be
deemed to be safely posted on the Internet.
18Statistician Method
19Statistician Method
19
- For this method, the covered entity
- must remove all direct identifiers
- reduce the number of variables on which a match
might be made - should limit the distribution of records through
a data use agreement or restricted access
agreement - 65 Fed. Reg. at 82,709-710 (Dec. 28, 2000)
20Opinion of Statistician
20
- Statistician must
- determine that there is a very small risk of
re-identification - after applying generally accepted statistical
and scientific principles and methods for
rendering information not individually
identifiable - documents the methods and results of the analysis
that justify such determination. - 45 CFR 160.514(b)(1)
21Statistician Method
21
- This method has been generally ignored by covered
entities. - Who prefer a safe harbor approach with safe
being the operative word. - Consider the Statistician alternative as too
complicated.
22Statistician Method Quintiles Experience
22
- An expert statistician calculated the statistical
likelihood of re-identification IF all 18 safe
harbor identifiers were removed, that is, the
de-identification probability. - Then, the statistician calculated the likelihood
of re-identification if certain dates of service
of medical or pharmacy claims were retained - And rather than age or year of birth, which is
allowed in the safe harbor, the month and year of
birth was included.
23Statisticians Opinion
23
- This calculated number, the de-identification
probability served as a benchmark of a very
small risk of re-identification against which
the statistician method would be compared.
24Analysis Comparison of Both Methods
24
- To ensure the statistical likelihood of
re-identification was comparable to that of the
calculated safe harbor benchmark, the following
data fields were made stricter than as permitted
by the safe harbor - For all patients older than 85 years of age
(rather than 90), the year of their birth
modified to make them all 85 years old. - All five-digit patient zip codes truncated to
first 3Â digits and further merged so that no
resulting 3Â digit code has a total population of
less than 200,000.
25Factors Considered by Statistician
25
- In the analysis, the statistician pointed out the
obvious - The de-identified data received is conveyed under
a confidentiality agreement, which specifically
prohibits re-identification or further disclosure
of the data except in statistically aggregated
form. - The database is maintained on a physically and
technically secure, password-protected server.
25
26Statisticians Opinion
26
- Applying generally accepted statistical and
scientific principles and methods for rendering
information not individually identifiable, . .
. I conclude that the risk is very small that the
information . . . could be used, alone or in
combination with other reasonably available
information, by an anticipated recipient to
identify an individual who is a subject of the
information. . . . In practice the actual
reidentification probabilities are much, much
lower . . . arguably de minimis.
26
27Statistician Method
27
- It is clear that most persons who have reviewed
the Privacy Rule have failed to appreciate the
significance of the statistician opinion to
de-identification, and, instead, have focused
almost exclusively on the "safe harbor." - In particular, many have failed to understand the
importance of the "restricted access" as it
relates to the statistician opinion approach to
de-identification.
2828
Ensuring HIPAA Compliance
All data handled is de-identified using a unique
patient identifier that is irreversibly
encrypted.
Patient identifiable electronic healthcare claims
(standard health claims data fields)
Data Encryption Process
Data Warehouse
De-identified data
zip 3 digit DOB modified
Upon completion of the de-identification process
a unique patient identifier is created, which is
irreversibly encrypted.
2929
Core Data Elements
July 98 - to date
Jan 98 - to date
Note Payor Type not available on all records
30Physician Demographics
30
- Specialty
- Region
- Number of years in practice
- Prescribing volume
- Type of practice
- Number of HMO / PPO / IPA affiliations
- patient volume by insurance type
- Physician race
- Physician age
31Patient Characteristics
31
- Location of contact
- Height and weight
- Age
- Gender
- Race
- Blood pressure
- Cholesterol levels (total, HDL, LDL,
triglycerides) - Insurance type
- Physician reimbursement method (fee-for-service
vs. capitation) - Smoker or non-smoker
32Disease Entities
32
- Visits (with and without drugs)
- Visits per physician per year
- Total patients seeking treatment
- Newly diagnosed patients
- Visit type (first vs. subsequent)
- Referrals and referring specialty
- Severity of condition
- Tests ordered or completed during visit
- Existing medical conditions not treated
- Number of times seen and days since last visit
- Number of patient drug requests for condition
33Treatment Regimens
33
- Dosage form, strength and signa
- Formulary impact
- Quantity prescribed and number of refills (mean
and frequency) - Weighted diagnosis value
- Dispensing instructions
- Occurrences per physician per year
- Therapy type
- New
- First-line versus adjunct therapy
- Drug replacement and reason
- Continued
34Treatment Regimens
34
- Desired action
- Concomitant drugs (to treat same diagnosis)
- Concurrent drugs (regardless of diagnosis)
- Drug issuance
- Sample days of therapy (mean and frequency)
- Prescribed days of therapy (mean and frequency)
- Daily average consumption (DACON)
- Non-drug therapy
35Limited Data Set (LDS)
36HHS Solution Limited Data Set
36
- For research, public health, or health care
operations purposes - Authorization not required
- A limited data use agreement must be in place
between the covered entity and the recipient of
limited data set (LDS) 45 CFR 164.514(e) - Data Use Agreements would only be needed for
those public health, research, or health care
operation uses and disclosures that are not
otherwise permitted by federal or state laws.
See Draft (05/27/03) DHHS Policy and Procedure
Manual De-Identification Policy d11
37LDS Still PHI
37
- Regarded as PHI, that is, not de-identified data
and, therefore subject to requirements for
protection of PHI such as - Prohibits re-identification or any attempt to
contact individuals by recipient - BUT re-identification code permitted for covered
entity - Subject to minimum necessary standards
- BUT no accounting of disclosures or IRB approval
38Limited Data Set Specifications
38
- May be useful for records-based research such as
epidemiological and other population research - But may NOT be useful for patient recruitment
- Because re-identification of individuals or
attempt to contact individuals is prohibited by a
third party even if by Researcher (without IRB or
internal privacy board approval) unless the
contact is made by the Covered Entity or the
Covered Entitys Workforce.
39LDS Remove 16 Identifiers
39
- Name
- Postal address information (other than city,
state, zip code) - Telephone number
- Fax number
- E-mail address
- Social Security Number
- Medical record / prescription numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate / license numbers
- Vehicle identity / serial numbers
- Device numbers
- Web URL
- IP address
- Biometric identifiers (e.g., fingerprints,
retinal scans) - Full face similar photographic images
45 CFR 164.514(e)(2)
40LDS Retain Indirect Identifiers
40
- Five-digit zip code
- Dates of service (e.g., admission / discharge)
- Dates of birth and death
- Geographic subdivision (e.g., state, county,
city, precinct), but not street address
41Statistical Method for Dummies
41
- Limited Data Set . . .
- the Statistician Method made easy.
42Preemption of State Laws on De-identification
Standards for Health Information
43Preemption of De-identification Standards - A View
43
- HIPAA Statute and privacy regulation
- Preemption of state law only if
- The provision of state law relates to the privacy
of individually identifiable health information - HIPAA Statute 1178 AND 45 CFR 160.202 -
.204
44Preemption of State Law HIPAA Statute
44
- Health information considered identifiable and,
therefore, subject to all requirements of rule
ONLY if reasonable basis to believe that the
information can be used to identify the
individual. - Exception to preemption - when states can assert
contrary and more stringent definition of
individually identifiable health information - But exception analysis does not apply to
de-identified data -
45Preemption Deidentification Standards
45
- Thus, states would be preempted from enforcing a
standard for deidentification that exceeds the
reasonable basis definition of individually
identifiable health information as established in
HIPAA statute. - Note in response to Quintiles written request,
HHS responded by revising preemption section of
the Rule to refer to individually identifiable
health information rather than merely health
information.
46Privacy Cases ControversiesDe-identified
Health Databases
47U.S. Controversy
47
- Quintiles Transnational Corp. v. WebMD
- No demonstrable violation of HIPAA or other
privacy law by transmission and aggregation of
deidentified health data - Inhibits additional state regulation of national
electronic data system - Order of Judge Terrence Boyle.
- Re de-identified data the Dormant Commerce
Clause prevents the individual states from
regulating the interstate transmission of data. - No. 501-CV-180-BO(3), U.S. EDNC Western
Division
48UK Controversy
48
- Regina v. Department of Health, Ex Parte Source
Informatics Ltd. Judge Latham, 4 All ER 185, May
29, 1999 Case No. CO\4490\97, Queens Bench
Division - Judge Latham dismissed applicants' application
for a Declaration that a policy document issued
in March 1996 by the Department of Health The
Protection and Use of Health Information.
49UK Source Informatics Overturned on Appeal
49
- Court of Appeals Simon Brown, Aldous and
Schiemann LJJ 21 December 1999 - Where a patient's identity was protected, it
would not be a breach of confidence for general
practitioners and pharmacists to disclose to a
third party, without the patient's consent, the
information contained in the patient's
prescription form for marketing research
purposes.
50UK Health and Social Care Bill Clause 65
50
- Department of Health included language in the
Health and Social Care Bill that would have
essentially reinstated the lower courts opinion
(Judge Lathams) - After heavy lobbying in the House of Lords
against Clause 65, the language was defeated.
51Conclusion
The key is . . .
Safeguarding protected health information by
encouraging use of federal standards for
de-identification of health data for clinical
research.