Title: Privacy, Confidentiality and Data Security (PCDS) in HSR: Best Practices
1Privacy, Confidentiality and Data Security
(PCDS) in HSR Best Practices
- Alan M. Zaslavsky
- Department of Health Care Policy
- Harvard Medical School
2Privacy, Confidentiality and Data Security (PCDS)
- Importance and sensitivity of PCDS
- Basic concepts of disclosure risk
- Deidentification and reidentification
- Disclosure control
- Institutional and regulatory frameworks
- Common Rule, HIPAA, Data use agreements
- File organization, data flow and computer security
3- This presentation offered in our department at
least annually - Required attendance by all programmers, students,
fellow, project managers with data
responsibilities - Presented to faculty at meetings
- Shortened version for lower-level staff
- Tracking of attendance by personnel manager
- Sanction is loss of computer account
- Seek to fully involve project management in PCDS
issues
4Definitions
- Privacy the right of an individual to keep
information about herself or himself from others. - Confidentiality safeguarding, by a recipient, of
information about another individual - Disclosure release (direct or indirect) of
information about an identifiable individual
5Definitions (continued)
- Data security protections on data to prevent
unauthorized access or destruction - Informed consent a person's agreement to allow
person data to be provided for research and
statistical purposes - Research study producing generalizable knowledge
- excludes internal operations, quality assurance
6Importance of PCDS
- Nexus for balance between
- benefits of information to society
- possible harms of information use to individuals
- in conducting the research enterprise.
- One persons invasion of privacy is anothers
essential use of information.
7Inherent conflicts
- Law enforcement / legal process
- General access to research data
- Freedom of Information Act (FOIA)
- Commercial use / beneficial products services?
- Prevention of harm
- Need to save data for verification, revision
8Costs of violations of PCDS
- Damage to subjects
- Material
- Psychological/social
- Damage to the research enterprise
- Exposure to legal/administrative sanctions for
researchers and data providers and their
institutions
9Direct and indirect identifiers
- Key variable or combination of variables, the
value for which results in a record being unique
in the target and population data - Direct identifier Information that is uniquely
associated with a person. - Indirect identifier Data which, in combination
are uniquely associated with a person.
Information which facilitates such associations.
10Direct Identifiers (keys)
- Name
- Telephone number
- Street /e-mail address
- Unique features (SSN, Medicare ID, Health plan,
Medical record , Certificate/License,
voice-finger prints, photos)
11Re-identification by Matching
- De-identification
- Original target file Name abcdefghijkl
- Anonymized target file abcdefghijkl
- Re-identification
- key
- Anonymized target file abcdefghijkl
- Population file abcdefmnop
-
Name
12Data in Combination
- Variables might be identifying in combination
that are not identifying by themselves - Month, day and year of birth
- Gender
- Zip code
13Example of reidentification using three
variables
- Variables Unique in Maine state voter
registration list - Birthdate alone 12
- Birthdate gender 29
- Birthdate Zip (5) 69
- Birthdate Zip (9) 97
- Sweeney, 1997
14Population (External) Data Bases
- Voter Registration Lists
- Research files
- State Federal Files
- Survey files with added administrative data
- Information Vendor Files
- The unknown what might an intruder know about
some or all members of your population?
15Identifiable population groups (entire data set
highly identifiable)
- Rare diseases
- Sample drawn from a particular area
16Unique/unusual cases rare values
- 110 year-old woman
- Man who weighs 350 pounds
- Income gt 100 million
- Verbatim text containing identifying details
17Unique/unusual cases rare combinations of values
- 16 year-old widow
- 20 year-old Ph.D.
- Asian race in rural mid-west
- Female/Asian Executive
- 60-year old male married to 30 year-old female
- Cause of death prostate cancer for 30 year-old
male
18Micro Data Protection 1
- Remove direct identifiers
- Restrict geographical detail
- Code to remove detail larger categories,
top/bottom coding - Remove, code or edit verbatim comments
- Case suppression
- Variable suppression
19Micro Data Protection 2
- Special handling (e.g. coding) of data from
external sources (esp. area data) - Statistical modification (noise)
- Sample/subsample
- Eliminate link between persons and establishments
20Tabular data
- Information on individuals deduced from unique
cases in tables - Reidentification usually related to small groups,
small cell counts - Rounding, cell suppression, complementary
suppression might be required
21Disclosure of individual information from a table
22Technical issues
- Highly technical issues in both microdata and
tabular nondisclosure - Intersection of stats, math, computer science
- Software for detecting disclosure risk
- RTI, m-argus, etc.
- Nontechnical variables
- Resources and intentions of intruder
23Disclosure control in released data
- Affect us as producers and consumers of data
- Masking
- Affects analyses if performed on data we receive
- Complex to implement on our releases
- Limited access data centers
24Restricted access data centers
- Alternative to fully-deidentified public-use
microdata files - Data are held at restricted center
- Limited set of researchers submit analyses
through intermediaries - Output reviewed for nondisclosure
- Only feasible for organizations with substantial,
persistent resources - e.g. NCHS, Census
25Institutional and regulatory frameworks for PCDS
- Common Rule / IRB
- HIPAA
- Data Use Agreements
- State regulations
26Common Rule
- Governs protection of research subjects in all
Federally-funded research - IRB evaluates adherence by researcher
- Institutional sanctions for violations
- Many institutions extend to all research
- Objective protection of subject from harm
- In HSR, often there is no intervention
- Typically, commitment to minimal risk of
disclosure
27Common Rule (continued)
- Informed consent
- generally required in primary data-collection
- appropriate information about use of data
- might be waived where impractical to obtain (e.g.
intrusive), if risks minimal rights not injured - Exemption from (full) review
- No intervention that could harm subject
- Secondary data with no identifiable data
- Requires determination by IRB (but less tedious)
28Implications for researchers
- Commitments are made
- To subjects consent language
- To IRB safeguards promised in IRB application
- To funding agencies in grant application
- May involve
- Protection of data while used
- Limits on duration of use
29HIPAA
- Health Insurance Portability and Accountability
Act - Specific rules for electronic transmission of
health data - Primarily for efficiency but includes Privacy
Rule - Obligations imposed on health care providers
- Includes direct providers, health plans and
insurers - Research data distinguished from health plan /
provider operational functions - Researchers must respect these obligations
30Who is Covered by HIPAA?
- A health care provider who transmits health
information in electronic transactions - Example a physician or hospital who
electronically bills for services - A health plan
- A health care clearinghouse
31HIPAA implications for research
- Practical implications of HIPAA
- What data providers will be looking for
- Need to work around restrictions on content
- More elaborate paths for data control
- HIPAA provisions for releasing data for research
- fully deidentified
- limited use dataset
- waiver
32Option 1 De-identified Health Information
- Completely de-identified information (18 elements
removed) and no knowledge that remaining
information can identify the individual. OR - Statistically de-identified information where a
qualified statistician determines that there is a
very small risk that the information could be
used to identify the individual and documents the
methods and analysis.
33Removal of These Identifiers Makes Information
De-identified
- Certificate/license s
- VIN and Serial s, license plate s
- Device identifiers, serial s
- Web URLs
- IP address s
- Biometric identifiers (finger prints)
- Full face, comparable photo images
- Unique identifying s
- Names
- Geographic info (including city and ZIP)
- Elements of dates (except year)
- Telephone s
- Fax s
- E-mail address
- Social Security
- Medical record, prescription s
- Health plan beneficiary s
- Account s
If the covered entity has actual knowledge that
remaining information can be used to identify the
individual, the information is considered
individually identifiable, and therefore,
generally is PHI.
34Option 2 Limited Data Set with Data Use Agreement
- The Privacy Rule permits limited types of
identifiers to be released for research with
health information (referred to as a Limited Data
Set). - Limited Data Sets can only be used and released
in accordance with a Data Use Agreement between
the covered entity and the recipient.
35Limited Data Set w/ Data Use Agreement
- The Limited Data Set CAN contain
- Elements of Dates
- City and ZIP
- Other unique identifiers, characteristics and
codes not previously listed as direct identifiers
(previous slide) - CANNOT contain other direct identifiers (among
the 18)
36Option 3 Waiver of Authorization
- May use or disclose personal inforamtion for
research if IRB or Privacy Board determines that
- research involves no more than minimal risk
- research does not adversely affect the rights
and welfare of subjects - the research could not be done without a waiver
37Data Use Agreements (DUA)
- Between data provider and data user
- Restrictions
- access by specific personnel
- use for a specific reason
- defined duration of retention
- Implements commitments made by data provider
38State regulations
- Variable from state to state
- Some are relatively restrictive
- requires negotiation with data provider
39Iron-clad protection?
- Certificate of Confidentiality
- Issued by DHHS
- Protects data against legal process
- Typically for sensitive topics, e.g. illicit
drugs - O, Canada!
40Data security in complex projects
- Multisite projects special needs
- Careful mapping of data flow and access
- Minimal identifying information at each stage
- Particular care in technical aspects of security
41Example of a data flow plan (with security
provisions)
42File management for PCDS
- General practices of good management
- Practices necessary to maintain project
continuity - Well-structured directory organization and naming
- Include documentation with files
- Separate project data from personal directories
- Separate datasets from programs
- Separate raw data from analytic datasets
43- We typically follow this presentation with a
15-minute tutorial on good practices for data and
file management
44Backups
- Conflict of privacy/confidentiality (restrict)
and data security (maintain) - Basic backup schedule (undeletable)
- All Unix files 4 month retention
- PC files 2 month retention
- Project-specific backup by request
- Only possible if material is properly organized
- Permanent media, physical security
45- The backup policy described here was adopted
after several months of faculty discussion - Computer system managers wanted longer retention
- Faculty concerned about unexpected discovery of
material intended to be deleted - Conflicts of DUA requirements with rules
regarding retention of data for verification,
revision of manuscripts, etc.
46General computer security
- Proper use of computer accounts, only by
authorized individuals - Secure connections for outside access
- Remote users
- Home or on road access via Internet
- Applications can be tunneled securely
- Good practices with passwords
- Maintain file permissions to restrict access to
authorized users
47- We follow this up with a training on mechanics of
computer security - Permissions, file organization, etc.
- More or less fine-grained tools for protection of
various files - IT staff included in training
- Responsible for implementing security and data
retention policies for various project datasets - Teach methods for both Unix and Windows sides of
our system
48Conclusions
- Know your data
- Be prepared to accommodate restrictions required
by data providers - Maintain general security
- Seek guidance for tough situations!