Privacy, Confidentiality and Data Security (PCDS) in HSR: Best Practices - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Privacy, Confidentiality and Data Security (PCDS) in HSR: Best Practices

Description:

Privacy, Confidentiality and Data Security (PCDS) in HSR: Best Practices Alan M. Zaslavsky Department of Health Care Policy Harvard Medical School – PowerPoint PPT presentation

Number of Views:368
Avg rating:3.0/5.0
Slides: 49
Provided by: academyhe
Category:

less

Transcript and Presenter's Notes

Title: Privacy, Confidentiality and Data Security (PCDS) in HSR: Best Practices


1
Privacy, Confidentiality and Data Security
(PCDS) in HSR Best Practices
  • Alan M. Zaslavsky
  • Department of Health Care Policy
  • Harvard Medical School

2
Privacy, Confidentiality and Data Security (PCDS)
  • Importance and sensitivity of PCDS
  • Basic concepts of disclosure risk
  • Deidentification and reidentification
  • Disclosure control
  • Institutional and regulatory frameworks
  • Common Rule, HIPAA, Data use agreements
  • File organization, data flow and computer security

3
  • This presentation offered in our department at
    least annually
  • Required attendance by all programmers, students,
    fellow, project managers with data
    responsibilities
  • Presented to faculty at meetings
  • Shortened version for lower-level staff
  • Tracking of attendance by personnel manager
  • Sanction is loss of computer account
  • Seek to fully involve project management in PCDS
    issues

4
Definitions
  • Privacy the right of an individual to keep
    information about herself or himself from others.
  • Confidentiality safeguarding, by a recipient, of
    information about another individual
  • Disclosure release (direct or indirect) of
    information about an identifiable individual

5
Definitions (continued)
  • Data security protections on data to prevent
    unauthorized access or destruction
  • Informed consent a person's agreement to allow
    person data to be provided for research and
    statistical purposes
  • Research study producing generalizable knowledge
  • excludes internal operations, quality assurance

6
Importance of PCDS
  • Nexus for balance between
  • benefits of information to society
  • possible harms of information use to individuals
  • in conducting the research enterprise.
  • One persons invasion of privacy is anothers
    essential use of information.

7
Inherent conflicts
  • Law enforcement / legal process
  • General access to research data
  • Freedom of Information Act (FOIA)
  • Commercial use / beneficial products services?
  • Prevention of harm
  • Need to save data for verification, revision

8
Costs of violations of PCDS
  • Damage to subjects
  • Material
  • Psychological/social
  • Damage to the research enterprise
  • Exposure to legal/administrative sanctions for
    researchers and data providers and their
    institutions

9
Direct and indirect identifiers
  • Key variable or combination of variables, the
    value for which results in a record being unique
    in the target and population data
  • Direct identifier Information that is uniquely
    associated with a person.
  • Indirect identifier Data which, in combination
    are uniquely associated with a person.
    Information which facilitates such associations.

10
Direct Identifiers (keys)
  • .
  • Name
  • Telephone number
  • Street /e-mail address
  • Unique features (SSN, Medicare ID, Health plan,
    Medical record , Certificate/License,
    voice-finger prints, photos)

11
Re-identification by Matching
  • De-identification
  • Original target file Name abcdefghijkl
  • Anonymized target file abcdefghijkl
  • Re-identification
  • key
  • Anonymized target file abcdefghijkl
  • Population file abcdefmnop

Name
12
Data in Combination
  • Variables might be identifying in combination
    that are not identifying by themselves
  • Month, day and year of birth
  • Gender
  • Zip code

13
Example of reidentification using three
variables
  • Variables Unique in Maine state voter
    registration list
  • Birthdate alone 12
  • Birthdate gender 29
  • Birthdate Zip (5) 69
  • Birthdate Zip (9) 97
  • Sweeney, 1997

14
Population (External) Data Bases
  • Voter Registration Lists
  • Research files
  • State Federal Files
  • Survey files with added administrative data
  • Information Vendor Files
  • The unknown what might an intruder know about
    some or all members of your population?

15
Identifiable population groups (entire data set
highly identifiable)
  • Rare diseases
  • Sample drawn from a particular area

16
Unique/unusual cases rare values
  • 110 year-old woman
  • Man who weighs 350 pounds
  • Income gt 100 million
  • Verbatim text containing identifying details

17
Unique/unusual cases rare combinations of values
  • 16 year-old widow
  • 20 year-old Ph.D.
  • Asian race in rural mid-west
  • Female/Asian Executive
  • 60-year old male married to 30 year-old female
  • Cause of death prostate cancer for 30 year-old
    male

18
Micro Data Protection 1
  • Remove direct identifiers
  • Restrict geographical detail
  • Code to remove detail larger categories,
    top/bottom coding
  • Remove, code or edit verbatim comments
  • Case suppression
  • Variable suppression

19
Micro Data Protection 2
  • Special handling (e.g. coding) of data from
    external sources (esp. area data)
  • Statistical modification (noise)
  • Sample/subsample
  • Eliminate link between persons and establishments

20
Tabular data
  • Information on individuals deduced from unique
    cases in tables
  • Reidentification usually related to small groups,
    small cell counts
  • Rounding, cell suppression, complementary
    suppression might be required

21
Disclosure of individual information from a table
22
Technical issues
  • Highly technical issues in both microdata and
    tabular nondisclosure
  • Intersection of stats, math, computer science
  • Software for detecting disclosure risk
  • RTI, m-argus, etc.
  • Nontechnical variables
  • Resources and intentions of intruder

23
Disclosure control in released data
  • Affect us as producers and consumers of data
  • Masking
  • Affects analyses if performed on data we receive
  • Complex to implement on our releases
  • Limited access data centers

24
Restricted access data centers
  • Alternative to fully-deidentified public-use
    microdata files
  • Data are held at restricted center
  • Limited set of researchers submit analyses
    through intermediaries
  • Output reviewed for nondisclosure
  • Only feasible for organizations with substantial,
    persistent resources
  • e.g. NCHS, Census

25
Institutional and regulatory frameworks for PCDS
  • Common Rule / IRB
  • HIPAA
  • Data Use Agreements
  • State regulations

26
Common Rule
  • Governs protection of research subjects in all
    Federally-funded research
  • IRB evaluates adherence by researcher
  • Institutional sanctions for violations
  • Many institutions extend to all research
  • Objective protection of subject from harm
  • In HSR, often there is no intervention
  • Typically, commitment to minimal risk of
    disclosure

27
Common Rule (continued)
  • Informed consent
  • generally required in primary data-collection
  • appropriate information about use of data
  • might be waived where impractical to obtain (e.g.
    intrusive), if risks minimal rights not injured
  • Exemption from (full) review
  • No intervention that could harm subject
  • Secondary data with no identifiable data
  • Requires determination by IRB (but less tedious)

28
Implications for researchers
  • Commitments are made
  • To subjects consent language
  • To IRB safeguards promised in IRB application
  • To funding agencies in grant application
  • May involve
  • Protection of data while used
  • Limits on duration of use

29
HIPAA
  • Health Insurance Portability and Accountability
    Act
  • Specific rules for electronic transmission of
    health data
  • Primarily for efficiency but includes Privacy
    Rule
  • Obligations imposed on health care providers
  • Includes direct providers, health plans and
    insurers
  • Research data distinguished from health plan /
    provider operational functions
  • Researchers must respect these obligations

30
Who is Covered by HIPAA?
  • A health care provider who transmits health
    information in electronic transactions
  • Example a physician or hospital who
    electronically bills for services
  • A health plan
  • A health care clearinghouse

31
HIPAA implications for research
  • Practical implications of HIPAA
  • What data providers will be looking for
  • Need to work around restrictions on content
  • More elaborate paths for data control
  • HIPAA provisions for releasing data for research
  • fully deidentified
  • limited use dataset
  • waiver

32
Option 1 De-identified Health Information
  • Completely de-identified information (18 elements
    removed) and no knowledge that remaining
    information can identify the individual. OR
  • Statistically de-identified information where a
    qualified statistician determines that there is a
    very small risk that the information could be
    used to identify the individual and documents the
    methods and analysis.

33
Removal of These Identifiers Makes Information
De-identified
  • Certificate/license s
  • VIN and Serial s, license plate s
  • Device identifiers, serial s
  • Web URLs
  • IP address s
  • Biometric identifiers (finger prints)
  • Full face, comparable photo images
  • Unique identifying s
  • Names
  • Geographic info (including city and ZIP)
  • Elements of dates (except year)
  • Telephone s
  • Fax s
  • E-mail address
  • Social Security
  • Medical record, prescription s
  • Health plan beneficiary s
  • Account s

If the covered entity has actual knowledge that
remaining information can be used to identify the
individual, the information is considered
individually identifiable, and therefore,
generally is PHI.
34
Option 2 Limited Data Set with Data Use Agreement
  • The Privacy Rule permits limited types of
    identifiers to be released for research with
    health information (referred to as a Limited Data
    Set).
  • Limited Data Sets can only be used and released
    in accordance with a Data Use Agreement between
    the covered entity and the recipient.

35
Limited Data Set w/ Data Use Agreement
  • The Limited Data Set CAN contain
  • Elements of Dates
  • City and ZIP
  • Other unique identifiers, characteristics and
    codes not previously listed as direct identifiers
    (previous slide)
  • CANNOT contain other direct identifiers (among
    the 18)

36
Option 3 Waiver of Authorization
  • May use or disclose personal inforamtion for
    research if IRB or Privacy Board determines that
  • research involves no more than minimal risk
  • research does not adversely affect the rights
    and welfare of subjects
  • the research could not be done without a waiver

37
Data Use Agreements (DUA)
  • Between data provider and data user
  • Restrictions
  • access by specific personnel
  • use for a specific reason
  • defined duration of retention
  • Implements commitments made by data provider

38
State regulations
  • Variable from state to state
  • Some are relatively restrictive
  • requires negotiation with data provider

39
Iron-clad protection?
  • Certificate of Confidentiality
  • Issued by DHHS
  • Protects data against legal process
  • Typically for sensitive topics, e.g. illicit
    drugs
  • O, Canada!

40
Data security in complex projects
  • Multisite projects special needs
  • Careful mapping of data flow and access
  • Minimal identifying information at each stage
  • Particular care in technical aspects of security

41
Example of a data flow plan (with security
provisions)
42
File management for PCDS
  • General practices of good management
  • Practices necessary to maintain project
    continuity
  • Well-structured directory organization and naming
  • Include documentation with files
  • Separate project data from personal directories
  • Separate datasets from programs
  • Separate raw data from analytic datasets

43
  • We typically follow this presentation with a
    15-minute tutorial on good practices for data and
    file management

44
Backups
  • Conflict of privacy/confidentiality (restrict)
    and data security (maintain)
  • Basic backup schedule (undeletable)
  • All Unix files 4 month retention
  • PC files 2 month retention
  • Project-specific backup by request
  • Only possible if material is properly organized
  • Permanent media, physical security

45
  • The backup policy described here was adopted
    after several months of faculty discussion
  • Computer system managers wanted longer retention
  • Faculty concerned about unexpected discovery of
    material intended to be deleted
  • Conflicts of DUA requirements with rules
    regarding retention of data for verification,
    revision of manuscripts, etc.

46
General computer security
  • Proper use of computer accounts, only by
    authorized individuals
  • Secure connections for outside access
  • Remote users
  • Home or on road access via Internet
  • Applications can be tunneled securely
  • Good practices with passwords
  • Maintain file permissions to restrict access to
    authorized users

47
  • We follow this up with a training on mechanics of
    computer security
  • Permissions, file organization, etc.
  • More or less fine-grained tools for protection of
    various files
  • IT staff included in training
  • Responsible for implementing security and data
    retention policies for various project datasets
  • Teach methods for both Unix and Windows sides of
    our system

48
Conclusions
  • Know your data
  • Be prepared to accommodate restrictions required
    by data providers
  • Maintain general security
  • Seek guidance for tough situations!
Write a Comment
User Comments (0)
About PowerShow.com