Privacy Preserving Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Privacy Preserving Data Mining

Description:

Title: Privacy Preserving Data Mining Author: Yehuda Lindell Last modified by: Li Xiong Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:236
Avg rating:3.0/5.0
Slides: 17
Provided by: Yehu80
Category:

less

Transcript and Presenter's Notes

Title: Privacy Preserving Data Mining


1
Healthcare privacy and security Genomic data
privacy
Li Xiong CS573 Data Privacy and Security
2
Genomic data privacy
  • Genomic data are increasingly collected, stored,
    and shared in research and clinical environments
  • Genomic data are person-specific (there exists no
    public registrar that maps genomes to names of
    individuals)
  • Genomic data is not specified as an identifying
    patient attribute under HIPAA privacy rule and
    may be released for public research purposes
  • How can person-specific DNA be shared, such
    that it cannot be associated to its explicit
    identity?

3
Data sharing scenario
  • John Smith admitted to a local hospital which
    stores clinical and DNA information
  • John visits other hospitals
  • The hospital forward certain DNA data onto a
    research group, with institution and pseudonyms
    of the patients
  • The hospital sends identified discharge record
    onto a state-controlled database

4
Data at a specific location
  • Identified table of patient demographics
  • De-identified DNA sequences
  • Can we uniquely link identified data to DNA data?

5
Data at multiple locations
  • Each site has an identified table and
    de-identified DNA sequences
  • Can we uniquely link identified data to DNA data?

6
Trails
  • The set of locations each patient visited is
    called a trail
  • The trails can be tracked and matched to link DNA
    data to identified data

7
REIDIT-Complete
  • Re-identification of data in trails (REIDIT) for
    complete publishing
  • If there is a unique trail match, then a
    re-identification occurred

8
Results
9
REIDIT-C reidentification
  • Re-identifiability related to average people
    per location

10
Reserved publishing
  • Data releasers can reserve certain information
  • N is reserved to P vs. P is reserved to N

11
REIDIT - Incomplete
  • REIDIT for reserved publishing
  • For each trail in the track with incomplete
    trails, if there is only one supertrail, then a
    re-identification occurred
  • Remove the re-identified supertrail
  • Important because a trail can be a supertrail to
    many trails
  • Repeat the process

12
REIDIT-Incomplete
0.0, 0.1, 0.5, 0.9 probability of reserving
information hospital rank based on of patients
13
Can masking location help?
Not necessarily!
14
Comments and open issues
  • Can k-anonymity solve the problem?
  • Pseudonyms subject to dictionary attacks, how to
    allow linkage of the data without pseudonyms
  • Genomic protection methods incorporating utility
    of the genomic data

15
(No Transcript)
16
De-identification
e.g. Utah Resource for Genetic and Epidemiologic
Research (RGE)
Write a Comment
User Comments (0)
About PowerShow.com