Inference Problem Privacy Preserving Data Mining - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Inference Problem Privacy Preserving Data Mining

Description:

I. Moskowitz, M. H. Kang: Covert Channels Here to Stay? http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.itd.nrl.navy . ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 30
Provided by: far1
Category:

less

Transcript and Presenter's Notes

Title: Inference Problem Privacy Preserving Data Mining


1
Inference ProblemPrivacy Preserving Data Mining
  • Lecture 19

2
Readings and Assignments
  • I. Moskowitz, M. H. Kang Covert Channels Here
    to Stay? http//citeseer.nj.nec.com/cache/papers/c
    s/1340/httpzSzzSzwww.itd.nrl.navy.milzSzITDzSz554
    0zSzpublicationszSzCHACSzSz1994zSz1994moskowitz-co
    mpass.pdf/moskowitz94covert.pdf
  • Jajodia, Meadows Inference Problems in
    Multilevel Secure Database Management Systems
    http//www.acsac.org/secshelf/book001/book001.html
    , essay 24

3
Indirect Information Flow Channels
  • Covert channels
  • Inference channels

4
Communication Channels
  • Overt Channel designed into a system and
    documented in the user's manual
  • Covert Channel not documented. Covert channels
    may be deliberately inserted into a system, but
    most such channels are accidents of the system
    design.

5
Covert Channel
  • Timing Channel based on system times
  • Storage channels not time related communication
  • Can be turned into each other

6
Inference Channels
  • Non-sensitive
  • information

Sensitive Information

Meta-data

7
Inference Channels
  • Statistical Database Inferences
  • General Purpose Database Inferences

8
Statistical Databases
  • Goal provide aggregate information about groups
    of individuals
  • E.g., average grade point of students
  • Security risk specific information about a
    particular individual
  • E.g., grade point of student John Smith
  • Meta-data
  • Working knowledge about the attributes
  • Supplementary knowledge (not stored in database)

9
Types of Statistics
  • Macro-statistics collections of related
    statistics presented in 2-dimensional tables
  • Micro-statistics Individual data records used
    for statistics after identifying information is
    removed

10
Statistical Compromise
  • Exact compromise find exact value of an
    attribute of an individual (e.g., John Smiths
    GPA is 3.8)
  • Partial compromise find an estimate of an
    attribute value corresponding to an individual
    (e.g., John Smiths GPA is between 3.5 and 4.0)

11
Methods of Attacks and Protection
  • Small/Large Query Set Attack
  • C characteristic formula that identifies groups
    of individuals
  • If C identifies a single individual I, e.g.,
    count(C) 1
  • Find out existence of property
  • If count(C and D)1 means I has property D
  • If count(C and D)0 means I does not have D
  • OR
  • Find value of property
  • Sum(C, D), gives value of D

12
  • Protection from small/large query set attack
    query-set-size control
  • A query q(C) is permitted only if
  • N-n ? C ? n , where n ? 0 is a parameter of
    the database and N is all the records in the
    database

13
Tracker attack
q(C) is disallowed
CC1 and C2 TC1 and C2
Tracker
C
C2
C1
q(C)q(C1) q(T)
14
Tracker attack
q(C and D) is disallowed
CC1 and C2 TC1 and C2
C
Tracker
C2
C1
C and D
q(C and D) q(T or C and D) q(T)
D
15
Query overlap attack
Q(John)q(C1)-q(C2)
C1
C2
Kathy
Paul
John
Eve
Max
Fred
Mitch
Protection query-overlap control
16
Insertion/Deletion Attack
  • Observing changes overtime
  • q1q(C)
  • insert(i)
  • q2q(C)
  • q(i)q2-q1
  • Protection insertion/deletion performed as pairs

17
Statistical Inference Theory
  • Give unlimited number of statistics and correct
    statistical answers, all statistical databases
    can be compromised (Ullman)

18
Inferences in General-Purpose Databases
  • Queries based on sensitive data
  • Inference via database constraints
  • Inferences via updates

19
Queries based on sensitive data
  • Sensitive information is used in selection
    condition but not returned to the user.
  • Example Salary secret, Name public
  • ?Name?Salary25,000
  • Protection apply query of database views at
    different security levels

20
Database Constraints
  • Integrity constraints
  • Database dependencies
  • Key integrity

21
Integrity Constraints
  • CAB
  • Apublic, Cpublic, and Bsecret
  • B can be calculated from A and C, i.e., secret
    information can be calculated from public data

22
Database Dependencies
  • Metadata
  • Functional dependencies
  • Multi-valued dependencies
  • Join dependencies
  • etc.

23
Functional Dependency
  • FD A ? B, that is for any two tuples in the
    relation, if they have the same value for A, they
    must have the same value for B.
  • Example FD Rank ? Salary
  • Secret information Name and Salary together
  • Query1 Name and Rank
  • Query2 Rank and Salary
  • Combine answers for query1 and 2 to reveal Name
    and Salary together

24
Key integrity
  • Every tuple in the relation have a unique key
  • Users at different levels, see different versions
    of the database
  • Users might attempt to update data that is not
    visible for them

25
Example
Secret View
Public View
26
Updates
Public User
  • Update Blacks address to Orlando
  • Add new tuple (Red, 22,000, Manassas)
  • If
  • Refuse update covert channel
  • Allow update
  • Overwrite high data may be incorrect
  • Create new tuple which data it correct
  • (polyinstantiation) violate key constraints

27
Updates
Secret user
  • Update Blacks salary to 45,000
  • If
  • Refuse update denial of service
  • Allow update
  • Overwrite low data covert channel
  • Create new tuple which data it correct
  • (polyinstantiation) violate key constraints

28
Inference Problem
  • No general technique is available to solve the
    problem
  • Need assurance of protection
  • Hard to incorporate outside knowledge

29
Next Class
  • Firewalls
Write a Comment
User Comments (0)
About PowerShow.com