Revealing Information while Preserving Privacy - PowerPoint PPT Presentation

About This Presentation
Title:

Revealing Information while Preserving Privacy

Description:

want to hide i(d1, ... ,dn)=di. Information functions: want ... want to hide all functions () not computable from f() Implicit definition of private functions ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 24
Provided by: kob2
Category:

less

Transcript and Presenter's Notes

Title: Revealing Information while Preserving Privacy


1
Revealing Information while Preserving Privacy
  • Kobbi Nissim
  • NEC Labs, DIMACS

Based on work with Irit Dinur, Cynthia Dwork
and Joe Kilian
2
The Hospital Story
3
Easy Tempting Solution
A Bad Solution
Idea a. Remove identifying information (name,
SSN, )
b. Publish data
  • Observation harmless attributes uniquely
    identify many patients (gender, approx age,
    approx weight, ethnicity, marital status)
  • Worserare attribute (CF ? 1/3000)

4
Our Model Statistical Database (SDB)
5
The Privacy Game Information-Privacy Tradeoff
  • Private functions
  • want to hide ?i(d1, ,dn)di
  • Information functions
  • want to reveal fq(d1, ,dn)?i?q di
  • Explicit definition of private functions

6
Approaches to SDB Privacy AW 89
  • Query Restriction
  • Require queries to obey some structure
  • Perturbation
  • Give noisy or approximate answers

This talk
7
Perturbation
  • Database d d1,,dn
  • Query q ? n
  • Exact answer aq ?i?qdi
  • Perturbed answer âq
  • Perturbation E For all q âq aq E
  • General Perturbation Prq âq aq E
    1-neg(n)
  • 99, 51

8
Perturbation Techniques AW89
  • Data perturbation
  • Swapping Reiss 84Liew, Choi, Liew 85
  • Fixed perturbations Traub, Yemini, Wozniakowski
    84 Agrawal, Srikant 00 Agrawal, Aggarwal 01
  • Additive perturbation didiEi
  • Output perturbation
  • Random sample queries Denning 80
  • Sample drawn from query set
  • Varying perturbations Beck 80
  • Perturbation variance grows with number of
    queries
  • Rounding Achugbue, Chin 79 Randomized Fellegi,
    Phillips 74

9
Main Question How much perturbation is needed to
achieve privacy?
10
Privacy from ??n Perturbation
(an example of a useless database)
  • Database d?R0,1n
  • Can we do better?
  • Smaller E ?
  • Usability ???
  • Privacy is preserved
  • If E ? ?n (lgn)2, whp always use rule 3
  • No information about d is given!
  • No usability!

11
Defining Privacy
(not) Defining Privacy
  • Elusive definition
  • Application dependent
  • Partial vs. exact compromise
  • Prior knowledge, how to model it?
  • Other issues

12
The Useless Database Achieves Best Possible
PerturbationPerturbation ltlt ?n Implies no
Privacy!
  • Main Theorem Given a DB response algorithm with
    perturbation E ltlt ?n, there is a poly-time
    reconstruction algorithm that outputs a database
    d, s.t. dist(d,d) lt o(n).

13
The Adversary as a Decoding Algorithm
n bits
(Recall âq ?i?qdi pertq ) Decoding Problem
Given access to âq1,, âq2n reconstruct d in time
poly(n).

14
Goldreich-Levin Hardcore Bit
n bits
Where âq ?i?qdi mod 2 on 51 of the subsets The
GL Algorithm finds in time poly(n) a small list
of candidates, containing d
15
Comparing the Tasks
16
Recall Our Goal Perturbation ltlt ?n Implies no
Privacy!
  • Main Theorem Given a DB response algorithm with
    perturbation E lt ?n, there is a poly-time
    reconstruction algorithm that outputs a database
    d, s.t. dist(d,d) lt o(n).

17
Proof of Main Theorem The Adversary
Reconstruction Algorithm
  • Query phase Get âqj for t random subsets
    q1,,qt of n
  • Weeding phase Solve the Linear Program
  • 0 ? xi ? 1
  • ?i?qj xi - âqj ? E
  • Rounding Let ci round(xi), output c

Observation An LP solution always exists, e.g.
xd.
18
Proof of Main Theorem Correctness of the Algorithm
  • Consider x(0.5,,0.5) as a solution for the LP

- Such a q disqualifies x as a solution for the LP
  • We prove that if dist(x,d) gt ???n, then whp
    there will
  • be a q among q1,,qt that disqualifies x

19
Extensions of the Main Theorem
  • Imperfect perturbation
  • Can approximate the original bit string even if
    database answer is within perturbation only for
    99 of the queries
  • Other information functions
  • Given access to noisy majority of subsets we
    can approximate the original bit-string.

20
Notes on Impossibility Results
  • Exponential Adversary
  • Strong breaking of privacy if E ltlt n
  • Polynomial Adversary
  • Non-adaptive queries
  • Oblivious of perturbation method and database
    distribution
  • Tight threshold E ? ?n
  • What if adversary is more restricted?

21
Bounded Adversary Model
  • Database d?R0,1n
  • Theorem If the number of queries is bounded by
    T, then there is a DB response algorithm with
    perturbation of ?T that maintains privacy.
  • With a reasonable definition of privacy

22
Summary and Open Questions
  • Very high perturbation is needed for privacy
  • Threshold phenomenon above ?n total privacy,
    below ?n none (poly-time adversary)
  • Rules out many currently proposed solutions for
    SDB privacy
  • Q whats on the threshold? Usability?
  • Main tool A reconstruction algorithm
  • Reconstructing an n-bit string from perturbed
    partial sums/thresholds
  • Privacy for a T-bounded adversary with a random
    database
  • ?T perturbation
  • Q other database distributions
  • Q Crypto and SDB privacy?

23
Our Privacy Definition (bounded adversary model)
i

(transcript, i)
di
Fails w.p. gt ½-?
24
The Adversary as a Decoding Algorithm
partial sums
perturbed sums
Write a Comment
User Comments (0)
About PowerShow.com