Title: Private Inference Control
1Private Inference Control
- David Woodruff
- MIT
- dpwood_at_mit.edu
- Joint work with Jessica Staddon (PARC)
2Contents
- Background
- Access Control and Inference Control
- Our contribution Private Inference Control (PIC)
- Related Work
- PIC model definitions
- Our Results
- Conclusions
3Access Control
- User queries a database. Some info in DB
sensitive.
Whats Bobs salary?
Server
DB of n records
Sensitive Access denied
- Access control prevents user from learning
individual sensitive relations/attributes. - Does access control prevent user from learning
sensitive info?
4Inference Control
Name Job Salary
Alyssa P. Hacker Software Engineer 90,000
Paul E. Nomial Mathematician 31,415
Query 1 How much does Alyssa make?
Query 2 What is Alyssas job?
Query 3 How much do software engineers make?
Sensitive.
Software Engineer
90,000
- Combining non-sensitive info may yield
something sensitive - Inference Channel (name, job), (job, salary)
- Inference Control block all inference
channels
5Inference Control
- Database x 2 (0,1m)n
- DB of n records, m attributes 1, , m per record
- n tending to infinity, m O(1)
- Inference engine generates collection C of
subsets of m denoting all the inference
channels - We assume have an engine QSKLG93 (exhaustive
search)
- F 2 C means for all i, user shouldnt learn xi,
j for all j 2 F - Assume C is monotone.
- Assume C input to both user and server
- User learns C anyway when his queries are
blocked - C is data-independent, reveals info only about
attributes
6Our contribution Private Inference Control
- Existing inference control schemes require server
to learn user queries to check if they form an
inference
- This talk arbitrary malicious users U,
semi-honest S
- Our goal user Privacy Inference Control
PIC
- Privacy polytime S learns nothing about honest
users queries - except made so far
- queries made so far enables S to do inference
control
- Private and symmetrically-private information
retrieval - Not sufficient since they are stateless
- Users permissions change over time
- Generic secure function evaluation
- Not efficient our communication exponentially
smaller
7Application
- Government analysts inspect repositories for
terrorist patterns - Inference Control prevent analysts from learning
sensitive info about non-terrorists. - User Privacy prevent server from learning what
analysts are tracking if discovered this info
could go to terrorists!
8Related Work
- Data perturbation AS00, B80, TYW84
- So much noise required data not as useful DN03
- Adaptive Oblivious Transfer NP99
- One record can be queried adaptively at most k
times - Priced Oblivious Transfer AIR01
- One record, supports more inference channels than
threshold version considered in NP99 - We generalize NP99 and AIR01
- Arbitrary inference channels and multiple records
- More efficient/private than parallelizing NP99
and AIR01 on each record
9The Model
- Offline Stage S given x, C, 1k, and can
preprocess x - Online Stage at time t, honest U generates
query (it, jt) - (it, jt) can depend on all prior
info/transactions with S - Let T denote all queries U makes, (i1, j1), ,
(iT, jT) - T r.v. - depends on Us code, x, and randomness
- T permissable if no i s.t. (i,j) 2 T for all j 2
F for some F 2 C. We require honest U to generate
permissable T. - U and S interact in a multiround protocol, then U
outputs outt - ViewU consists of C, n, m, 1k , all messages from
S, randomness - ViewS consists of C, n, m, 1k, x, all messages
from U, randomness
10Security Definitions
- Correctness For all x, C, for all honest users
U, for all ? 2 T(U, x), out? xi?, j? - User Privacy For all x, C, for all honest U, for
any two sequences T1, T2 with T1 T2, for
all semi-honest servers S and random coin tosses
of S - (ViewS T(U, x) T1) ? (ViewS T(U, x)
T2) - Inference Control Comparison with ideal model
for every U, every x, any random coins of U,
for every C there exists a simulator U
interacting with trusted party Ch for which
ViewU ? ViewltU, Chgt, where U just asks Ch for
tuples (it, jt) that are permissable
11Efficiency
- Efficiency measures are per query
- Minimize communication round complexity
- Ideally O(polylog(n)) bits and 1 round
- Minimize servers time-complexity
- Ideally O(n) without preprocessing
- W/preprocessing, potentially better, but O(n)
optimal w.r.t. known single-server PIR schemes
12Our Results
- For any PIR scheme, let C(n) W(n) denote
communication and server work for DB size n - PIC scheme 1
- Communication O(k log n C(n2)), 1-round
- Work O(k log n W(n2))
- PIC scheme 2
- Communication O(k(n C(n))), O(1)-round
- Work O(k(n W(n)))
- Plugging in best PIR parameters,
- Scheme 1 comm. O(polylog(n)), work O(n2)
- Scheme 2 comm. work O(npolylog(n))
13A Generic Reduction
- A protocol is a threshold PIC (TPIC) if it
satisfies the definitions of a PIC scheme
assuming C m. - Theorem (roughly speaking) If there exists a
TPIC with communication C(n), work W(n), and
round complexity R(n), then there exists a PIC
with communication O(C(n)), work O(W(n)), and
round complexity O(R(n)).
14PIC ideas
cnvdselvuiaapxnw
- User/server do SPIR on table of encryptions
- Idea Encryptions of both data and keys that
will help user decrypt encryptions on future
queries
- User can only decrypt if has appropriate keys
only - possible if not in danger of making an inference
15Stateless PIC
- Minimizing communication is a data structures
problem - What type of keys require least communication for
user to - Update as user makes new queries?
- Prove user not in danger of making an inference
on current/future queries? - Keys must prevent replay attacks cant use old
keys to pretend made less queries to records than
actually have
16PIC Scheme 1 Stage 1
- Let E by a homomorphic semantically secure
encryption scheme (e.g., Pallier) - Suppose we allow accessing each record at most
once
E(i3), E(j3), ZKPOK
PK, SK
PK
(i3, j3)
E(i1) -gt E(r1(i1 i3)) E(i2) -gt E(r2(i2 i3))
Recovers r1, r2 iff hasnt previously accessed i3
- From r1 and r2 user can reconstruct a secret S3
17PIC Scheme 1 Stage 2
E(i3), E(j3), ZKPOK
PK, SK
PK
(i3, j3)
E(r1,1(j-j3) r1,1(i i3) S3 x1,1)
E(r1,2(j-j3) r1,2(i i3) S3 x1,2)
E(r2,1(j-j3) r2,1(i i3) S3 x2,1)
Recovers S3
User does SPIR on records on table of
encryptions
18PIC Scheme 1 - Wrapup
- To extend to querying a record lt m times, on t-th
query, let r1, , rt-1 be (t-m1) out of (t-1)
secret sharing of St - This scheme can be proven to be a TPIC use
generic reduction to get a PIC - User Privacy semantic security of E, ZK of
proof, privacy of SPIR - Inference Control user can recover at most t-m
ri if already queried record m-1 times can
build a simulator using SPIR w/knowledge
extractor NP99
19PIC Scheme 2 - Glimpse
t
- polylog(n)-communication PIC
- Balanced binary tree B
- Leaves are attributes
- Parents of leaves are records
- Internal node n accessed when record r queried
and n on path from r to root - Keys encode times nodes in B have been
accessed.
Kv, b
Ku, a
Kw,c
Kx,d
Ky,e
Kz,f
1
2
4
3
ab t
20Conclusions
- Extensions not in this talk
- Multiple users (pseudonyms)
- Collusion resistance c-resistance gt m-channel
becomes collection of (m-1)/c channels. - Summary
- New Primitive PIC
- (Almost) Communication-optimal implementations