Title: P4P: A Practical Framework for PrivacyPreserving Distributed Computation
1P4P A Practical Framework for Privacy-Preserving
Distributed Computation
- Yitao Duan (Advisor Prof. John Canny)
- http//www.cs.berkeley.edu/duan
- Berkeley Institute of Design
- Computer Science Division
- University of California, Berkeley
- 11/27/2007
2Research Goal
- To provide practical solutions with provable
privacy and adequate efficiency in a realistic
adversary model at reasonably large scale
3Research Goal
- To provide practical solutions with provable
privacy and adequate efficiency in a realistic
adversary model at reasonably large scale
4Model
f
5A Practical Solution
- Provable privacy Cryptography
- Efficiency Minimize the number of expensive
primitives and rely on probabilistic guarantee - Realistic adversary model Must handle malicious
users who may try to bias the computation by
inputting invalid data
6Basic Approach
S
f
di in D, gj j 1, 2, , m
gj(di)
gj(dn)
gj(d2)
gj(dn-1)
7The Power of Addition
- A large number of popular algorithms can be run
with addition-only steps - Linear algorithms voting and summation,
nonlinear algorithm regression, classification,
SVD, PCA, k-means, ID3, EM etc - All algorithms in the statistical query model
Kearns 93 - Many other gradient-based numerical algorithms
- Addition-only framework has very efficient
private implementation in cryptography and admits
efficient ZKPs
8Peers for Privacy The Nomenclature
- Privacy is a right that one must fight for. Some
agents must act on behalf of users privacy in
the computation. We call them privacy peers - Our method aggregates across many user data. We
can prove that the aggregation provides privacy
the data from the peers protects each other
9Private Addition P4P Style
- The computation secret sharing over small field
- Malicious users efficient zero-knowledge proof
to bound the L2-norm of the user vector
10Big Integers vs. Small Ones
- Most applications work with regular-sized
integers (e.g. 32- or 64-bit). Arithmetic
operations are very fast when each operand fits
into a single memory cell (10-9 sec) - Public-key operations (e.g. used in encryption
and verification) must use keys with sufficient
length (e.g. 1024-bit) for security. Existing
private computation solutions must work with
large integers extensively (10-3 sec) - A 6 orders of magnitude difference!
11Private Arithmetic Two Paradigms
- Homomorphism User data is encrypted with a
public key cryptosystem. Arithmetic on this data
mirrors arithmetic on the original data, but the
server cannot decrypt partial results. - Secret-sharing User sends shares of their data
to several servers, so that no small group of
servers gains any information about it.
12Arithmetic Homomorphism vs VSS
- Homomorphism
- Can tolerate t lt n corrupted players as far as
privacy is concerned - - Use public key crypto, works with large fields
(e.g. 1024-bit), 10,000x more expensive than
normal arithmetic (even for addition) - Secret sharing
- Addition is essentially free. Can use any size
field - - Cant do two party multiplication
- - Most schemes also use public key crypto for
verification - - Doesnt fit well into existing service
architecture
13P4P Peers for Privacy
- Some parties, called Privacy Peers, actively
participate in the computation, working for
users privacy - Privacy peers provide privacy when they are
available, but cant access data themselves
14P4P
- The server provides data archival, and
synchronizes the protocol - Server only communicates with privacy peers
occasionally (2AM)
15Privacy Peers
- Roles of privacy peers
- Anonymizing communication
- Sharing information
- Participating in computation
- Others infrastructure support
- They work on behalf of users privacy
- But we need a higher level of trust on privacy
peers
16Candidates for Privacy Peers
- Some players are more trustworthy than others
- In workspace, a union representative
- In a community, a few members with good
reputation - Or a third party commercial provider
- A very important source of security and
efficiency - The key is that privacy peers should have
different incentives from the server, a mutual
distrust between them
17Security from Heterogeneity
- Server is secure against outside attacks and
wont actively cheat - Companies spend to protect their servers
- The server often holds much more valuable info
than what the protocol reveals - Server benefits from accurate computation
- Privacy peers wont collude with the server
- Interests conflicts, mutual distrust, laws
- Server cant trust clients can keep conspiracy
secret - Users can actively cheat
- Rely on server for protection against outside
attacks, privacy peers for defending against a
curious server
18Private Addition
ui
vi
di user is private vector. ui,,vi and di are
all in a small integer field
ui vi di
19Private Addition
µ Sui
? Svi
ui vi di
20Private Addition
µ
?
µ Sui
? Svi
ui vi di
21Private Addition
µ ?
22P4Ps Private Addition
- Provable privacy
- Computation on both the server and the privacy
peer is over small field same cost as
non-private implementation - Fits existing server-based schemes
- Server is always online. Users and privacy peers
can be on and off. - Only two parties performing the computation,
users just submit their data (and provide a ZK
proof, see later) - Extra communication for the server is only with
the privacy peer, independent of n
23The Need for Verification
- This scheme has a glaring weakness. Users can use
any number in the small field as their data. - Think of a voting scheme Please place your vote
0 or 1 in the envelope
24Zero Knowledge Proofs
- I can prove that I know X without disclosing what
X is. - I can prove that a given encrypted number is a 0.
Or I can prove that an encrypted number is a 1. - I can prove that an encrypted number is a ZERO OR
ONE, i.e. a bit. (6 extra numbers needed) - I can prove that an encrypted number is a k-bit
integer. I need 6k extra numbers to do this (!!!)
25An Efficient ZKP of Boundedness
- Luckily, we dont need to prove that every number
in a users vector is small, only that the vector
is small. - The server asks for some random projections of
the users vector, and expects the user to prove
that the square sum of them is small.
- O(log m) public key crypto operations (instead
of O(m)) to prove that the L-2 norm of an m-dim
vector is smaller than L. - Running time reduced from hours to seconds.
26Bounding the L2-Norm
- A natural and effective way to restrict a
cheating users malicious influence - You must have a big vector to produce large
influence on the sum - Perturbation theory bounds system change with
norms - si(A) - si(B) A-B2 Weyl
- Can be the basis for other checks
- Setting L 1 forces each user to have only 1
vote
27Random Projection-basedL2-Norm ZKP
- Server generates N random m-vectors in
- -1, 0, 1m
- User projects his data to the N directions.
provides ZKP that the square sum of the
projections lt NL2/2 - Expensive public key operations are only on the
projections and the square sum
28Effectiveness
29Acceptance/rejection probabilities
(a) Linear and (b) log plots of probability of
user input acceptance as a function of d/L for
N 50. (b) also includes probability of
rejection. In each case, the steepest (jagged
curve) is the single-value vector (case 3), the
middle curve is Zipf vector (case 2) and the
shallow curve is uniform vector (case 1)
30Performance Evaluation
(a) Verifier and (b) prover times in seconds for
the validation protocol where (from top to
bottom) L (the required bound) has 40, 20, or 10
bits. The x-axis is the vector length.
31SVD
- Singular value decomposition is an extremely
useful tool for a lot of IR and data mining tasks
(CF, clustering ) - SVD for a matrix A is a factorization A UDVT.
- If A encodes users x items, then VT gives us the
best least-squares approximations to the rows of
A in a user-independent way. - ATAV VD ?? SVD is an eigenproblem
32SVD P4P Style
33Experiments SVD Datasets
34Results
N number of iterations. k number of singular
values. e relative residual error
35Distributed Association Rule Mining
- n users, m items. User i has dataset Di
- Horizontally partitioned Di contains the same
attributes
1 0 0 0 1 0 0
D1
Dn
0 0 1 0 0 1 0
36The Market-Basket Model
- A large set of items, e.g., things sold in a
supermarket. - A large set of baskets, each of which is a small
set of the items, e.g., the things one customer
buys on one day.
37Support
- Simplest question find sets of items that appear
frequently in the baskets. - Support for itemset I the number of baskets
containing all items in I. - Given a support threshold s, sets of items that
appear in gt s baskets are called frequent
itemsets.
38Example
- Itemsmilk, coke, pepsi, beer, juice.
- Support 3 baskets.
- B1 m, c, b B2 m, p, j
- B3 m, b B4 c, j
- B5 m, p, b B6 m, c, b, j
- B7 c, b, j B8 b, c
- Frequent itemsets m, c, b, j, m,
b, c, b, j, c.
39Association Rules
- If-then rules about the contents of baskets.
- i1, i2,,ik ? j means if a basket contains
all of i1,,ik then it is likely to contain j. - Confidence of this association rule is the
probability of j given i1,,ik.
40Step k of apriori-gen in P4P
- User i constructs an mk-Dimensional vector in
small field (mk number of candidate itemset at
step k) - Use P4P to compute the aggregate (with
verification) - The result encodes the supports of all candidate
itemsets
41Step k of apriori-gen in P4P
1 0 0 0 1 0 0
D1
d1j
Dn
dnj
0 0 1 0 0 1 0
P4P
cj jth candidate itemset
Support for cj
42Analysis
- Privacy guaranteed by P4P
- Near optimal efficiency cost comparable to that
of a direct implementation of the algorithms - Main aggregation in small field
- Only a small number of large field operations
- Deal with cheating users with P4Ps built-in ZK
user data verification
43Privacy
- SVD The intermediate sums are implied by the
final results - ATA VDVT
- ARM Sums treated as public by the applications
- Guaranteed privacy regardless data distribution
or size
44Infrastructure Support
- Multicast encryption RSA 06
- Scalable secure bidirectional communication
Infocom 07 - Data protection scheme PET 04
45P4P Current Status
- P4P has been implemented
- In Java using native code for big integer
- Runs on Linux platform
- Will be made an open-source toolkit for building
privacy-preserving real-world applications.
46Conclusion
- We can provide strong privacy protection with
little or no cost to a service provider for a
broad class of problems in e-commerce and
knowledge work. - Responsibility for privacy protection shifts to
privacy peers - Within the P4P framework, private computation and
many zero-knowledge verifications can be done
with great efficiency
47More info
- duan_at_cs.berkeley.edu
- http//www.cs.berkeley.edu/duan
- Thank You!