P4P: A Practical Framework for PrivacyPreserving Distributed Computation - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

P4P: A Practical Framework for PrivacyPreserving Distributed Computation

Description:

... attacks and won't actively cheat ... Users can actively cheat ... Will be made an open-source toolkit for building privacy-preserving real-world applications. ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 48

Provided by: duan

Learn more at: http://bid.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: P4P: A Practical Framework for PrivacyPreserving Distributed Computation

1
P4P A Practical Framework for Privacy-Preserving
Distributed Computation

Yitao Duan (Advisor Prof. John Canny)
http//www.cs.berkeley.edu/duan
Berkeley Institute of Design
Computer Science Division
University of California, Berkeley
11/27/2007

2
Research Goal

To provide practical solutions with provable
privacy and adequate efficiency in a realistic
adversary model at reasonably large scale

3
Research Goal

To provide practical solutions with provable
privacy and adequate efficiency in a realistic
adversary model at reasonably large scale

4
Model
f

5
A Practical Solution

Provable privacy Cryptography
Efficiency Minimize the number of expensive
primitives and rely on probabilistic guarantee
Realistic adversary model Must handle malicious
users who may try to bias the computation by
inputting invalid data

6
Basic Approach
S
f

di in D, gj j 1, 2, , m
gj(di)
gj(dn)
gj(d2)
gj(dn-1)

7
The Power of Addition

A large number of popular algorithms can be run
with addition-only steps
Linear algorithms voting and summation,
nonlinear algorithm regression, classification,
SVD, PCA, k-means, ID3, EM etc
All algorithms in the statistical query model
Kearns 93
Many other gradient-based numerical algorithms
Addition-only framework has very efficient
private implementation in cryptography and admits
efficient ZKPs

8
Peers for Privacy The Nomenclature

Privacy is a right that one must fight for. Some
agents must act on behalf of users privacy in
the computation. We call them privacy peers
Our method aggregates across many user data. We
can prove that the aggregation provides privacy
the data from the peers protects each other

9
Private Addition P4P Style

The computation secret sharing over small field
Malicious users efficient zero-knowledge proof
to bound the L2-norm of the user vector

10
Big Integers vs. Small Ones

Most applications work with regular-sized
integers (e.g. 32- or 64-bit). Arithmetic
operations are very fast when each operand fits
into a single memory cell (10-9 sec)
Public-key operations (e.g. used in encryption
and verification) must use keys with sufficient
length (e.g. 1024-bit) for security. Existing
private computation solutions must work with
large integers extensively (10-3 sec)
A 6 orders of magnitude difference!

11
Private Arithmetic Two Paradigms

Homomorphism User data is encrypted with a
public key cryptosystem. Arithmetic on this data
mirrors arithmetic on the original data, but the
server cannot decrypt partial results.
Secret-sharing User sends shares of their data
to several servers, so that no small group of
servers gains any information about it.

12
Arithmetic Homomorphism vs VSS

Homomorphism
Can tolerate t lt n corrupted players as far as
privacy is concerned
- Use public key crypto, works with large fields
(e.g. 1024-bit), 10,000x more expensive than
normal arithmetic (even for addition)
Secret sharing
Addition is essentially free. Can use any size
field
- Cant do two party multiplication
- Most schemes also use public key crypto for
verification
- Doesnt fit well into existing service
architecture

13
P4P Peers for Privacy

Some parties, called Privacy Peers, actively
participate in the computation, working for
users privacy
Privacy peers provide privacy when they are
available, but cant access data themselves

14
P4P

The server provides data archival, and
synchronizes the protocol
Server only communicates with privacy peers
occasionally (2AM)

15
Privacy Peers

Roles of privacy peers
Anonymizing communication
Sharing information
Participating in computation
Others infrastructure support
They work on behalf of users privacy
But we need a higher level of trust on privacy
peers

16
Candidates for Privacy Peers

Some players are more trustworthy than others
In workspace, a union representative
In a community, a few members with good
reputation
Or a third party commercial provider
A very important source of security and
efficiency
The key is that privacy peers should have
different incentives from the server, a mutual
distrust between them

17
Security from Heterogeneity

Server is secure against outside attacks and
wont actively cheat
Companies spend to protect their servers
The server often holds much more valuable info
than what the protocol reveals
Server benefits from accurate computation
Privacy peers wont collude with the server
Interests conflicts, mutual distrust, laws
Server cant trust clients can keep conspiracy
secret
Users can actively cheat
Rely on server for protection against outside
attacks, privacy peers for defending against a
curious server

18
Private Addition
ui
vi
di user is private vector. ui,,vi and di are
all in a small integer field
ui vi di
19
Private Addition
µ Sui
? Svi
ui vi di
20
Private Addition
µ
?
µ Sui
? Svi
ui vi di
21
Private Addition
µ ?
22
P4Ps Private Addition

Provable privacy
Computation on both the server and the privacy
peer is over small field same cost as
non-private implementation
Fits existing server-based schemes
Server is always online. Users and privacy peers
can be on and off.
Only two parties performing the computation,
users just submit their data (and provide a ZK
proof, see later)
Extra communication for the server is only with
the privacy peer, independent of n

23
The Need for Verification

This scheme has a glaring weakness. Users can use
any number in the small field as their data.
Think of a voting scheme Please place your vote
0 or 1 in the envelope

24
Zero Knowledge Proofs

I can prove that I know X without disclosing what
X is.
I can prove that a given encrypted number is a 0.
Or I can prove that an encrypted number is a 1.
I can prove that an encrypted number is a ZERO OR
ONE, i.e. a bit. (6 extra numbers needed)
I can prove that an encrypted number is a k-bit
integer. I need 6k extra numbers to do this (!!!)

25
An Efficient ZKP of Boundedness

Luckily, we dont need to prove that every number
in a users vector is small, only that the vector
is small.
The server asks for some random projections of
the users vector, and expects the user to prove
that the square sum of them is small.

O(log m) public key crypto operations (instead
of O(m)) to prove that the L-2 norm of an m-dim
vector is smaller than L.
Running time reduced from hours to seconds.

26
Bounding the L2-Norm

A natural and effective way to restrict a
cheating users malicious influence
You must have a big vector to produce large
influence on the sum
Perturbation theory bounds system change with
norms
si(A) - si(B) A-B2 Weyl
Can be the basis for other checks
Setting L 1 forces each user to have only 1
vote

27
Random Projection-basedL2-Norm ZKP

Server generates N random m-vectors in
-1, 0, 1m
User projects his data to the N directions.
provides ZKP that the square sum of the
projections lt NL2/2
Expensive public key operations are only on the
projections and the square sum

28
Effectiveness
29
Acceptance/rejection probabilities
(a) Linear and (b) log plots of probability of
user input acceptance as a function of d/L for
N 50. (b) also includes probability of
rejection. In each case, the steepest (jagged
curve) is the single-value vector (case 3), the
middle curve is Zipf vector (case 2) and the
shallow curve is uniform vector (case 1)
30
Performance Evaluation
(a) Verifier and (b) prover times in seconds for
the validation protocol where (from top to
bottom) L (the required bound) has 40, 20, or 10
bits. The x-axis is the vector length.
31
SVD

Singular value decomposition is an extremely
useful tool for a lot of IR and data mining tasks
(CF, clustering )
SVD for a matrix A is a factorization A UDVT.
If A encodes users x items, then VT gives us the
best least-squares approximations to the rows of
A in a user-independent way.
ATAV VD ?? SVD is an eigenproblem

32
SVD P4P Style
33
Experiments SVD Datasets
34
Results
N number of iterations. k number of singular
values. e relative residual error
35
Distributed Association Rule Mining

n users, m items. User i has dataset Di
Horizontally partitioned Di contains the same
attributes

1 0 0 0 1 0 0
D1

Dn
0 0 1 0 0 1 0
36
The Market-Basket Model

A large set of items, e.g., things sold in a
supermarket.
A large set of baskets, each of which is a small
set of the items, e.g., the things one customer
buys on one day.

37
Support

Simplest question find sets of items that appear
frequently in the baskets.
Support for itemset I the number of baskets
containing all items in I.
Given a support threshold s, sets of items that
appear in gt s baskets are called frequent
itemsets.

38
Example

Itemsmilk, coke, pepsi, beer, juice.
Support 3 baskets.
B1 m, c, b B2 m, p, j
B3 m, b B4 c, j
B5 m, p, b B6 m, c, b, j
B7 c, b, j B8 b, c
Frequent itemsets m, c, b, j, m,
b, c, b, j, c.

39
Association Rules

If-then rules about the contents of baskets.
i1, i2,,ik ? j means if a basket contains
all of i1,,ik then it is likely to contain j.
Confidence of this association rule is the
probability of j given i1,,ik.

40
Step k of apriori-gen in P4P

User i constructs an mk-Dimensional vector in
small field (mk number of candidate itemset at
step k)
Use P4P to compute the aggregate (with
verification)
The result encodes the supports of all candidate
itemsets

41
Step k of apriori-gen in P4P

1 0 0 0 1 0 0

D1
d1j

Dn
dnj
0 0 1 0 0 1 0
P4P
cj jth candidate itemset
Support for cj
42
Analysis

Privacy guaranteed by P4P
Near optimal efficiency cost comparable to that
of a direct implementation of the algorithms
Main aggregation in small field
Only a small number of large field operations
Deal with cheating users with P4Ps built-in ZK
user data verification

43
Privacy

SVD The intermediate sums are implied by the
final results
ATA VDVT
ARM Sums treated as public by the applications
Guaranteed privacy regardless data distribution
or size

44
Infrastructure Support

Multicast encryption RSA 06
Scalable secure bidirectional communication
Infocom 07
Data protection scheme PET 04

45
P4P Current Status

P4P has been implemented
In Java using native code for big integer
Runs on Linux platform
Will be made an open-source toolkit for building
privacy-preserving real-world applications.

46
Conclusion

We can provide strong privacy protection with
little or no cost to a service provider for a
broad class of problems in e-commerce and
knowledge work.
Responsibility for privacy protection shifts to
privacy peers
Within the P4P framework, private computation and
many zero-knowledge verifications can be done
with great efficiency

47
More info