P4P: A Practical Framework for PrivacyPreserving Distributed Computation - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

P4P: A Practical Framework for PrivacyPreserving Distributed Computation

Description:

... attacks and won't actively cheat ... Users can actively cheat ... Will be made an open-source toolkit for building privacy-preserving real-world applications. ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 48
Provided by: duan
Learn more at: http://bid.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: P4P: A Practical Framework for PrivacyPreserving Distributed Computation


1
P4P A Practical Framework for Privacy-Preserving
Distributed Computation
  • Yitao Duan (Advisor Prof. John Canny)
  • http//www.cs.berkeley.edu/duan
  • Berkeley Institute of Design
  • Computer Science Division
  • University of California, Berkeley
  • 11/27/2007

2
Research Goal
  • To provide practical solutions with provable
    privacy and adequate efficiency in a realistic
    adversary model at reasonably large scale

3
Research Goal
  • To provide practical solutions with provable
    privacy and adequate efficiency in a realistic
    adversary model at reasonably large scale

4
Model
f

5
A Practical Solution
  • Provable privacy Cryptography
  • Efficiency Minimize the number of expensive
    primitives and rely on probabilistic guarantee
  • Realistic adversary model Must handle malicious
    users who may try to bias the computation by
    inputting invalid data

6
Basic Approach
S
f

di in D, gj j 1, 2, , m
gj(di)
gj(dn)
gj(d2)
gj(dn-1)

7
The Power of Addition
  • A large number of popular algorithms can be run
    with addition-only steps
  • Linear algorithms voting and summation,
    nonlinear algorithm regression, classification,
    SVD, PCA, k-means, ID3, EM etc
  • All algorithms in the statistical query model
    Kearns 93
  • Many other gradient-based numerical algorithms
  • Addition-only framework has very efficient
    private implementation in cryptography and admits
    efficient ZKPs

8
Peers for Privacy The Nomenclature
  • Privacy is a right that one must fight for. Some
    agents must act on behalf of users privacy in
    the computation. We call them privacy peers
  • Our method aggregates across many user data. We
    can prove that the aggregation provides privacy
    the data from the peers protects each other

9
Private Addition P4P Style
  • The computation secret sharing over small field
  • Malicious users efficient zero-knowledge proof
    to bound the L2-norm of the user vector

10
Big Integers vs. Small Ones
  • Most applications work with regular-sized
    integers (e.g. 32- or 64-bit). Arithmetic
    operations are very fast when each operand fits
    into a single memory cell (10-9 sec)
  • Public-key operations (e.g. used in encryption
    and verification) must use keys with sufficient
    length (e.g. 1024-bit) for security. Existing
    private computation solutions must work with
    large integers extensively (10-3 sec)
  • A 6 orders of magnitude difference!

11
Private Arithmetic Two Paradigms
  • Homomorphism User data is encrypted with a
    public key cryptosystem. Arithmetic on this data
    mirrors arithmetic on the original data, but the
    server cannot decrypt partial results.
  • Secret-sharing User sends shares of their data
    to several servers, so that no small group of
    servers gains any information about it.

12
Arithmetic Homomorphism vs VSS
  • Homomorphism
  • Can tolerate t lt n corrupted players as far as
    privacy is concerned
  • - Use public key crypto, works with large fields
    (e.g. 1024-bit), 10,000x more expensive than
    normal arithmetic (even for addition)
  • Secret sharing
  • Addition is essentially free. Can use any size
    field
  • - Cant do two party multiplication
  • - Most schemes also use public key crypto for
    verification
  • - Doesnt fit well into existing service
    architecture

13
P4P Peers for Privacy
  • Some parties, called Privacy Peers, actively
    participate in the computation, working for
    users privacy
  • Privacy peers provide privacy when they are
    available, but cant access data themselves

14
P4P
  • The server provides data archival, and
    synchronizes the protocol
  • Server only communicates with privacy peers
    occasionally (2AM)

15
Privacy Peers
  • Roles of privacy peers
  • Anonymizing communication
  • Sharing information
  • Participating in computation
  • Others infrastructure support
  • They work on behalf of users privacy
  • But we need a higher level of trust on privacy
    peers

16
Candidates for Privacy Peers
  • Some players are more trustworthy than others
  • In workspace, a union representative
  • In a community, a few members with good
    reputation
  • Or a third party commercial provider
  • A very important source of security and
    efficiency
  • The key is that privacy peers should have
    different incentives from the server, a mutual
    distrust between them

17
Security from Heterogeneity
  • Server is secure against outside attacks and
    wont actively cheat
  • Companies spend to protect their servers
  • The server often holds much more valuable info
    than what the protocol reveals
  • Server benefits from accurate computation
  • Privacy peers wont collude with the server
  • Interests conflicts, mutual distrust, laws
  • Server cant trust clients can keep conspiracy
    secret
  • Users can actively cheat
  • Rely on server for protection against outside
    attacks, privacy peers for defending against a
    curious server

18
Private Addition
ui
vi
di user is private vector. ui,,vi and di are
all in a small integer field
ui vi di
19
Private Addition
µ Sui
? Svi
ui vi di
20
Private Addition
µ
?
µ Sui
? Svi
ui vi di
21
Private Addition
µ ?
22
P4Ps Private Addition
  • Provable privacy
  • Computation on both the server and the privacy
    peer is over small field same cost as
    non-private implementation
  • Fits existing server-based schemes
  • Server is always online. Users and privacy peers
    can be on and off.
  • Only two parties performing the computation,
    users just submit their data (and provide a ZK
    proof, see later)
  • Extra communication for the server is only with
    the privacy peer, independent of n

23
The Need for Verification
  • This scheme has a glaring weakness. Users can use
    any number in the small field as their data.
  • Think of a voting scheme Please place your vote
    0 or 1 in the envelope

24
Zero Knowledge Proofs
  • I can prove that I know X without disclosing what
    X is.
  • I can prove that a given encrypted number is a 0.
    Or I can prove that an encrypted number is a 1.
  • I can prove that an encrypted number is a ZERO OR
    ONE, i.e. a bit. (6 extra numbers needed)
  • I can prove that an encrypted number is a k-bit
    integer. I need 6k extra numbers to do this (!!!)

25
An Efficient ZKP of Boundedness
  • Luckily, we dont need to prove that every number
    in a users vector is small, only that the vector
    is small.
  • The server asks for some random projections of
    the users vector, and expects the user to prove
    that the square sum of them is small.
  • O(log m) public key crypto operations (instead
    of O(m)) to prove that the L-2 norm of an m-dim
    vector is smaller than L.
  • Running time reduced from hours to seconds.

26
Bounding the L2-Norm
  • A natural and effective way to restrict a
    cheating users malicious influence
  • You must have a big vector to produce large
    influence on the sum
  • Perturbation theory bounds system change with
    norms
  • si(A) - si(B) A-B2 Weyl
  • Can be the basis for other checks
  • Setting L 1 forces each user to have only 1
    vote

27
Random Projection-basedL2-Norm ZKP
  • Server generates N random m-vectors in
  • -1, 0, 1m
  • User projects his data to the N directions.
    provides ZKP that the square sum of the
    projections lt NL2/2
  • Expensive public key operations are only on the
    projections and the square sum

28
Effectiveness
29
Acceptance/rejection probabilities
(a) Linear and (b) log plots of probability of
user input acceptance as a function of d/L for
N 50. (b) also includes probability of
rejection. In each case, the steepest (jagged
curve) is the single-value vector (case 3), the
middle curve is Zipf vector (case 2) and the
shallow curve is uniform vector (case 1)
30
Performance Evaluation
(a) Verifier and (b) prover times in seconds for
the validation protocol where (from top to
bottom) L (the required bound) has 40, 20, or 10
bits. The x-axis is the vector length.
31
SVD
  • Singular value decomposition is an extremely
    useful tool for a lot of IR and data mining tasks
    (CF, clustering )
  • SVD for a matrix A is a factorization A UDVT.
  • If A encodes users x items, then VT gives us the
    best least-squares approximations to the rows of
    A in a user-independent way.
  • ATAV VD ?? SVD is an eigenproblem

32
SVD P4P Style
33
Experiments SVD Datasets
34
Results
N number of iterations. k number of singular
values. e relative residual error
35
Distributed Association Rule Mining
  • n users, m items. User i has dataset Di
  • Horizontally partitioned Di contains the same
    attributes

1 0 0 0 1 0 0
D1

Dn
0 0 1 0 0 1 0
36
The Market-Basket Model
  • A large set of items, e.g., things sold in a
    supermarket.
  • A large set of baskets, each of which is a small
    set of the items, e.g., the things one customer
    buys on one day.

37
Support
  • Simplest question find sets of items that appear
    frequently in the baskets.
  • Support for itemset I the number of baskets
    containing all items in I.
  • Given a support threshold s, sets of items that
    appear in gt s baskets are called frequent
    itemsets.

38
Example
  • Itemsmilk, coke, pepsi, beer, juice.
  • Support 3 baskets.
  • B1 m, c, b B2 m, p, j
  • B3 m, b B4 c, j
  • B5 m, p, b B6 m, c, b, j
  • B7 c, b, j B8 b, c
  • Frequent itemsets m, c, b, j, m,
    b, c, b, j, c.

39
Association Rules
  • If-then rules about the contents of baskets.
  • i1, i2,,ik ? j means if a basket contains
    all of i1,,ik then it is likely to contain j.
  • Confidence of this association rule is the
    probability of j given i1,,ik.

40
Step k of apriori-gen in P4P
  • User i constructs an mk-Dimensional vector in
    small field (mk number of candidate itemset at
    step k)
  • Use P4P to compute the aggregate (with
    verification)
  • The result encodes the supports of all candidate
    itemsets

41
Step k of apriori-gen in P4P

1 0 0 0 1 0 0

D1
d1j


Dn
dnj
0 0 1 0 0 1 0
P4P
cj jth candidate itemset
Support for cj
42
Analysis
  • Privacy guaranteed by P4P
  • Near optimal efficiency cost comparable to that
    of a direct implementation of the algorithms
  • Main aggregation in small field
  • Only a small number of large field operations
  • Deal with cheating users with P4Ps built-in ZK
    user data verification

43
Privacy
  • SVD The intermediate sums are implied by the
    final results
  • ATA VDVT
  • ARM Sums treated as public by the applications
  • Guaranteed privacy regardless data distribution
    or size

44
Infrastructure Support
  • Multicast encryption RSA 06
  • Scalable secure bidirectional communication
    Infocom 07
  • Data protection scheme PET 04

45
P4P Current Status
  • P4P has been implemented
  • In Java using native code for big integer
  • Runs on Linux platform
  • Will be made an open-source toolkit for building
    privacy-preserving real-world applications.

46
Conclusion
  • We can provide strong privacy protection with
    little or no cost to a service provider for a
    broad class of problems in e-commerce and
    knowledge work.
  • Responsibility for privacy protection shifts to
    privacy peers
  • Within the P4P framework, private computation and
    many zero-knowledge verifications can be done
    with great efficiency

47
More info
  • duan_at_cs.berkeley.edu
  • http//www.cs.berkeley.edu/duan
  • Thank You!
Write a Comment
User Comments (0)
About PowerShow.com