Cryptographic methods for privacy aware computing: applications - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Cryptographic methods for privacy aware computing: applications

Description:

Table with key and r set of attributes. key. X1...Xd. key. X1...Xd. Site 1. key. X1...Xd ... certain attribute Ai, sorted v1...vn. Certain value in the attribute ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 24
Provided by: loca306
Category:

less

Transcript and Presenter's Notes

Title: Cryptographic methods for privacy aware computing: applications


1
Cryptographic methods for privacy aware
computing applications
2
Outline
  • Review three basic methods
  • Two applications
  • Distributed decision tree with horizontally
    partitioned data
  • Distributed k-means with vertically partitioned
    data

3
Three basic methods
  • 1-out-K Oblivious Transfer
  • Random share
  • Homomorphic encryption
  • Cost is the major concern

4
Two example protocols
  • The basic idea is
  • Do not release original data
  • Exchange intermediate result
  • Applying the three basic methods to securely
    combine them

5
Building decision trees over horizontally
partitioned data
  • Horizontally partitioned data
  • Entropy-based information gain
  • Major ideas in the protocol

6
Horizontally Partitioned Data
  • Table with key and r set of attributes

key
X1Xd
K1k2 kn
key
X1Xd
key
X1Xd
key
X1Xd
Ki1ki2 kj
Km1km2 kn
K1k2 ki
Site 1
Site 2
Site r

7
Review decision tree algorithm (ID3 algorithm)
  • Find the cut that maximizes gain
  • certain attribute Ai, sorted v1vn
  • Certain value in the attribute
  • For categorical data we use Aivi
  • For numerical data we use Ailtvi

Ailtvi?
yes
no

Ajltvj?
Ai
label
E() Entropy of label distribution
v1v2 vn
l1l2 ln
cut
Choose the attribute/value that gives the highest
gain!
8
Key points
  • Calculating entropy

Ai
label
v1v2 vn
l1l2 ln
cut
  • The key is calculating x log x, where
  • x is the sum of values from the two parties
  • P1 and P2 , i.e., x1 and x2, respectively
  • decomposed to several steps
  • Each step each party knows only a random
  • share of the result

9
steps
  • Step1 compute shares for
  • w1 w2 (x1x2)ln(x1x2)
  • a major protocol is used to compute
    ln(x1x2)
  • Step 2 for a condition (Ai, vi), find the random
    shares for E(S), E(S1) and E(S2) respectively.
  • Step3 repeat step12 to all possible (Ai, vi)
    pairs
  • Step4 a circuit gate to determine which
  • (Ai, vi) pair results in maximum gain.

(Ai,vi) with Maximum gain
w11
w21
x1

w12
w22
x2
10
2. K-means over vertically partitioned data
  • Vertically partitioned data
  • Normal K-means algorithm
  • Applying secure sum and secure comparison among
    multi-sites in the secure distributed algorithm

11
Vertically Partitioned Data
  • Table with key and r set of attributes

key
X1Xi Xi1Xj Xm1Xd
key
X1Xi
key
Xi1Xj
key
Xm1Xd
Site 1
Site 2
Site r

12
Motivation
  • Naïve approach send all data to a trusted site
    and do k-mean clustering there
  • Costly
  • Trusted third party?
  • Preferable distributed privacy preserving k-means

13
Basic K-means algorithm
  • 4 main steps
  • step1.Randomly select k initial cluster centers
    (k means)
  • repeat
  • step 2. Assign any point i to its closest
    cluster center
  • step 3. Recalculate the k means with the new
    point assignment
  • Until step 4. the k means do not change

14
Distributed k-means
  • Why k-means can be done over vertically
    partitioned data
  • All of the 4 steps are decomposable !
  • The most costly part (step 2 and 3) can be done
    locally
  • We will focus on the step 2 (Assign any point i
    to its closest cluster center)

15
step 1
  • All sites share the index of the initial random k
    records as the centroids

µ11 µ1i
µ1i1 µ1j
µ1m µ1d
µ1
µk1 µki
µki1 µkj
µkm µkd
µk
Site 1
Site 2
Site r

16
Step 2
  • Assign any point x to its closest cluster center
  • Calculate distance of point X (X1, X2, Xd) to
    each cluster center µk
  • -- each distance calculation is decomposable!
  • d2 (X1- µk1)2 (Xi- µki)2 (Xi1-
    µki1)2 (Xj- µkj)2
  • 2. Compare the k full distances to find the
    minimum one

Partial distances d1 d2

Site1 site2
For each X, each site has a k-element vector that
is the result for the partial distance to the k
centroids, notated as Xi
17
Privacy concerns for step 2
  • Some concerns
  • Partial distances d1, d2 may breach privacy
    (the Xi and µki ) need to hide it
  • distance of a point to each cluster may breach
    privacy need hide it
  • Basic ideas to ensure security
  • Disguise the partial distances
  • Compare distances so that only the comparison
    result is learned
  • Permute the order of clusters so the real meaning
    of the comparison results is unknown.
  • Need 3 non-colluding sites (P1, P2, Pr)

18
Secure Computing of Step 2
  • Stage1 prepare for secure sum of partial
    distances
  • p1 generate V1V2 Vr 0, Vi is random
    k-element vector, used to hide the partial
    distance for site i
  • Use Homomorphic encryption to do randomization
  • Ei(Xi)Ei(Vi) Ei(XiVi)
  • Stage2 calculate secure sum for r-1 parties
  • P1, P3, P4 Pr-1 send their perturbed and
    permuted partial distances to Pr
  • Pr sums up the r-1 partial distances (including
    its own part)

19
Secure Computing of Step 2
Stage 1
Stage 2
Xi contains the partial distances to the k
partial centroids at site i Ei(Xi)Ei(Vi)
Ei(XiVi) Homomorphic encryption, Ei is public
key ?(Xi) permutation function, perturb the
order of elements in Xi V1V2 Vr 0, Vi is
used to hide the partial distances
20
  • Stage 3 secure_add_and_compare to find the
    minimum distance
  • Involves only Pr and P2
  • Use a standard Secure Multiparty Computation
    protocol to find the result
  • Stage 4
  • the index of minimum distance (permuted cluster
    id) is sent back to P1.
  • P1 knows the permutation function thus knows the
    original cluster id.
  • P1 broadcasts the cluster id to all parties.

K-1 comparisons
21
Step 3 can also be done locally
  • Update partial means µi locally according to the
    new cluster assignments.

Cluster labels
X11 X1i
X1i1 X1j
X1m X1d
Cluster 2
X21 X2i
Cluster k
Xn1 Xni
Xni1 Xnj
Xnm Xnd
Cluster k
Site 1
Site 2
Site r

22
Extra communication cost
  • O(nrk)
  • n of records
  • r of parties
  • k of means
  • Also depends on of iterations

23
Conclusion
  • It is appealing to have cryptographic privacy
    preserving protocols
  • The cost is the major concern
  • It can be reduced using novel algorithms
Write a Comment
User Comments (0)
About PowerShow.com