Privacy Preserving Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Privacy Preserving Data Mining

Description:

Data stays private i.e. no party learns anything but output. Assumptions ... Step 1 - Each party computes ID3 decision tree learning (O(# attributes) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 16
Provided by: csU5
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Privacy Preserving Data Mining


1
Privacy Preserving Data Mining
  • Yehuda Lindell Benny Pinkas

2
Summary
  • Objective
  • Various components / tools needed
  • Algorithm

3
Objective
  • Perform Data-mining on union of two private
    databases
  • Data stays private i.e. no party learns anything
    but output

4
Assumptions
  • Large Databases Generic Solutions not possible
  • Semi-Honest Parties

5
Classification by Decision Tree Learning
  • ltattribute,valuegt

Attributes
Class Attribute
Want to Predict Class, using only non-class
attributes
Transaction
6
Decision Tree
  • Rooted tree with nodes/edges
  • Internal Nodes gt Attributes
  • Edges leaving nodes gt Possible values
  • Leaves gt Expected Class for transaction
  • Traverse tree using known attributes
  • Predict class given leaf nodes value

7
Constructing Tree
  • Top-down
  • At each level find attribute that best
    classifies transactions gt gives least overhead
  • Best gt Attribute that minimizes entropy
    (maximizes information gain)
  • Entropy -xlnx
  • Entropy of class 0

8
Entropy calcluations
  • Entropy H(T) sum (-x ln x )
  • Hc(T) gt Info needed to ID class of transaction T
  • X set of transactions for each class
  • Sum over all possible classes
  • Hc(T A) gt Info needed to ID class of
    transaction T, Given value v of attribute A
  • X transactions with value v for attribute A
  • Gain Hc(T) Hc(T A)

9
Private Computation
  • Given only x1 and f1(x1,y), function S1 exists
    s.t.
  • P2 provides input x1 to P1
  • P2 can compute corresponding view of P1s DB
    (desired ltatt,valuegt pairs)

S1
View
Party 2
f1(x1,y)
x1
f1
Party 1
10
Oblivious Evaluation
  • What if in previous example Party 2 does not
    want Party 1 to know what input (x1) it is
    providing?
  • Oblivious Evaluation Receiver obtains P(x)
    without learning anything else about polynomial
    P. Sender learns nothing about x.

11
Oblivious Evaluation (2) Simplified Version
  • ri receivers random number
  • Ri senders random number
  • X input from rcvr
  • Sender Receiver
  • s (secret key)
  • (ari, asrj ax)
  • (aRi, asR aP(x) asri)

Divide 2nd element by 1st element raised to power
s to get P(x)
a P(x) (aRi, asR aP(x) asri) /
(aRi ari)s
12
Algorithm
  • Step 1 - Each party computes ID3 decision tree
    learning (O( attributes))
  • Step 2 - Combine results using cryptographic
    protocols like oblivious evaluation -
    (O(log(transactions)))
  • Result - Each party gains results of data-mining
    without learning more than necessary

13
Algorithm (2)Finding best attribute is hardest
part
  • Each party computes their share of entropy
  • For each attribute, combine values from each
    party
  • Results in private computation of Entropy (-xlnx)
  • Choose attribute that minimizes entropy
  • Provides maximum information gain
  • Ensures most efficient tree with least overhead
  • Use oblivious Evaluation

14
Discussion of Algorithm
  • Efficient
  • Large Databases accommodated Algorithm relies on
    number of possible values for attributes NOT
    number of transactions in database
  • Private
  • Each step depends on local computation and
    private protocol
  • Uses techniques like oblivious transfer /
    evaluation to exchange information
  • Paper proves individual steps are private, AND
    can predict control flow between steps ONLY based
    on input/output so also private

15
Discussion of Algorithm (2)
  • Approximate ID3 used instead of actual ID3
    shown to be as secure and provide same information
Write a Comment
User Comments (0)
About PowerShow.com