The%20Zigzag%20Graph%20Product%20and%20Constant-Degree%20Lossless%20Expanders - PowerPoint PPT Presentation

About This Presentation
Title:

The%20Zigzag%20Graph%20Product%20and%20Constant-Degree%20Lossless%20Expanders

Description:

... circuit finds the attribute giving the minimal conditional ... The databases are divided according to the value of this attribute. Efficiency. lnx protocol: ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 20
Provided by: omer
Category:

less

Transcript and Presenter's Notes

Title: The%20Zigzag%20Graph%20Product%20and%20Constant-Degree%20Lossless%20Expanders


1
Privacy Preserving Learning of Decision Trees
Benny Pinkas HP Labs Joint work with Yehuda
Lindell (done while at the Weizmann Institute)
2
Cryptographic methods vs. perturbation methods
overhead
This work
inaccuracy
lack of privacy
3
A story
Were experiencing a lot of fraud lately
Here too..
I cant find a pattern to recognize fraud in
advance..
Neither can I..
  • But, what about
  • Patients privacy
  • Business secrets

Maybe we should share information..
Have you heard of Secure function evaluation ?
This is all theory. It cant be efficient.
4
Privacy preserving data mining
P2
P1
Confidential database D1
Confidential database D2
Wish to mine D1 ? D2 without revealing more info
  • Examples
  • Medical databases protected by law
  • Competing businesses
  • Government agencies (privacy, need to know)

5
Secure Function Evaluation Yao 86
  • F(x,y) A public function.
  • Represented as a Boolean circuit C(x,y).
  • Implementation
  • Two passes
  • O(X) oblivious transfers. O(C)
    communication.
  • Pretty efficient for small circuits!

6
Our Contribution
  • An efficient sub-linear protocol for secure
    computation of a complex well-known data-mining
    alg (ID3), for semi-honest parties.
  • A different approach offered by the data-mining
    community AS00
  • Perturb each entry (add random noise).
  • Analyze accuracy of using perturbed data as input
    to data mining algorithms.
  • How much privacy?

7
The classification problem
Age gt 30 Sex How long do we know him/her? Claim gt 500 Did fraud occur?
C1 Yes M t ?0,4 years No No
C2 No F t ?5,9 years Yes Yes

Cn Yes F t ?10,15 years No No
8
Classification using Decision Trees
9
Privacy Preserving ID3
  • Core of the problem Comparing entropies while
    preserving privacy. (entropy ?x logx)
  • Privacy for each party, all intermediate values
    are random.
  • Efficiency most computation done independently
    by parties.
  • Basic task compute x log x.
  • x e.g. of patients with (age gt 30) and
    (fraud yes)

10
Privacy Preserving ID3
  • Computing x log x
  • x x1 x2 known to P1 and P2 respectively
    (independently computed from databases).
  • Might as well compute x lnx lnx.
  • First run a protocol to compute random shares, y1
    y2 ln x
  • ln x is Real. Crypto works over finite fields.
    Must do numerical analysis.

11
Cryptographic Tools
  • Secure Function Evaluation (SFE) Yao
  • Oblivious Polynomial Evaluation NP

Q( . )
x
Input
Q(x) and nothing else
nothing
Output
Implementation Two passes, O(degree) (or O(
logF) ) exponentiations.
12
Computing random shares oflnx ln(x1x2)
  • Use Taylor approximation for lnx
  • x x1 x2 2 n (1?) -½ lt ? lt ½
  • lnx ln(2 n (1?)) ln 2 n ln(1?)
  • ? ln 2 n ?
    i1..k (-1) i-1 ? i / i
  • ln 2 n T(?)
  • T(?) is a polynomial of degree k. Error is
    exponentially small in k.
  • We only know how to work over finite fields
  • Work in F, where F sufficiently large.
  • Compute clnx, where c compensates for fractions.

13
ln(x1x2) Protocol (Cont.)
  • Step 1 of the protocol Find n, ?
  • Apply Yaos protocol to the following small
    circuit
  • Input x1 and x2
  • Output (random shares)
  • random a1 and a2 s.t. a1 a2 x-2 n ?2 n
  • random b1 and b2 s.t. b1 b2 ln 2 n
  • Operation The protocol finds 2 n closest to
    x1x2, computes ?2 n x1x2- 2 n.
  • x x1 x2 2 n ?2 n

14
ln(x1x2) Protocol (Cont)
  • Step 2 of the protocol
  • Compute random shares of T(?) (Taylor approx.)
  • P1 chooses a random w1? F and defines a
    polynomial Q(x), s.t. w1 Q(a2) T(?)
    (recall a1 a2 ?2 n)
  • Namely, Q(x) T( (a1x)/2 n ) w1 .
  • Run an oblivious poly evaluation in which P2
    computes
  • w2 Q(a2) T(?) w1 .
  • Now the parties have random w1 and w2 s.t.
  • w1 w2 T(?) ? ln(1?)
  • (b1 w1) (b2 w2) ? ln 2 n ln(1?) ln x

15
Computing x lnx
  • Tool Multiply(c1,c2)
  • Input c1, c2
  • Output d1, d2 s.t. d1 d2 c1 c2
  • How? OPE of Q(z) c1z -d1
  • d2 Q(c2) c1 c2 - d1
  • Actual task x lnx
  • Input x1 x2 x, c1 c2 ln x
  • Output x lnx (x1 x2 )(c1 c2)
  • Run Multiply(x1 ,c2), Multiply (c1 ,x2)

16
The rest of the work..
  • Each party computes a share of the entropy by
    summing shares of x lnx
  • A small circuit finds the attribute giving the
    minimal conditional entropy
  • The attribute is assigned to the node
  • The databases are divided according to the value
    of this attribute

17
Efficiency
  • lnx protocol
  • secure computation of a small circuit
  • one oblivious polynomial evaluation
  • ID3 for a database with
  • 1,000,000 transactions
  • 15 attributes
  • 10 values per attribute
  • 4 class values
  • Communication per node takes seconds (T1)
  • Computation per node takes minutes (P3)

18
Issues
  • Only two participants
  • Curious but honest participants
  • Approximating ln x gives an approximation of ID3
  • The participants learn the decision tree, which
    reveals some information

19
Contributions
  • A cryptographic protocol where the bulk of the
    operations is done independently.
  • Data mining
  • Rigorous model for secure data-mining.
  • Efficient, secure protocol for ID3.
  • Cryptography
  • Sub-linear complexity - secure computation for
    large data sets.
  • An efficient protocol for a complex known
    algorithm.
  • Secure computation of logarithms (real function -
    numerical analysis).
Write a Comment
User Comments (0)
About PowerShow.com