Title: The%20Zigzag%20Graph%20Product%20and%20Constant-Degree%20Lossless%20Expanders
1Privacy Preserving Learning of Decision Trees
Benny Pinkas HP Labs Joint work with Yehuda
Lindell (done while at the Weizmann Institute)
2Cryptographic methods vs. perturbation methods
overhead
This work
inaccuracy
lack of privacy
3A story
Were experiencing a lot of fraud lately
Here too..
I cant find a pattern to recognize fraud in
advance..
Neither can I..
- But, what about
- Patients privacy
- Business secrets
Maybe we should share information..
Have you heard of Secure function evaluation ?
This is all theory. It cant be efficient.
4Privacy preserving data mining
P2
P1
Confidential database D1
Confidential database D2
Wish to mine D1 ? D2 without revealing more info
- Examples
- Medical databases protected by law
- Competing businesses
- Government agencies (privacy, need to know)
5Secure Function Evaluation Yao 86
- F(x,y) A public function.
- Represented as a Boolean circuit C(x,y).
- Implementation
- Two passes
- O(X) oblivious transfers. O(C)
communication. - Pretty efficient for small circuits!
6Our Contribution
- An efficient sub-linear protocol for secure
computation of a complex well-known data-mining
alg (ID3), for semi-honest parties.
- A different approach offered by the data-mining
community AS00 - Perturb each entry (add random noise).
- Analyze accuracy of using perturbed data as input
to data mining algorithms. - How much privacy?
7The classification problem
Age gt 30 Sex How long do we know him/her? Claim gt 500 Did fraud occur?
C1 Yes M t ?0,4 years No No
C2 No F t ?5,9 years Yes Yes
Cn Yes F t ?10,15 years No No
8Classification using Decision Trees
9Privacy Preserving ID3
- Core of the problem Comparing entropies while
preserving privacy. (entropy ?x logx) - Privacy for each party, all intermediate values
are random. - Efficiency most computation done independently
by parties. - Basic task compute x log x.
- x e.g. of patients with (age gt 30) and
(fraud yes)
10Privacy Preserving ID3
- Computing x log x
- x x1 x2 known to P1 and P2 respectively
(independently computed from databases). - Might as well compute x lnx lnx.
- First run a protocol to compute random shares, y1
y2 ln x - ln x is Real. Crypto works over finite fields.
Must do numerical analysis.
11Cryptographic Tools
- Secure Function Evaluation (SFE) Yao
- Oblivious Polynomial Evaluation NP
Q( . )
x
Input
Q(x) and nothing else
nothing
Output
Implementation Two passes, O(degree) (or O(
logF) ) exponentiations.
12Computing random shares oflnx ln(x1x2)
- Use Taylor approximation for lnx
- x x1 x2 2 n (1?) -½ lt ? lt ½
- lnx ln(2 n (1?)) ln 2 n ln(1?)
- ? ln 2 n ?
i1..k (-1) i-1 ? i / i - ln 2 n T(?)
- T(?) is a polynomial of degree k. Error is
exponentially small in k. - We only know how to work over finite fields
- Work in F, where F sufficiently large.
- Compute clnx, where c compensates for fractions.
13ln(x1x2) Protocol (Cont.)
- Step 1 of the protocol Find n, ?
- Apply Yaos protocol to the following small
circuit - Input x1 and x2
- Output (random shares)
- random a1 and a2 s.t. a1 a2 x-2 n ?2 n
- random b1 and b2 s.t. b1 b2 ln 2 n
- Operation The protocol finds 2 n closest to
x1x2, computes ?2 n x1x2- 2 n. - x x1 x2 2 n ?2 n
14ln(x1x2) Protocol (Cont)
- Step 2 of the protocol
- Compute random shares of T(?) (Taylor approx.)
- P1 chooses a random w1? F and defines a
polynomial Q(x), s.t. w1 Q(a2) T(?)
(recall a1 a2 ?2 n) - Namely, Q(x) T( (a1x)/2 n ) w1 .
- Run an oblivious poly evaluation in which P2
computes - w2 Q(a2) T(?) w1 .
- Now the parties have random w1 and w2 s.t.
- w1 w2 T(?) ? ln(1?)
- (b1 w1) (b2 w2) ? ln 2 n ln(1?) ln x
15Computing x lnx
- Tool Multiply(c1,c2)
- Input c1, c2
- Output d1, d2 s.t. d1 d2 c1 c2
- How? OPE of Q(z) c1z -d1
- d2 Q(c2) c1 c2 - d1
- Actual task x lnx
- Input x1 x2 x, c1 c2 ln x
- Output x lnx (x1 x2 )(c1 c2)
- Run Multiply(x1 ,c2), Multiply (c1 ,x2)
16The rest of the work..
- Each party computes a share of the entropy by
summing shares of x lnx - A small circuit finds the attribute giving the
minimal conditional entropy - The attribute is assigned to the node
- The databases are divided according to the value
of this attribute
17Efficiency
- lnx protocol
- secure computation of a small circuit
- one oblivious polynomial evaluation
- ID3 for a database with
- 1,000,000 transactions
- 15 attributes
- 10 values per attribute
- 4 class values
- Communication per node takes seconds (T1)
- Computation per node takes minutes (P3)
18Issues
- Only two participants
- Curious but honest participants
- Approximating ln x gives an approximation of ID3
- The participants learn the decision tree, which
reveals some information
19Contributions
- A cryptographic protocol where the bulk of the
operations is done independently. - Data mining
- Rigorous model for secure data-mining.
- Efficient, secure protocol for ID3.
- Cryptography
- Sub-linear complexity - secure computation for
large data sets. - An efficient protocol for a complex known
algorithm. - Secure computation of logarithms (real function -
numerical analysis).