Private Matching - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Private Matching

Description:

Garbled values (w's) of his input values. Translation from garbled values of ... If Alice gets garbled values (w's) of her input values, she can compute the ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 42
Provided by: Ben5153
Category:

less

Transcript and Presenter's Notes

Title: Private Matching


1
Privacy Preserving Data Mining Lecture
2 Cryptographic Solutions
Benny Pinkas HP Labs, Israel
2
Secure two-party computation - definition
y
x
Input
F(x,y) and nothing else
Output
y
As if
x
F(x,y)
F(x,y)
3
Secure Function Evaluation
  • A major topic of cryptographic research
  • How to let n parties, P1,..,Pn compute a function
    F(x1,..,xn)
  • Where input xi is known to party Pi
  • Parties learn the final input and nothing else
  • Caveat cryptographic definitions of secure
    computation are both too strong and too weak
  • Too strong do not allow leakage of harmless
    information the price of this extra security is
    in efficiency.
  • Too weak do not address leakage or misuse caused
    by the function itself (e.g., information implied
    by the outputs, or misbehavior in choosing an
    input).

4
Leak no other information
  • A protocol is secure if it emulates the ideal
    solution
  • Alice learns F(x,y), and therefore can compute
    everything that is implied by x, her prior
    knowledge of y, and F(x,y).
  • Alice must not be able to compute anything else
  • Simulation
  • A protocol is considered secure if
  • For every adversary in the real world
  • There exists a simulator in the ideal world,
    which outputs an indistinguishable transcript
    , given access to the information that the
    adversary is allowed to learn in the ideal model.

5
Secure Function Evaluation
  • Major Result Yao Any function that can be
    evaluated using polynomial resources can be
    securely evaluated using polynomial
    resources(under some cryptographic assumption)

6
SFE Building Block 1-out-of 2 Oblivious Transfer
Y0, Y1
j?0,1
Bob
Alice
  • 1-out-of-2 OT can be based on most public key
    systems
  • There are implementations with two communication
    rounds

7
General Two party Computation
  • Two party protocol
  • Input
  • Sender Function F (some representation)
  • The senders input Y is already embedded in F
  • Receiver X ??0,1?n
  • Output
  • Receiver F(x) and nothing else about F
  • Sender nothing about x

8
Representations of F
  • Boolean circuits Yao,GMW,
  • Algebraic circuits BGW,
  • Low deg polynomials BFKR
  • Matrices product over a large field FKN,IK
  • Randomizing polynomials IK
  • Communication Complexity Protocol NN

9
Secure two-party computation of general functions
Yao
  • First, represent the function F as a Boolean
    circuit C
  • Its always possible
  • Sometimes its easy (additions, comparisons)
  • Sometimes the result is inefficient (e.g. for
    indirect addressing, e.g. Ax )
  • Then, garble the circuit
  • Finally, evaluate the garbled circuit

10
Garbling the circuit
  • Bob constructs the circuit, and then garbles it.

W values will serve as cryptographic keys Wk0 ?
0 on wire k Wk1 ? 1 on wire k (Alice will learn
one string per wire, but not which bit it
corresponds to.)
11
Gate tables
  • For every gate, every combination of input values
    is used as a key for encrypting the corresponding
    output
  • Assume GAND. Bob constructs a table
  • Encryption of wk0 using keys wi0,wJ0
    (AND(0,0)0)
  • Encryption of wk0 using keys wi0,wJ1
    (AND(0,1)0)
  • Encryption of wk0 using keys wi1,wJ0
    (AND(1,0)0)
  • Encryption of wk1 using keys wi1,wJ1
    (AND(1,1)1)
  • Result given wix,wJy, can compute wkG(x,y)

12
Secure computation
  • Bob sends the table of gate G to Alice
  • Given, e.g., wi0,wJ1, Alice computes wk0 by
    decrypting the corresponding entry in the table,
    but she does not know the actual values of the
    wires.

Encryption of wk0 using keys wi0,wJ0 Encryption
of wk0 using keys wi0,wJ1 Encryption of wk1 using
keys wi1,wJ1 Encryption of wk0 using keys wi1,wJ0
Permuted order
13
Secure computation
  • Bob sends to Alice
  • Tables encoding each circuit gate.
  • Garbled values (ws) of his input values.
  • Translation from garbled values of output wires
    to actual 0/1 values.
  • If Alice gets garbled values (ws) of her input
    values, she can compute the output of the
    circuit, and nothing else.

14
Alices input
  • For every wire i of Alices input
  • The parties run an OT protocol
  • Alices input is her input bit (s).
  • Bobs input is wi0,wi1
  • Alice learns wis
  • The OTs for all input wires can be run in
    parallel.
  • Afterwards Alice can compute the circuit by
    herself.

15
Secure computation the big picture
  • Represent the function as a circuit C
  • Bob sends to Alice 4C encryptions (e.g. 64C
    Bytes), 4 encryptions for every gate.
  • Alice performs an OT for every input bit. (Can
    do, e.g. 100-1000 OTs per sec.)
  • One round of communication.
  • Efficient for medium size circuits!

16
Example
  • The Millionaires problem comparing two N bit
    numbers
  • Whats the overhead?

17
Applications
  • Two parties. Two large data sets.
  • Max?
  • Mean?
  • Median?
  • Intersection?
  • Decision Tree learning? ID3?

18
Fairplay a secure two-party computation
systemMalkhi, Nissan, P., Sella
  • A a full fledged secure two-party computation
    system, implementing Yaos garbled circuit
    protocol.
  • Goals
  • Investigate whether two-party SFE is practical
  • Actual measurements of overall computation
  • Breakdown of computation into parts
  • Computation versus communication?
  • Test-bed for various optimizations

19
Fairplay
  • The Compilation paradigm
  • Programs written in SFDL, a high-level
    programming language
  • Allows clear, formal, easily understandable
    definition and requirements by humans
  • SHDL Low-level language describing Boolean
    circuits
  • SFDL ? SHDL compiler and optimizer
  • SHDL ? Java programs implementing Yaos protocol

20
Fairplay SFDL example
  • program Millionaires
  • type int Intlt20gt // 20-bit integer
  • type AliceInput int
  • type BobInput int
  • type AliceOutput Boolean
  • type BobOutput Boolean
  • type Output struct AliceOutput alice,
    BobOutput bob
  • type Input struct AliceInput alice,
    BobInput bob
  • function Output output(Input input)
  • output.alice input.alice gt input.bob
  • output.bob input.bob gt input.alice

21
SFDL properties
  • Conventional syntax (C/Pascal-like)
  • Type system Boolean, integer, enumerated
  • Program structure
  • Declarations global constants, types
  • Sequence of functions (no nesting C, no
    recursion)
  • Function name is its return value Pascal
  • Conditional execution and loops
  • if-then, if-then-else statements, For-loop (loop
    boundaries should be known at compile time)
  • Assignments and expressions
  • constants, variables, array entries, structure
    items, function calls, operators (, -, logical,
    comparison), parenthesis

22
SHDL example
  • 0 input //outputinput.bob0
  • 1 input //outputinput.bob1
  • 2 input //outputinput.bob2
  • 3 input //outputinput.bob3
  • 4 input //outputinput.alice0
  • 5 input //outputinput.alice1
  • 6 input //outputinput.alice2
  • 7 input //outputinput.alice3
  • 8 gate arity 2 table 1 0 0 0 inputs 4 5
  • 9 gate arity 2 table 0 1 1 0 inputs 4 5

23
kth-ranked element (e.g. median)
  • Inputs
  • Alice SA Bob SB
  • Large sets of unique items (?D).
  • Output
  • x ? SA ? SB s.t. x has k-1 elements smaller than
    it.
  • The rank k
  • Could depend on the size of input datasets.
  • Median k (SA SB) / 2
  • Motivation
  • Basic statistical analysis of distributed data.
  • E.g. histogram of salaries in CS departments
  • The Problem Generic constructions using circuits
    Yao yield an overhead which is at least
    linear in k.

24
An (insecure) two-party median protocol
RA
LA
SA
mA
mA lt mB
SB
RB
LB
mB
LA lies below the median, RB lies above the
median. New median is same as original median.
Recursion ? Need log n rounds (assume each set
contains n2i items)
25
A Secure two-party median protocol
A deletes elements mA. B deletes elements gt
mB.
YES
A finds its median mA B finds its median mB
mA lt mB
A deletes elements gt mA. B deletes elements
mB.
NO
Secure comparison (e.g. a small circuit)
26
An example
B
A
mAgtmB
mAltmB
mAltmB
mAgtmB
Median found!!
mAltmB
27
Proof of security
median
B
A
mAgtmB
mAgtmB
mAltmB
mAltmB
mAltmB
mAltmB
mAgtmB
mAgtmB
mAltmB
mAltmB
28
Arbitrary input size, arbitrary k
SA
k
SB
Now, compute the median of two sets of size k.
Size should be a power of 2.
median of new inputs kth element of original
inputs
29
Hiding size of inputs
  • Can search for kth element without revealing size
    of input sets.
  • However, kn/2 (median) reveals input size.
  • Solution Let S2i be a bound on input size.

Median of new datasets is same as median of
original datasets.
SA
SB
30
Privacy preserving data mining
P2
P1
Confidential database D1
Confidential database D2
Wish to mine D1 ? D2 without revealing more info
  • Examples
  • Medical databases protected by law
  • Competing businesses
  • Government agencies (privacy, need to know)

31
The classification problem
Goal based on available data design an algorithm
to classify new data
32
Classification using Decision Trees
33
Privacy Preserving ID3
  • Scenario The inputs are private information of
    P1 and P2
  • Main technical problem Comparing entropies while
    preserving privacy. (entropy ?x logx)
  • Efficiency
  • most computation done independently by parties.
  • The overhead of cryptographic operations depends
    only on the size of the decision tree (not on the
    input size).
  • Basic task compute x log x.
  • x x1x2 e.g., total number of customers
    with (age gt 30) and (fraud yes)

34
Privacy Preserving ID3
  • Computing x log x
  • x x1 x2, known to P1 and P2 respectively
    (independently computed from databases).
  • Might as well compute x lnx, or lnx.
  • First run a protocol to compute random shares, y1
    y2 ln x
  • ln x is Real. Crypto works over finite fields.
    Must do numerical analysis.

35
Cryptographic Tools
  • Secure Function Evaluation (SFE) Yao
  • Oblivious Polynomial Evaluation NP

A polynomial Q()
x
Input
Q(x) and nothing else
nothing
Output
Implementation Two passes, O(degree) (or O(
logF) ) exponentiations.
36
Computing random shares of lnx ln(x1x2)
  • Use Taylor approximation for lnx
  • x x1 x2 2 n (1?) -½ lt ? lt ½
  • lnx ln(2 n (1?)) ln 2 n ln(1?)
  • ? ln 2 n ?
    i1..k (-1) i-1 ? i / i
  • ln 2 n T(?)
  • T(?) is a polynomial of degree k. Error is
    exponentially small in k.
  • We only know how to work over finite fields
  • Compute clnx, where c compensates for fractions.
  • Work in F, where F sufficiently large.

37
ln(x1x2) Protocol
  • Step 1 of the protocol Find n, ?
  • Apply Yaos protocol to the following small
    circuit
  • Input x1 and x2
  • Output (random shares)
  • random a1 and a2 s.t. a1 a2 x-2 n ? 2 n
  • random b1 and b2 s.t. b1 b2 ln 2 n
  • Operation The protocol finds 2 n closest to
    x1x2, computes ?2 n x1x2- 2 n.
  • x x1 x2 2 n ?2 n
  • lnx ln(2 n (1?)) ln 2 n ln(1?)

38
ln(x1x2) Protocol (Cont.)
  • Step 2 of the protocol
  • Compute random shares of T(?) (Taylor approx.)
  • P1 chooses a random w1? F and defines a
    polynomial Q(x), s.t. w1 Q(a2) T(?)
    (recall a1 a2 ?2 n)
  • Namely, Q(x) T( (a1x)/2 n ) w1 .
  • Run an oblivious poly evaluation in which P2
    computes
  • w2 Q(a2) T(?) w1 .
  • Now the parties have random w1 and w2 s.t.
  • w1 w2 T(?) ? ln(1?)
  • (b1 w1) (b2 w2) ? ln 2 n ln(1?) ln x

39
Computing x lnx
  • Tool Multiply(c1,c2)
  • Input c1, c2
  • Output d1, d2 s.t. d1 d2 c1 c2
  • How? OPE of Q(z) c1z -d1
  • Actual task x lnx
  • Input x1 x2 x, c1 c2 ln x
  • Output x lnx (x1 x2 )(c1 c2)
  • Run Multiply(x1 ,c2), Multiply (c1 ,x2)

40
The rest of the work..
  • The parties compute shares of lnx
  • Then they compute shares of xlnx
  • Each party computes a share of the entropy by
    summing shares of x lnx (H(X) ? x lnx )
  • A small circuit finds the attribute giving the
    minimal conditional entropy
  • The attribute is assigned to the node
  • The databases are divided according to the value
    of this attribute

41
Efficiency
  • lnx protocol
  • secure computation of a small circuit
  • one oblivious polynomial evaluation
  • ID3 for a database with
  • 1,000,000 transactions
  • 15 attributes
  • 10 values per attribute
  • 4 class values
  • Communication per node takes seconds (T1)
  • Computation per node takes minutes (P3)

42
Contributions
  • Cryptographic protocols where the bulk of the
    operations is done independently.
  • Data mining
  • Rigorous model for secure data-mining.
  • Efficient, secure protocol for specific problems
    (median, ID3).
  • Cryptography
  • Sub-linear complexity - secure computation for
    large data sets.
  • Efficient protocols for complex known algorithms.
  • Secure computation of logarithms (real function -
    numerical analysis).
  • Drawbacks
  • Privacy preserving solutions are less efficient
  • Its hard to find efficient private solutions for
    all interesting functions
  • Security against malicious parties

43
References
  • Lecture notes and overview papers
  • B. Pinkas, Cryptographic Techniques for
    Privacy-Preserving Data Mining, SIGKDD
    Explorations, January 2003. http//www.pinkas.net/
    PAPERS/sigkdd.pdf
  • R. Cramer Introduction to Secure Computation,
    2000. http//homepages.cwi.nl/cramer/papers/CRAME
    R_revised.ps
  • Ivan Damgård, Theory and practice of multiparty
    computation, 8th EWSCS, http//www.cs.ioc.ee/yik/s
    chools/win2003/damgard.php
  • Research papers
  • G. Aggarwal, N. Mishra and B. Pinkas, Secure
    Computation of the K'th-ranked Element, Eurocrypt
    '2004. http//www.pinkas.net/PAPERS/ANP04.pdf
  • Y. Lindell and B. Pinkas, Privacy Preserving Data
    Mining, Journal of Cryptology, Vol. 15 No. 3,
    2002. http//www.pinkas.net/PAPERS/id3-final.pdf
Write a Comment
User Comments (0)
About PowerShow.com