Privacy Preserving Data Mining

About This Presentation

Title:

Privacy Preserving Data Mining

Description:

Secure Multiparty Computation Basic Cryptographic Methods Li Xiong CS573 Data Privacy and Security – PowerPoint PPT presentation

Number of Views:203

Avg rating:3.0/5.0

Slides: 56

Provided by: Yehu45

Learn more at: http://www.mathcs.emory.edu

Category:

more less

Transcript and Presenter's Notes

Title: Privacy Preserving Data Mining

1
Secure Multiparty Computation Basic
Cryptographic Methods
Li Xiong CS573 Data Privacy and Security
2
The Love Game (AKA the AND game)
He loves me, he loves me not
She loves me, she loves me not
Want to know if both parties are interested in
each other. But Do not want to reveal unrequited
love.
Input 1 I love you Input 0 I love
you Must compute F(X,Y)X AND Y, giving
F(X,Y) to both players.
as a friend
Can we reveal the answer without revealing the
inputs?
3
The Spoiled Children Problem(AKA The
Millionaires Problem Yao 1982)
Who has more toys?
Who Cares?
Pearl wants to know whether she has more toys
than Gersh, Doesnt want to tell Gersh
anything. Doesnt want Pearl to know how many
toys he has.
Pearl wants to know whether she has more toys
than Gersh,. Gersh is willing for Pearl to find
out who has more toys,
Can we give Pearl the information she wants, and
nothing else, without giving Gersh any
information at all?
4
Secure Multiparty Computation

A set of parties with private inputs
Parties wish to jointly compute a function of
their inputs so that certain security properties
(like privacy and correctness) are preserved
Properties must be ensured even if some of the
parties maliciously attack the protocol
Examples
Secure elections
Auctions
Privacy preserving data mining

5
Application to Private Data Mining

The setting
Data is distributed at different sites
These sites may be third parties (e.g.,
hospitals, government bodies) or may be the
individual him or herself
The aim
Compute the data mining algorithm on the data so
that nothing but the output is learned
Privacy ? Security (why?)

6
Privacy and Secure Computation

Privacy ? Security
Secure computation only deals with the process of
computing the function
It does not ask whether or not the function
should be computed
A two-stage process
Decide that the function/algorithm should be
computed an issue of privacy
Apply secure computation techniques to compute it
securely security

7
Outline

Secure multiparty computation
Problem and security definitions
Feasibility results for secure computation
Basic cryptographic tools and general
constructions

8
Heuristic Approach to Security

Build a protocol
Try to break the protocol
Fix the break
Return to (2)

9
Another Heuristic Tactic

Design a protocol
Provide a list of attacks that (provably) cannot
be carried out on the protocol
Reason that the list is complete
Problem often, the list is not complete

10
A Rigorous Approach

Provide an exact problem definition
Adversarial power
Network model
Meaning of security
Prove that the protocol is secure

11
Secure Multiparty Computation

A set of parties with private inputs wish to
compute some joint function of their inputs.
Parties wish to preserve some security
properties. e.g., privacy and correctness.
Example secure election protocol
Security must be preserved in the face of
adversarial behavior by some of the participants,
or by an external party.

12
Defining Security

Components of ANY security definition
Adversarial power
Network model
Type of network
Existence of trusted help
Stand-alone versus composition
Security guarantees
It is crucial that all the above are explicitly
and clearly defined.

13
Security Requirements

Consider a secure auction (with secret bids)
An adversary may wish to learn the bids of all
parties to prevent this, require privacy
An adversary may wish to win with a lower bid
than the highest to prevent this, require
correctness

14
Defining Security

Option 1 analyze security concerns for each
specific problem
Auctions privacy and correctness
Contract signing fairness
Problems
How do we know that all concerns are covered?
Definitions are application dependent and need to
be redefined from scratch for each task

15
Defining Security Option 2

The real/ideal model paradigm for defining
security GMW,GL,Be,MR,Ca
Ideal model parties send inputs to a trusted
party, who computes the function for them
Real model parties run a real protocol with no
trusted help
A protocol is secure if any attack on a real
protocol can be carried out in the ideal model
Since no attacks can be carried out in the ideal
model, security is implied

16
The Real Model
x
y
Protocol output
Protocol output
17
The Ideal Model
x
y
y
x
f1(x,y)
f2(x,y)
f2(x,y)
f1(x,y)
18
The Security Definition
Protocol interaction
Trusted party
IDEAL
REAL
19
Properties of the Definition

Privacy
The ideal-model adversary cannot learn more about
the honest partys input than what is revealed by
the function output
Thus, the same is true of the real-model
adversary
Correctness
In the ideal model, the function is always
computed correctly
Thus, the same is true in the real-model
Others
For example, fairness, independence of inputs

20
Why This Approach?

General it captures all applications
The specifics of an application are defined by
its functionality, security is defined as above
The security guarantees achieved are easily
understood (because the ideal model is easily
understood)
We can be confident that we did not miss any
security requirements

21
Adversary Model

Computational power
Probabilistic polynomial-time versus all-powerful
Adversarial behaviour
Semi-honest follows protocol instructions
Malicious arbitrary actions
Corruption behaviour
Static set of corrupted parties fixed at onset
Adaptive can choose to corrupt parties at any
time during computation
Number of corruptions
Honest majority versus unlimited corruptions

22
Outline

Secure multiparty computation
Defining security
Feasibility results for secure computation
Basic cryptographic tools and general
constructions

23
Feasibility A Fundamental Theorem

Any multiparty functionality can be securely
computed
For any number of corrupted parties security
with abort is achieved, assuming enhanced
trapdoor permutations Yao,GMW
With an honest majority full security is
achieved, assume private channels only BGW,CCD

24
Outline

Secure multiparty computation
Defining security
Feasibility results for secure computation
Basic cryptographic tools and general
constructions

25
Public-key encryption

Let (G,E,D) be a public-key encryption scheme
G is a key-generation algorithm (pk,sk) ? G
Pk public key
Sk secret key
Terms
Plaintext the original text, notated as m
Ciphertext the encrypted text, notated as c
Encryption c Epk(m)
Decryption m Dsk(c)
Concept of one-way function knowing c, pk, and
the function Epk, it is still computationally
intractable to find m.
Different implementations available, e.g. RSA

26
Construction paradigms

Passively-secure computation for two-parties
Use oblivious transfer to securely select a value
Passively-secure computation with shares
Use secret sharing scheme such that data can be
reconstructed from some shares
From passively-secure protocols to
actively-secure protocols
Use zero-knowledge proofs to force parties to
behave in a way consistent with the
passively-secure protocol

27
1-out-of-2 Oblivious Transfer (OT)

1-out-of-2 Oblivious Transfer (OT)
Inputs
Sender has two messages m0 and m1
Receiver has a single bit ??0,1
Outputs
Sender receives nothing
Receiver obtain m? and learns nothing of m1-?

28
Semi-Honest OT

Let (G,E,D) be a public-key encryption scheme
G is a key-generation algorithm (pk,sk) ? G
Encryption c Epk(m)
Decryption m Dsk(c)
Assume that a public-key can be sampled without
knowledge of its secret key
Oblivious key generation pk ? OG
El-Gamal encryption has this property

29
Semi-Honest OT

Protocol for Oblivious Transfer
Receiver (with input ?)
Receiver chooses one key-pair (pk,sk) and one
public-key pk (oblivious of secret-key).
Receiver sets pk? pk, pk1-? pk
Note receiver can decrypt for pk? but not for
pk1-?
Receiver sends pk0,pk1 to sender
Sender (with input m0,m1)
Sends receiver c0Epk0(m0), c1Epk1(m1)
Receiver
Decrypts c? using sk and obtains m?.

30
Security Proof

Intuition
Senders view consists only of two public keys
pk0 and pk1. Therefore, it doesnt learn anything
about that value of ?.
The receiver only knows one secret-key and so can
only learn one message
Note this assumes semi-honest behavior. A
malicious receiver can choose two keys together
with their secret keys.

31
Generalization

Can define 1-out-of-k oblivious transfer

Protocol remains the same
Choose k-1 public keys for which the secret key
is unknown
Choose 1 public-key and secret-key pair

32
General GMW Construction

For simplicity consider two-party case
Let f be the function that the parties wish to
compute
Represent f as an arithmetic circuit with
addition and multiplication gates
Aim compute gate-by-gate, revealing only random
shares each time

33
Random Shares Paradigm

Let a be some value
Party 1 holds a random value a1
Party 2 holds aa1
Note that without knowing a1, aa1 is just a
random value revealing nothing of a.
We say that the parties hold random shares of a.
The computation will be such that all
intermediate values are random shares (and so
they reveal nothing).

34
Circuit Computation

Stage 1 each party randomly shares its input
with the other party
Stage 2 compute gates of circuit as follows
Given random shares to the input wires, compute
random shares of the output wires
Stage 3 combine shares of the output wires in
order to obtain actual output

NOT
Alices inputs
Bobs inputs
35
Addition Gates

Input wires to gate have values a and b
Party 1 has shares a1 and b1
Party 2 has shares a2 and b2
Note a1a2a and b1b2b
To compute random shares of output cab
Party 1 locally computes c1a1b1
Party 2 locally computes c2a2b2
Note c1c2a1a2b1b2abc

36
Multiplication Gates

Input wires to gate have values a and b
Party 1 has shares a1 and b1
Party 2 has shares a2 and b2
Wish to compute c ab (a1a2)(b1b2)
Party 1 knows its concrete share values.
Party 2s values are unknown to Party 1, but
there are only 4 possibilities (depending on
correspondence to 00,01,10,11)

37
Multiplication (cont)

Party 1 prepares a table as follows
Row 1 corresponds to Party 2s input 00
Row 2 corresponds to Party 2s input 01
Row 3 corresponds to Party 2s input 10
Row 4 corresponds to Party 2s input 11
Let r be a random bit chosen by Party 1
Row 1 contains the value a?br when a20,b20
Row 2 contains the value a?br when a20,b21
Row 3 contains the value a?br when a21,b20
Row 4 contains the value a?br when a21,b21

38
Concrete Example

Assume a10, b11
Assume r1

Row Party 2s shares Output value
1 a20,b20 (00).(10)11
2 a20,b21 (00).(11)11
3 a21,b20 (01).(10)10
4 a21,b21 (01).(11)11
39
The Gate Protocol

The parties run a 1-out-of-4 oblivious transfer
protocol
Party 1 plays the sender message i is row i of
the table.
Party 2 plays the receiver it inputs 1 if a20
and b20, 2 if a20 and b21, and so on
Output
Party 2 receives c2cr this is its output
Party 1 outputs c1r
Note c1 and c2 are random shares of c, as
required

40
Summary

By computing each gate these way, at the end the
parties hold shares of the output wires.
Function output generated by simply sending
shares to each other.

41
Security

Reduction to the oblivious transfer protocol
Assuming security of the OT protocol, parties
only see random values until the end. Therefore,
simulation is straightforward.
Note correctness relies heavily on semi-honest
behavior (otherwise can modify shares).

42
Outline

Secure multiparty computation
Defining security
Feasibility results for secure computation
Basic cryptographic tools and general
constructions
Coming up
Applications in privacy preserving distributed
data mining
Random response protocols

43
A real-world problem and some simple solutions

Bob comes to Ron (a manager), with a complaint
about a sensitive matter, asking Ron to keep his
identity confidential
A few months later, Moshe (another manager) tells
Ron that someone has complained to him, also with
a confidentiality request, about the same matter
Ron and Moshe would like to determine whether the
same person has complained to each of them
without giving information to each other about
their identities

Comparing information without leaking it. Fagin
et al, 1996
44
References

Secure Multiparty Computation for
Privacy-Preserving Data Mining, Pinkas, 2008
Chapter 7 General Cryptographic Protocols ( 7.1
Overview), The Foundations of Cryptography,
Volume 2, Oded Goldreich
http//www.wisdom.weizmann.ac.il/Eoded/foc-vo
l2.html
Comparing information without leaking it. Fagin
et al, 1996

45
Slides credits

Tutorial on secure multi-party computation,
Lindell
www.cs.biu.ac.il/lindell/research-statements/tut
orial-secure-computation.ppt
Introduction to secure multi-party computation,
Vitaly Shmatikov, UT Austin
www.cs.utexas.edu/shmat/courses/cs380s_fall08/16
smc.ppt

46
Remark

The semi-honest model is often used as a tool for
obtaining security against malicious parties.
In many (most?) settings, security against
semi-honest adversaries does not suffice.
In some settings, it may suffice.
One example hospitals that wish to share data.

47
Malicious Adversaries

The above protocol is not secure against
malicious adversaries
A malicious adversary may learn more than it
should.
A malicious adversary can cause the honest party
to receive incorrect output.
We need to be able to extract a malicious
adversarys input and send it to the trusted
party.

48
Tool Zero Knowledge

Problem setting a prover wishes to prove a
statement to the verifier so that
Zero knowledge the verifier will learn nothing
beyond the fact that the statement is correct
Soundness the prover will not be able to
convince the verifier of a wrong statement
Zero-knowledge proven using simulation.

49
Illustrative Example

Prover has two colored cards that he claims are
of different color
The verifier is color blind and wants a proof
that the colors are different.
Idea 1 use a machine to measure the light waves
and color. But, then the verifier will learn what
the colors are.

50
Example (continued)

Protocol
Verifier writes color1 and color2 on the back of
the cards and shows the prover
Verifier holds out one card so that the prover
only sees the front
The prover then says whether or not it is color1
or color2
Soundness if they are both the same color, the
prover will fail with probability ½. By repeating
many times, will obtain good soundness bound.
Zero knowledge verifier can simulate by itself
by holding out a card and just saying the color
that it knows

51
Zero Knowledge

Fundamental Theorem GMR zero-knowledge proofs
exist for all languages in NP
Observation given commitment to input and random
tape, and given incoming message series,
correctness of next message in a protocol is an
NP-statement.
Therefore, it can be proved in zero-knowledge.

52
Protocol Compilation

Given any protocol, construct a new protocol as
follows
Both parties commit to inputs
Both parties generate uniform random tape
Parties send messages to each other, each message
is proved correct with respect to the original
protocol, with zero-knowledge proofs.

53
Resulting Protocol

Theorem if the initial protocol was secure
against semi-honest adversaries, then the
compiled protocol is secure against malicious
adversaries.
Proof
Show that even malicious adversaries are limited
to semi-honest behavior.
Show that the additional messages from the
compilation all reveal nothing.

54
Summary

GMW paradigm
First, construct a protocol for semi-honest adv.
Then, compile it so that it is secure also
against malicious adversaries
There are many other ways to construct secure
protocols some of them significantly more
efficient.
Efficient protocols against semi-honest
adversaries are far easier to obtain than for
malicious adversaries.

55
Useful References

Oded Goldreich. Foundations of Cryptography
Volume 1 Basic Tools. Cambridge University
Press.
Computational hardness, pseudorandomness, zero
knowledge
Oded Goldreich. Foundations of Cryptography
Volume 2 Basic Applications. Cambridge
University Press.
Chapter on secure computation
Papers an endless list (I would rather not go on
record here, but am very happy to personally
refer people).