Private Matching - PowerPoint PPT Presentation

About This Presentation
Title:

Private Matching

Description:

Technological changes erode privacy: ubiquitous computing, cheap storage. ... (credit card purchases, magazine subscriptions, bank deposits, flights) ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 38
Provided by: Ben5153
Category:
Tags: matching | private

less

Transcript and Presenter's Notes

Title: Private Matching


1
Privacy Preserving Data Mining Lecture
1 Motivating privacy research, Introducing
Crypto
Benny Pinkas HP Labs, Israel
2
Course structure
  • Lecture 1
  • Introduction to privacy
  • Introduction to cryptography, in particular, to
    rigorous cryptographic analysis.
  • Definitions
  • Proofs of security
  • Lecture 2
  • Cryptographic tools for privacy preserving data
    mining.
  • Lecture 3
  • Non-cryptographic tools for privacy preserving
    data mining
  • In particular, answer perturbation.

3
Privacy-Preserving Data Mining
  • Allow multiple data holders to collaborate in
    order to compute important information while
    protecting the privacy of other information.
  • Security-related information
  • Public health information
  • Marketing information
  • Advantages of privacy protection
  • protection of personal information
  • protection of proprietary or sensitive
    information
  • enables collaboration between different data
    owners (since they may be more willing or able to
    collaborate if they need not reveal their
    information)
  • compliance with the law

4
Privacy Preserving Data Mining
  • Two papers appeared in 2000
  • Privacy preserving data mining, Agrawal and
    Srikant, SIGMOD 2000. (statistical approach)
  • Privacy preserving data mining, Lindell and
    Pinkas, Crypto 2000. (cryptographic approach)
  • Why privacy now?
  • Technological changes erode privacy ubiquitous
    computing, cheap storage.
  • Public awareness health coverage, employment,
    personal relationships.
  • Historical changes Small towns vs. Cities vs.
    Connected society.
  • Privacy is a real problem that needs to be solved

5
Some data privacy cases hospital data
  • Hospital data contains
  • Identifying information name, id, address
  • General information age, marital status
  • Medical information
  • Billing information
  • Database access issues
  • Your doctor should get every information that is
    required to take care of you
  • Emergency rooms should get all medical
    information that is required to take care of
    whoever comes there
  • Billing department should only get information
    relevant to billing
  • Problem how to stop employees from getting
    information about family, neighbors, celebrities?

6
Some data privacy cases Medical Research
  • Medical research
  • Trying to learn patterns in the data, in
    aggregate form.
  • Problem how to enable learning aggregate data
    without revealing personal medical information?
  • Hiding names is not enough, since there are many
    ways to uniquely identify a person
  • A single hospitals/medical researcher might not
    have enough data
  • How can different organizations share research
    data without revealing personal data?

7
Public Data
  • Many public records are available in electronic
    form birth records, property records, voter
    registration
  • Your information serves as an error correcting
    code of your identity
  • Latanya Sweeney
  • Date of birth uniquely identifies 12 of the
    population of Cambridge, MA.
  • Date of birth gender 29
  • Date of birth gender (9 digit) zip code 95
  • Sweeney was therefore able to get her medical
    information from an annonymized database

8
Census data
  • A trusted party (the census bureau) collects
    information about individuals
  • Collected data
  • Explicitly identifying data (names, address..)
  • Implicitly identifying data (combination of
    several attributes)
  • Private data
  • The data should is collected to help decision
    making
  • Partial or aggregate data should therefore made
    public

9
Total Information Awareness (TIA)
  • Collects information about transactions (credit
    card purchases, magazine subscriptions, bank
    deposits, flights)
  • Early detection of terrorist activity
  • Check a chemistry book in the library, buy
    something at a hardware store and something in a
    pharmacy
  • Early collection of epidemic bursts
  • Early symptoms of Anthrax are similar to the flu
  • Check non-traditional data sources grocery and
    pharmacy data, school attendance records, etc..
  • Such systems are developed and used
  • Could the collection of data be done in a privacy
    preserving manner? (without learning about
    individuals?)

10
Basic Scenarios
  • Single (centralized) database, e.g., census data
  • This is often a simple abstraction of a more
    complicated scenario, so we better solve this one
  • Need to collect data and present it in a privacy
    preserving way
  • Published data (e.g., on a CD)
  • A trusted party collects data and then
    publishes a sanitized version
  • Users can do any computation they wish with the
    sanitized data
  • For example, statistical tabulations.

11
Basic Scenarios
  • Multi database scenarios
  • Two or more parties with private data want to
    cooperate.
  • Horizontally split Each party has a large
    database. Databases have same attributes but are
    about different subjects. For example, the
    parties are banks which each have information
    about their customers.
  • Vertically split Each party has some information
    about the same set of subjects. For example, the
    participating parties are government agencies
    each with some data about every citizen.

bank 1
u1 un
u1 un
houses
u1 un
bank 2
bank
taxes
12
Issues and Tools
  • Best privacy can be achieved by not giving any
    data, but..
  • Privacy tools cryptography LP00
  • Encryption data is hidden unless you have the
    decryption key. However, we also want to use the
    data.
  • Secure function evaluation two or more parties
    with private inputs. Can compute any function
    they wish without revealing anything else.
  • Strong theory. Starts to be relevant to real
    applications.
  • Non-cryptographic tools AS00
  • Query restriction prevent certain queries from
    being answered.
  • Data/Input/output perturbation add errors to
    inputs hide personal data while keeping
    aggregates accurate. (randomization, rounding,
    data swapping.)
  • Can these be understood as well as we understand
    Crypto? Provide the same level of security as
    Crypto?

13
Introduction to Cryptography
14
Why learn/use crypto to solve privacy issues?
  • Why are we referring to crypto?
  • Cryptography is one of the tools we can use for
    preserving privacy
  • A mature research area
  • many useful results/tools
  • Can reflect on our thinking how is security
    defined in cryptography? How should we define
    privacy?

15
What is Cryptography?
Traditionally how to maintain secrecy in
communication
Alice and Bob talk while Eve tries to listen
Bob
Alice
Eve
16
History of Cryptography
  • Very ancient occupation
  • Up to the mid 70s - mostly classified military
    work
  • Exception Shannon, Turing
  • Since then - explosive growth
  • Commercial applications
  • Scientific work tight relationship with
    Computational Complexity Theory
  • Major works Diffie-Hellman, Rivest, Shamir and
    Adleman (RSA)
  • Recently - more involved models for more diverse
    tasks.
  • Scope How to maintain the secrecy, integrity and
    functionality in computer and communication
    system.

17
Relation to computational hardness
  • Cryptography uses problems that are infeasible to
    solve.
  • Uses the intractability of some problems in order
    to construct secure systems.
  • Feasible computable in probabilistic polynomial
    time (PPT)
  • Infeasible no probabilistic polynomial time
    algorithm
  • Usually average case hardness is needed
  • For example, the discrete log problem

18
The Discrete Log Problem
  • Let G be a group and g an element in G.
  • Given y?G let x be minimal non-negative integer
    satisfying the equation ygx.
  • x is called the discrete log of y to base g.
  • Example ygx mod p in the multiplicative group
    of Zp (p is prime). (For example, p7, g3, y4
    ? x4.)
  • In general, it is easy to exponentiate
  • (using repeated squaring and the binary
    representation of x)
  • Computing the discrete log is believed to be hard
    in Zp if p is large. (E.g., p is a prime,
    pgt768 bits, p2q1 and q is also a prime.)

19
Encryption
  • Alice wants to send a message m ? 0,1n to Bob
  • Set-up phase is secret
  • Symmetric encryption Alice and Bob share a
    secret key k
  • They want to prevent Eve from learning anything
    about the message

Ek(m)
Alice
Bob
k
k
Eve
20
Public key encryption
  • Alice generates a private/public key pair (SK,PK)
  • Only Alice knows the secret key SK
  • Everyone (even Eve) knows the public key PK, and
    can encrypt messages to Alice
  • Only Alice can decrypt (using SK)

EPK(m)
Alice
Bob
SK
PK
EPK(m)
Charlie
Eve
PK
21
Rigorous Specification of Security
  • To define the security of a system we must
    specify
  • What constitute a failure of the system
  • The power of the adversary
  • computational
  • access to the system
  • what it means to break the system.

22
What does learn mean?
  • Even if Eve has some prior knowledge of m, she
    should not have any advantage in
  • Probability of guessing m, or probability of
    guessing whether m is m0 or m1, or prob. of
    computing any other function f of m ,or even
    computing m
  • Ideally the message sent is a independent of the
    message m
  • Implies all the above
  • Achievable one-time pad (symmetric encryption)
  • Let r?R 0,1 n be the shared key.
  • Let m ? 0,1 n
  • To encrypt m send r ? m
  • To decrypt z send m z ? r
  • Shannon achievable only if the entropy of the
    shared secret is at least as large as that of m.
    Therefore must use long key ?.

23
Defining security
  • The power of the adversary
  • Computational Probabilistic polynomial time
    machine (PPTM)
  • Access to the system e.g. can it change
    messages?
  • Passive adversary, (adaptive) chosen plaintext
    attack, chosen ciphertext attack
  • What constitutes a failure of the system?
  • Recovering plaintext from ciphertext not enough
  • Allows for the leakage of partial information
  • In general, hard to answer which partial
    information may/should not be leaked. Application
    dependent.
  • How would partial information the adversary
    already holds be combined with what he learns to
    affect privacy?
  • Better Prevent learning anything about an
    encrypted message
  • There are two common, equivalent, definitions

24
Security of Encryption Definition
1Indistinguishability of Encryptions
  • Adversary A chooses any X0 , X1 ??0,1?n
  • Receives encryption of Xb for b?R?0,1?
  • Has to decide whether b ? 0 or b ? 1.
  • For every PPTM A, choosing a pair X0 , X1
    ??0,1?n
  • Pr? A(E(X0)) 1 ? - Pr? A(E(Xb1))? 1 ?
    neg(n)
  • (Probability is over the choice of keys,
    randomization in the encryption and As coins)
  • Note that a proof of security must be rigorous

25
Computational Indistinguishability
  • Definition two sequences of distributions Dn
    and Dn on 0,1n are computationally
    indistinguishable if
  • for every polynomial p(n) and sufficiently large
    n, for every probabilistic polynomial time
    adversary A that receives input y ? 0,1n and
    tries to decide whether y was sampled from Dn or
    Dn
  • ProbA0 Dn - ProbA0 Dn lt
    1/p(n)

26
Security of Encryption Definition 2Semantic
Security
  • Simulation Whatever Adversary A can compute
    given an encryption of X ??0,1?n so can a
    simulator S that does not get to see the
    encryption of X.
  • A selects a distribution Dn on ?0,1?n and a
    relation R(X,Y) - computable in PPT (e.g.
    R(X,Y)1 iff Y is last bit of X).
  • X?R Dn is sampled
  • Given E(X), A outputs Y trying to satisfy
    R(X,Y)
  • The simulator S does the same without access to
    E(X)
  • Simulation is successful if A and S have the same
    success probability
  • Successful simulation ? semantic security

27
Security of Encryption (2)Semantic Security
  • More formally
  • For every PPTM A there is a PPTM S so that
  • for all PPTM relations R
  • for X?R Dn
  • ? Pr? R(X,A(E(X)) ? - Pr? R(X,S(?)) ? ?
  • is negligible.
  • In other words The outputs of A and S are
    indistinguishable even for a test that is aware
    of X.

28
Which is the Right Definition?
  • Semantic security seems to convey that the
    message is protected
  • But it is usually easier to prove
    indistinguishability of encryptions
  • Would like to argue that the two definitions are
    equivalent
  • Must define the attack chosen plaintext attack
  • Adversary can obtain the encryption for any
    message it chooses, in an adaptive manner
  • More severe attacks chosen ciphertext
  • The Equivalence Theorem
  • A cryptosystem is semantically secure if and
    only if it has the indistinguishability of
    encryptions property

29
Equivalence Proof (informal)
  • Semantic security ? Indistinguishability of
    encryptions
  • Suppose no indistinguishability
  • A chooses a pair X0 , X1??0,1?n for which it can
    distinguish encryptions with non-negligible
    advantage ?
  • Choose
  • Distribution Dn X0 , X1
  • Relation R which is equality with X
  • ?S that doesnt get E(X), and outputs Y we have
  • Prob R( X, Y ) ½
  • Given E(Xb ), run A(E(Xb )), get output b?0,1,
    set YXb
  • Now, Pr?A(E(Xb)) 1 ? b ? 1? - Pr?A(E(Xb))?
    1 ? b ? 0? gt ?
  • Therefore, Pr?R(X,Y)? - Pr?R(E(X,Y)? gt ? / 2

30
Equivalence Proof (informal)
  • Indistinguishability of encryptions ? Semantic
    security
  • Suppose no semantic security A chooses some
    distribution Dn and some relation R
  • Choose X0, X1 ?R Dn , choose b?R 0,1, compute
    E(Xb).
  • Give E(Xb) to A, ask A to compute Yb A(E(Xb))
  • For X0 , X1 ?R Dn let
  • ?0 ProbR(X0, Yb), ?1 ProbR(X1, Yb)
  • With noticeable probability ?0 - ?1 is
    non-negligible, since otherwise Yb can be
    computed without the encryption.
  • If ?0 - ?1 is non-negligible, then we can
    distinguish between an encryption of X0 and X1

31
Lessons learned?
  • Rigorous approach to cryptography
  • Defining security
  • Proving security

32
References
  • Books
  • O. Goldreich, Foundations of Cryptography Vol 1,
    Basic Tools, Cambridge, 2001
  • Pseudo-randomness, zero-knowledge
  • Vol 2, Basic Applications (to be available May
    2004)
  • Encryption, Secure Function Evaluation)
  • Other volumes in www.wisdom.weizmann.ac.il/oded/b
    ooks.html
  • Web material/courses
  • S. Goldwasser and M. Bellare, Lecture Notes on
    Cryptography,
  • http//www-cse.ucsd.edu/mihir/papers/gb.html
  • M. Naor, 9th EWSCS, http//www.cs.ioc.ee/yik/schoo
    ls/win2004/naor.php

33
Secure Function Evaluation
  • A major topic of cryptographic research
  • How to let n parties, P1,..,Pn compute a function
    f(x1,..,xn)
  • Where input xi is known to party Pi
  • Parties learn the final input and nothing else

34
The Millionaires Problem Yao
x
y
Alice
Bob
Whose value is greater?
Leak no other information!
35
Comparing Information without Leaking it
x
y
Alice
Bob
  • Output Is xy?
  • The following solution is insecure
  • Use a one-way hash function H()
  • Alice publishes H(x), Bob publishes H(y)

36
Secure two-party computation - definition
y
x
Input
F(x,y) and nothing else
Output
y
As if
x
Trusted third party
F(x,y)
F(x,y)
37
Leak no other information
  • A protocol is secure if it emulates the ideal
    solution
  • Alice learns F(x,y), and therefore can compute
    everything that is implied by x, her prior
    knowledge of y, and F(x,y).
  • Alice should not be able to compute anything else
  • Simulation
  • A protocol is considered secure if
  • For every adversary in the real world
  • There exists a simulator in the ideal world,
    which outputs an indistinguishable transcript
    , given access to the information that the
    adversary is allowed to learn

38
  • More tomorrow
Write a Comment
User Comments (0)
About PowerShow.com