Private Keyword Search on Streaming Data - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Private Keyword Search on Streaming Data

Description:

Homomorphic, i.e., E(x)E(y) = E(x y) Simplifying Assumptions for this Talk ... Scheme based on arbitrary homomorphic encryption. Conclusions ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 28
Provided by: stealthso
Category:

less

Transcript and Presenter's Notes

Title: Private Keyword Search on Streaming Data


1
Private Keyword Search on Streaming Data

Rafail Ostrovsky William
Skeith UCLA
(patent pending)
2
Motivating Example
  • The intelligence community collects data from
    multiple sources that might potentially be
    useful for future analysis.
  • Network traffic
  • Chat rooms
  • Web sites, etc
  • However, what is useful is often classified.

3
Current Practice
  • Continuously transfer all data to a secure
    environment.
  • After data is transferred, filter in the
    classified environment, keep only small fraction
    of documents.

4
  • Classified Environment

Filter
Storage
! D(1,3)! D(1,2)! D(1,1)!
D(3,1)
D(1,1)
D(1,2)
D(2,2)
D(2,3)
D(3,2)
D(2,1)
D(1,3)
D(3,3)
! D(2,3)! D(2,2)! D(2,1)!
Filter rules are written by an analyst and are
classified!
! D(3,3)! D(3,2)! D(3,1)!
5
Current Practice
  • Drawbacks
  • Communication
  • Processing

6
How to improve performance?
  • Distribute work to many locations on a network
  • Seemingly ideal solution, but
  • Major problem
  • Not clear how to maintain privacy, which is the
    focus of this talk

7
  • Classified Environment

Storage E (D(1,2)) E (D(1,3))
Filter
! D(1,3)! D(1,2)! D(1,1)!
Decrypt
Storage E (D(2,2))
Filter
! D(2,3)! D(2,2)! D(2,1)!
Storage D(1,2) D(1,3) D(2,2)
Storage
Filter
! D(3,3)! D(3,2)! D(3,1)!
8
  • Example Filter
  • Look for all documents that contain special
    classified keywords, selected by an analyst
  • Perhaps an alias of a dangerous criminal
  • Privacy
  • Must hide what words are used to create the
    filter
  • Output must be encrypted

9
More generally
  • We define the notion of Public Key Program
    Obfuscation
  • Encrypted version of a program
  • Performs same functionality as un-obfuscated
    program, but
  • Produces encrypted output
  • Impossible to reverse engineer
  • A little more formally

10
Public Key Program Obfuscation
11
Privacy
12
Related Notions
  • PIR (Private Information Retrieval)
    CGKS,KO,CMS
  • Keyword PIR KO,CGN,FIPR
  • Program Obfuscation BGIRSVY
  • Here output is identical to un-obfuscated
    program, but in our case it is encrypted.
  • Public Key Program Obfuscation
  • A more general notion than PIR, with lots of
    applications

13
What we want
Filter
Storage
! D(1,3)! D(1,2)! D(1,1)!
14
This is matching document 2

This is a Non-matching document
This is matching document 1
This is matching document 3


This is a Non-matching document
This is a Non-matching document




15
How to accomplish this?
16
Several Solutions based on Homomorphic Encryptions
  • For this talk Paillier Encryption
  • Properties
  • Plaintext set Zn
  • Ciphertext set Zn2
  • Homomorphic, i.e., E(x)E(y) E(xy)

17
Simplifying Assumptions for this Talk
  • All keywords come from some poly-size dictionary
  • Truncate documents beyond a certain length

18
D
Dictionary
. . .
(g,gD)



Output Buffer
19
Heres another matching document
  • Collisions cause two problems
  • Good documents are destroyed
  • 2. Non-existent documents could be fabricated

This is matching document 1
This is matching document3

This is matching document 2
20
  • Well make use of two combinatorial lemmas

21
(No Transcript)
22
How to detect collisions?
  • Append a highly structured, (yet random) k-bit
    string to the message
  • The sum of two or more such strings will be
    another such string with negligible probability
    in k
  • Specifically, partition k bits into triples of
    bits, and set exactly one bit from each triple to
    1

23
  • 100001100010010100001010010

010001010001100001100001010
010100100100010001010001010

100100010111100100111010010
24
Detecting Overflow gt m
  • Double buffer size from m to 2m
  • If m lt documents lt 2m, output overflow
  • If documents gt 2m, then expected number of
    collisions is large, thus output overflow in
    this case as well.
  • Not yet in eprint version, will appear soon, as
    well as some other extensions.

25
More from the paper that we dont have time to
discuss
  • Reducing program size below dictionary size
    (using ? Hiding from CMS)
  • Queries containing AND (using BGN machinery)
  • Eliminating negligible error (using perfect
    hashing)
  • Scheme based on arbitrary homomorphic encryption

26
Conclusions
  • Private searching on streaming data
  • Public key program obfuscation, more general than
    PIR
  • Practical, efficient protocols
  • Many open problems

27
Thanks For Listening!?
Write a Comment
User Comments (0)
About PowerShow.com