How to compile searching software so that it is impossible to reverse-engineer. - PowerPoint PPT Presentation

About This Presentation
Title:

How to compile searching software so that it is impossible to reverse-engineer.

Description:

How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data) Rafail Ostrovsky William ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 34
Provided by: ucl109
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: How to compile searching software so that it is impossible to reverse-engineer.


1
How to compile searching software so that it is
impossible to reverse-engineer.
(Private Keyword Search on Streaming Data)
Rafail Ostrovsky William
Skeith UCLA
(patent pending)
2
MOTIVATION Problem 1.
  • Each hour, we wish to find if any of hundreds of
    passenger lists has a name from Possible
    Terrorists list and if so his/hers itinerary.
  • Possible Terrorists list is classified and
    should not be revealed to airports
  • Tantalizing question can the airports help (and
    do all the search work) if they are not allowed
    to get possible terrorist list?

PROBLEM 1 Is it possible to design mobile
software that can be transmitted to all airports
(including potentially revealing this software to
the adversary due to leaks) so that this software
collects ONLY information needed and without
revealing what it is collecting at each node?
Non-triviality requirement must send back
only needed information, not everything!
3
MOTIVATION Problem 2.
  • Looking for malicious insiders and/or terrorists
    communication
  • (I) First, we must identify some signature
    criteria (rules) for suspicious behavior
    typically, this is done by analysts.
  • (II) Second, we must detect which nodes/stations
    transmit these signatures.
  • Here, we want to tackle part (II).

Public networks
PROBLEM 2 Is it possible to design software that
can capture all messages (and network locations)
that include secret/classified set of rules?
Key challenge the software must not reveal
secret rules. Non-triviality requirement the
software must send back only locations and
messages that match given rules, not
everything it sees.
4
Current Practice
  • Continuously transfer all data to a secure
    environment.
  • After data is transferred, filter in the
    classified environment, keep only small fraction
    of documents.

5
Current practice
  • Classified Environment

Filter
Storage
? D(1,3)?D(1,2)? D(1,1)?
D(3,1)
D(1,1)
D(1,2)
D(2,2)
D(2,3)
D(3,2)
D(2,1)
D(1,3)
D(3,3)
?D(2,3)?D(2,2) ?D(2,1)?
Filter rules are written by an analyst and are
classified!
? D(3,3) ? D(3,2) ?D(3,1) ?
Amount of data that must be transferred to a
classified environment is enormous!
6
Drawbacks
  • Communication
  • Processing
  • Cost and timeliness

7
How to improve performance?
  • Distribute work to many locations on a network,
    where you decide on the fly which data is
    useful
  • Seemingly ideal solution, but
  • Major problem
  • Not clear how to maintain security, which is the
    focus of this technology.

8
Our Architecture
Punch line we can send executable code
publicly. (it wont reveal its secrets!)
9
  • HIGH NETWORK (classified)

Storage E (D(1,2)) E (D(1,3))
Filter
? D(1,3)? D(1,2)?D(1,1)?
Decrypt
Storage E (D(2,2))
Filter
? D(2,3)?D(2,2)?D(2,1)?
Storage D(1,2) D(1,3) D(2,2)
Storage
Filter
?D(3,3)?D(3,2)?D(3,1)?
10
  • Example Filters
  • Look for all documents that contain special
    classified keywords (or string or data-item
    and/or do not contain some other data), selected
    by an analyst.
  • Privacy
  • Must hide what rules are used to create the
    filter
  • Output must be encrypted

11
What do we want?
Filter
Storage E (D(1,2)) E (D(1,3))
?D(1,3)?D(1,2)?D(1,1)?
2 requirements correctness only matching
documents are saved, nothing else. efficiency
the decoding is proportional to the length of the
buffer, not the size of the entire stream.
Conundrum Complied Filter Code is not allowed to
have ANY branches (i.e. any if then else
executables). Only straight-line code is allowed!
12
Simplifying Assumptions for this Talk
  • All keywords come from some poly-size dictionary
  • Truncate documents beyond a certain length

13
Sneak peak the compiled code
  • Suppose we are looking for all documents that
    contain some secret word from Webster dictionary.
  • Here is how it looks to the adversary For each
    document, execute the same code as follows

14
Lookup encryptions of all words appearing in the
document and multiply them together. Take this
value and apply a fixed formula to it to get
value g.
w1 E()
w2 E()
w3 E()
w4 E()
w5 E()
D
Dictionary
. . .
wn-2 E()
wn-1 E()
wn E()
g
(,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,)
Small Output Buffer
15
How should a solution look?
16
This is matching document 2

This is a Non-matching document
This is matching document 1
This is matching document 3


This is a Non-matching document
This is a Non-matching document




17
How do we accomplish this?
18
Reminder PKE
  • Key-generation(1k) ? (PK, SK)
  • E(PK,m,r) ? c
  • D(c, SK) ? m
  • We will use PKE with additional properties.

19
Several Solutions based on Homomorphic Public-Key
Encryptions
  • For this talk Paillier Encryption
  • Properties
  • E(x) is probabilistic, in particular can encrypt
    a single bit in many different ways, s.t. any
    instances of E(0) and any instance of E(1) can
    not be distinguished.
  • Homomorphic i.e., E(x)E(y) E(xy)

20
Using Paillier Encryption
  • E(x)E(y) E(xy)
  • Important to note
  • E(0)c E(0)E(0)
  • E(00.0) E(0)
  • E(1)c E(1)E(1)
  • E(111) E(c)
  • Assume we can somehow compute an encrypted value
    v, where we dont know what v stands for, but
    vE(0) for un-interesting documents and vE(1)
    for interesting documents.
  • Whats vc ? It is either E(0) or E(C) where we
    dont know which one it is.

21
w1 E(0)
w2 E(1)
w3 E(0)
w4 E(0)
w5 E(1)
D
g E(0) if there are no matching words g E(c)
if there are c matching words
Dictionary
gD E(0) if there are no matching words gD
E(cD) if there are c matching words Thus if we
keep gE(c) and gDE(cD), we can calculate D
exactly.
. . .
wn-2 E(1)
wn-1 E(0)
wn E(0)
(g,gD)
E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0)
Output Buffer
22
Heres another matching document
  • Collisions cause two problems
  • Good documents are destroyed
  • 2. Non-existent documents could be fabricated

This is matching document 1
This is matching document3

This is matching document 2
23
  • Well make use of two combinatorial lemmas

24
(No Transcript)
25
Combinatorial Lemma 1
  • Claim color survival games succeeds with
    probability gt 1-neg(g)

26
How to detect collisions?
  • Idea append a highly structured, (yet random)
    short combinatorial object to the message with
    the property that if 2 or more of them collide
    the combinatorial property is destroyed.
  • ? can always detect collisions!

27
  • 100001100010010100001010010

010001010001100001100001010
010100100100010001010001010

100100010111100100111010010
28
Combinatorial Lemma 2
Claim collisions are detected with
probability gt 1 - exp(-k/3)
29
We do the same for all documents!
30
For every document in the stream do the same
Lookup encryptions of all words appearing in the
document and multiply them together ( g).
w1 E()
w2 E()
w3 E()
w4 E()
w5 E()
D
Dictionary
Compute gD and f(g)
. . .
multiply (g,gD,f(g))into g randomly chosen
locations
wn-2 E()
wn-1 E()
wn E()
(g,gD,f(g))
(,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,)
Small Output Buffer
31
Extensions (1)
  • Can execute more sophisticated rules
  • OR of keywords
  • Catch documents where some words must be not be
    present
  • Catch documents where certain words must be
    close in text
  • Many others, depending on the application.

32
Extensions (2)
  • Can do even more
  • Detect overflow.
  • In case of an overflow of matching documents,
    collect a sample
  • Dynamically change rules on a public web-page
  • Can act as an ultimate corporate security tool!

33
Conclusions
  • We introduced Private searching on streaming data
  • More generally smart encryption
  • Practical, deployable solutions
  • Eat your cake and have it too ensure that only
    useful documents are collected.
  • A new gadget in your quiver of technologies!
  • THANK YOU!
Write a Comment
User Comments (0)
About PowerShow.com