Title: How to compile searching software so that it is impossible to reverse-engineer.
1How to compile searching software so that it is
impossible to reverse-engineer.
(Private Keyword Search on Streaming Data)
Rafail Ostrovsky William
Skeith UCLA
(patent pending)
2MOTIVATION Problem 1.
- Each hour, we wish to find if any of hundreds of
passenger lists has a name from Possible
Terrorists list and if so his/hers itinerary. - Possible Terrorists list is classified and
should not be revealed to airports - Tantalizing question can the airports help (and
do all the search work) if they are not allowed
to get possible terrorist list?
PROBLEM 1 Is it possible to design mobile
software that can be transmitted to all airports
(including potentially revealing this software to
the adversary due to leaks) so that this software
collects ONLY information needed and without
revealing what it is collecting at each node?
Non-triviality requirement must send back
only needed information, not everything!
3MOTIVATION Problem 2.
- Looking for malicious insiders and/or terrorists
communication - (I) First, we must identify some signature
criteria (rules) for suspicious behavior
typically, this is done by analysts. - (II) Second, we must detect which nodes/stations
transmit these signatures. - Here, we want to tackle part (II).
Public networks
PROBLEM 2 Is it possible to design software that
can capture all messages (and network locations)
that include secret/classified set of rules?
Key challenge the software must not reveal
secret rules. Non-triviality requirement the
software must send back only locations and
messages that match given rules, not
everything it sees.
4Current Practice
- Continuously transfer all data to a secure
environment. - After data is transferred, filter in the
classified environment, keep only small fraction
of documents.
5Current practice
Filter
Storage
? D(1,3)?D(1,2)? D(1,1)?
D(3,1)
D(1,1)
D(1,2)
D(2,2)
D(2,3)
D(3,2)
D(2,1)
D(1,3)
D(3,3)
?D(2,3)?D(2,2) ?D(2,1)?
Filter rules are written by an analyst and are
classified!
? D(3,3) ? D(3,2) ?D(3,1) ?
Amount of data that must be transferred to a
classified environment is enormous!
6Drawbacks
- Communication
- Processing
- Cost and timeliness
7How to improve performance?
- Distribute work to many locations on a network,
where you decide on the fly which data is
useful - Seemingly ideal solution, but
- Major problem
- Not clear how to maintain security, which is the
focus of this technology.
8Our Architecture
Punch line we can send executable code
publicly. (it wont reveal its secrets!)
9- HIGH NETWORK (classified)
Storage E (D(1,2)) E (D(1,3))
Filter
? D(1,3)? D(1,2)?D(1,1)?
Decrypt
Storage E (D(2,2))
Filter
? D(2,3)?D(2,2)?D(2,1)?
Storage D(1,2) D(1,3) D(2,2)
Storage
Filter
?D(3,3)?D(3,2)?D(3,1)?
10- Example Filters
- Look for all documents that contain special
classified keywords (or string or data-item
and/or do not contain some other data), selected
by an analyst. - Privacy
- Must hide what rules are used to create the
filter - Output must be encrypted
11What do we want?
Filter
Storage E (D(1,2)) E (D(1,3))
?D(1,3)?D(1,2)?D(1,1)?
2 requirements correctness only matching
documents are saved, nothing else. efficiency
the decoding is proportional to the length of the
buffer, not the size of the entire stream.
Conundrum Complied Filter Code is not allowed to
have ANY branches (i.e. any if then else
executables). Only straight-line code is allowed!
12Simplifying Assumptions for this Talk
- All keywords come from some poly-size dictionary
- Truncate documents beyond a certain length
13Sneak peak the compiled code
- Suppose we are looking for all documents that
contain some secret word from Webster dictionary. - Here is how it looks to the adversary For each
document, execute the same code as follows
14Lookup encryptions of all words appearing in the
document and multiply them together. Take this
value and apply a fixed formula to it to get
value g.
w1 E()
w2 E()
w3 E()
w4 E()
w5 E()
D
Dictionary
. . .
wn-2 E()
wn-1 E()
wn E()
g
(,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,)
Small Output Buffer
15How should a solution look?
16This is matching document 2
This is a Non-matching document
This is matching document 1
This is matching document 3
This is a Non-matching document
This is a Non-matching document
17How do we accomplish this?
18Reminder PKE
- Key-generation(1k) ? (PK, SK)
- E(PK,m,r) ? c
- D(c, SK) ? m
- We will use PKE with additional properties.
19Several Solutions based on Homomorphic Public-Key
Encryptions
- For this talk Paillier Encryption
- Properties
- E(x) is probabilistic, in particular can encrypt
a single bit in many different ways, s.t. any
instances of E(0) and any instance of E(1) can
not be distinguished. - Homomorphic i.e., E(x)E(y) E(xy)
20Using Paillier Encryption
- E(x)E(y) E(xy)
- Important to note
- E(0)c E(0)E(0)
- E(00.0) E(0)
- E(1)c E(1)E(1)
- E(111) E(c)
- Assume we can somehow compute an encrypted value
v, where we dont know what v stands for, but
vE(0) for un-interesting documents and vE(1)
for interesting documents. - Whats vc ? It is either E(0) or E(C) where we
dont know which one it is.
21w1 E(0)
w2 E(1)
w3 E(0)
w4 E(0)
w5 E(1)
D
g E(0) if there are no matching words g E(c)
if there are c matching words
Dictionary
gD E(0) if there are no matching words gD
E(cD) if there are c matching words Thus if we
keep gE(c) and gDE(cD), we can calculate D
exactly.
. . .
wn-2 E(1)
wn-1 E(0)
wn E(0)
(g,gD)
E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0) E(0)
Output Buffer
22Heres another matching document
- Collisions cause two problems
- Good documents are destroyed
- 2. Non-existent documents could be fabricated
This is matching document 1
This is matching document3
This is matching document 2
23- Well make use of two combinatorial lemmas
24(No Transcript)
25Combinatorial Lemma 1
- Claim color survival games succeeds with
probability gt 1-neg(g)
26How to detect collisions?
- Idea append a highly structured, (yet random)
short combinatorial object to the message with
the property that if 2 or more of them collide
the combinatorial property is destroyed. - ? can always detect collisions!
27- 100001100010010100001010010
010001010001100001100001010
010100100100010001010001010
100100010111100100111010010
28Combinatorial Lemma 2
Claim collisions are detected with
probability gt 1 - exp(-k/3)
29We do the same for all documents!
30For every document in the stream do the same
Lookup encryptions of all words appearing in the
document and multiply them together ( g).
w1 E()
w2 E()
w3 E()
w4 E()
w5 E()
D
Dictionary
Compute gD and f(g)
. . .
multiply (g,gD,f(g))into g randomly chosen
locations
wn-2 E()
wn-1 E()
wn E()
(g,gD,f(g))
(,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,)
Small Output Buffer
31Extensions (1)
- Can execute more sophisticated rules
- OR of keywords
- Catch documents where some words must be not be
present - Catch documents where certain words must be
close in text - Many others, depending on the application.
32Extensions (2)
- Can do even more
- Detect overflow.
- In case of an overflow of matching documents,
collect a sample - Dynamically change rules on a public web-page
- Can act as an ultimate corporate security tool!
33Conclusions
- We introduced Private searching on streaming data
- More generally smart encryption
- Practical, deployable solutions
- Eat your cake and have it too ensure that only
useful documents are collected. - A new gadget in your quiver of technologies!
- THANK YOU!