Title: Median
1Secure Computation of the kth Ranked Element
Gagan Aggarwal Stanford University Joint work
with Nina Mishra and Benny Pinkas, HP Labs
2A story
I bet the dumbest student in Gryffindor has a
higher IQ than the median IQ of all students in
the school.
But you dont even know what the median IQ is
Let us compute it...
But, what about privacy of the students.
We can do Secure function evaluation
This is all theory. It cant be efficient.
3Rising Need for Privacy
- Many opportunities of interaction between
institutions and agencies holding sensitive data. - Privacy cannot be sacrificed.
- I.e. different agencies might hold data which
they are not allowed to share. - A need for protocols to evaluate functions while
preserving privacy of data.
4Privacy-preserving Computation the ideal case
y
x
Input
F(x,y) and nothing else
Output
x
y
F(x,y)
F(x,y)
5Trusted third parties are rare
x
y
F(x,y)
F(x,y)
- Run a protocol to evaluate F(x,y) without a
trusted party. - Two kinds of adversaries
- Semi-honest Follows the protocol, but is
curious to learn more than F(x,y). - Malicious - Might do anything.
6Is there anything better?
x
y
F(x,y)
F(x,y)
- Does the trusted party scenario make sense?
- Are the parties motivated to submit their true
inputs? - Can they tolerate the disclosure of F(x,y)?
- Our goal Implement the scenario without a
trusted party.
7Definition of securitysemi-honest model
x
y
F(x,y)
Protocol is secure if Bob can generate the
sequence of messages exchanged from his own
input y and the value of F(x,y).
8Definition of securitymalicious model
x
- Protocol is secure if
- adversary Bob, ? an input y
- s.t. Bobs actions correspond to him
- presenting y to a trusted third party.
9Secure Function Evaluation Yao,GMW,BGW,CCD
- F(x,y) A public function.
- Represented as a Boolean circuit C(x,y).
y
x
Input
C(x,y) and nothing else
Output
C(x,y) and nothing else
- Implementation
- O(X) oblivious transfers.
- O(C) communication.
- Pretty efficient for small circuits!
- e.g. Is x gt y? (Millionaires problem)
10Some useful primitives
- Useful to have efficient solutions for simple
primitives. - Let X and Y be sets of elements
- X ? Y (first talk)
- Statistics over X ? Y
- Max, Min, Average, Median, kth-ranked element.
11kth-ranked element
- Inputs
- Alice SA Bob SB
- Large sets of unique items (? S).
- The rank k
- Could depend on the size of input datasets.
- Median k (SA SB) / 2
- Output
- x ? SA ? SB s.t. x has k-1 elements smaller than
it.
12Motivation
- Basic statistical analysis of distributed data.
- E.g. histogram of salaries in all CS departments
(Taulbee survey).
13Faculty salary for top 12 CS departments(2001-200
2)
Faculty rank Number Minimum Mean Median Maximum
Non-tenure teaching 75 37 K 72 K 72 K 110 K
Assistant professor 118 50 K 81 K 81 K 96 K
Associate professor 86 63 K 91 K 91 K 120 K
Full professor 218 52 K 123K 117 K 199 K
14Results
- Finding the kth ranked item (Ddomain)
- Two-party reduction to log k secure comparisons
of log D bit numbers. - log k rounds O(log D)
- Multi-party reduction to log D simple
computations with log D bit numbers. - log D rounds O(log D)
- Also, security against malicious parties.
- Can hide the size of the datasets.
15Related work
- Lower bound O(log D)
- From communication complexity.
- Generic constructions
- Using circuits Yao
- Overhead at least linear in k.
- Naor-Nissim
- Overhead of O(D).
16An (insecure) two-party median protocol
RA
LA
SA
mA
mA lt mB
SB
RB
LB
mB
LA lies below the median, RB lies above the
median. New median is same as original median.
Recursion ? Need log n rounds
(assume each set contains n2i items)
17Secure two-party median protocol
A deletes elements mA. B deletes elements gt
mB.
YES
A finds its median mA . B finds its median mB .
mA lt mB
A deletes elements gt mA. B deletes elements
mB.
NO
Secure comparison (e.g. a small circuit)
18An example
B
A
mAgtmB
mAltmB
mAltmB
mAgtmB
Median found!!
mAltmB
19Proof of security
median
B
A
mAgtmB
mAgtmB
mAltmB
mAltmB
mAltmB
mAltmB
mAgtmB
mAgtmB
mAltmB
mAltmB
Median
20Still to come
- Security against malicious parties.
- Adapt the median protocol for arbitrary k and
arbitrary input set size. - Hide the size of the datasets.
- kth element for multi-party scenario.
21Security against malicious parties
- Comparisons secure against malicious parties.
- Verify that parties inputs to comparisons are
consistent. I.e., prevent - Round 1 mA 1000. Is told to delete all
xgt1000. - Round 2 mA 1100
- Solution Each round sends secure state to next
round (i.e., boundaries for parties inputs).
Implement reactive computation C,CLOS. - Can implement in a single circuit. Efficient
security against malicious parties.
22Security against malicious parties
a4 lt b4
YES
NO
a2 lt b6
a6 lt b2
YES
YES
NO
a7 lt b1
a5 lt b3
a3 lt b5
a1 lt b7
Y
N
Y
N
Y
Y
N
N
a8 lt b1
a7 lt b2
a4 lt b5
a3 lt b6
a6 lt b3
a5 lt b4
a2 lt b7
a1 lt b8
23Security against malicious parties
a4 lt b4
YES
NO
a2 lt b6
a6 lt b2
YES
YES
NO
a7 lt b1
a5 lt b3
a3 lt b5
a1 lt b7
Y
N
Y
N
Y
Y
N
N
a8 lt b1
a7 lt b2
a4 lt b5
a3 lt b6
a6 lt b3
a5 lt b4
a2 lt b7
a1 lt b8
24Security against malicious parties
a4 lt b4
YES
NO
a2 lt b6
a6 lt b2
YES
YES
NO
a7 lt b1
a5 lt b3
a3 lt b5
a1 lt b7
Y
N
Y
N
Y
Y
N
N
a8 lt b1
a7 lt b2
a4 lt b5
a3 lt b6
a5 lt b4
a6 lt b3
a2 lt b7
a1 lt b8
25Security against malicious parties
- An adversary is fully defined by the input ais
it gives for each of the nodes of this tree. - These (consistent) ais form an input x which can
be used with F(x,y) to generate a transcript.
26Arbitrary input size, arbitrary k
SA
k
SB
Now, compute the median of two sets of size k.
Size should be a power of 2.
median of new inputs kth element of original
inputs
27Hiding size of inputs
- Can search for kth element without revealing size
of input sets. - However, kn/2 (median) reveals input size.
- Solution Let U2i be a bound on input size.
Median of new datasets is same as median of
original datasets.
SA
SB
28The multi-party case
- Input Party Pi has set Si, i1..n.
- (all values ?a,b, where a and b are
known) - Output kth element of S1 ? ? Sn
- Basic Idea Binary search on a,b.
29An example
a
b
Left
Right
Right
Median found!!
Done
30The multi-party case
- Protocol Set m (ab)/2. Repeat
- Pi inputs to a secure computation
- Li elements in Si smaller than m.
- Bi times m appears in Si.
- The following is computed securely
- If SLi ? k,
- Else, if SLi Bi ? k,
- Otherwise,
Lower half
Found median
Upper half
31The multi-party case
- Can be made secure for malicious case.
- Using consistency checks.
- Works for two-party case.
- Can be used for non-distinct elements.
32Summary
- Efficient secure computation of the median.
- Two-party log k rounds O(log D)
- Multi-party log D rounds O(log D)
- Communication overhead is very close to the
communication complexity lower bound of log D
bits. - Malicious case is efficient too.
- Do not use generic tools.
- Instead, we implement simple consistency checks
to get security against malicious parties.
33Thanks for your attention! ?
34Open Problems
- Approximation protocols for NP-hard problems.
- Clustering does not admit exact poly-time
solutions. - At best, hope for a protocol that computes an
approximation. - Then, comparison to a trusted party which
computes the exact solution doesnt seem fair. - Need an appropriate notion of privacy.
- Efficient solutions for more primitives.
35Definition of securitymalicious model
Real model
Learns no more than
x
y
Ideal model/ Trusted party model
F(x,y)
F(x,y)
36The multi-party case
- Input Party Pi has set Si, i1..n.
- (all values ?a,b, where a and b are
known) - Output kth element of S1 ? ? Sn
- Protocol Set m (ab)/2. Repeat
- Pi inputs to a secure computation
- Li elements in Si smaller than m.
- Bi times m appears in Si.
- The following is computed securely
- If SLi ? k, set bm, m(am)/2.
- Else, if SLi Bi ? k, stop. kth element is m.
- Otherwise, set am, m (mb)/2.
Left
Done
Right
37Definition of securitysemi-honest model
x
y
F(x,y)
Protocol is secure if Bob can generate the
transcript from his own input y and the value
of F(x,y). s.t. T is computationally
indistinguishable from the actual transcript of
the protocol.
38Definition of securitysemi-honest model
x
y
F(x,y)
Protocol is secure if Bob can generate the
sequence of messages exchanged from his own
input y and the value of F(x,y).
39Definition of securitymalicious model
x
Protocol is secure if for every adversary
Bob, there exists an input y s.t. Bob can
generate a computationally indistinguishable
transcript from this input y and the value of
F(x,y).
40Security against malicious parties
- Consistency checks ensure that
- Along any execution path, ai lt aj and biltbj for
all iltj. - Any ai or bi appears at most twice on each
execution path, and are checked to be consistent
at those occurrences. - Any adversary is fully defined by the input bis
it gives for each of the nodes of this tree. - These (consistent) bis form an input y which can
be used with F(x,y) to generate a transcript.
41Previous work
- Generic constructions using circuitsYao
- Overhead at least linear in k.
- Naor-Nissim
- Any function which can be computed with
communication complexity of c bits, can be
privately computed with overhead 2C. - Communication complexity of median is T(log D)
bits. - Implies overhead of D using this approach.