Foundations of Privacy Lecture 10 - PowerPoint PPT Presentation

About This Presentation
Title:

Foundations of Privacy Lecture 10

Description:

... Improved accuracy and Storage Multiplicative accuracy using hashing ... is a basic building block in more complex ... perturbation works need ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 68
Provided by: wisdomWei5
Category:

less

Transcript and Presenter's Notes

Title: Foundations of Privacy Lecture 10


1
Foundations of PrivacyLecture 10
  • Lecturer Moni Naor

2
Recap of lecture two weeks ago
  • Continual changing data
  • Counters
  • How to combine expert advice
  • Multi-counter and the list update problem
  • Pan Privacy

3
What if the data is dynamic?
  • Want to handle situations where the data keeps
    changing
  • Not all data is available at the time of
    sanitization

Curator/ Sanitizer
4
Google Flu Trends
We've found that certain search terms are good
indicators of flu activity. Google Flu Trends
uses aggregated Google search data to estimate
current flu activity around the world in near
real-time.
5
Example of Utility Google Flu Trends
6
What if the data is dynamic?
  • Want to handle situations where the data keeps
    changing
  • Not all data is available at the time of
    sanitization
  • Issues
  • When does the algorithm make an output?
  • What does the adversary get to examine?
  • How do we define an individual which we should
    protect? DMe
  • Efficiency measures of the sanitizer

7
Data Streams
Data is a stream of items Sanitizer sees each
item and updates internal state. Produces output
either on-the-fly or at the end
output
Sanitizer
Data Stream
8
Three new issues/concepts
  • Continual Observation
  • The adversary gets to examine the output of the
    sanitizer all the time
  • Pan Privacy
  • The adversary gets to examine the internal state
    of the sanitizer. Once? Several times? All the
    time?
  • User vs. Event Level Protection
  • Are the items singletons or are they related

9
Randomized Response
  • Randomized Response Technique Warner 1965
  • Method for polling stigmatizing questions
  • Idea Lie with known probability.
  • Specific answers are deniable
  • Aggregate results are still valid
  • The data is never stored in the plain

trust no-one
Popular in DB literature Mishra and Sandler.
1
0
1



noise
noise
noise

10
The Dynamic Privacy Zoo
Petting
User-Level Continual Observation Pan Private
Differentially Private
Continual Observation
Pan Private
Randomized Response
User level Private
11
Continual Output Observation
Data is a stream of items Sanitizer sees each
item, updates internal state. Produces an output
observable to the adversary
Output
Sanitizer
12
Continual Observation
  • Alg - algorithm working on a stream of data
  • Mapping prefixes of data streams to outputs
  • Step i output ?i
  • Alg is e-differentially private against continual
    observation if for all
  • adjacent data streams S and S
  • for all prefixes t outputs ?1 ?2 ?t

Adjacent data streams can get from one to the
other by changing one element
S acgtbxcde S acgtbycde
PrAlg(S)?1 ?2 ?t
ee 1e
e-e
PrAlg(S)?1 ?2 ?t
13
The Counter Problem
0/1 input stream 011001000100000011000000100101
Goal a publicly observable counter,
approximating the total number of 1s so
far Continual output each time period, output
total number of 1s Want to hide individual
increments while providing reasonable accuracy
14
Counters w. Continual Output Observation
Data is a stream of 0/1 Sanitizer sees each xi,
updates internal state. Produces a value
observable to the adversary
1
1
1
2
Output
Sanitizer
1
0
0
1
0
0
1
1
0
0
0
1
15
Counters w. Continual Output Observation
Continual output each time period, output total
1s Initial idea at each time period, on input
xi 2 0, 1 Update counter by input xi Add
independent Laplace noise with magnitude
1/e Privacy since each increment protected by
Laplace noise differentially private whether xi
is 0 or 1 Accuracy noise cancels out, error
Õ(vT) For sparse streams this error too high.
T total number of time periods
16
Why So Inaccurate?
  • Operate essentially as in randomized response
  • No utilization of the state
  • Problem we do the same operations when the
    stream is sparse as when it is dense
  • Want to act differently when the stream is dense
  • The times where the counter is updated are
    potential leakage

17
Delayed Updates
Main idea update output value only when large
gap between actual count and output Have a good
way of outputting value of counter once the
actual counter noise. Maintain Actual count
At ( noise ) Current output outt ( noise)
D update threshold
18
Delayed Output Counter
  • Outt - current output
  • At - count since last update.
  • Dt - noisy threshold
  • If At Dt gt fresh noise then
  • Outt1 ? Outt At fresh noise
  • At1 ? 0
  • Dt1 ? D fresh noise
  • Noise independent Laplace noise with magnitude
    1/e
  • Accuracy
  • For threshold D w.h.p update about N/D times
  • Total error (N/D)1/2 noise D noise noise
  • Set D N1/3 ? accuracy N1/3

delay
19
Privacy of Delayed Output
Outt1?Outt At fresh noise
At Dt gt fresh noise, Dt1 ? D fresh noise
  • Protect update time and update value
  • For any two adjacent sequences
  • 101101110001
  • 101101010001
  • Can pair up noise vectors
  • ?1?2?k-1 ?k ?k1
  • ?1?2?k-1 ?k ?k1
  • Identical in all locations except one
  • ?k ?k 1

Where first update after difference occurred
Dt Dt
Prob ee
20
Dynamic from Static
Accumulator measured when stream is in the time
frame
  • Run many accumulators in parallel
  • each accumulator counts number of 1's in a fixed
    segment of time plus noise.
  • Value of the output counter at any point in time
    sum of the accumulators of few segments
  • Accuracy depends on number of segments in
    summation and the accuracy of accumulators
  • Privacy depends on the number of accumulators
    that a point influences

Idea apply conversion of static algorithms into
dynamic ones Bentley-Saxe 1980
Only finished segments used
xt
21
The Segment Construction
Based on the bit representation Each point t is
in dlog te segments ?i1t xi - Sum of at most log
t accumulators
By setting ? ¼ ? / log T can get the desired
privacy Accuracy With all but negligible in T
probability the error at every step t is at most
O((log1.5 T)/?)).
canceling
22
Synthetic Counter
  • Can make the counter synthetic
  • Monotone
  • Each round counter goes up by at most 1
  • Apply to any monotone function

23
Lower Bound on Accuracy
  • Theorem additive inaccuracy of log T is
    essential for ?-differential privacy, even for
    ?1
  • Consider the stream 0T compared to collection of
    T/b streams of the form 0jb1b0T-(j1)b
  • Sj 000000001111000000000000

b
Call output sequence correct if a b/3
approximation for all points in time
24
Lower Bound on Accuracy
Sj000000001111000000000000
  • Important properties
  • For any output ratio of probabilities under
    stream Sj and 0T should be at least e-?b
  • Hybrid argument from differential privacy
  • Any output sequence correct for at most one Sj or
    0T
  • Say probability of a good output sequence is at
    least ?

b/3 approximation for all points in time
Good for Sj
Prob under 0T at least ?e-?b
b1/2log T, ? 1/2
T/b ? e-?b 1-?
contradiction
25
Hybrid Proof
  • Want to show that for any event B

PrA(0T)2 B
Let Sji0jb1i0T-jb-i Sj00T SjbSj
e-eb
PrA(Sj) 2 B
PrA(Sji) 2 B
e-e
PrA(Sji1)2B
PrA(Sj0)2B
PrA(Sj0)2B
PrA(Sjb-1)2B
.
.


e-eb
PrA(Sjb)2B
PrA(Sj1)2B
PrA(Sjb)2B
26
What shall we do with the counter?
  • Privacy-preserving counting is a basic building
    block in more complex environments
  • General characterizations and transformationsEven
    t-level pan-private continual-output algorithm
    for any low sensitivity function
  • Following expert advice privatelyTrack experts
    over time, choose who to followNeed to track how
    many times each expert was correct

27
Following Expert Advice
Hannan 1957Littlestone Warmuth 1989
  • n experts, in every time period each gives 0/1
    advice
  • pick which expert to follow
  • then learn correct answer, say in 0/1
  • Goal over time, competitive with best expert in
    hindsight

1
1
1
0
1
Expert 1
0
1
1
0
0
Expert 2
0
0
1
1
1
Expert 3
0
1
1
0
0
Correct
28
Following Expert Advice
n experts, in every time period each gives 0/1
advice pick which expert to follow then learn
correct answer, say in 0/1 Goal over time,
competitive with best expert in hindsight
Goalmistakes of chosen experts mistakes
made by best expert in hindsight Want 1o(1)
approximation
1
1
1
0
1
Expert 1
0
1
1
0
0
Expert 2
0
0
1
1
1
Expert 3
0
1
1
0
0
Correct
29
Following Expert Advice, Privately
  • n experts, in every time period each gives 0/1
    advice
  • pick which expert to follow
  • then learn correct answer, say in 0/1
  • Goal over time, competitive with best expert in
    hindsight
  • New concern
  • protect privacy of experts opinions and outcomes
  • User-level privacyLower bound, no non-trivial
    algorithm
  • Event-level privacy counting gives
    1o(1)-competitive

Was the expert consulted at all?
30
Algorithm for Following Expert Advice
  • Follow perturbed leader Kalai VempalaFor each
    expert keep perturbed of mistakesfollow
    expert with lowest perturbed count
  • Idea use counter, count privacy-preserving
    mistakes
  • Problem not every perturbation worksneed
    counter with well-behaved noise distribution
  • Theorem Follow the Privacy-Perturbed LeaderFor
    n experts, over T time periods, mistakes is
    within poly(log n,log T,1/e) of best expert

31
List Update Problem
  • There are n distinct elements Aa1, a2, an
  • Have to maintain them in a list some
    permutation
  • Given a request sequence r1, r2,
  • Each ri 2 A
  • For request ri cost is how far ri is in the
    current permutation
  • Can rearrange list between requests
  • Want to minimize total cost for request sequence
  • Sequence not known in advance

for each request ri cannot tell whether ri is in
the sequence or not
Our goal do it while providing privacy for the
request sequence, assuming list order is public
32
List Update Problem
  • In general cost can be very high
  • First problem to be analyzed in the competitive
    framework by Sleator and Tarjan (1985)
  • Compared to the best algorithm that knows the
    sequence in advance
  • Best algorithms
  • 2- competitive deterministic
  • Better randomized 1.5
  • Assume free rearrangements between request
  • Bad news cannot be better than (1/?)-competitive
    if we want to keep privacy

Cannot act until 1/? requests to an element appear
33
Lower bound for Deterministic Algorithms
  • Bad schedule always ask for the last element in
    the list
  • Cost of online nt
  • Cost of best fixed list sort the list according
    to popularity
  • Average cost 1/2n
  • Total cost 1/2nt

34
List Update Problem Static Optimality
  • A more modest performance goal compete with the
    best algorithm that fixes the permutation in
    advance
  • Blum-Chowla-Kalai can be 1o(1) competitive wrt
    best static algorithm (probabilistic)
  • BCK algorithm based on number of times each
    element has been requested.
  • Algorithm
  • Start with random weights ri in range 1,c
  • At all times wi ri ci
  • ci is of times element ai was requested.
  • At any point in time arrange elements according
    to weights

35
Privacy with Static Optimality
  • Algorithm
  • Start with random weights ri in range 1,c
  • At any point in time wi ri ci
  • ci is of times element ai was requested.
  • Arrange elements according to weights
  • Privacy from privacy of counters
  • list depends on counters plus randomness
  • Accuracy can show that BCK proof can be modified
    to handle approximate counts as well
  • What about efficiency?

Run with private counter
36
The multi-counter problem
  • How to run n counters for T time steps
  • In each round few counters are incremented
  • Identity of incremented counter is kept private
  • Work per increment logarithmic in n and T
  • Idea arrange the n counters in a binary tree
    with n leaves
  • Output counters associated with leaves
  • For each internal node maintain a counter
    corresponding to sum of leaves in subtree

37
The multi-counter problem
  • Idea arrange the n counters in a binary tree
    with n leaves
  • Output counters associated with leaves
  • For each internal node maintain
  • Counter corresponding to sum of leaves in subtree
  • Register with number of increments since last
    output update
  • When a leaf counter is updated
  • All log n nodes to root are incremented
  • Internal state of root updated.
  • If output of parent node updated, internal state
    of children updated

(internal, output)
Determines when to update subtree
38
Tree of Counters
(counter, register)
Output counter
39
The multi-counter problem
  • Work per increment
  • log n increment number of counter need to
    update
  • Amortized complexity is O(n log n /k)
  • k number of times we expect to increment a
    counter until output is updated
  • Privacy each increment of a leaf counter effects
    log n counters
  • Accuracy we have introduced some delay
  • After t k log n increments all nodes on path
    have been update

40
Pan-Privacy
think of the children
  • In privacy literature data curator trusted
  • In reality
  • even well-intentioned curator subject to mission
    creep, subpoena, security breach
  • Pro baseball anonymous drug tests
  • Facebook policies to protect users from
    application developers
  • Google accounts hacked
  • Goal curator accumulates statistical
    information,but never stores sensitive data
    about individuals
  • Pan-privacy algorithm private inside and out
  • internal state is privacy-preserving.

41
Randomized Response Warner 1965
  • Method for polling stigmatizing questions
  • Idea participants lie with known probability.
  • Specific answers are deniable
  • Aggregate results are still valid
  • Data never stored in the clearpopular in DB
    literature MiSa06

Strong guarantee no trust in curator Makes sense
when each users data appears only
once,otherwise limited utility New idea curator
aggregates statistical information,but never
stores sensitive data about individuals
User Response
noise
noise
noise




1
0
1
User Data
42
Aggregation Without Storing Sensitive Data?
  • Streaming algorithms small storage
  • Information stored can still be sensitive
  • My data many appearances, arbitrarily
    interleaved with those of others
  • Pan-Private Algorithm
  • Private inside and out
  • Even internal state completely hides the
    appearance pattern of any individualpresence,
    absence, frequency, etc.

User level
43
Pan-Privacy Model
Data is stream of items, each item belongs to a
user Data of different users interleaved
arbitrarily Curator sees items, updates internal
state, output at stream end
Can also consider multiple intrusions
Pan-Privacy For every possible behavior of user
in stream, joint distribution of the internal
state at any single point in time and the final
output is differentially private
44
Adjacency User Level
  • Universe U of users whose data in the stream x 2
    U
  • Streams x-adjacent if same projections of users
    onto U\x
  • Example axbxcxdxxxex and abcdxe are x-adjacent
  • Both project to abcde
  • Notion of corresponding locations in x-adjacent
    streams
  • U -adjacent 9 x 2 U for which they are
    x-adjacent
  • Simply adjacent, if U is understood
  • Note Streams of different lengths can be adjacent

45
Example Stream Density or Distinct Elements
  • Universe U of users, estimate how many distinct
    users in U appear in data stream
  • Application distinct users who searched for
    flu
  • Ideas that dont work
  • NaïveKeep list of users that appeared (bad
    privacy and space)
  • Streaming
  • Track random sub-sample of users (bad privacy)
  • Hash each user, track minimal hash (bad privacy)

46
Pan-Private Density Estimator
Inspired by randomized response. Store for each
user x 2 U a single bit bx Initially all bx
0 w.p. ½ 1 w.p. ½ When encountering
x redraw bx 0 w.p. ½-e 1 w.p. ½e Final
output (fraction of 1s in table - ½)/e noise
Distribution D0
Distribution D1
Pan-PrivacyIf user never appeared entry drawn
from D0If user appeared any of times entry
drawn from D1D0 and D1 are 4e-differentially
private
47
Pan-Private Density Estimator
Inspired by randomized response. Store for each
user x 2 U a single bit bx Initially all bx 0
w.p. ½ 1 w.p. ½ When encountering x redraw
bx 0 w.p. ½-e 1 w.p. ½e Final output
(fraction of 1s in table - ½)/e noise
Improved accuracy and Storage Multiplicative
accuracy using hashing Small storage using
sub-sampling
48
Pan-Private Density Estimator
Theorem density estimation streaming
algorithm e pan-privacy, multiplicative error
a space is poly(1/a,1/e)
49
Density Estimation with Multiple Intrusions
  • If intrusions are announced, can handle multiple
    intrusionsaccuracy degrades exponentially in
    of intrusions
  • Can we do better?
  • Theorem multiple intrusion lower bounds
  • If there are either
  • Two unannounced intrusions (for finite-state
    algorithms)
  • Non-stop intrusions (for any algorithm)
  • then additive accuracy cannot be better than ?(n)

50
What other statistics have pan-private algorithms?
Density of users appeared at least
once Incidence counts of users appearing k
times exactly Cropped means mean, over users,
of min(t,appearances) Heavy-hitters users
appearing at least k times
51
Counters and Pan Privacy
  • Is the counter algorithm pan private?
  • No the internal counts accurately reflect what
    happened since last update
  • Easy to correct store them together with noise
  • Add (1/?)-Laplacian noise to all accumulators
  • Both at storage and when added
  • At most doubles the noise

count
accumulator
noise
52
Continual Intrusion
  • Consider multiple intrusions
  • Most desirable resistance to continual intrusion
  • Adversary can continually examine the internal
    state of the algorithm
  • Implies also continual observation
  • Something can be done randomized response
  • But
  • Theorem any counter that is e-pan-private under
    continual observation and with m intrusions must
    have additive error ?(vm) with constant
    probability.

53
Proof of lower bound
  • Two distributions
  • I0 all 0 stream
  • I1 xi 0 with probability 1 - 1/kvn
  • and xi 1 with probability 1/kvn.
  • Let Db be the distribution on states when running
    Ib
  • Claim statistical distance between D0 and D1 is
    small
  • Key point can represent transition probabilities
    as
  • Q0s (x) 1/2 C(x) 1/2 C(x)
  • Q1s (x) (1/2-1/kvn)C(x)(1/21/kvn)C(x)

Randomized Response is the best we can do
54
Pan Privacy under Continual Observation
Definition? U-adjacent streams S and S, joint
distribution on internal state at any single
location and sequence of all outputs is
differentially private.
Output
Internal state
55
A General Transformation
  • Transform any static algorithm A to continual
    output, maintain
  • Pan-privacy
  • Storage size
  • Hit in accuracy low for large classes of
    algorithms
  • Main idea delayed updatesUpdate output value
    only rarely, when large gap between As current
    estimate and output

56
Theorem General Transformation
Max output difference on adjacent streams
Transform any algorithm A for monotone function f
with error a, sensitivity sensA, maximum value
N New algorithm has e-privacy under continual
observation, maintains As pan-privacy and
storage Error is Õ(avNsensA/e)
57
General Transformation Main Idea
input a0bcbbde
A
out
  • Assume A is a pan-private estimator for monotone
    f N
  • If At outt-1 gt D then outt ? At
  • For threshold D w.h.p update about N/D times

58
General Transformation Main Idea
input a0bcbbde
A
out
  • Assume A is a pan-private estimator for monotone
    f N
  • As output may not be monotonic
  • If At outt-1 gt D then outt ? At
  • What about privacy? Update times, update values
  • For threshold D w.h.p update about N/D times
  • Quit if updates exceeds Bound N/D

59
General Transformation Privacy
If At outt-1 gt D then outt ? At What about
privacy? Update times, update values Add
noise Noisy threshold test ? privacy-preserving
update times Noisy update ? privacy preserving
update values
60
Error ÕD(sAN)/(De)
General Transformation Privacy
  • If At outt-1 noise gt D
  • then outt ? At noise
  • Scale noise(s) to BoundsensA/e
  • Yields (e,d)-diff. privacyPrzS
    eePrzSd
  • Proof pairs noise vectors that are far from
    causing quitting on S, with noise vectors for
    which S has exact same update times
  • Few noise vectors bad paired vectors e-private

61
Theorem General Transformation
  • Transform any algorithm A for monotone function f
  • with error a, sensitivity sensA, maximum value N
  • New algorithm
  • satisfies e-privacy with continual observation,
  • maintains As pan-privacy and storage
  • Error is Õ(avNsensA/e)
  • Extends from monotone to stable functions
  • Loose characterization of functions that can be
    computed privately under continual observation
    without pan-privacy

62
What other statistics have pan-private algorithms?
  • Pan-private streaming algorithms for
  • Stream density / number of distinct elements
  • t-cropped mean mean, over users, of
    min(t,appearances)
  • Fraction of users appearing k times exactly
  • Fraction of heavy-hitters, users appearing at
    least k times

63
Incidence Counting
  • Universe X of users. Given k, estimate what
    fraction of users in X appear exactly k times in
    data stream
  • Difficulty cant track individuals of
    appearances
  • Idea keep track of noisy of appearances
  • However cant accurately track whether
    individual appeared 0,k or 100k times.
  • Different approach follows count-min CM05
    idea from streaming literature

User level privacy!
64
Incidence Counting a la Count-Min
  • Use pan-private algorithm that gets input
  • hash function h Z?M (for small range M)
  • target val
  • Outputs fraction of users with h(appearances)
    val
  • Given this, estimate k-incidence as fraction of
    users with
  • h( appearances) h(k)
  • Concern Might we over-estimate? (hash
    collisions)
  • Accuracy If h has low collision prob, then with
    some probability collisions are few and estimate
    is accurate.
  • Repeat to amplify (output minimal estimate)

65
Putting it together
  • Hash by choosing small random prime ph(z) z
    (mod p)
  • Pan-private modular incidence counterGets p and
    val, estimates fraction of users with
    appearances val (mod p)space is poly(p), but
    small p suffices
  • Theorem k-incidence counting streaming
    algorithm
  • e pan-privacy, multiplicative error a,upper
    bound N on number of appearances.
  • Space is poly(1/a,1/e,log N)

66
t -Incidence Estimator
  • Let R 1, 2, , r be the smallest range of
    integers containing at least 4 logN/? distinct
    prime numbers.
  • Choose at random L distinct primes p1, p2,,pL
  • Run modular incidence counter these L primes.
  • When a value x 2 M appears update each of the L
    modular counters
  • For any desired t For each i 2 L
  • Let fi b the i-th modular incidence counter t
    (mod pi)
  • Output the (noisy) minimum of these fractions

67
Pan-Private Modular Incidence Counter
  • For every user x, keep counter cx20,,p-1Increa
    se counter (mod p) every time user appears
  • If initially 0 no privacy, but perfect accuracy
  • If initially random perfect privacy, but no
    accuracy
  • Initialize using a distribution slightly biased
    towards 0
  • Prcxi e-ei/(p-1)
  • Privacy users appearances has only small
    effecton distribution of cx

0
p-1
68
Modular Incidence Counter Accuracy
  • For j2 0,,p-1
  • oj is users with observed noisy count j
  • tj is true users that truly appear j times (mod
    p)
  • oj ? tj-k (mod p)e-ek/(p-1)
  • Using observed ojsGet p (approx.) equations in
    p variables (the tks)Solve using linear
    programming
  • Solution is close to true counts

p-1
k0
69
Pan-private Algorithms
Continual Observation
Density of users appeared at least
once Incidence counts of users appearing k
times exactly Cropped means mean, over users, of
min(t,appearances) Heavy-hitters users
appearing at least k times
70
The Dynamic Privacy Zoo
Petting
Continual Pan Privacy
Differentially Private Outputs
Privacy under Continual Observation
Pan Privacy
Sketch vs. Stream
User level Privacy
Write a Comment
User Comments (0)
About PowerShow.com