Batch Codes and Their Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Batch Codes and Their Applications

Description:

Multiset Batch Codes (n,N,m,k) multiset batch code: Motivation ... Primitive multiset batch code. Batch Codes vs. Smooth Codes. Def. ... – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 39
Provided by: rafailos
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Batch Codes and Their Applications


1
Batch Codes and Their Applications
  • Y.Ishai, E.Kushilevitz, R.Ostrovsky, A.Sahai
  • Preliminary version in STOC 2004

2
Talk Outline
  • Batch codes
  • Amortized PIR
  • via hashing
  • via batch codes
  • Constructing batch codes
  • Concluding remarks

3
A Load-Balancing Scenario
x
4
Whats wrong with a random partition?
  • Good on average for oblivious queries.
  • However
  • Cant balance adversarial queries
  • Cant balance few random queries
  • Cant relieve hot spots in multi-user setting

5
Example
  • 3 devices, 50 storage overhead.
  • By how much can the maximal load be reduced?
  • Replicating bits is no good ?device s.t.1/6 of
    the bits can only be found at this device.
  • Factor 2 load reduction is possible

6
Batch Codes
  • (n,N,m,k) batch code
  • Notes
  • Rate n / N
  • By default, insist on minimal load per bucket ?
    mk.
  • Load measured by of probes.
  • Generalizations
  • Allow t probes per bucket
  • Larger alphabet ?

7
Multiset Batch Codes
  • (n,N,m,k) multiset batch code
  • Motivation
  • Models multiple users (with off-line
    coordination)
  • Useful as a building block for standard batch
    codes
  • Nontrivial even for multisets of the form lt
    i,i,,i gt

8
Examples
  • Trivial codes
  • Replication Nkn, mk
  • Optimal m, bad rate.
  • One bit per bucket Nmn
  • Optimal rate, bad m.
  • (L,R,L?R) code rate2/3, m3, k2.
  • Goal simultaneously obtain
  • High rate (close to 1)
  • Small m (close to k)

9
Private Information Retrieval (PIR)
  • Goal allow user to query database while hiding
    the identity of the data-items she is after.
  • Motivation patent databases, web searches, ...
  • Paradox(?) imagine buying in a store without the
    seller knowing what you buy.
  • Note Encrypting requests is useful against third
    parties not against server holding the
    data.

10
Modeling
  • Database n-bit string x
  • User wishes to
  • retrieve xi and
  • keep i private

11
(No Transcript)
12
Some Solutions
  • 1. User downloads entire database.
  • Drawback n communication bits (vs. logn1
    w/o privacy).
  • Main research goal minimize communication
    complexity.
  • 2. User masks i with additional random indices.
  • Drawback gives a lot of information about i.
  • 3. Enable anonymous access to database.
  • Note addresses the different security
    concern of hiding users identity, not
    the fact that xi is retrieved.
  • Fact PIR as described so far requires ?(n)
    communication bits.

13
Two Approaches
  • Computational PIR KO97, CMS99,...
  • Computational privacy
  • Based on cryptographic assumptions
  • Information-Theoretic PIR CGKS95,Amb97,...
  • Replicate database among s servers
  • Unconditional privacy against t servers
  • Default t1

14
Communication Upper Bounds
  • Computational PIR
  • O(n?), polylog(n), O(??logn), O(?logn)
    KO97,CMS99,
  • Information-theoretic PIR
  • 2 servers, O(n1/3) CGKS95
  • s servers, O(n1/c(s)) where c(s)?(slogs /
    loglogs)CGKS95,Amb97,BIKR02
  • O(logn/loglogn) servers, polylog(n)

15
Time Complexity of PIR
  • Given low-communication protocols, efficiency
    bottleneck shifts to servers time complexity.
  • Protocols require (at least) linear time per
    query.
  • This is an inherent limitation!
  • Possible workarounds
  • Preprocessing
  • Amortize cost over multiple queries

16
Previous Results BIM00
  • PIR with preprocessing
  • s-server protocols with O(n?) communication and
    O(n1/s?) work per query, requiring poly(n)
    storage.
  • Disadvantages
  • Only work for multi-server PIR
  • Storage typically huge
  • Amortized PIR
  • Slight savings possible using fast matrix
    multiplication
  • Require a large batch of queries and high
    communication
  • Apply also to queries originating from different
    users.
  • This work
  • Assume a batch of k queries originate from a
    single user.
  • Allow preprocessing (not always needed).
  • Nearly optimal amortization

17
Model
Server/s
User
18
Amortized PIR via Hashing
  • Let P be a PIR protocol.
  • Hashing-based amortized PIR
  • User picks h?RH , defining a random partition of
    x into k buckets of size?n/k, and sends h to
    Server/s.
  • Except for 2-? failure probability, at most
    tO(??logk) queries fall in each bucket.
  • P is applied t times for each bucket.
  • Complexity
  • Time ? kt ?T(n/k) ? t ?T(n)
  • Communication ? kt?C(n/k)
  • Asymptotically optimal up to polylog factors

19
So whats wrong?
  • Not much
  • Still
  • Not perfect
  • introduces either error or privacy loss
  • Useless for small k
  • tO(??logk) overhead dominates
  • Cannot hash once and for all
  • ?h ? bad k-tuple of queries
  • Sounds familiar?

20
Amortized PIR via Batch Codes
  • Idea use batch-encoding instead of hashing.
  • Protocol
  • Preprocessing Server/s encode x as
    y(y1,y2,,ym).
  • Based on i1,,ik, User computes the index of the
    bit it needs from each bucket.
  • P is applied once for each bucket.
  • Complexity
  • Time ? ?1?j?mT(Nj) ? T(N)
  • Communication ? ?1?j?mC(Nj) ? m?C(n)
  • Trivial batch codes imply trivial protocols.
  • (L,R,L?R) code 2 queries,1.5 X time, 3 X
    communication

21
Constructing Batch Codes
22
Overview
  • Recall notion
  • Main qualitative questions
  • 1.Can we get arbitrarily high constant rate
    (n/N1-?) while keeping m feasible in terms of k
    (say mpoly(k))?
  • 2.Can we insist on nearly optimal m (say mO(k))
    and still get close to a constant rate?
  • Several incomparable constructions
  • Answer both questions affirmatively.


23
Batch Codes from Unbalanced Expanders
  • By Halls theorem, the graph represents an
    (n,NE,m,k) batch code iff every set S
    containing at most k vertices on the left has at
    least S neighbors on the right.
  • Fully captures replication-based batch codes.

24
Parameters
  • Non-explicit Ndn, mO(k? (nk)1/(d-1))
  • d3 rate1/3, mO(k3/2n1/2).
  • dlogn rate1/logn, mO(k) ? Settles Q2
  • Explicit (using TUZ01,CRVW02)
  • Nontrivial, but quite far from optimal
  • Limitations
  • Rate lt ½ (unless m?(n))
  • For const. rate, m must also depend on n.
  • Cannot handle multisets.

25
The Subcube Code
  • Generalize (L,R,L?R) example in two ways
  • Trade better rate for larger m
  • (Y1,Y2,,Ys,Y1? ? Ys)
  • still k2
  • Handle larger k via composition

26
Geomertic Interpretation
A
B
A
B
C
D
A?B
C
D
C?D
A?C
B?D
A?B?C?D
27
Parameters
  • N?klog(11/s)?n, m?klog(s1)
  • sO(logk) gives an arbitrary constant rate with
    mkO(loglogk). ? almost resolves Q1
  • Advantages
  • Arbitrary constant rate
  • Handles multisets
  • Very easy decoding
  • Asymptotically dominated by subsequent
    construction.

28
The Gadget Lemma
Primitive multiset batch code
  • From now on, we can choose a convenient n and
    get same rate and m(k) for arbitrarily larger n.

29
Batch Codes vs. Smooth Codes
  • Def. A code C?n? ?m is q-smooth if there exists
    a (randomized) decoder D such that
  • D(i) decodes xi by probing q symbols of C(x).
  • Each symbol of C(x) is probed w/prob ? q/m.
  • Smooth codes are closely related to locally
    decodable codes KT00.
  • Two-way relation with batch codes
  • q-smooth code ? primitive multiset batch code
    with km/q2 (ideally would like km/q).
  • Primitive multiset batch code ? (expected)
    q-smooth for qm/k
  • Batch codes and smooth codes are very different
    objects
  • Relation breaks when relaxing multiset or
    primitive
  • Gap between m/q and m/q2 is very significant for
    high rate case
  • Best known smooth codes with rategt1/2 require
    qgtn1/2
  • These codes are provably useless as batch codes.

30
Batch Codes from RM Codes
  • (s,d) Reed-Muller code over F
  • Message viewed as s-variate polynomial p over F
    of total degree (at most) d.
  • Encoded by the sequence of its evaluations on all
    points in Fs
  • Case Fgtd is useful due to a smooth decoding
    feature p(z) can be extrapolated from the
    values of p on any d1 points on a line passing
    through z.

31
x2
xn
x1
s2, d?(2n)1/2
  • Two approaches for handling conflicts
  • Replicate each point t times
  • Use redundancy to delete intersections
  • Slightly increases field size, but still allows
    constant rate.

32
Parameters
  • Rate (1/s!-?), mk11/(s-1)o(1)
  • Multiset codes with constant rate (lt ½)
  • Rate ?(1/k?), mO(k) ? resolves Q2 for
    multiset codes as well
  • Main remaining challenge resolve Q1


33
The Subset Code
  • Choose s,d such that n?
  • Each data bit i?n is associated T?
  • Each bucket j?m is associated S?
  • Primitive code yS?T?SxT

( )
s ?d
34
Batch Decoding the Subset Code
xT
yT
  • Lemma For each T?T, xT can be decoded from all
    yS such that S?TT.
  • Let LT,T denote the set of such S.
  • Note LT,T T?T defines a partition of

0011110000 0110
35
Batch Decoding the Subset Code (contd.)
x3
x1
x2
  • Goal Given T1,,Tk, find subsets T1,,Tk such
    that LTi,Ti are pairwise disjoint.
  • Easy if all Ti are distinct or if all Ti are
    the same.
  • Attempt 1 Ti is a random subset of Ti
  • Problem if Ti,Tj are disjoint, LTi,Ti and
    LTj,Tj intersect w.h.p.
  • Attempt 2 greedily assign to Ti the largest Ti
    such that LTi,Ti does not intersect any
    previous LTj,Tj
  • Problem adjacent sets may block each other.
  • Solution pick random Ti with bias towards large
    sets.

36
Parameters
  • Allows arbitrary constant rate with mpoly(k) ?
    Settles Q1
  • Both the subcube code and the subset code can be
    viewed as sub-codes of the binary RM code.
  • The full binary RM code cannot be batch decoded
    when the rategt1/2.

37
Concluding Remarks Batch Codes
  • A common relaxation of very different
    combinatorial objects
  • Expanders
  • Locally-decodable codes
  • Problem makes sense even for small values of m,k.
  • For multiset codes with m3,k2, rate 2/3 is
    optimal.
  • Open for m?k2.
  • Useful building block for distributed data
    structures.

38
Concluding Remarks PIR
  • Single-user amortization is useful in practice
    only if PIR is significantly more efficient than
    download.
  • Certainly true for multi-server PIR
  • Most likely true also for single-server PIR
  • Killer app for lattice-based cryptosystems?
Write a Comment
User Comments (0)
About PowerShow.com