Title: Lower Bounds on Streaming Algorithms for Approximating the Length of the Longest Increasing Subseque
1Lower Bounds on Streaming Algorithms for
Approximating the Length of theLongest
Increasing Subsequence.
Anna Gal UT Austin Parikshit Gopalan U.
Washington UT Austin
2Data Stream Model of Computation
X1 X2 X3 Xn
Input
- Single pass.
- Small storage space, update time.
- Surprisingly powerful Alon-Matias-Szegedy,
3Estimated Sortedness on Data-Streams
Cannot sort efficiently. Can we tell if the data
needs to be sorted?
Ajtai-Jayram-Kumar-Sivakumar, Gupta-Zane, Cormode
-Muthukrishnan-Sahinalp, LibenNowell-Vee-Zhu, Wood
ruff-Sun, G.-Jayram-Kumar-Sivakumar
- Measuring Sortedness
- Length of Longest Increasing Subsequence.
- Ulam/Edit distance
- Inversion/Kendall Tau distance
4Longest Increasing Subsequence
LIS(?) Length of Longest Increasing Subsequence.
5 7 8 1 4 2 10 3 6 9
5Longest Increasing Subsequence
LIS(?) Length of Longest Increasing Subsequence.
5 7 8 1 4 2 10 3 6 9
Studied in statistics, biology, computer science
Gusfeld, Pevzner, Aldous-Diaconis
6Prior Work
- Exact Computation of LIS(?)
- Patience Sorting Ross,Mallows
- O(n) space, 1-pass streaming algorithm.
- ?(n) space lower bound. G.-Jayram-Krauthgamer-Kum
ar07, Woodruff-Sun07 - Approximating LIS(?)
- Deterministic, O(n/?)1/2 space, (1 ?)-approx.
- G.-Jayram-Krauthgamer-Kumar07
Conjecture GJKK Every 1-pass deterministic
algorithm that gives a 1.1-approximation to
LIS(?) requires ?(vn) space.
7Our Results
Thm Any det. O(1)-pass algorithm that gives a (1
?) approximation to the LIS requires space
v(n/?).
- Tight bounds in n, ?.
- Proof via direct sum approach.
- Direct sum for maximum communication in the
private messages model. - Separation between communication models.
8A Communication Problem
- Consider the following problem
1 2 3.2 4.2
1.6 2.8 3.5 4.6
1.8 2.9 3.7 4.9
- t players, t numbers each.
- Goal Approximate length of the LIS.
- Enough to show a lower bound of ?(t) on maximum
message size.
9A Communication Problem
- Consider the following problem
P1 P2 Pt
- t players, t numbers each.
- Goal Approximate length of the LIS.
- Enough to show a lower bound of ?(t) on maximum
message size.
10A Communication Problem
GJKK Consider the following decision problem
Yes
No
P1 P2 Pt
11A Communication Problem
GJKK Consider the following decision problem
Yes
No
P1 P2 Pt
All columns non-increasing
12A Communication Problem
GJKK Consider the following decision problem
Yes
No
P1 P2 Pt
All columns non-increasing
13A Communication Problem
GJKK Consider the following decision problem
Yes
No
P1 P2 Pt
Some column increasing
All columns non-increasing
14A Communication Problem
GJKK Consider the following decision problem
Yes
No
P1 P2 Pt
Some column increasing
All columns non-increasing
15Direct Sum Paradigm
Primitive Problem
p(x1, y1)
y1
x1
16Direct Sum Paradigm
Direct Sum Problem
Çi p(xi,yi)
y1,,yn
x1,,xn
Can run n copies of protocol for p. Direct-Sum
Question Is this the best possible? Set-Disjoint
ness, Inner Product Techniques for proving
direct-sum theorems KN,CKSW,BJKS,SS
17Primitive Problem
Yes
No
P1 P2 Pt
18Direct Sum of Primitive Problems
Yes
No
P1 P2 Pt
All No instances
19Direct Sum of Primitive Problems
Yes
No
P1 P2 Pt
All No instances
One Yes instance
20Direct Sum of Primitive Problems
Yes
No
P1 P2 Pt
21Direct Sum of Primitive Problems
Yes
No
P1 P2 Pt
Techniques for proving direct-sum
theorems KN,CSWY,BJKS,SS,
22GG An Easier Problem
Yes
No
Hope Some player distinguishes between many No
instances.
23BlackBoard Model of One-Way Communication
- Players speak in order.
- Every message seen by all.
- Last player outputs answer.
24 Problem is Easy in the BlackBoard model
No
Yes
BlackBoard protocol with max. communication 2
log(m).
25 Problem is Easy in the BlackBoard model
No
Yes
BlackBoard protocol with max. communication 2
log(m).
26Private Messages Model
- Messages seen by next player only.
- Suffices for streaming lower bound.
- Requires non-standard techniques.
27Private Messages Model
Yes
No
Strong lower bound for maximum communication in
the private messages model.
Thm Any det. O(1)-pass algorithm that gives a (1
?) approximation to the LIS requires space
v(n/?).
Separation between blackboard and private
messages.
28Proof Outline
- Step 1 Primitive Problem (one round).
- Step 2 Direct-sum Problem (one-round).
- Multi-round Protocols.
29Primitive Problem
Yes
No
P1 P2 Pt
Alphabet of size m gt t. Yes Case LIS(?) gt
t/2. Easy Bound of (log m)/t on max
communication. Thm Max communication is at least
log (m/t).
30Lower Bound for Primitive Problem
a
a
a
a
a
aa
aa
aa
x1xi
Pis message is specified by prefix x1xi. Mi(a)
Prefixes where Pi sends the same message as
aa. qi(a) Length of longest IS in Mi(a) ending
below a.
31Lower Bound for Primitive Problem
a
a
a
a
Mi(a) Inputs where Pi sends the same message as
aa. qi(a) Length of longest IS in Mi(a) ending
below a.
- Monotone
- x1xi 2 Mi(a) ) x1xia 2 Mi1(a)
- Bounded by t/2
- Correctness.
qi(a)
i
32Lower Bound for Primitive Problem
a
a
a
a
Mi(a) Inputs where Pi sends the same message as
aa. qi(a) Length of longest IS in Mi(a) ending
below a.
Map a to first i s.t qi-1(a) qi(a). Some i
occurs m/t times.
qi(a)
i
33Lower Bound for Primitive Problem
Pi-1
Pi
aa
x1 lt lt xi-1 a
x1xi-1
bb
m/t
y1 lt lt yi-1 b
y1yi-1
cc
z1 lt lt zi-1 c
z1zi-1
Claim Pi-1 must distinguish aa from bb from
cc.
34Lower Bound for Primitive Problem
Pi-1
Pi
aab
aa
x1xi-1b
x1xi-1
y1yi-1b
y1yi-1
bbb
bb
x1 xi-1 a b But qi(b) i-1.
Contradiction.
Hence Pi-1 must distinguish aa from bb from
cc. Gives log(m/t) lower bound.
35Lower Bound for General Problem
a1at
a1at
a1at
a1at
Mi(a1at) i t prefixes where Pi sends the same
message as (a1at)i. qi,j(a1at) Length of
longest IS in column j ending at/before aj.
36Lower Bound for General Problem
a1at
a1at
a1at
a1at
Mi(a1at) i t prefixes where Pi sends the same
message as (a1at)i. qi,j(a1at) Length of
longest IS in column j ending at/before aj.
...
qi,t(a)
qi,1(a)
37Lower Bound for General Problem
a1at
a1at
a1at
a1at
Mi(a1at) i t prefixes where Pi sends the same
message as (a1at)i. qi,j(a1at) Length of
longest IS in column j ending at/before aj.
...
qi,t(a)
qi,1(a)
38Lower Bound for General Problem
a1at
a1at
a1at
a1at
- Part I By pigeonhole, find
- A good player Pi
- A good set S µ t of columns
- A good set I µ mt of (m/t)t inputs where
...
qi,t(a)
qi,1(a)
39Lower Bound for General Problem
a1at
a1at
Part II Show that Pi-1 distinguishes between
inputs in I of (m/t)t inputs. Gives a lower
bound of log(I) t log (m/t)
40Lower Bound for Many Rounds
a1at
a1at
a1at
a1at
Part I Messages sent by Pi in round 2 and beyond
depend on entire input. Need to change defn. of
Mi(a1at).
41Lower Bound for Many Rounds
a1at
a1at
Part I Messages sent by Pi in round 2 and beyond
depend on entire input. Need to change defn. of
Mi(a1at). Part II Reduce to 2-player protocol
involving Pi-1 and Pt.
Thm Any deterministic O(1)-pass algorithm that
gives a (1 ?) approximation to the LIS requires
space v(n/?).
42Conclusions
- Exact Computation of LIS(?)
- Patience Sorting Ross,Mallows
- O(n) space, 1-pass streaming algorithm.
- ?(n) space lower bound. G.-Jayram-Krauthgamer-Ku
mar, Woodruff-Sun - Approximating LIS(?)
- O(n/?)1/2 space, deterministic 1-pass algorithm.
G.-Jayram-Krauthgamer-Kumar - This paper The bound is tight for deterministic,
O(1)-pass algorithms. - Ergun-Jowhari08 Different proof.
43Randomized Complexity of LIS
- Problem Is the a randomized streaming algorithm
to approximate the LIS using space o(vn) ? - Woodruff-Sun O(log m) lower bound
- Chakrabarti Randomized private-messages
protocol for the direct-sum problem.
Thank You!
44Prior Work
- Exact Computation of LIS(?)
- Patience Sorting Ross,Mallows
45Patience Sorting Ross,Mallows
- Track best inc. seq. of length i, for all i.
- Ai Smallest number ending an IS of length i.
- Patience Sorting Dynamic program to compute
Ai.
46Approximate Patience Sorting GJKK
- Track best inc. seq. of length i, for all i.
- Ai Smallest number ending an IS of length i.
- Patience Sorting Dynamic program to compute
Ai. - Approx. Patience Sorting Store Ai for at most
vn values of i.
47Lower Bounds for approximating the LIS
Conjecture GJKK For some e0 gt 0, every 1-pass
deterministic algorithm that gives a (1 e0)
approximation to LIS(?) requires ?(vn) space.
Candidate Hard Instances
P1 P2 Pt
48Protocol for BlackBoard model
49Protocol for BlackBoard model
50Protocol for BlackBoard model
51Primitive Problem
Yes
No
P1 P2 Pt
Does the direct sum property hold for this
problem?