Title: Offline, Stream and Approximation Algorithms for Synospis Construction
1Offline, Stream and Approximation Algorithms for
Synospis Construction
- Sudipto Guha University of Pennsylvania
- Kyuseok Shim Seoul National University
2About this Tutorial
- Information is incomplete and could be inaccurate
- Our presentation reflects our understanding which
may be erroneous
3Synopses Construction
- Where is the life we have lost in living?
- Where is the wisdom we have lost in knowledge?
- Where is the knowledge we have lost in
information? - T. S.
Eliot, from The Rock. - Routers
- Sensors
- Web
- Astronomy and sciences
- Too much data too little time.
4The idea
- To see the world in a grain of sand
- Broad characteristics of the data
- Compression
- Dimensionality Reduction
- Approximate query answering
- Denoising, Outlier Detection and a broad array of
signal processing
5What is a synopsis ?
- Hmm.
- Any shorthand representation
- Clustering!
- SVD!
- In this tutorial we will focus on signal/time
series processing
6The basic problem
- Formally, given a signal X and a dictionary ?i
find a representation F?i zi ?i with at most B
non-zero zi minimizing some error which is a fn
of X-F - Note, the above extends to any dim.
7Many issues
- What is the dictionary ?
- Which B terms ?
- What is the error ?
- What are the constraints ?
8Many issues
- What is the dictionary ?
- Set of vectors
- Maybe a basis
- Which B terms ?
- What is the error ?
- What are the constraints ?
Top K
9Many issues
- What is the dictionary ?
- Set of vectors
- Maybe a basis
- Which B terms ?
- What is the error ?
- What are the constraints ?
Haar Wavelets
Also Fourier, Polynomials,
10Many issues
- What is the dictionary ?
- Set of vectors
- May not be a basis
- Histograms
- There are n choose 2 vectors
- But since we impose a non-overlapping restriction
we get a unique representation. - Which B terms ?
- What is the error ?
- What are the constraints ?
11Many issues
- What is the dictionary ?
- Which B terms ?
- First B ?
- Best B ?
- What is the error ?
- What are the constraints ?
Why should we choose first B ?
12Approximation theory
- Discipline of Math associated with approximation
of functions. - Same as our problem
- Linear theory (Parseval, 1800 over two
centuries) - Non-Linear theory (Schmidt 1909, Haar 1910)
- Is it relevant ? Yes. However Math treatment has
been extremal, i.e., how does the error change
as a function of B. Is that bound tight? - Note a yes answer does not say anything about
given this signal, is that the best we can do ?
13Many issues
- What is the dictionary ?
- Which B terms ?
- What is the error ?
- This controls which B.
- X-F2 is most common, used all over in
mathematics - X-F1,X-F1 are useful also
- Weights. Relative error of approximation
- 1000 by 1010 is not so bad.
- 1 by 11 is not too good an idea.
- What are the constraints ?
14Many issues
- What is the dictionary ?
- Which B terms ?
- What is the error ?
- What are the constraints ?
- Input ? Stream, stream of updates
- Space, time, precision and range of values (for
zi in the expression F?i zi ?i )
15In this tutorial
- Histograms Wavelets
- Will focus on Optimal, Approximation and
Streaming algorithms - How to get one from the other!
- Connections to top K and Fourier.
16I. Histograms.
17VOpt Histograms
- Lets start simple
- Given a signal X, find a piecewise constant
representation H with at most B pieces minimizing
X-H2 - Jagadish, Koudas, Muthukrishnan, Poosala, Sevcik,
Suel, 1998 - Consider one bucket.
- The mean is the best value.
- A natural Dynamic programming formulation
18An Example Histogram
Data Distribution
V-Optimal Histogram
19Idea VOpt Algorithm
- Within step/bucket Mean is the best.
- Assume that the last bucket is j1,n.
- What can we say about the rest k-1 ?
OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
20Idea VOpt Algorithm
- Within step/bucket Mean is the best.
- Assume that the last bucket is j1,n.
- What can we say about the rest k-1 ?
OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
21Idea VOpt Algorithm
- Within step/bucket Mean is the best.
- Assume that the last bucket is j1,n.
- What can we say about the rest k-1 ?
OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
22Idea VOpt Algorithm
- Dynamic programming algorithm was given to
construct the V optimal Histogram. - OPTn,k min OPTj,k-1,SQERR(j1)..n
- 1jltn
- OPTj, k the minimum cost of representing the
set of values indexed by 1..j by a histogram
with k buckets. - SQERR(j1)..n the sum of the squared absolute
errors from (j1) to n.
23The DP-based VOpt Algorithm
- for i1 to n do
- for k1 to B do
- for j1 to i-1 do (split pt of k-1 bucket
hist. and last bucket) - OPTi, k min OPTi, k, OPTj,k-1
SQERRj1,i - We need O(Bn) entries for the table OPT
- For each entry OPTi,k, it takes O(n) time if
SQERRj1.i can be computed O(1) time - O(Bn) space and O(Bn2) time
OPT
B
n
24Computation of Sum of Squared Absolute Error in
O(1) time
sum(2,3) x2x3 sum3-sum1 12-2 10
25Computation of Sum of Squared Absolute Error in
O(1) time
Let and Then, Thus,
26Analysis of VOpt Algorithm
- O(n2B) time O(nB) space
- The space can be reduced (Wednesday)
- Main Question The end use of histogram is to
approximate something. - Why not find an approximately optimal (e.g.,
(1e) ) histogram?
27If you had to improve something ?
Via Wavelets ssq O(n) time O(B2/?2) space
O(n2B) time O(nB) space
(1?) streaming O(nB2/?) time. O(B2/?) space
(1?) streaming ssq O(n) time. O(B/?2) space
O(n2B) time O(n) space
(1?) streaming O(n) time. O(B2/?) space
offline O(n) time. O(B2/?) space
Offline O(n) time. O(nB/?) space
28Take 1
- For i1 to n do
- For K1 to B do
- For j1 to i-1 do (split point for the last
bucket) - OPT 1i, k
- Min OPT1i, k,
OPT1j,k-1 SQERR(j1,i) - OPT1..j,k is increasing
- SQERR(j1,i) is decreasing
- Question Can we use the monotonicity for
searching the minimum ?
As j increases
29No
- Consider a sequence of positive y1,y2,,yn
- F(i) ?i yi and G(i) F(n) F(i-1)
- F(i) monotonically increasing Opt1..j,k-1
- G(i) monotonically deceasing SQERR(j1,i)
-
- ?(n) time is necessary to find mini F(i)G(i)
- Open Question Does it extend to ?(n2) over the
entire algorithm ?
30What gives ?
- Consider a sequence of positive y1,y2,,yn
- F(i) ?i yi and G(i) F(n) F(i-1)
- Thus, F(i)G(i) F(i) xi
- Any i gives a 2 approximation to mini F(i)
G(i) - F(i) G(i) F(n) xi 2 F(n)
- mini F(i) G(i) is at least F(n)
31Round 1
- Use a histogram to approximate the fn
- Bootstrap!
- Approximate the increasing fn in powers of (1d)
- Right end pt is (1?) approximation of left end
pt
32What does that do ?
- Consider evaluating the fn at the two endpoints
- Proof by picture.
h
h
By construction.
Why ?
By monotonicity!
33Therefore
- The right hand point is a (1d) approximation!
- Holds for any point in between.
- OPTxSQERRx1 OPTaSQERRb
- OPTb/(1 d) SQERRb
- OPTb SQERRb/ (1d)
- Are we done ?
- Not quite yet.
- What happens for Bgt2 ? we do not compute
OPTi,b exactly !!
SQERR
OPT
h
a
b
34Zen and the art of histograms
- Approximate the increasing fn in powers of (1d)
- Right end pt is (1d) approximation
- Prove by induction that the error is (1?)B
- This tells us what ? should be (small), in fact
if we set ??/2B then (1?)B 1?
35Complexity analysis
- of intervals p (B/?) log n
- Why ?
- c(1d) (p-1) nR2 and d ?/(2B)
- R is the largest number in data
- Assume R is polynomially bounded by n
- Running time nB (B/?) log n
- Why are we approximating the increasing function
? Why not the decreasing one ?
36The first streaming model
- The signal X is specified by xi arriving in
increasing order of i - Not the most general model
- But extremely useful for modeling time series data
37Streaming
?1b xi ?1b x2i
Need to store ?1a xi ?1a x2i
a
b
Required space is (B2/?) log n
38VOpt Construction O(Bn2)
- Jagadish et al. VLDB 1998
- OPT(i,k) min1jltiOPT(j,k-1)SQERR(j1,i)
10
7
8
9
3
4
5
6
1
2
OPTj,k
8
9
10
4
5
6
7
2
3
1
OPTj,k-1
n
n
39AHIST-S (1e) Approximation
- AOPTj,k min1jltiAOPTbjp,k-1SQERRbjp1,n
- O(B2e-1nlogn) time and O(B2e-1logn) space
AOPTj,k
(1d)a b
AOPTj,k-1
n
(1d)a lt c
P
40The overall idea
The natural DP table
The approximate table
41Do ?s talk to us ?
execution time
B
42Take 2 GK02
- Sliding window streams
- Potentially infinite data interested in the
last n only - Q Suppose we constructed histogram for 1..n
and now want it for 2..(n1) - Previous idea is a dead on arrival.
- Consider 100,1,2,3,4,5,7,8,
43Formal problem
- Maintain a data structure
- Given an interval a,b construct a B bucket
histogram for a,b - Compute on the fly
- Generalizes the window!
- Generalizes VOpt when a1,bn
44Reconsider the take 1
- We are evaluating
- Left to right, i.e.,
But we are still evaluating this guy !
45A brave new world
- Assume a O(n) size buffer holds xi values
- The previous algorithm was
- Several issues
- Which values are necessary and sufficient
- We are not evaluating all values what induction
?
46A trickier proof
a
c
b
d
g
f
d
c
b
c
g
g
a
d
g
a
a
47GK02 Enhanced (1e) Approximation
- Lazy evaluation using Binary Search
- O(B3e-2log3n) time and O(n) space
- Pre-processing takes O(n) time SUM and SQSUM
(1d)a z
AOPTj,k
(1d)a lt z1
AOPTj,k-1
n
P
48GK02 Enhanced (1e) Approximation
- Creates all of B interval lists at once
- The values of necessary AOPTj,k are computed
recursively to find the intervals ajp, bjp
where bjp is the largest z s.t. - (1e) AOPTajp,k (1e) AOPTz,k
- (1e) AOPTajp,k lt (1e) AOPTz1,k
- Note that AOPT increases as z increases
- Thus, we can use binary search to find z
- O(n) space of SUM and SQSUM arrays needs to be
maintain to allow the computation of SQERR(j1,i)
in O(1) time - O(nB3e-2log3n) time and O(n) space
49Take 2 summary
- O(n) space and O(nB3?-2log2 n) time
- Is that the best ? Obviously no.
50Take 3 AHIST-L- ?
- Suppose we knew ? OPT 2 ? then
- Instead of powers of (1?/B) additive terms of
??/(2B) then - Time is O(B3?-2 log n)
- To get ? ?
- 2-approximation ?O(1)
- a binary search O(log n)
- Thus, O(B3 log n log n)
- Overall O(nB3(?-2logn)log n) time and O(nB2/?)
space
O(B/?)
51Take 4 AHIST-B
- Consider the take 4 algorithm.
- How to stream it ?
On the new part
Overall
M
52Not done yet
k
K-1
1r
- First find an ? O(1) approximation, then proceed
back and refine
53The running space-time
- B( insertions)(log M)(log ?) where ?O(B?-1 log
n) is the length of a list - Space
- Who cares and why ?
54Asymptotics
- For fixed B and ?, we can compute a (1 ?)
piecewise constant representation in - O(n log log n) time and O(log n) space or
- O(n) time and O(log n log log n) space.
- Extends to degree d polynomials, space increases
by O(d) and time is O(nd d3)
55Our friendly ? Running time
Execution Time
B
56Our friendly ? Error
(Error VOPT)/VOPT
B
57What you analyze is what you get
Execution time
n
58Questions ?
59The status for VOPT
- Saves space across all algorithms except
algorithms which extend to general error measure
over streams
60For general error measure, IF
- The error of a bucket only depends on the values
in the bucket. - The overall error function, is the sum of the
errors in the buckets. - The data can be processed in O(T) time per item
such that in O(Q) time we can find the error of a
bucket, storing O(P) info. - The error (of a bucket) is a monotonic function
of the interval. - The value of the maximum and the minimum nonzero
error is polynomially bounded in n.
61Then
- Optimum histogram in time O(nTn2(BQ)) time and
O(n(PB)) space - (1?)-approximation in
- O(nTnQB2?-1 log n) time and O(PB2?-1 log n)
space, - O(nT QB3(log n ?-2 )log n) time and O(nP)
space - O(nT) time and space
- O(PB2 ?-1 log n (QB/T) B?-1 log2 (B?-1 log n)
log n loglog n)
62Splines and piecewise polynomials
- Instead of
- If we wanted
- Or maybe
63The overall idea
- If we want to represent xa1,,xb by
p0p1(x-xa)p2(x-xa)2 - The solution is as above
- We need O(d) times (than before) space and need
to solve the system. This means an increase by a
factor O(d3) in time.
64Another useful example Relative error
- Issue with global measures Estimating 10 by 20
and 1000 by 1010 has the same effect - The above is ok if we are querying for 1000 a
1000 times and 10 times for 10 (point queries
and VOPT measure) - But consider approximating a time series. We may
be interested in per point guarantees.
65Sum of Squared Relative Error for a Bucket
- Relative error for a bucket (sr,er,xr)
- Since A gt 0, it is minimized when xrB/A
- The minimum value is C-B2/A
- If the aggregated sum of A, B and C are stored,
ERRSQ(i,j) can be computed in O(1) time - Optimal histogram can be constructed in O(Bn2)
time Approximation algorithms follow
66Maximum Error and the l1 metric
67Maximum Error Histograms
- A bucket (sr,er,xr) with a numbers x1, x2, ,
xn s.t. - sr starting position
- er ending position
- xr representative value
- Maximum Error is given by
- Maximum relative error is defined as
68Maximum Error of a bucket
- Given numbers x1, x2, , xn s.t.
- Maximum Error is given by ErrMminxr maxi xi
xr - What is the best xr
- (xminxmax)/2
69Maximum Relative Error of a set
- Given a set of numbers x1, x2, , xn
- max the maximum of x1, x2, , xn
- min the minimum of x1, x2, , xn
- c A sanitary constant
- Some function of c,max,min
- E.g., when c min max the error is
- Optimal maximum relative error for a bucket can
be computed in O(1) time
70The Naïve Optimal Algorithm
- for i 1 to n do
- OPTMi,1 ERRM(i,i)
- for K 1 to B do
- max - 8 min 8 OPTMi,k 8
- for j i-1 to 1 do
- if (max lt xj1) max xj1
- if (min gt xj1) min xj1
- OPTMi,k minOPTMi,k ,
- max( OPTMj,k-1,
ERRM(j1,i) ) -
-
-
- ERRM(j1,i) can be obtained in O(1) time
- O(Bn) space and O(Bn2 ) time optimal algorithm
71An Improved Optimal Algorithm
- OPTMi,j minjmax( OPTMj,k-1, ERRM(j1,i))
- Observations
- OPTMj,k-1 is an increasing function
- ERRM(j1,i) is a decreasing function
- To compute minx max ( F(x), G(x) ) where F(x)
and G(x) are non-decreasing and non-increasing
functions - We can perform binary search for the value of x
such that F(x) gt G(x) and F(x-1) lt G(x-1) - The minimum is min G(x-1) and F(X)
72An Improved Optimal Algorithm
- OPTMi,j minmax(OPTMj,k-1, ERRM(j1,i))
- We can improve the most inner loop of Naïve
algorithm in O(log n) time. - However, ERRM(j1,i) cannot be computed in O(1)
time any more - Using an interval tree, we can compute min and
max values for j1, i, i.e. ERRM(j1,i), in
O(log n) time - Thus, our improved algorithm takes O(Bn log2n)
time with O(Bn) space
73An Interval Tree Example
1,8
Min Interval
5,8
1,4
decomposeRight
decomposeLeft
1,2
3,4
5,6
7,8
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
The steps of decomposing 2,4 with an interval
tree
74Consider another solution
- Make the first bucket as large as possible
- i.e. push the boundary right
- E.g. in the figure we can.
- As long as the max and min is same
- Why will we have to stop ?
75Consider another solution (2)
- In this example we cannot
- But may be the error comes from a different
bucket! - Heres one idea
- Given an i, find Err1,i
- If i is small Err1,i OPT
- If i is large Err1,i OPT
- How ?
76How ?
- Assume given an interval a,b, we can find the
min and max, and therefore Erra,b - With O(n) time and space preprocessing, we can
find Err in O(log n) time. (interval tree) - Checkp,q,b,?
- If q gt p (for b 0), we are done.
- Otherwise,
- Find mid, s.t. Errp,mid ? and Errp,mid1 gt
? - Checkmid1,q,b-1,?
- O(B log2 n)
- Binary Search log n log n (to find min and max
for Err) - Invocation of Check B times
77Now for the original problem
- By binary search, find largest s such that
- When ?Err1,s and ?Err1,s1,
- Check1,n, B-1 ?false and Check1,n, B-1,
?true - Now OPT? or the best B-1 bucket error of
s1,n - A recursive algorithm!
- T(B) log n B log2 n T(B-1) ¼ O(B2 log3 n) !!
78Summary
- In O(n B2 log3 n) time and O(n) space we can
find the optimum error. - What do we do if
- Stream or
- Less than O(n) space ?
- Approximate, using some of the old ideas
79Short break !
- When we return
- Range Query Histograms
- Wavelets
- Optimum synopsis
- Connection to Histograms
- Overall ideas and themes
80Range Query Histograms
81A more synopsis structure
- Instead of estimating the value at a point we are
interested in sum of the values in
intervals/ranges. - Clearly, very useful.
- Clearly we need new optimization.
- E.g.,
82A more difficult problem
- Only special cases solved (satisfactorily)
- Hierarchies
- Prefix ranges All ranges of form 1,j as j
varies - Complete Binary Ranges
- General hierarchies
- Uniform Ranges all ranges
83Status Range Query
84The uniform case
- Consider a sequence X0,x1,x2,,xn
- Define the operators
- ?(g)i?j i gj is the prefix sum
-
85Unbiased
- Suppose H is a histogram such that F?(X-H) is
s.t. ?i Fi0 - Or think of ?i ?rlti (Xr-Hr)0
- Claim Error of using H to answer range queries
for X is twice the error of using ?(H) to answer
point queries about ?(X) !
86The main idea
- Define Gi?rlti Xi Hi ?(X)i - ?(H)i
- Now ?i Gi 0 if H is unbiased
- Pick a RANDOM elements u
- Expected Gu 0
- Pick two random elements u,v
- Expected (Gu-Gv)2Expected error of using H
to answer range queries for X - But that is equal to 2 Expected Gu2
87A simple approximation
- What we want is
- Hard
- But we know how to get
?(H)
?(X)
Piecewise linear histograms!
88An easy trick
- We can also find
- A buffer of Size 1 after each bucket
- Use it as a patch-up
- 2B buckets
- Same error as OPT
- Approximation algorithms try to find the
continuous variant
89The Synopsis Construction Problem
- Formally, given a signal X and a dictionary ?i
find a representation F?i zi ?i with at most B
non-zero zi minimizing some error which a fn of
X-F - In case of histograms the dictionary was the
set of all possible intervals but we could only
choose a non-overlapping set.
90The eternal what if
- If the ?i are designed for the data do we get
a better synopsis ? - Absolutely!
- Consider a Sine wave
- Or any smooth fn.
- Why though ?
91Representations not piecewise const.
- Electromagnetic signals are sine/cosine waves.
- If we are considering any process which involve
electromagnetic signals this is a great idea. - These are particularly great for representing
periodic functions. - Often these algorithms are found in DSP (digital
signal processing chips) - A fascinating 300 years of history in Math !
92A slight problem
- ni nill cfme back tf Ffurier
- Fourier is suitable to smooth natural processes
- If we are talking about signals from man-made
processes, clearly they cannot be natural (and
hardly likely to be smooth) - More seriously, discreteness and burstiness
93The Wavelet (frames)
- Inherits properties from both worlds
- Fourier transform has all frequencies.
- Considers frequencies that are powers of 2 but
the effect of each wave is limited (shifted)
94Wavelets
- What to do in a discrete world ?
The Haar Wavelets (1910) !
95The Haar Wavelets
- Best energy synopsis amongst all wavelets (we
will see more later) - Great for data with discontinuities.
- A natural extension to discrete spaces
- 1,-1,0,0,0,0, 0,0,1,-1,0,0,,0,0,0,0,1,-1,
- 1,1,-1,-1,0,0,0,0,,0,0,0,0,1,1,-1,-1,
96The Haar Synopsis Problem
- Formally, given a signal X and the Haar basis
?i find a representation F?i zi ?i with at
most B non-zero zi minimizing some error which a
fn of X-F - Lets begin with the VOPT error (X-F22)
97The Magic of Parseval (no spears)
- The l2 distance is unchanged by a rotation.
- A set of basis vectors ?i define a rotation iff
- h ?i,?j i ?ij , i.e.,
- Redefine the basis (scale) s.t. ?i2 1
- Let the transform be W
- Then X-F2 W(X-F)2W(X) W(F)2
- Now W(F)z1,z2,zn and so
- W(X) W(F)2 ?i (W(X)i zi)2
98What did we achieve ?
- Storing the largest coefficients is the best
solution. - Note that the fact ziW(X)i is a consequence of
the optimization and IS NOT a specification of
the problem. - More on that later.
99What is the best algorithm ?
- How to find the largest B coefficients of the set
x1,x2, ? - Cascade Algorithm.
- Recall the hierarchical nature.
100Cascade algorithm ?
- Given a,b represent them as (a-b) and (ab)
- Divide by sqrt(2) so that the sum of squares etc
- Running time O(n)
1
4
5
6
101Surfing Streams
- Notice that once the left half is done we only
need to remember the - A stream algorithm is natural
102Surfing Streams
- Have an auxillary structure that maintains top B
of a set of numbers
Where else have you seen this ?
Reduce Merge Paradigm Also used in clustering
data streams
103In summary
- Given a series of x1,x2,xi,xn in increasing
order of i we can find (maintain) the largest B
coefficients in O(n) time and O(Blog n) space - Ok, but only for X-F2
104Extended Histograms
- What do we do in presence of multiple
dimensions/measures ? - Use multi-dim transforms
- Use many 1 D transforms
- Strategy Use a Flexible scheme that allows us to
store the index and a bitmap to indicate which
measures are stored.
105How to solve it ?
- For the basic 1-D problem we need to choose the
largest B coefficients - Use Parseval to transform error of data to
choosing/not choosing coefficients - Here we have bags
- We can choose coefficient j with bitmap
- 0100 using HS space
- 0101 using H2S space
- 1111 using H4S space
106Is 0101 better than 1100 ?
- Subproblem
- Given the fact that we have settled on choosing 2
coefficients for j, which 2 ? - It is the largest 2 again!
- Basically we can choose a set of indices j and
decide how many coefficients we choose for each j
What does this remind you of ?
107Knapsack
- Each item j is available with M different
versions. - Cost of the rth version is HrS. The profit is an
increasing function of r. - Can choose only one version.
108Strange roadbumps
- Optimal profit Optimal error total energy
- The relationship does not hold in approximation.
- 991100. Approximating 99 by 95 increases error
by 400 - We will return to this.
109Many questions
- What do we do for other error measures ?
- What is the connection with Histograms ?
- Positives Some direction
- Cascade algorithm
- Hierarchy of coefficients
110Non l2 errors
111Storing coefficients is suboptimal
- Recall the complicate 1,4,5,6
- We want a 1 term summary and the error is l1
- What do we store ?
What is the final Result ?
3.5,3.5,3.5,3.5 What is the transform ?
1
4
5
6
7,0,0,0 But the set of coefficients available
8,?,?,?
112What to do ?
- Search where there is light.
- Restricted problem. Useful if the synopsis has
more than one use. - Think outside the coefficients
- Probabilistic Rounding
- Search (cleverly) over the whole space
113The Best Restricted Synospis
- Maximum Error.
- A value (at the leaf) is affected by only the
ancestors. - of ancestors log n
- Guess/try all of the set!
- O(n) choices
- Start bottom up and use a DP to choose the best B
coefficients overall. - Works for a large number of error measures.
114Analysis
- At each internal node j we need to maintain the
table - Errorj,Ancestor set,b the contribution to
the minimum error by only the subtree rooted at j
when using b or less coefficients (for the
subtree) - Size of table O(n2B)
- Time O(n2B log B) depends on measure
- But we can do better.
115Faster Restricted Synospis
- A better cut
- Number of coefficients in a subtree is at most
size1 - Size of the table storing Errj,Ancestor Set,b
- Remains constant as we go up the levels!
- Ancestor set decreases by 1
- b takes twice as many values
-
- O(n2) algorithm
- We can also reduce the space to O(n)
116Thinking beyond the coefficient
- Probabilistic Rounding
- Start from the coefficients.
- Randomly round most of them to 0
- A few are rounded to non-zero values
- E.g. set zi? with prob. e-W(X)i/? and 0
otherwise - Has promise (correct expectation, variance)
- Two issues,
- The quality is unclear (wrt the original
optimization) - The Expected number of non-zero coefficients is B
- The variance is large, so with reasonable prob
2B
117More exploration reqd
- Interestingly the method (as proposed) eliminates
a region of search space - We can construct examples that the optimum lies
in that range. - But is an interesting method and likely (I/we are
guessing) preserves more errors than one
simultaneously (multi-criterion optimization)
118What is the optimum strategy
- Consider the best set of coefficients
- Zz1,z2,zn
- nudge them a bit by making them multiples of
some ? - The extra error is small (and a fn of ?)
- In fact each point sees ? log n
- By reducing ? we can get (1?) approx
119A straightforward idea
- But we still need to find the solution
The ancestor set is unimportant what is
important is their combined effect. Try all
possible values (multiples of ?, but we still
need to fix the range)
120The graphs the data
121The graphs l1
122Relative Error (small B), Relative l1
123The times
124What have we seen so far
- Wavelet representation of l_2 error
- Streaming
- Wavelet representation for non l_2 error
- Restricted
- Unrestricted
- Stream
125A return to histograms
126Easy relationships
- A B-bucket (piecewise constant) histogram can be
represented by 2B log n Haar wavelet
coefficients. - Why
- Only the 2B boundary points matter
- A B-term Haar wavelet synopsis can be represented
by 3B-bucket histogram. - Why
- Each wavelet basis creates 3 extra pieces from 1
line
127Anything else ?
Totally! We can use Wavelets to get
(1\epsilon)-approximate V-optimal
histograms. In fact the method has advantages
128Histograms, Take 5
- A B-term Histogram can be represented by cB log
n wavelet terms. - What is we choose the largest cB log n wavelet
terms ?
129Need not be good.
- The best histogram has the cB log n wavelets
aligned such that the result is B buckets. - The best cB log n coefficients are all over the
place and give us 3cB log n buckets. - All hope is lost ?
130If at first you dont succeed
- We repeat the process and also keep the next cB
log n coefficients - No.
- But notice that the energy drops.
- Energy X2W(X)2
- Basic intuition If there were a lot of
coefficients which were large then the best V-Opt
histogram MUST have a large error. - Why?
131The robust property
- Look at W(X)-W(H)2X-H2
- W(H) has cB log n entries
- If W(X) has cB?-2 log n large entries ..
132A strange idea in 1000 words
- Consider the projection to the largest cB?-2 log
n wavelet terms - Is
¼
?
133No. But flatten the fn
? X
¼
134In fact
- If we chose (B?log n)O(1), i.e., large, number of
coefficients then the boundary points of the
coefficients are (approximately) good boundary
points for a VOPT histogram.
135The take away
- Im ok youre ok
- If Im not ok then youre not ok too.
- An oft repeated approximation paradigm
- if there are too many coefficients then my
algorithm is doomed but so is anyone elses, and
therefore I am good - if there are not too many coefficients then
were good.
136The Extended Wavelets in l2
- We can store the largest coefficients
- If there are too many coefficients which are
large then optimum error is large. - Otherwise we repeatedly take out coefficients
till taking out coefficients will not reduce the
error any more. - DP on the set of coefficients taken out.
137The Full Monty update streams
- So far we have been looking at X arriving as
x1,x2, - What happens when X is specified by a stream of
updates ? - i.e., (i,di)change xi to xi di
138Sketches Stream Embeddings
- Basically Dimensionality reduction
- To compute the histogram H of signal X
- Compute embedding g(X) to fit the space
- Compute H s.t. g(H) is close to g(X)
139Linear Embeddings
- JL Lemma
- A is a Random Matrix drawn
from Gaussian distribution. - Too many elements in matrix!
- Use Pseudorandom Generators
- P-Stable distribution for
140What it achieves
Increasing the coordinate is adding the column to
sketch.
A
x
141Suppose we knew the intervals
- The best histogram minimizes
- X-H2 ¼? AX AH 2
AX is a vector, AH is a linear function of B
values
We have a min sq. error program, solvable in
ptime more involved in 1-norm.
142Cannot do that
- X-H2 W(X) W(H)2 ¼? AW(X) AW(H) 2
Idea Use the linear map to find the large
number of Wavelet coefficients (top k problem
using sketches) Use similar ideas to Take 5 to
get the final solution.
143The return of the pink Fourier
- Assuming x1,x2,,xi, arrive in increasing order
of i, find/maintain the top k Fourier
coefficients. - Use the strategy
- Assume that there are O(k log n) frequencies and
try to find them. - If not, we are doomed and so is everyone.
- So we are ok.
- For the 3rd time
144What about top k
- Assuming x1,x2,,xi, are specified by a stream
of updates find/maintain the top k values (all
elements with frequency 1/k or more). - Use the strategy
- Assume that there are O(k log n) elements and try
to find them. - If not, we are doomed and so is everyone.
- So we are ok.
- Again!
- Use Group testing
- 20 questions, bit chasing is an heavy item in
the first half ? You can use norms or you can
use collisions (hashes).
145From optimization to learning
- We are trying to learn a pure signal that has
few coefficients - A general paradigm.
146The Meaning of Life
- In Summary (high level)
- Approximation is very useful for synopsis
construction (the execution time speedups plus
the end use of synopsis is approximation only) - Synopses are usually applied on large data.
Asymptotic behaviour matters - The exact definition of the optimization is
important. How natural is natural - Few degrees of separation between the synopsis
structures. They are related. They should be. But
then we can use algorithmic techniques back and
forth between them.
147The Summary (contd.)
- In algorithm design terms
- Most synopsis construction problems involve DP.
Investigating how to change the DP to get
approximation, space efficient algs., is often
useful. - Search techniques (computation geometry) search
exponents first are useful. - What you analyze (carefully) is often what you
would get asymptotically. The usual techniques we
use for pruning etc., can be analyzed and and
shown to be better. - Reduce-Merge ) Streaming ?
- The top k in various disguises. Group testing
matters.
148What lies ahead
- Ok. So 1 D histograms have good algos.
- 2 D ?
- NP-Hard.
- Some approximation algorithms known.
- Q In linear time and sublinear space what can we
do ? - Sketch based results. Long way to go.
149What lies ahead
- So 1 D Haar Wavelets have good algos (non l2).
- 2 D ?
- Unlikely to be NP-Hard
- Quasi-polynomial time nlog n approximation
algorithms known. - Q In linear time and sublinear space what can we
do ?
150What lies ahead
- So 1 D Haar Wavelets have good algos (non l2).
- Non Haar ? Daubechies. Multifractals.
- Unlikely to be NP-Hard
- Quasi-polynomial time nlog n approximation
algorithms known. - What can we do ?
151What lies ahead
- All the update stream results are based on l2
error because of Johnson Lindenstrauss (and some
on lp for 0ltp 2) - What about other errors ?
- Will require new techniques for streaming.
152Notes (not from the underground)
- The VOPT definition
- Poosala, Haas, Ioannidis, Shekita, SIGMOD 96.
- The VOPT histogram algorithm
- Jagadish, Koudas, Muthukrishnan, Poosala, Sevcik,
Suel, VLDB 98. - Take 1
- Guha, Koudas, Shim, STOC, 01.
- Take 2
- Guha, Koudas, ICDE, 02.
- Take 3 4
- Guha, Koudas, Shim, TODS, 05.
- Take 5
- Guha, Indyk, Muthukrishnan, Strauss, ICALP, 02.
- Relative Error Histograms
- Guha, Shim, Woo, VLDB, 04.
- Maximum Error histograms
- Nicole, J. of Parallel Distributed Computing,
1994. - (Muthukrishnan, Khanna, Skiena, ICALP, 97),
- Guha, Shim, (here) 05.
153More Notes
- Range Query Histograms
- Muthukrishnan, Strauss, SODA, 03.
- The Full Monty
- Gilbert, Guha, Indyk, Kotidis, Muthukrishnan,
Strauss, STOC, 02. - Parseval stuff
- Parseval, (margin of notebook ?), 1799.
- Folklore sum of squares and l2
- The mandala
- Surfing Wavelets
- Gilbert, Kotidis, Muthukrishnan,Strauss, VLDB,
01 - Probabilistic Synopsis
- Gibbons, Garofalakais, SIGMOD, 02 (also TODS,
04) - Maximum error (restricted version)
- Garofalakis, Kumar, PODS, 04.
154Notes again
- Faster Restricted Synopsis
- Guha, VLDB, 05.
- Unrestricted non l2 error
- Guha, Harb, KDD, 05 new results
- Extended Wavelets
- Deligiannakis Rossopolous, SIGMOD 03.
- Guha, Kim, Shim, VLDB 04.
- Streaming Fourier approximation
- Gilbert, Guha, Indyk, Muthukrishnan, Strauss,
STOC, 02 - Learning Fourier Coefficients
- Linial, Kushilevitz, Mansour, JACM, 93
- JL Lemma
- Johnson, Lindenstrauss, , 84.
- Sketches
- Alon, Matias, Szegedy, JCSS, 99.
- Feigenbaum Kannan, Vishwanathan, Strauss, FOCS,
99 - Indyk, FOCS, 00
155Roads not taken
- (but are relevant to synopsis)
- Property Testing
- Weighted sampling and SVD
- Median Finding
- Sampling based estimators
-