Offline, Stream and Approximation Algorithms for Synospis Construction

About This Presentation

Title:

Offline, Stream and Approximation Algorithms for Synospis Construction

Description:

A tutorial on synopsis construction algorithms. 2. About this Tutorial ... What is a synopsis ? Hmm. Any 'shorthand' representation. Clustering! SVD! ... – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 154

Provided by: sudi4

Category:

more less

Transcript and Presenter's Notes

Title: Offline, Stream and Approximation Algorithms for Synospis Construction

1
Offline, Stream and Approximation Algorithms for
Synospis Construction

Sudipto Guha University of Pennsylvania
Kyuseok Shim Seoul National University

2
About this Tutorial

Information is incomplete and could be inaccurate
Our presentation reflects our understanding which
may be erroneous

3
Synopses Construction

Where is the life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in
information?
T. S.
Eliot, from The Rock.
Routers
Sensors
Web
Astronomy and sciences
Too much data too little time.

4
The idea

To see the world in a grain of sand
Broad characteristics of the data
Compression
Dimensionality Reduction
Approximate query answering
Denoising, Outlier Detection and a broad array of
signal processing

5
What is a synopsis ?

Hmm.
Any shorthand representation
Clustering!
SVD!
In this tutorial we will focus on signal/time
series processing

6
The basic problem

Formally, given a signal X and a dictionary ?i
find a representation F?i zi ?i with at most B
non-zero zi minimizing some error which is a fn
of X-F
Note, the above extends to any dim.

7
Many issues

What is the dictionary ?
Which B terms ?
What is the error ?
What are the constraints ?

8
Many issues

What is the dictionary ?
Set of vectors
Maybe a basis
Which B terms ?
What is the error ?
What are the constraints ?

Top K
9
Many issues

What is the dictionary ?
Set of vectors
Maybe a basis
Which B terms ?
What is the error ?
What are the constraints ?

Haar Wavelets
Also Fourier, Polynomials,
10
Many issues

What is the dictionary ?
Set of vectors
May not be a basis
Histograms
There are n choose 2 vectors
But since we impose a non-overlapping restriction
we get a unique representation.
Which B terms ?
What is the error ?
What are the constraints ?

11
Many issues

What is the dictionary ?
Which B terms ?
First B ?
Best B ?
What is the error ?
What are the constraints ?

Why should we choose first B ?

B vs 2B numbers
Also

12
Approximation theory

Discipline of Math associated with approximation
of functions.
Same as our problem
Linear theory (Parseval, 1800 over two
centuries)
Non-Linear theory (Schmidt 1909, Haar 1910)
Is it relevant ? Yes. However Math treatment has
been extremal, i.e., how does the error change
as a function of B. Is that bound tight?
Note a yes answer does not say anything about
given this signal, is that the best we can do ?

13
Many issues

What is the dictionary ?
Which B terms ?
What is the error ?
This controls which B.
X-F2 is most common, used all over in
mathematics
X-F1,X-F1 are useful also
Weights. Relative error of approximation
1000 by 1010 is not so bad.
1 by 11 is not too good an idea.
What are the constraints ?

14
Many issues

What is the dictionary ?
Which B terms ?
What is the error ?
What are the constraints ?
Input ? Stream, stream of updates
Space, time, precision and range of values (for
zi in the expression F?i zi ?i )

15
In this tutorial

Histograms Wavelets
Will focus on Optimal, Approximation and
Streaming algorithms
How to get one from the other!
Connections to top K and Fourier.

16
I. Histograms.
17
VOpt Histograms

Lets start simple
Given a signal X, find a piecewise constant
representation H with at most B pieces minimizing
X-H2
Jagadish, Koudas, Muthukrishnan, Poosala, Sevcik,
Suel, 1998
Consider one bucket.
The mean is the best value.
A natural Dynamic programming formulation

18
An Example Histogram
Data Distribution
V-Optimal Histogram
19
Idea VOpt Algorithm

Within step/bucket Mean is the best.
Assume that the last bucket is j1,n.
What can we say about the rest k-1 ?

OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
20
Idea VOpt Algorithm

Within step/bucket Mean is the best.
Assume that the last bucket is j1,n.
What can we say about the rest k-1 ?

OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
21
Idea VOpt Algorithm

Within step/bucket Mean is the best.
Assume that the last bucket is j1,n.
What can we say about the rest k-1 ?

OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
22
Idea VOpt Algorithm

Dynamic programming algorithm was given to
construct the V optimal Histogram.
OPTn,k min OPTj,k-1,SQERR(j1)..n
1jltn
OPTj, k the minimum cost of representing the
set of values indexed by 1..j by a histogram
with k buckets.
SQERR(j1)..n the sum of the squared absolute
errors from (j1) to n.

23
The DP-based VOpt Algorithm

for i1 to n do
for k1 to B do
for j1 to i-1 do (split pt of k-1 bucket
hist. and last bucket)
OPTi, k min OPTi, k, OPTj,k-1
SQERRj1,i
We need O(Bn) entries for the table OPT
For each entry OPTi,k, it takes O(n) time if
SQERRj1.i can be computed O(1) time
O(Bn) space and O(Bn2) time

OPT
B
n
24
Computation of Sum of Squared Absolute Error in
O(1) time
sum(2,3) x2x3 sum3-sum1 12-2 10
25
Computation of Sum of Squared Absolute Error in
O(1) time
Let and Then, Thus,
26
Analysis of VOpt Algorithm

O(n2B) time O(nB) space
The space can be reduced (Wednesday)
Main Question The end use of histogram is to
approximate something.
Why not find an approximately optimal (e.g.,
(1e) ) histogram?

27
If you had to improve something ?
Via Wavelets ssq O(n) time O(B2/?2) space
O(n2B) time O(nB) space
(1?) streaming O(nB2/?) time. O(B2/?) space
(1?) streaming ssq O(n) time. O(B/?2) space
O(n2B) time O(n) space
(1?) streaming O(n) time. O(B2/?) space
offline O(n) time. O(B2/?) space
Offline O(n) time. O(nB/?) space
28
Take 1

For i1 to n do
For K1 to B do
For j1 to i-1 do (split point for the last
bucket)
OPT 1i, k
Min OPT1i, k,
OPT1j,k-1 SQERR(j1,i)
OPT1..j,k is increasing
SQERR(j1,i) is decreasing
Question Can we use the monotonicity for
searching the minimum ?

As j increases
29
No

Consider a sequence of positive y1,y2,,yn
F(i) ?i yi and G(i) F(n) F(i-1)
F(i) monotonically increasing Opt1..j,k-1
G(i) monotonically deceasing SQERR(j1,i)
?(n) time is necessary to find mini F(i)G(i)
Open Question Does it extend to ?(n2) over the
entire algorithm ?

30
What gives ?

Consider a sequence of positive y1,y2,,yn
F(i) ?i yi and G(i) F(n) F(i-1)
Thus, F(i)G(i) F(i) xi
Any i gives a 2 approximation to mini F(i)
G(i)
F(i) G(i) F(n) xi 2 F(n)
mini F(i) G(i) is at least F(n)

31
Round 1

Use a histogram to approximate the fn
Bootstrap!
Approximate the increasing fn in powers of (1d)
Right end pt is (1?) approximation of left end
pt

32
What does that do ?

Consider evaluating the fn at the two endpoints
Proof by picture.

h
h
By construction.
Why ?
By monotonicity!
33
Therefore

The right hand point is a (1d) approximation!
Holds for any point in between.
OPTxSQERRx1 OPTaSQERRb
OPTb/(1 d) SQERRb
OPTb SQERRb/ (1d)
Are we done ?
Not quite yet.
What happens for Bgt2 ? we do not compute
OPTi,b exactly !!

SQERR
OPT
h
a
b
34
Zen and the art of histograms

Approximate the increasing fn in powers of (1d)
Right end pt is (1d) approximation
Prove by induction that the error is (1?)B
This tells us what ? should be (small), in fact
if we set ??/2B then (1?)B 1?

35
Complexity analysis

of intervals p (B/?) log n
Why ?
c(1d) (p-1) nR2 and d ?/(2B)
R is the largest number in data
Assume R is polynomially bounded by n
Running time nB (B/?) log n
Why are we approximating the increasing function
? Why not the decreasing one ?

36
The first streaming model

The signal X is specified by xi arriving in
increasing order of i
Not the most general model
But extremely useful for modeling time series data

37
Streaming
?1b xi ?1b x2i
Need to store ?1a xi ?1a x2i
a
b
Required space is (B2/?) log n
38
VOpt Construction O(Bn2)

Jagadish et al. VLDB 1998
OPT(i,k) min1jltiOPT(j,k-1)SQERR(j1,i)

10
7
8
9
3
4
5
6
1
2
OPTj,k
8
9
10
4
5
6
7
2
3
1
OPTj,k-1
n
n
39
AHIST-S (1e) Approximation

AOPTj,k min1jltiAOPTbjp,k-1SQERRbjp1,n
O(B2e-1nlogn) time and O(B2e-1logn) space

AOPTj,k
(1d)a b
AOPTj,k-1
n
(1d)a lt c
P
40
The overall idea
The natural DP table
The approximate table
41
Do ?s talk to us ?

DJIA data from 1901-1993

execution time
B
42
Take 2 GK02

Sliding window streams
Potentially infinite data interested in the
last n only
Q Suppose we constructed histogram for 1..n
and now want it for 2..(n1)
Previous idea is a dead on arrival.
Consider 100,1,2,3,4,5,7,8,

43
Formal problem

Maintain a data structure
Given an interval a,b construct a B bucket
histogram for a,b
Compute on the fly
Generalizes the window!
Generalizes VOpt when a1,bn

44
Reconsider the take 1

We are evaluating
Left to right, i.e.,

But we are still evaluating this guy !
45
A brave new world

Assume a O(n) size buffer holds xi values
The previous algorithm was

Several issues
Which values are necessary and sufficient
We are not evaluating all values what induction
?

46
A trickier proof
a
c
b
d
g
f
d

c
b
c

g

g

a
d

g
a
a
47
GK02 Enhanced (1e) Approximation

Lazy evaluation using Binary Search
O(B3e-2log3n) time and O(n) space
Pre-processing takes O(n) time SUM and SQSUM

(1d)a z
AOPTj,k
(1d)a lt z1
AOPTj,k-1
n
P
48
GK02 Enhanced (1e) Approximation

Creates all of B interval lists at once
The values of necessary AOPTj,k are computed
recursively to find the intervals ajp, bjp
where bjp is the largest z s.t.
(1e) AOPTajp,k (1e) AOPTz,k
(1e) AOPTajp,k lt (1e) AOPTz1,k
Note that AOPT increases as z increases
Thus, we can use binary search to find z
O(n) space of SUM and SQSUM arrays needs to be
maintain to allow the computation of SQERR(j1,i)
in O(1) time
O(nB3e-2log3n) time and O(n) space

49
Take 2 summary

O(n) space and O(nB3?-2log2 n) time
Is that the best ? Obviously no.

50
Take 3 AHIST-L- ?

Suppose we knew ? OPT 2 ? then
Instead of powers of (1?/B) additive terms of
??/(2B) then
Time is O(B3?-2 log n)
To get ? ?
2-approximation ?O(1)
a binary search O(log n)
Thus, O(B3 log n log n)
Overall O(nB3(?-2logn)log n) time and O(nB2/?)
space

O(B/?)
51
Take 4 AHIST-B

Consider the take 4 algorithm.
How to stream it ?

On the new part
Overall
M
52
Not done yet
k
K-1
1r

First find an ? O(1) approximation, then proceed
back and refine

53
The running space-time

B( insertions)(log M)(log ?) where ?O(B?-1 log
n) is the length of a list
Space
Who cares and why ?

54
Asymptotics

For fixed B and ?, we can compute a (1 ?)
piecewise constant representation in
O(n log log n) time and O(log n) space or
O(n) time and O(log n log log n) space.
Extends to degree d polynomials, space increases
by O(d) and time is O(nd d3)

55
Our friendly ? Running time
Execution Time
B
56
Our friendly ? Error
(Error VOPT)/VOPT
B
57
What you analyze is what you get
Execution time
n
58
Questions ?
59
The status for VOPT

Saves space across all algorithms except
algorithms which extend to general error measure
over streams

60
For general error measure, IF

The error of a bucket only depends on the values
in the bucket.
The overall error function, is the sum of the
errors in the buckets.
The data can be processed in O(T) time per item
such that in O(Q) time we can find the error of a
bucket, storing O(P) info.
The error (of a bucket) is a monotonic function
of the interval.
The value of the maximum and the minimum nonzero
error is polynomially bounded in n.

61
Then

Optimum histogram in time O(nTn2(BQ)) time and
O(n(PB)) space
(1?)-approximation in
O(nTnQB2?-1 log n) time and O(PB2?-1 log n)
space,
O(nT QB3(log n ?-2 )log n) time and O(nP)
space
O(nT) time and space
O(PB2 ?-1 log n (QB/T) B?-1 log2 (B?-1 log n)
log n loglog n)

62
Splines and piecewise polynomials

Instead of
If we wanted
Or maybe

63
The overall idea

If we want to represent xa1,,xb by
p0p1(x-xa)p2(x-xa)2
The solution is as above
We need O(d) times (than before) space and need
to solve the system. This means an increase by a
factor O(d3) in time.

64
Another useful example Relative error

Issue with global measures Estimating 10 by 20
and 1000 by 1010 has the same effect
The above is ok if we are querying for 1000 a
1000 times and 10 times for 10 (point queries
and VOPT measure)
But consider approximating a time series. We may
be interested in per point guarantees.

65
Sum of Squared Relative Error for a Bucket

Relative error for a bucket (sr,er,xr)
Since A gt 0, it is minimized when xrB/A
The minimum value is C-B2/A
If the aggregated sum of A, B and C are stored,
ERRSQ(i,j) can be computed in O(1) time
Optimal histogram can be constructed in O(Bn2)
time Approximation algorithms follow

66
Maximum Error and the l1 metric
67
Maximum Error Histograms

A bucket (sr,er,xr) with a numbers x1, x2, ,
xn s.t.
sr starting position
er ending position
xr representative value
Maximum Error is given by
Maximum relative error is defined as

68
Maximum Error of a bucket

Given numbers x1, x2, , xn s.t.
Maximum Error is given by ErrMminxr maxi xi
xr
What is the best xr
(xminxmax)/2

69
Maximum Relative Error of a set

Given a set of numbers x1, x2, , xn
max the maximum of x1, x2, , xn
min the minimum of x1, x2, , xn
c A sanitary constant
Some function of c,max,min
E.g., when c min max the error is
Optimal maximum relative error for a bucket can
be computed in O(1) time

70
The Naïve Optimal Algorithm

for i 1 to n do
OPTMi,1 ERRM(i,i)
for K 1 to B do
max - 8 min 8 OPTMi,k 8
for j i-1 to 1 do
if (max lt xj1) max xj1
if (min gt xj1) min xj1
OPTMi,k minOPTMi,k ,
max( OPTMj,k-1,
ERRM(j1,i) )
ERRM(j1,i) can be obtained in O(1) time
O(Bn) space and O(Bn2 ) time optimal algorithm

71
An Improved Optimal Algorithm

OPTMi,j minjmax( OPTMj,k-1, ERRM(j1,i))
Observations
OPTMj,k-1 is an increasing function
ERRM(j1,i) is a decreasing function
To compute minx max ( F(x), G(x) ) where F(x)
and G(x) are non-decreasing and non-increasing
functions
We can perform binary search for the value of x
such that F(x) gt G(x) and F(x-1) lt G(x-1)
The minimum is min G(x-1) and F(X)

72
An Improved Optimal Algorithm

OPTMi,j minmax(OPTMj,k-1, ERRM(j1,i))
We can improve the most inner loop of Naïve
algorithm in O(log n) time.
However, ERRM(j1,i) cannot be computed in O(1)
time any more
Using an interval tree, we can compute min and
max values for j1, i, i.e. ERRM(j1,i), in
O(log n) time
Thus, our improved algorithm takes O(Bn log2n)
time with O(Bn) space

73
An Interval Tree Example
1,8
Min Interval
5,8
1,4
decomposeRight
decomposeLeft
1,2
3,4
5,6
7,8
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
The steps of decomposing 2,4 with an interval
tree
74
Consider another solution

Make the first bucket as large as possible
i.e. push the boundary right
E.g. in the figure we can.
As long as the max and min is same
Why will we have to stop ?

75
Consider another solution (2)

In this example we cannot
But may be the error comes from a different
bucket!
Heres one idea
Given an i, find Err1,i
If i is small Err1,i OPT
If i is large Err1,i OPT
How ?

76
How ?

Assume given an interval a,b, we can find the
min and max, and therefore Erra,b
With O(n) time and space preprocessing, we can
find Err in O(log n) time. (interval tree)
Checkp,q,b,?
If q gt p (for b 0), we are done.
Otherwise,
Find mid, s.t. Errp,mid ? and Errp,mid1 gt
?
Checkmid1,q,b-1,?
O(B log2 n)
Binary Search log n log n (to find min and max
for Err)
Invocation of Check B times

77
Now for the original problem

By binary search, find largest s such that
When ?Err1,s and ?Err1,s1,
Check1,n, B-1 ?false and Check1,n, B-1,
?true
Now OPT? or the best B-1 bucket error of
s1,n
A recursive algorithm!
T(B) log n B log2 n T(B-1) ¼ O(B2 log3 n) !!

78
Summary

In O(n B2 log3 n) time and O(n) space we can
find the optimum error.
What do we do if
Stream or
Less than O(n) space ?
Approximate, using some of the old ideas

79
Short break !

When we return
Range Query Histograms
Wavelets
Optimum synopsis
Connection to Histograms
Overall ideas and themes

80
Range Query Histograms
81
A more synopsis structure

Instead of estimating the value at a point we are
interested in sum of the values in
intervals/ranges.
Clearly, very useful.
Clearly we need new optimization.
E.g.,

82
A more difficult problem

Only special cases solved (satisfactorily)
Hierarchies
Prefix ranges All ranges of form 1,j as j
varies
Complete Binary Ranges
General hierarchies
Uniform Ranges all ranges

83
Status Range Query

Caveat

84
The uniform case

Consider a sequence X0,x1,x2,,xn
Define the operators
?(g)i?j i gj is the prefix sum

85
Unbiased

Suppose H is a histogram such that F?(X-H) is
s.t. ?i Fi0
Or think of ?i ?rlti (Xr-Hr)0
Claim Error of using H to answer range queries
for X is twice the error of using ?(H) to answer
point queries about ?(X) !

86
The main idea

Define Gi?rlti Xi Hi ?(X)i - ?(H)i
Now ?i Gi 0 if H is unbiased
Pick a RANDOM elements u
Expected Gu 0
Pick two random elements u,v
Expected (Gu-Gv)2Expected error of using H
to answer range queries for X
But that is equal to 2 Expected Gu2

87
A simple approximation

What we want is
Hard
But we know how to get

?(H)
?(X)
Piecewise linear histograms!
88
An easy trick

We can also find
A buffer of Size 1 after each bucket
Use it as a patch-up
2B buckets
Same error as OPT
Approximation algorithms try to find the
continuous variant

89
The Synopsis Construction Problem

Formally, given a signal X and a dictionary ?i
find a representation F?i zi ?i with at most B
non-zero zi minimizing some error which a fn of
X-F
In case of histograms the dictionary was the
set of all possible intervals but we could only
choose a non-overlapping set.

90
The eternal what if

If the ?i are designed for the data do we get
a better synopsis ?
Absolutely!
Consider a Sine wave
Or any smooth fn.
Why though ?

91
Representations not piecewise const.

Electromagnetic signals are sine/cosine waves.
If we are considering any process which involve
electromagnetic signals this is a great idea.
These are particularly great for representing
periodic functions.
Often these algorithms are found in DSP (digital
signal processing chips)
A fascinating 300 years of history in Math !

92
A slight problem

ni nill cfme back tf Ffurier
Fourier is suitable to smooth natural processes
If we are talking about signals from man-made
processes, clearly they cannot be natural (and
hardly likely to be smooth)
More seriously, discreteness and burstiness

93
The Wavelet (frames)

Inherits properties from both worlds
Fourier transform has all frequencies.
Considers frequencies that are powers of 2 but
the effect of each wave is limited (shifted)

94
Wavelets

What to do in a discrete world ?

The Haar Wavelets (1910) !
95
The Haar Wavelets

Best energy synopsis amongst all wavelets (we
will see more later)
Great for data with discontinuities.
A natural extension to discrete spaces
1,-1,0,0,0,0, 0,0,1,-1,0,0,,0,0,0,0,1,-1,
1,1,-1,-1,0,0,0,0,,0,0,0,0,1,1,-1,-1,

96
The Haar Synopsis Problem

Formally, given a signal X and the Haar basis
?i find a representation F?i zi ?i with at
most B non-zero zi minimizing some error which a
fn of X-F
Lets begin with the VOPT error (X-F22)

97
The Magic of Parseval (no spears)

The l2 distance is unchanged by a rotation.
A set of basis vectors ?i define a rotation iff
h ?i,?j i ?ij , i.e.,
Redefine the basis (scale) s.t. ?i2 1
Let the transform be W
Then X-F2 W(X-F)2W(X) W(F)2
Now W(F)z1,z2,zn and so
W(X) W(F)2 ?i (W(X)i zi)2

98
What did we achieve ?

Storing the largest coefficients is the best
solution.
Note that the fact ziW(X)i is a consequence of
the optimization and IS NOT a specification of
the problem.
More on that later.

99
What is the best algorithm ?

How to find the largest B coefficients of the set
x1,x2, ?
Cascade Algorithm.
Recall the hierarchical nature.

100
Cascade algorithm ?

Given a,b represent them as (a-b) and (ab)
Divide by sqrt(2) so that the sum of squares etc
Running time O(n)

1
4
5
6
101
Surfing Streams

Notice that once the left half is done we only
need to remember the
A stream algorithm is natural

102
Surfing Streams

Have an auxillary structure that maintains top B
of a set of numbers

Where else have you seen this ?
Reduce Merge Paradigm Also used in clustering
data streams
103
In summary

Given a series of x1,x2,xi,xn in increasing
order of i we can find (maintain) the largest B
coefficients in O(n) time and O(Blog n) space
Ok, but only for X-F2

104
Extended Histograms

What do we do in presence of multiple
dimensions/measures ?
Use multi-dim transforms
Use many 1 D transforms
Strategy Use a Flexible scheme that allows us to
store the index and a bitmap to indicate which
measures are stored.

105
How to solve it ?

For the basic 1-D problem we need to choose the
largest B coefficients
Use Parseval to transform error of data to
choosing/not choosing coefficients
Here we have bags
We can choose coefficient j with bitmap
0100 using HS space
0101 using H2S space
1111 using H4S space

106
Is 0101 better than 1100 ?

Subproblem
Given the fact that we have settled on choosing 2
coefficients for j, which 2 ?
It is the largest 2 again!
Basically we can choose a set of indices j and
decide how many coefficients we choose for each j

What does this remind you of ?
107
Knapsack

Each item j is available with M different
versions.
Cost of the rth version is HrS. The profit is an
increasing function of r.
Can choose only one version.

108
Strange roadbumps

Optimal profit Optimal error total energy
The relationship does not hold in approximation.
991100. Approximating 99 by 95 increases error
by 400
We will return to this.

109
Many questions

What do we do for other error measures ?
What is the connection with Histograms ?
Positives Some direction
Cascade algorithm
Hierarchy of coefficients

110
Non l2 errors
111
Storing coefficients is suboptimal

Recall the complicate 1,4,5,6
We want a 1 term summary and the error is l1
What do we store ?

What is the final Result ?
3.5,3.5,3.5,3.5 What is the transform ?
1
4
5
6
7,0,0,0 But the set of coefficients available
8,?,?,?
112
What to do ?

Search where there is light.
Restricted problem. Useful if the synopsis has
more than one use.
Think outside the coefficients
Probabilistic Rounding
Search (cleverly) over the whole space

113
The Best Restricted Synospis

Maximum Error.
A value (at the leaf) is affected by only the
ancestors.
of ancestors log n
Guess/try all of the set!
O(n) choices
Start bottom up and use a DP to choose the best B
coefficients overall.
Works for a large number of error measures.

114
Analysis

At each internal node j we need to maintain the
table
Errorj,Ancestor set,b the contribution to
the minimum error by only the subtree rooted at j
when using b or less coefficients (for the
subtree)
Size of table O(n2B)
Time O(n2B log B) depends on measure
But we can do better.

115
Faster Restricted Synospis

A better cut
Number of coefficients in a subtree is at most
size1
Size of the table storing Errj,Ancestor Set,b
Remains constant as we go up the levels!
Ancestor set decreases by 1
b takes twice as many values
O(n2) algorithm
We can also reduce the space to O(n)

116
Thinking beyond the coefficient

Probabilistic Rounding
Start from the coefficients.
Randomly round most of them to 0
A few are rounded to non-zero values
E.g. set zi? with prob. e-W(X)i/? and 0
otherwise
Has promise (correct expectation, variance)
Two issues,
The quality is unclear (wrt the original
optimization)
The Expected number of non-zero coefficients is B
The variance is large, so with reasonable prob
2B

117
More exploration reqd

Interestingly the method (as proposed) eliminates
a region of search space
We can construct examples that the optimum lies
in that range.
But is an interesting method and likely (I/we are
guessing) preserves more errors than one
simultaneously (multi-criterion optimization)

118
What is the optimum strategy

Consider the best set of coefficients
Zz1,z2,zn
nudge them a bit by making them multiples of
some ?
The extra error is small (and a fn of ?)
In fact each point sees ? log n
By reducing ? we can get (1?) approx

119
A straightforward idea

But we still need to find the solution

The ancestor set is unimportant what is
important is their combined effect. Try all
possible values (multiples of ?, but we still
need to fix the range)
120
The graphs the data
121
The graphs l1
122
Relative Error (small B), Relative l1
123
The times
124
What have we seen so far

Wavelet representation of l_2 error
Streaming
Wavelet representation for non l_2 error
Restricted
Unrestricted
Stream

125
A return to histograms
126
Easy relationships

A B-bucket (piecewise constant) histogram can be
represented by 2B log n Haar wavelet
coefficients.
Why
Only the 2B boundary points matter
A B-term Haar wavelet synopsis can be represented
by 3B-bucket histogram.
Why
Each wavelet basis creates 3 extra pieces from 1
line

127
Anything else ?
Totally! We can use Wavelets to get
(1\epsilon)-approximate V-optimal
histograms. In fact the method has advantages
128
Histograms, Take 5

A B-term Histogram can be represented by cB log
n wavelet terms.
What is we choose the largest cB log n wavelet
terms ?

129
Need not be good.

The best histogram has the cB log n wavelets
aligned such that the result is B buckets.
The best cB log n coefficients are all over the
place and give us 3cB log n buckets.
All hope is lost ?

130
If at first you dont succeed

We repeat the process and also keep the next cB
log n coefficients
No.
But notice that the energy drops.
Energy X2W(X)2
Basic intuition If there were a lot of
coefficients which were large then the best V-Opt
histogram MUST have a large error.
Why?

131
The robust property

Look at W(X)-W(H)2X-H2
W(H) has cB log n entries
If W(X) has cB?-2 log n large entries ..

132
A strange idea in 1000 words

Consider the projection to the largest cB?-2 log
n wavelet terms
Is

¼
?
133
No. But flatten the fn
? X
¼
134
In fact

If we chose (B?log n)O(1), i.e., large, number of
coefficients then the boundary points of the
coefficients are (approximately) good boundary
points for a VOPT histogram.

135
The take away

Im ok youre ok
If Im not ok then youre not ok too.
An oft repeated approximation paradigm
if there are too many coefficients then my
algorithm is doomed but so is anyone elses, and
therefore I am good
if there are not too many coefficients then
were good.

136
The Extended Wavelets in l2

We can store the largest coefficients
If there are too many coefficients which are
large then optimum error is large.
Otherwise we repeatedly take out coefficients
till taking out coefficients will not reduce the
error any more.
DP on the set of coefficients taken out.

137
The Full Monty update streams

So far we have been looking at X arriving as
x1,x2,
What happens when X is specified by a stream of
updates ?
i.e., (i,di)change xi to xi di

138
Sketches Stream Embeddings

Basically Dimensionality reduction
To compute the histogram H of signal X
Compute embedding g(X) to fit the space
Compute H s.t. g(H) is close to g(X)

139
Linear Embeddings

JL Lemma
A is a Random Matrix drawn
from Gaussian distribution.
Too many elements in matrix!
Use Pseudorandom Generators
P-Stable distribution for

140
What it achieves

Computes Norm

Increasing the coordinate is adding the column to
sketch.
A
x
141
Suppose we knew the intervals

The best histogram minimizes
X-H2 ¼? AX AH 2

AX is a vector, AH is a linear function of B
values
We have a min sq. error program, solvable in
ptime more involved in 1-norm.
142
Cannot do that

X-H2 W(X) W(H)2 ¼? AW(X) AW(H) 2

Idea Use the linear map to find the large
number of Wavelet coefficients (top k problem
using sketches) Use similar ideas to Take 5 to
get the final solution.
143
The return of the pink Fourier

Assuming x1,x2,,xi, arrive in increasing order
of i, find/maintain the top k Fourier
coefficients.
Use the strategy
Assume that there are O(k log n) frequencies and
try to find them.
If not, we are doomed and so is everyone.
So we are ok.
For the 3rd time

144
What about top k

Assuming x1,x2,,xi, are specified by a stream
of updates find/maintain the top k values (all
elements with frequency 1/k or more).
Use the strategy
Assume that there are O(k log n) elements and try
to find them.
If not, we are doomed and so is everyone.
So we are ok.
Again!
Use Group testing
20 questions, bit chasing is an heavy item in
the first half ? You can use norms or you can
use collisions (hashes).

145
From optimization to learning

We are trying to learn a pure signal that has
few coefficients
A general paradigm.

146
The Meaning of Life

In Summary (high level)
Approximation is very useful for synopsis
construction (the execution time speedups plus
the end use of synopsis is approximation only)
Synopses are usually applied on large data.
Asymptotic behaviour matters
The exact definition of the optimization is
important. How natural is natural
Few degrees of separation between the synopsis
structures. They are related. They should be. But
then we can use algorithmic techniques back and
forth between them.

147
The Summary (contd.)

In algorithm design terms
Most synopsis construction problems involve DP.
Investigating how to change the DP to get
approximation, space efficient algs., is often
useful.
Search techniques (computation geometry) search
exponents first are useful.
What you analyze (carefully) is often what you
would get asymptotically. The usual techniques we
use for pruning etc., can be analyzed and and
shown to be better.
Reduce-Merge ) Streaming ?
The top k in various disguises. Group testing
matters.

148
What lies ahead

Ok. So 1 D histograms have good algos.
2 D ?
NP-Hard.
Some approximation algorithms known.
Q In linear time and sublinear space what can we
do ?
Sketch based results. Long way to go.

149
What lies ahead

So 1 D Haar Wavelets have good algos (non l2).
2 D ?
Unlikely to be NP-Hard
Quasi-polynomial time nlog n approximation
algorithms known.
Q In linear time and sublinear space what can we
do ?

150
What lies ahead

So 1 D Haar Wavelets have good algos (non l2).
Non Haar ? Daubechies. Multifractals.
Unlikely to be NP-Hard
Quasi-polynomial time nlog n approximation
algorithms known.
What can we do ?

151
What lies ahead

All the update stream results are based on l2
error because of Johnson Lindenstrauss (and some
on lp for 0ltp 2)
What about other errors ?
Will require new techniques for streaming.

152
Notes (not from the underground)