# Offline, Stream and Approximation Algorithms for Synospis Construction - PowerPoint PPT Presentation

PPT – Offline, Stream and Approximation Algorithms for Synospis Construction PowerPoint presentation | free to download - id: a9a76-NWVmO The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Offline, Stream and Approximation Algorithms for Synospis Construction

Description:

### A tutorial on synopsis construction algorithms. 2. About this Tutorial ... What is a synopsis ? Hmm. Any 'shorthand' representation. Clustering! SVD! ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 154
Provided by: sudi4
Category:
Tags:
Transcript and Presenter's Notes

Title: Offline, Stream and Approximation Algorithms for Synospis Construction

1
Offline, Stream and Approximation Algorithms for
Synospis Construction
• Sudipto Guha University of Pennsylvania
• Kyuseok Shim Seoul National University

2
• Information is incomplete and could be inaccurate
• Our presentation reflects our understanding which
may be erroneous

3
Synopses Construction
• Where is the life we have lost in living?
• Where is the wisdom we have lost in knowledge?
• Where is the knowledge we have lost in
information?
• T. S.
Eliot, from The Rock.
• Routers
• Sensors
• Web
• Astronomy and sciences
• Too much data too little time.

4
The idea
• To see the world in a grain of sand
• Broad characteristics of the data
• Compression
• Dimensionality Reduction
• Denoising, Outlier Detection and a broad array of
signal processing

5
What is a synopsis ?
• Hmm.
• Any shorthand representation
• Clustering!
• SVD!
• In this tutorial we will focus on signal/time
series processing

6
The basic problem
• Formally, given a signal X and a dictionary ?i
find a representation F?i zi ?i with at most B
non-zero zi minimizing some error which is a fn
of X-F
• Note, the above extends to any dim.

7
Many issues
• What is the dictionary ?
• Which B terms ?
• What is the error ?
• What are the constraints ?

8
Many issues
• What is the dictionary ?
• Set of vectors
• Maybe a basis
• Which B terms ?
• What is the error ?
• What are the constraints ?

Top K
9
Many issues
• What is the dictionary ?
• Set of vectors
• Maybe a basis
• Which B terms ?
• What is the error ?
• What are the constraints ?

Haar Wavelets
Also Fourier, Polynomials,
10
Many issues
• What is the dictionary ?
• Set of vectors
• May not be a basis
• Histograms
• There are n choose 2 vectors
• But since we impose a non-overlapping restriction
we get a unique representation.
• Which B terms ?
• What is the error ?
• What are the constraints ?

11
Many issues
• What is the dictionary ?
• Which B terms ?
• First B ?
• Best B ?
• What is the error ?
• What are the constraints ?

Why should we choose first B ?
• B vs 2B numbers
• Also

12
Approximation theory
• Discipline of Math associated with approximation
of functions.
• Same as our problem
• Linear theory (Parseval, 1800 over two
centuries)
• Non-Linear theory (Schmidt 1909, Haar 1910)
• Is it relevant ? Yes. However Math treatment has
been extremal, i.e., how does the error change
as a function of B. Is that bound tight?
given this signal, is that the best we can do ?

13
Many issues
• What is the dictionary ?
• Which B terms ?
• What is the error ?
• This controls which B.
• X-F2 is most common, used all over in
mathematics
• X-F1,X-F1 are useful also
• Weights. Relative error of approximation
• 1000 by 1010 is not so bad.
• 1 by 11 is not too good an idea.
• What are the constraints ?

14
Many issues
• What is the dictionary ?
• Which B terms ?
• What is the error ?
• What are the constraints ?
• Input ? Stream, stream of updates
• Space, time, precision and range of values (for
zi in the expression F?i zi ?i )

15
In this tutorial
• Histograms Wavelets
• Will focus on Optimal, Approximation and
Streaming algorithms
• How to get one from the other!
• Connections to top K and Fourier.

16
I. Histograms.
17
VOpt Histograms
• Lets start simple
• Given a signal X, find a piecewise constant
representation H with at most B pieces minimizing
X-H2
• Jagadish, Koudas, Muthukrishnan, Poosala, Sevcik,
Suel, 1998
• Consider one bucket.
• The mean is the best value.
• A natural Dynamic programming formulation

18
An Example Histogram
Data Distribution
V-Optimal Histogram
19
Idea VOpt Algorithm
• Within step/bucket Mean is the best.
• Assume that the last bucket is j1,n.
• What can we say about the rest k-1 ?

OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
20
Idea VOpt Algorithm
• Within step/bucket Mean is the best.
• Assume that the last bucket is j1,n.
• What can we say about the rest k-1 ?

OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
21
Idea VOpt Algorithm
• Within step/bucket Mean is the best.
• Assume that the last bucket is j1,n.
• What can we say about the rest k-1 ?

OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
22
Idea VOpt Algorithm
• Dynamic programming algorithm was given to
construct the V optimal Histogram.
• OPTn,k min OPTj,k-1,SQERR(j1)..n
• 1jltn
• OPTj, k the minimum cost of representing the
set of values indexed by 1..j by a histogram
with k buckets.
• SQERR(j1)..n the sum of the squared absolute
errors from (j1) to n.

23
The DP-based VOpt Algorithm
• for i1 to n do
• for k1 to B do
• for j1 to i-1 do (split pt of k-1 bucket
hist. and last bucket)
• OPTi, k min OPTi, k, OPTj,k-1
SQERRj1,i
• We need O(Bn) entries for the table OPT
• For each entry OPTi,k, it takes O(n) time if
SQERRj1.i can be computed O(1) time
• O(Bn) space and O(Bn2) time

OPT
B
n
24
Computation of Sum of Squared Absolute Error in
O(1) time
sum(2,3) x2x3 sum3-sum1 12-2 10
25
Computation of Sum of Squared Absolute Error in
O(1) time
Let and Then, Thus,
26
Analysis of VOpt Algorithm
• O(n2B) time O(nB) space
• The space can be reduced (Wednesday)
• Main Question The end use of histogram is to
approximate something.
• Why not find an approximately optimal (e.g.,
(1e) ) histogram?

27
If you had to improve something ?
Via Wavelets ssq O(n) time O(B2/?2) space
O(n2B) time O(nB) space
(1?) streaming O(nB2/?) time. O(B2/?) space
(1?) streaming ssq O(n) time. O(B/?2) space
O(n2B) time O(n) space
(1?) streaming O(n) time. O(B2/?) space
offline O(n) time. O(B2/?) space
Offline O(n) time. O(nB/?) space
28
Take 1
• For i1 to n do
• For K1 to B do
• For j1 to i-1 do (split point for the last
bucket)
• OPT 1i, k
• Min OPT1i, k,
OPT1j,k-1 SQERR(j1,i)
• OPT1..j,k is increasing
• SQERR(j1,i) is decreasing
• Question Can we use the monotonicity for
searching the minimum ?

As j increases
29
No
• Consider a sequence of positive y1,y2,,yn
• F(i) ?i yi and G(i) F(n) F(i-1)
• F(i) monotonically increasing  Opt1..j,k-1
• G(i) monotonically deceasing  SQERR(j1,i)
• ?(n) time is necessary to find mini F(i)G(i)
• Open Question Does it extend to ?(n2) over the
entire algorithm ?

30
What gives ?
• Consider a sequence of positive y1,y2,,yn
• F(i) ?i yi and G(i) F(n) F(i-1)
• Thus, F(i)G(i) F(i) xi
• Any i gives a 2 approximation to mini F(i)
G(i)
• F(i) G(i) F(n) xi 2 F(n)
• mini F(i) G(i) is at least F(n)

31
Round 1
• Use a histogram to approximate the fn
• Bootstrap!
• Approximate the increasing fn in powers of (1d)
• Right end pt is (1?) approximation of left end
pt

32
What does that do ?
• Consider evaluating the fn at the two endpoints
• Proof by picture.

h
h
By construction.
Why ?
By monotonicity!
33
Therefore
• The right hand point is a (1d) approximation!
• Holds for any point in between.
• OPTxSQERRx1 OPTaSQERRb
• OPTb/(1 d) SQERRb
• OPTb SQERRb/ (1d)
• Are we done ?
• Not quite yet.
• What happens for Bgt2 ? we do not compute
OPTi,b exactly !!

SQERR
OPT
h
a
b
34
Zen and the art of histograms
• Approximate the increasing fn in powers of (1d)
• Right end pt is (1d) approximation
• Prove by induction that the error is (1?)B
• This tells us what ? should be (small), in fact
if we set ??/2B then (1?)B 1?

35
Complexity analysis
• of intervals p (B/?) log n
• Why ?
• c(1d) (p-1) nR2 and d ?/(2B)
• R is the largest number in data
• Assume R is polynomially bounded by n
• Running time nB (B/?) log n
• Why are we approximating the increasing function
? Why not the decreasing one ?

36
The first streaming model
• The signal X is specified by xi arriving in
increasing order of i
• Not the most general model
• But extremely useful for modeling time series data

37
Streaming
?1b xi ?1b x2i
Need to store ?1a xi ?1a x2i
a
b
Required space is (B2/?) log n
38
VOpt Construction O(Bn2)
• Jagadish et al. VLDB 1998
• OPT(i,k) min1jltiOPT(j,k-1)SQERR(j1,i)

10
7
8
9
3
4
5
6
1
2
OPTj,k
8
9
10
4
5
6
7
2
3
1
OPTj,k-1
n
n
39
AHIST-S (1e) Approximation
• AOPTj,k min1jltiAOPTbjp,k-1SQERRbjp1,n
• O(B2e-1nlogn) time and O(B2e-1logn) space

AOPTj,k
(1d)a b
AOPTj,k-1
n
(1d)a lt c
P
40
The overall idea
The natural DP table
The approximate table
41
Do ?s talk to us ?
• DJIA data from 1901-1993

execution time
B
42
Take 2 GK02
• Sliding window streams
• Potentially infinite data interested in the
last n only
• Q Suppose we constructed histogram for 1..n
and now want it for 2..(n1)
• Previous idea is a dead on arrival.
• Consider 100,1,2,3,4,5,7,8,

43
Formal problem
• Maintain a data structure
• Given an interval a,b construct a B bucket
histogram for a,b
• Compute on the fly
• Generalizes the window!
• Generalizes VOpt when a1,bn

44
Reconsider the take 1
• We are evaluating
• Left to right, i.e.,

But we are still evaluating this guy !
45
A brave new world
• Assume a O(n) size buffer holds xi values
• The previous algorithm was
• Several issues
• Which values are necessary and sufficient
• We are not evaluating all values what induction
?

46
A trickier proof
a
c
b
d
g
f
d

c
b
c

g

g

a
d

g
a
a
47
GK02 Enhanced (1e) Approximation
• Lazy evaluation using Binary Search
• O(B3e-2log3n) time and O(n) space
• Pre-processing takes O(n) time SUM and SQSUM

(1d)a z
AOPTj,k
(1d)a lt z1
AOPTj,k-1
n
P
48
GK02 Enhanced (1e) Approximation
• Creates all of B interval lists at once
• The values of necessary AOPTj,k are computed
recursively to find the intervals ajp, bjp
where bjp is the largest z s.t.
• (1e) AOPTajp,k (1e) AOPTz,k
• (1e) AOPTajp,k lt (1e) AOPTz1,k
• Note that AOPT increases as z increases
• Thus, we can use binary search to find z
• O(n) space of SUM and SQSUM arrays needs to be
maintain to allow the computation of SQERR(j1,i)
in O(1) time
• O(nB3e-2log3n) time and O(n) space

49
Take 2 summary
• O(n) space and O(nB3?-2log2 n) time
• Is that the best ? Obviously no.

50
Take 3 AHIST-L- ?
• Suppose we knew ? OPT 2 ? then
??/(2B) then
• Time is O(B3?-2 log n)
• To get ? ?
• 2-approximation ?O(1)
• a binary search O(log n)
• Thus, O(B3 log n log n)
• Overall O(nB3(?-2logn)log n) time and O(nB2/?)
space

O(B/?)
51
Take 4 AHIST-B
• Consider the take 4 algorithm.
• How to stream it ?

On the new part
Overall
M
52
Not done yet
k
K-1
1r
• First find an ? O(1) approximation, then proceed
back and refine

53
The running space-time
• B( insertions)(log M)(log ?) where ?O(B?-1 log
n) is the length of a list
• Space
• Who cares and why ?

54
Asymptotics
• For fixed B and ?, we can compute a (1 ?)
piecewise constant representation in
• O(n log log n) time and O(log n) space or
• O(n) time and O(log n log log n) space.
• Extends to degree d polynomials, space increases
by O(d) and time is O(nd d3)

55
Our friendly ? Running time
Execution Time
B
56
Our friendly ? Error
(Error VOPT)/VOPT
B
57
What you analyze is what you get
Execution time
n
58
Questions ?
59
The status for VOPT
• Saves space across all algorithms except
algorithms which extend to general error measure
over streams

60
For general error measure, IF
• The error of a bucket only depends on the values
in the bucket.
• The overall error function, is the sum of the
errors in the buckets.
• The data can be processed in O(T) time per item
such that in O(Q) time we can find the error of a
bucket, storing O(P) info.
• The error (of a bucket) is a monotonic function
of the interval.
• The value of the maximum and the minimum nonzero
error is polynomially bounded in n.

61
Then
• Optimum histogram in time O(nTn2(BQ)) time and
O(n(PB)) space
• (1?)-approximation in
• O(nTnQB2?-1 log n) time and O(PB2?-1 log n)
space,
• O(nT QB3(log n ?-2 )log n) time and O(nP)
space
• O(nT) time and space
• O(PB2 ?-1 log n (QB/T) B?-1 log2 (B?-1 log n)
log n loglog n)

62
Splines and piecewise polynomials
• If we wanted
• Or maybe

63
The overall idea
• If we want to represent xa1,,xb by
p0p1(x-xa)p2(x-xa)2
• The solution is as above
• We need O(d) times (than before) space and need
to solve the system. This means an increase by a
factor O(d3) in time.

64
Another useful example Relative error
• Issue with global measures Estimating 10 by 20
and 1000 by 1010 has the same effect
• The above is ok if we are querying for 1000 a
1000 times and 10 times for 10 (point queries
and VOPT measure)
• But consider approximating a time series. We may
be interested in per point guarantees.

65
Sum of Squared Relative Error for a Bucket
• Relative error for a bucket (sr,er,xr)
• Since A gt 0, it is minimized when xrB/A
• The minimum value is C-B2/A
• If the aggregated sum of A, B and C are stored,
ERRSQ(i,j) can be computed in O(1) time
• Optimal histogram can be constructed in O(Bn2)
time Approximation algorithms follow

66
Maximum Error and the l1 metric
67
Maximum Error Histograms
• A bucket (sr,er,xr) with a numbers x1, x2, ,
xn s.t.
• sr starting position
• er ending position
• xr representative value
• Maximum Error is given by
• Maximum relative error is defined as

68
Maximum Error of a bucket
• Given numbers x1, x2, , xn s.t.
• Maximum Error is given by ErrMminxr maxi xi
xr
• What is the best xr
• (xminxmax)/2

69
Maximum Relative Error of a set
• Given a set of numbers x1, x2, , xn
• max the maximum of x1, x2, , xn
• min the minimum of x1, x2, , xn
• c A sanitary constant
• Some function of c,max,min
• E.g., when c min max the error is
• Optimal maximum relative error for a bucket can
be computed in O(1) time

70
The Naïve Optimal Algorithm
• for i 1 to n do
• OPTMi,1 ERRM(i,i)
• for K 1 to B do
• max - 8 min 8 OPTMi,k 8
• for j i-1 to 1 do
• if (max lt xj1) max xj1
• if (min gt xj1) min xj1
• OPTMi,k minOPTMi,k ,
• max( OPTMj,k-1,
ERRM(j1,i) )
• ERRM(j1,i) can be obtained in O(1) time
• O(Bn) space and O(Bn2 ) time optimal algorithm

71
An Improved Optimal Algorithm
• OPTMi,j minjmax( OPTMj,k-1, ERRM(j1,i))
• Observations
• OPTMj,k-1 is an increasing function
• ERRM(j1,i) is a decreasing function
• To compute minx max ( F(x), G(x) ) where F(x)
and G(x) are non-decreasing and non-increasing
functions
• We can perform binary search for the value of x
such that F(x) gt G(x) and F(x-1) lt G(x-1)
• The minimum is min G(x-1) and F(X)

72
An Improved Optimal Algorithm
• OPTMi,j minmax(OPTMj,k-1, ERRM(j1,i))
• We can improve the most inner loop of Naïve
algorithm in O(log n) time.
• However, ERRM(j1,i) cannot be computed in O(1)
time any more
• Using an interval tree, we can compute min and
max values for j1, i, i.e. ERRM(j1,i), in
O(log n) time
• Thus, our improved algorithm takes O(Bn log2n)
time with O(Bn) space

73
An Interval Tree Example
1,8
Min Interval
5,8
1,4
decomposeRight
decomposeLeft
1,2
3,4
5,6
7,8
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
The steps of decomposing 2,4 with an interval
tree
74
Consider another solution
• Make the first bucket as large as possible
• i.e. push the boundary right
• E.g. in the figure we can.
• As long as the max and min is same
• Why will we have to stop ?

75
Consider another solution (2)
• In this example we cannot
• But may be the error comes from a different
bucket!
• Heres one idea
• Given an i, find Err1,i
• If i is small Err1,i OPT
• If i is large Err1,i OPT
• How ?

76
How ?
• Assume given an interval a,b, we can find the
min and max, and therefore Erra,b
• With O(n) time and space preprocessing, we can
find Err in O(log n) time. (interval tree)
• Checkp,q,b,?
• If q gt p (for b 0), we are done.
• Otherwise,
• Find mid, s.t. Errp,mid ? and Errp,mid1 gt
?
• Checkmid1,q,b-1,?
• O(B log2 n)
• Binary Search log n log n (to find min and max
for Err)
• Invocation of Check B times

77
Now for the original problem
• By binary search, find largest s such that
• When ?Err1,s and ?Err1,s1,
• Check1,n, B-1 ?false and Check1,n, B-1,
?true
• Now OPT? or the best B-1 bucket error of
s1,n
• A recursive algorithm!
• T(B) log n B log2 n T(B-1) ¼ O(B2 log3 n) !!

78
Summary
• In O(n B2 log3 n) time and O(n) space we can
find the optimum error.
• What do we do if
• Stream or
• Less than O(n) space ?
• Approximate, using some of the old ideas

79
Short break !
• When we return
• Range Query Histograms
• Wavelets
• Optimum synopsis
• Connection to Histograms
• Overall ideas and themes

80
Range Query Histograms
81
A more synopsis structure
• Instead of estimating the value at a point we are
interested in sum of the values in
intervals/ranges.
• Clearly, very useful.
• Clearly we need new optimization.
• E.g.,

82
A more difficult problem
• Only special cases solved (satisfactorily)
• Hierarchies
• Prefix ranges All ranges of form 1,j as j
varies
• Complete Binary Ranges
• General hierarchies
• Uniform Ranges all ranges

83
Status Range Query
• Caveat

84
The uniform case
• Consider a sequence X0,x1,x2,,xn
• Define the operators
• ?(g)i?j i gj is the prefix sum

85
Unbiased
• Suppose H is a histogram such that F?(X-H) is
s.t. ?i Fi0
• Or think of ?i ?rlti (Xr-Hr)0
• Claim Error of using H to answer range queries
for X is twice the error of using ?(H) to answer

86
The main idea
• Define Gi?rlti Xi Hi ?(X)i - ?(H)i
• Now ?i Gi 0 if H is unbiased
• Pick a RANDOM elements u
• Expected Gu 0
• Pick two random elements u,v
• Expected (Gu-Gv)2Expected error of using H
to answer range queries for X
• But that is equal to 2 Expected Gu2

87
A simple approximation
• What we want is
• Hard
• But we know how to get

?(H)
?(X)
Piecewise linear histograms!
88
An easy trick
• We can also find
• A buffer of Size 1 after each bucket
• Use it as a patch-up
• 2B buckets
• Same error as OPT
• Approximation algorithms try to find the
continuous variant

89
The Synopsis Construction Problem
• Formally, given a signal X and a dictionary ?i
find a representation F?i zi ?i with at most B
non-zero zi minimizing some error which a fn of
X-F
• In case of histograms the dictionary was the
set of all possible intervals but we could only
choose a non-overlapping set.

90
The eternal what if
• If the ?i are designed for the data do we get
a better synopsis ?
• Absolutely!
• Consider a Sine wave
• Or any smooth fn.
• Why though ?

91
Representations not piecewise const.
• Electromagnetic signals are sine/cosine waves.
• If we are considering any process which involve
electromagnetic signals this is a great idea.
• These are particularly great for representing
periodic functions.
• Often these algorithms are found in DSP (digital
signal processing chips)
• A fascinating 300 years of history in Math !

92
A slight problem
• ni nill cfme back tf Ffurier
• Fourier is suitable to smooth natural processes
processes, clearly they cannot be natural (and
hardly likely to be smooth)
• More seriously, discreteness and burstiness

93
The Wavelet (frames)
• Inherits properties from both worlds
• Fourier transform has all frequencies.
• Considers frequencies that are powers of 2 but
the effect of each wave is limited (shifted)

94
Wavelets
• What to do in a discrete world ?

The Haar Wavelets (1910) !
95
The Haar Wavelets
• Best energy synopsis amongst all wavelets (we
will see more later)
• Great for data with discontinuities.
• A natural extension to discrete spaces
• 1,-1,0,0,0,0, 0,0,1,-1,0,0,,0,0,0,0,1,-1,

• 1,1,-1,-1,0,0,0,0,,0,0,0,0,1,1,-1,-1,

96
The Haar Synopsis Problem
• Formally, given a signal X and the Haar basis
?i find a representation F?i zi ?i with at
most B non-zero zi minimizing some error which a
fn of X-F
• Lets begin with the VOPT error (X-F22)

97
The Magic of Parseval (no spears)
• The l2 distance is unchanged by a rotation.
• A set of basis vectors ?i define a rotation iff
• h ?i,?j i ?ij , i.e.,
• Redefine the basis (scale) s.t. ?i2 1
• Let the transform be W
• Then X-F2 W(X-F)2W(X) W(F)2
• Now W(F)z1,z2,zn and so
• W(X) W(F)2 ?i (W(X)i zi)2

98
What did we achieve ?
• Storing the largest coefficients is the best
solution.
• Note that the fact ziW(X)i is a consequence of
the optimization and IS NOT a specification of
the problem.
• More on that later.

99
What is the best algorithm ?
• How to find the largest B coefficients of the set
x1,x2, ?
• Recall the hierarchical nature.

100
• Given a,b represent them as (a-b) and (ab)
• Divide by sqrt(2) so that the sum of squares etc
• Running time O(n)

1
4
5
6
101
Surfing Streams
• Notice that once the left half is done we only
need to remember the
• A stream algorithm is natural

102
Surfing Streams
• Have an auxillary structure that maintains top B
of a set of numbers

Where else have you seen this ?
Reduce Merge Paradigm Also used in clustering
data streams
103
In summary
• Given a series of x1,x2,xi,xn in increasing
order of i we can find (maintain) the largest B
coefficients in O(n) time and O(Blog n) space
• Ok, but only for X-F2

104
Extended Histograms
• What do we do in presence of multiple
dimensions/measures ?
• Use multi-dim transforms
• Use many 1 D transforms
• Strategy Use a Flexible scheme that allows us to
store the index and a bitmap to indicate which
measures are stored.

105
How to solve it ?
• For the basic 1-D problem we need to choose the
largest B coefficients
• Use Parseval to transform error of data to
choosing/not choosing coefficients
• Here we have bags
• We can choose coefficient j with bitmap
• 0100 using HS space
• 0101 using H2S space
• 1111 using H4S space

106
Is 0101 better than 1100 ?
• Subproblem
• Given the fact that we have settled on choosing 2
coefficients for j, which 2 ?
• It is the largest 2 again!
• Basically we can choose a set of indices j and
decide how many coefficients we choose for each j

What does this remind you of ?
107
Knapsack
• Each item j is available with M different
versions.
• Cost of the rth version is HrS. The profit is an
increasing function of r.
• Can choose only one version.

108
• Optimal profit Optimal error total energy
• The relationship does not hold in approximation.
• 991100. Approximating 99 by 95 increases error
by 400

109
Many questions
• What do we do for other error measures ?
• What is the connection with Histograms ?
• Positives Some direction
• Hierarchy of coefficients

110
Non l2 errors
111
Storing coefficients is suboptimal
• Recall the complicate 1,4,5,6
• We want a 1 term summary and the error is l1
• What do we store ?

What is the final Result ?
3.5,3.5,3.5,3.5 What is the transform ?
1
4
5
6
7,0,0,0 But the set of coefficients available
8,?,?,?
112
What to do ?
• Search where there is light.
• Restricted problem. Useful if the synopsis has
more than one use.
• Think outside the coefficients
• Probabilistic Rounding
• Search (cleverly) over the whole space

113
The Best Restricted Synospis
• Maximum Error.
• A value (at the leaf) is affected by only the
ancestors.
• of ancestors log n
• Guess/try all of the set!
• O(n) choices
• Start bottom up and use a DP to choose the best B
coefficients overall.
• Works for a large number of error measures.

114
Analysis
• At each internal node j we need to maintain the
table
• Errorj,Ancestor set,b the contribution to
the minimum error by only the subtree rooted at j
when using b or less coefficients (for the
subtree)
• Size of table O(n2B)
• Time O(n2B log B) depends on measure
• But we can do better.

115
Faster Restricted Synospis
• A better cut
• Number of coefficients in a subtree is at most
size1
• Size of the table storing Errj,Ancestor Set,b
• Remains constant as we go up the levels!
• Ancestor set decreases by 1
• b takes twice as many values
• O(n2) algorithm
• We can also reduce the space to O(n)

116
Thinking beyond the coefficient
• Probabilistic Rounding
• Start from the coefficients.
• Randomly round most of them to 0
• A few are rounded to non-zero values
• E.g. set zi? with prob. e-W(X)i/? and 0
otherwise
• Has promise (correct expectation, variance)
• Two issues,
• The quality is unclear (wrt the original
optimization)
• The Expected number of non-zero coefficients is B
• The variance is large, so with reasonable prob
2B

117
More exploration reqd
• Interestingly the method (as proposed) eliminates
a region of search space
• We can construct examples that the optimum lies
in that range.
• But is an interesting method and likely (I/we are
guessing) preserves more errors than one
simultaneously (multi-criterion optimization)

118
What is the optimum strategy
• Consider the best set of coefficients
• Zz1,z2,zn
• nudge them a bit by making them multiples of
some ?
• The extra error is small (and a fn of ?)
• In fact each point sees ? log n
• By reducing ? we can get (1?) approx

119
A straightforward idea
• But we still need to find the solution

The ancestor set is unimportant what is
important is their combined effect. Try all
possible values (multiples of ?, but we still
need to fix the range)
120
The graphs the data
121
The graphs  l1
122
Relative Error (small B), Relative l1
123
The times
124
What have we seen so far
• Wavelet representation of l_2 error
• Streaming
• Wavelet representation for non l_2 error
• Restricted
• Unrestricted
• Stream

125
126
Easy relationships
• A B-bucket (piecewise constant) histogram can be
represented by 2B log n Haar wavelet
coefficients.
• Why
• Only the 2B boundary points matter
• A B-term Haar wavelet synopsis can be represented
by 3B-bucket histogram.
• Why
• Each wavelet basis creates 3 extra pieces from 1
line

127
Anything else ?
Totally! We can use Wavelets to get
(1\epsilon)-approximate V-optimal
histograms. In fact the method has advantages
128
Histograms, Take 5
• A B-term Histogram can be represented by cB log
n wavelet terms.
• What is we choose the largest cB log n wavelet
terms ?

129
Need not be good.
• The best histogram has the cB log n wavelets
aligned such that the result is B buckets.
• The best cB log n coefficients are all over the
place and give us 3cB log n buckets.
• All hope is lost ?

130
If at first you dont succeed
• We repeat the process and also keep the next cB
log n coefficients
• No.
• But notice that the energy drops.
• Energy X2W(X)2
• Basic intuition If there were a lot of
coefficients which were large then the best V-Opt
histogram MUST have a large error.
• Why?

131
The robust property
• Look at W(X)-W(H)2X-H2
• W(H) has cB log n entries
• If W(X) has cB?-2 log n large entries ..

132
A strange idea in 1000 words
• Consider the projection to the largest cB?-2 log
n wavelet terms
• Is

¼
?
133
No. But flatten the fn
? X
¼
134
In fact
• If we chose (B?log n)O(1), i.e., large, number of
coefficients then the boundary points of the
coefficients are (approximately) good boundary
points for a VOPT histogram.

135
The take away
• Im ok youre ok
• If Im not ok then youre not ok too.
• An oft repeated approximation paradigm
• if there are too many coefficients then my
algorithm is doomed but so is anyone elses, and
therefore I am good
• if there are not too many coefficients then
were good.

136
The Extended Wavelets in l2
• We can store the largest coefficients
• If there are too many coefficients which are
large then optimum error is large.
• Otherwise we repeatedly take out coefficients
till taking out coefficients will not reduce the
error any more.
• DP on the set of coefficients taken out.

137
The Full Monty update streams
• So far we have been looking at X arriving as
x1,x2,
• What happens when X is specified by a stream of
• i.e., (i,di)change xi to xi di

138
Sketches Stream Embeddings
• Basically Dimensionality reduction
• To compute the histogram H of signal X
• Compute embedding g(X) to fit the space
• Compute H s.t. g(H) is close to g(X)

139
Linear Embeddings
• JL Lemma
• A is a Random Matrix drawn
from Gaussian distribution.
• Too many elements in matrix!
• Use Pseudorandom Generators
• P-Stable distribution for

140
What it achieves
• Computes Norm

Increasing the coordinate is adding the column to
sketch.
A
x
141
Suppose we knew the intervals
• The best histogram minimizes
• X-H2 ¼? AX AH 2

AX is a vector, AH is a linear function of B
values
We have a min sq. error program, solvable in
ptime more involved in 1-norm.
142
Cannot do that
• X-H2 W(X) W(H)2 ¼? AW(X) AW(H) 2

Idea Use the linear map to find the large
number of Wavelet coefficients (top k problem
using sketches) Use similar ideas to Take 5 to
get the final solution.
143
The return of the pink Fourier
• Assuming x1,x2,,xi, arrive in increasing order
of i, find/maintain the top k Fourier
coefficients.
• Use the strategy
• Assume that there are O(k log n) frequencies and
try to find them.
• If not, we are doomed and so is everyone.
• So we are ok.
• For the 3rd time

144
• Assuming x1,x2,,xi, are specified by a stream
of updates find/maintain the top k values (all
elements with frequency 1/k or more).
• Use the strategy
• Assume that there are O(k log n) elements and try
to find them.
• If not, we are doomed and so is everyone.
• So we are ok.
• Again!
• Use Group testing
• 20 questions, bit chasing is an heavy item in
the first half ? You can use norms or you can
use collisions (hashes).

145
From optimization to learning
• We are trying to learn a pure signal that has
few coefficients

146
The Meaning of Life
• In Summary (high level)
• Approximation is very useful for synopsis
construction (the execution time speedups plus
the end use of synopsis is approximation only)
• Synopses are usually applied on large data.
Asymptotic behaviour matters
• The exact definition of the optimization is
important. How natural is natural
• Few degrees of separation between the synopsis
structures. They are related. They should be. But
then we can use algorithmic techniques back and
forth between them.

147
The Summary (contd.)
• In algorithm design terms
• Most synopsis construction problems involve DP.
Investigating how to change the DP to get
approximation, space efficient algs., is often
useful.
• Search techniques (computation geometry) search
exponents first are useful.
• What you analyze (carefully) is often what you
would get asymptotically. The usual techniques we
use for pruning etc., can be analyzed and and
shown to be better.
• Reduce-Merge ) Streaming ?
• The top k in various disguises. Group testing
matters.

148
• Ok. So 1 D histograms have good algos.
• 2 D ?
• NP-Hard.
• Some approximation algorithms known.
• Q In linear time and sublinear space what can we
do ?
• Sketch based results. Long way to go.

149
• So 1 D Haar Wavelets have good algos (non l2).
• 2 D ?
• Unlikely to be NP-Hard
• Quasi-polynomial time nlog n approximation
algorithms known.
• Q In linear time and sublinear space what can we
do ?

150
• So 1 D Haar Wavelets have good algos (non l2).
• Non Haar ? Daubechies. Multifractals.
• Unlikely to be NP-Hard
• Quasi-polynomial time nlog n approximation
algorithms known.
• What can we do ?

151
• All the update stream results are based on l2
error because of Johnson Lindenstrauss (and some
on lp for 0ltp 2)
• What about other errors ?
• Will require new techniques for streaming.

152
Notes (not from the underground)
• The VOPT definition
• Poosala, Haas, Ioannidis, Shekita, SIGMOD 96.
• The VOPT histogram algorithm
• Jagadish, Koudas, Muthukrishnan, Poosala, Sevcik,
Suel, VLDB 98.
• Take 1
• Guha, Koudas, Shim, STOC, 01.
• Take 2
• Guha, Koudas, ICDE, 02.
• Take 3 4
• Guha, Koudas, Shim, TODS, 05.
• Take 5
• Guha, Indyk, Muthukrishnan, Strauss, ICALP, 02.
• Relative Error Histograms
• Guha, Shim, Woo, VLDB, 04.
• Maximum Error histograms
• Nicole, J. of Parallel Distributed Computing,
1994.
• (Muthukrishnan, Khanna, Skiena, ICALP, 97),
• Guha, Shim, (here) 05.

153
More Notes
• Range Query Histograms
• Muthukrishnan, Strauss, SODA, 03.
• The Full Monty
• Gilbert, Guha, Indyk, Kotidis, Muthukrishnan,
Strauss, STOC, 02.
• Parseval stuff
• Parseval, (margin of notebook ?), 1799.
• Folklore sum of squares and l2
• The mandala
• Surfing Wavelets
• Gilbert, Kotidis, Muthukrishnan,Strauss, VLDB,
01
• Probabilistic Synopsis
• Gibbons, Garofalakais, SIGMOD, 02 (also TODS,
04)
• Maximum error (restricted version)
• Garofalakis, Kumar, PODS, 04.

154
Notes again
• Faster Restricted Synopsis
• Guha, VLDB, 05.
• Unrestricted non l2 error
• Guha, Harb, KDD, 05 new results
• Extended Wavelets
• Deligiannakis Rossopolous, SIGMOD 03.
• Guha, Kim, Shim, VLDB 04.
• Streaming Fourier approximation
• Gilbert, Guha, Indyk, Muthukrishnan, Strauss,
STOC, 02
• Learning Fourier Coefficients
• Linial, Kushilevitz, Mansour, JACM, 93
• JL Lemma
• Johnson, Lindenstrauss, , 84.
• Sketches
• Alon, Matias, Szegedy, JCSS, 99.
• Feigenbaum Kannan, Vishwanathan, Strauss, FOCS,
99
• Indyk, FOCS, 00

155