Loading...

PPT – Offline, Stream and Approximation Algorithms for Synospis Construction PowerPoint presentation | free to download - id: a9a76-NWVmO

The Adobe Flash plugin is needed to view this content

Offline, Stream and Approximation Algorithms for

Synospis Construction

- Sudipto Guha University of Pennsylvania
- Kyuseok Shim Seoul National University

About this Tutorial

- Information is incomplete and could be inaccurate
- Our presentation reflects our understanding which

may be erroneous

Synopses Construction

- Where is the life we have lost in living?
- Where is the wisdom we have lost in knowledge?
- Where is the knowledge we have lost in

information? - T. S.

Eliot, from The Rock. - Routers
- Sensors
- Web
- Astronomy and sciences
- Too much data too little time.

The idea

- To see the world in a grain of sand
- Broad characteristics of the data
- Compression
- Dimensionality Reduction
- Approximate query answering
- Denoising, Outlier Detection and a broad array of

signal processing

What is a synopsis ?

- Hmm.
- Any shorthand representation
- Clustering!
- SVD!
- In this tutorial we will focus on signal/time

series processing

The basic problem

- Formally, given a signal X and a dictionary ?i

find a representation F?i zi ?i with at most B

non-zero zi minimizing some error which is a fn

of X-F - Note, the above extends to any dim.

Many issues

- What is the dictionary ?
- Which B terms ?
- What is the error ?
- What are the constraints ?

Many issues

- What is the dictionary ?
- Set of vectors
- Maybe a basis
- Which B terms ?
- What is the error ?
- What are the constraints ?

Top K

Many issues

- What is the dictionary ?
- Set of vectors
- Maybe a basis
- Which B terms ?
- What is the error ?
- What are the constraints ?

Haar Wavelets

Also Fourier, Polynomials,

Many issues

- What is the dictionary ?
- Set of vectors
- May not be a basis
- Histograms
- There are n choose 2 vectors
- But since we impose a non-overlapping restriction

we get a unique representation. - Which B terms ?
- What is the error ?
- What are the constraints ?

Many issues

- What is the dictionary ?
- Which B terms ?
- First B ?
- Best B ?
- What is the error ?
- What are the constraints ?

Why should we choose first B ?

- B vs 2B numbers
- Also

Approximation theory

- Discipline of Math associated with approximation

of functions. - Same as our problem
- Linear theory (Parseval, 1800 over two

centuries) - Non-Linear theory (Schmidt 1909, Haar 1910)
- Is it relevant ? Yes. However Math treatment has

been extremal, i.e., how does the error change

as a function of B. Is that bound tight? - Note a yes answer does not say anything about

given this signal, is that the best we can do ?

Many issues

- What is the dictionary ?
- Which B terms ?
- What is the error ?
- This controls which B.
- X-F2 is most common, used all over in

mathematics - X-F1,X-F1 are useful also
- Weights. Relative error of approximation
- 1000 by 1010 is not so bad.
- 1 by 11 is not too good an idea.
- What are the constraints ?

Many issues

- What is the dictionary ?
- Which B terms ?
- What is the error ?
- What are the constraints ?
- Input ? Stream, stream of updates
- Space, time, precision and range of values (for

zi in the expression F?i zi ?i )

In this tutorial

- Histograms Wavelets
- Will focus on Optimal, Approximation and

Streaming algorithms - How to get one from the other!
- Connections to top K and Fourier.

I. Histograms.

VOpt Histograms

- Lets start simple
- Given a signal X, find a piecewise constant

representation H with at most B pieces minimizing

X-H2 - Jagadish, Koudas, Muthukrishnan, Poosala, Sevcik,

Suel, 1998 - Consider one bucket.
- The mean is the best value.
- A natural Dynamic programming formulation

An Example Histogram

Data Distribution

V-Optimal Histogram

Idea VOpt Algorithm

- Within step/bucket Mean is the best.
- Assume that the last bucket is j1,n.
- What can we say about the rest k-1 ?

OPTj,k-1

SQERRj1,n

Last bucket

Must also be optimal for the range 1, j with

(k-1) buckets! Dynamic Programming !!

Idea VOpt Algorithm

- Within step/bucket Mean is the best.
- Assume that the last bucket is j1,n.
- What can we say about the rest k-1 ?

OPTj,k-1

SQERRj1,n

Last bucket

Must also be optimal for the range 1, j with

(k-1) buckets! Dynamic Programming !!

Idea VOpt Algorithm

- Within step/bucket Mean is the best.
- Assume that the last bucket is j1,n.
- What can we say about the rest k-1 ?

OPTj,k-1

SQERRj1,n

Last bucket

Must also be optimal for the range 1, j with

(k-1) buckets! Dynamic Programming !!

Idea VOpt Algorithm

- Dynamic programming algorithm was given to

construct the V optimal Histogram. - OPTn,k min OPTj,k-1,SQERR(j1)..n
- 1jltn
- OPTj, k the minimum cost of representing the

set of values indexed by 1..j by a histogram

with k buckets. - SQERR(j1)..n the sum of the squared absolute

errors from (j1) to n.

The DP-based VOpt Algorithm

- for i1 to n do
- for k1 to B do
- for j1 to i-1 do (split pt of k-1 bucket

hist. and last bucket) - OPTi, k min OPTi, k, OPTj,k-1

SQERRj1,i - We need O(Bn) entries for the table OPT
- For each entry OPTi,k, it takes O(n) time if

SQERRj1.i can be computed O(1) time - O(Bn) space and O(Bn2) time

OPT

B

n

Computation of Sum of Squared Absolute Error in

O(1) time

sum(2,3) x2x3 sum3-sum1 12-2 10

Computation of Sum of Squared Absolute Error in

O(1) time

Let and Then, Thus,

Analysis of VOpt Algorithm

- O(n2B) time O(nB) space
- The space can be reduced (Wednesday)
- Main Question The end use of histogram is to

approximate something. - Why not find an approximately optimal (e.g.,

(1e) ) histogram?

If you had to improve something ?

Via Wavelets ssq O(n) time O(B2/?2) space

O(n2B) time O(nB) space

(1?) streaming O(nB2/?) time. O(B2/?) space

(1?) streaming ssq O(n) time. O(B/?2) space

O(n2B) time O(n) space

(1?) streaming O(n) time. O(B2/?) space

offline O(n) time. O(B2/?) space

Offline O(n) time. O(nB/?) space

Take 1

- For i1 to n do
- For K1 to B do
- For j1 to i-1 do (split point for the last

bucket) - OPT 1 i, k
- Min OPT1
i, k,

OPT1 j,k-1 SQERR(j1,i) - OPT1..j,k is increasing
- SQERR(j1,i) is decreasing
- Question Can we use the monotonicity for

searching the minimum ?

As j increases

No

- Consider a sequence of positive y1,y2, ,yn
- F(i) ?i yi and G(i) F(n) F(i-1)
- F(i) monotonically increasing Opt1..j,k-1
- G(i) monotonically deceasing SQERR(j1,i)
- ?(n) time is necessary to find mini F(i)G(i)
- Open Question Does it extend to ?(n2) over the

entire algorithm ?

What gives ?

- Consider a sequence of positive y1,y2, ,yn
- F(i) ?i yi and G(i) F(n) F(i-1)
- Thus, F(i)G(i) F(i) xi
- Any i gives a 2 approximation to mini F(i)

G(i) - F(i) G(i) F(n) xi 2 F(n)
- mini F(i) G(i) is at least F(n)

Round 1

- Use a histogram to approximate the fn
- Bootstrap!
- Approximate the increasing fn in powers of (1d)
- Right end pt is (1?) approximation of left end

pt

What does that do ?

- Consider evaluating the fn at the two endpoints
- Proof by picture.

h

h

By construction.

Why ?

By monotonicity!

Therefore

- The right hand point is a (1d) approximation!
- Holds for any point in between.
- OPTxSQERRx1 OPTaSQERRb
- OPTb/(1 d) SQERRb
- OPTb SQERRb/ (1d)
- Are we done ?
- Not quite yet.
- What happens for Bgt2 ? we do not compute

OPTi,b exactly !!

SQERR

OPT

h

a

b

Zen and the art of histograms

- Approximate the increasing fn in powers of (1d)
- Right end pt is (1d) approximation
- Prove by induction that the error is (1?)B
- This tells us what ? should be (small), in fact

if we set ??/2B then (1?)B 1?

Complexity analysis

- of intervals p (B/?) log n
- Why ?
- c(1d) (p-1) nR2 and d ?/(2B)
- R is the largest number in data
- Assume R is polynomially bounded by n
- Running time nB (B/?) log n
- Why are we approximating the increasing function

? Why not the decreasing one ?

The first streaming model

- The signal X is specified by xi arriving in

increasing order of i - Not the most general model
- But extremely useful for modeling time series data

Streaming

?1b xi ?1b x2i

Need to store ?1a xi ?1a x2i

a

b

Required space is (B2/?) log n

VOpt Construction O(Bn2)

- Jagadish et al. VLDB 1998
- OPT(i,k) min1jltiOPT(j,k-1)SQERR(j1,i)

10

7

8

9

3

4

5

6

1

2

OPTj,k

8

9

10

4

5

6

7

2

3

1

OPTj,k-1

n

n

AHIST-S (1e) Approximation

- AOPTj,k min1jltiAOPTbjp,k-1SQERRbjp1,n

- O(B2e-1nlogn) time and O(B2e-1logn) space

AOPTj,k

(1d)a b

AOPTj,k-1

n

(1d)a lt c

P

The overall idea

The natural DP table

The approximate table

Do ?s talk to us ?

- DJIA data from 1901-1993

execution time

B

Take 2 GK02

- Sliding window streams
- Potentially infinite data interested in the

last n only - Q Suppose we constructed histogram for 1..n

and now want it for 2..(n1) - Previous idea is a dead on arrival.
- Consider 100,1,2,3,4,5,7,8,

Formal problem

- Maintain a data structure
- Given an interval a,b construct a B bucket

histogram for a,b - Compute on the fly
- Generalizes the window!
- Generalizes VOpt when a1,bn

Reconsider the take 1

- We are evaluating
- Left to right, i.e.,

But we are still evaluating this guy !

A brave new world

- Assume a O(n) size buffer holds xi values
- The previous algorithm was

- Several issues
- Which values are necessary and sufficient
- We are not evaluating all values what induction

?

A trickier proof

a

c

b

d

g

f

d

c

b

c

g

g

a

d

g

a

a

GK02 Enhanced (1e) Approximation

- Lazy evaluation using Binary Search
- O(B3e-2log3n) time and O(n) space
- Pre-processing takes O(n) time SUM and SQSUM

(1d)a z

AOPTj,k

(1d)a lt z1

AOPTj,k-1

n

P

GK02 Enhanced (1e) Approximation

- Creates all of B interval lists at once
- The values of necessary AOPTj,k are computed

recursively to find the intervals ajp, bjp

where bjp is the largest z s.t. - (1e) AOPTajp,k (1e) AOPTz,k
- (1e) AOPTajp,k lt (1e) AOPTz1,k
- Note that AOPT increases as z increases
- Thus, we can use binary search to find z
- O(n) space of SUM and SQSUM arrays needs to be

maintain to allow the computation of SQERR(j1,i)

in O(1) time - O(nB3e-2log3n) time and O(n) space

Take 2 summary

- O(n) space and O(nB3?-2log2 n) time
- Is that the best ? Obviously no.

Take 3 AHIST-L- ?

- Suppose we knew ? OPT 2 ? then
- Instead of powers of (1?/B) additive terms of

??/(2B) then - Time is O(B3?-2 log n)
- To get ? ?
- 2-approximation ?O(1)
- a binary search O(log n)
- Thus, O(B3 log n log n)
- Overall O(nB3(?-2logn)log n) time and O(nB2/?)

space

O(B/?)

Take 4 AHIST-B

- Consider the take 4 algorithm.
- How to stream it ?

On the new part

Overall

M

Not done yet

k

K-1

1r

- First find an ? O(1) approximation, then proceed

back and refine

The running space-time

- B( insertions)(log M)(log ?) where ?O(B?-1 log

n) is the length of a list - Space
- Who cares and why ?

Asymptotics

- For fixed B and ?, we can compute a (1 ?)

piecewise constant representation in - O(n log log n) time and O(log n) space or
- O(n) time and O(log n log log n) space.
- Extends to degree d polynomials, space increases

by O(d) and time is O(nd d3 )

Our friendly ? Running time

Execution Time

B

Our friendly ? Error

(Error VOPT)/VOPT

B

What you analyze is what you get

Execution time

n

Questions ?

The status for VOPT

- Saves space across all algorithms except

algorithms which extend to general error measure

over streams

For general error measure, IF

- The error of a bucket only depends on the values

in the bucket. - The overall error function, is the sum of the

errors in the buckets. - The data can be processed in O(T) time per item

such that in O(Q) time we can find the error of a

bucket, storing O(P) info. - The error (of a bucket) is a monotonic function

of the interval. - The value of the maximum and the minimum nonzero

error is polynomially bounded in n.

Then

- Optimum histogram in time O(nTn2(BQ)) time and

O(n(PB)) space - (1?)-approximation in
- O(nTnQB2?-1 log n) time and O(PB2?-1 log n)

space, - O(nT QB3(log n ?-2 )log n) time and O(nP)

space - O(nT) time and space
- O(PB2 ?-1 log n (QB/T) B?-1 log2 (B?-1 log n)

log n loglog n)

Splines and piecewise polynomials

- Instead of
- If we wanted
- Or maybe

The overall idea

- If we want to represent xa1,
,xb by

p0p1(x-xa)p2(x-xa)2 - The solution is as above
- We need O(d) times (than before) space and need

to solve the system. This means an increase by a

factor O(d3) in time.

Another useful example Relative error

- Issue with global measures Estimating 10 by 20

and 1000 by 1010 has the same effect - The above is ok if we are querying for 1000 a

1000 times and 10 times for 10 (point queries

and VOPT measure) - But consider approximating a time series. We may

be interested in per point guarantees.

Sum of Squared Relative Error for a Bucket

- Relative error for a bucket (sr,er,xr)
- Since A gt 0, it is minimized when xrB/A
- The minimum value is C-B2/A
- If the aggregated sum of A, B and C are stored,

ERRSQ(i,j) can be computed in O(1) time - Optimal histogram can be constructed in O(Bn2)

time Approximation algorithms follow

Maximum Error and the l1 metric

Maximum Error Histograms

- A bucket (sr,er,xr) with a numbers x1, x2,
,

xn s.t. - sr starting position
- er ending position
- xr representative value
- Maximum Error is given by
- Maximum relative error is defined as

Maximum Error of a bucket

- Given numbers x1, x2, , xn s.t.
- Maximum Error is given by ErrMminxr maxi xi

xr - What is the best xr
- (xminxmax)/2

Maximum Relative Error of a set

- Given a set of numbers x1, x2, , xn
- max the maximum of x1, x2, , xn
- min the minimum of x1, x2, , xn
- c A sanitary constant
- Some function of c,max,min
- E.g., when c min max the error is
- Optimal maximum relative error for a bucket can

be computed in O(1) time

The Naïve Optimal Algorithm

- for i 1 to n do
- OPTMi,1 ERRM(i,i)
- for K 1 to B do
- max - 8 min 8 OPTMi,k 8
- for j i-1 to 1 do
- if (max lt xj1) max xj1
- if (min gt xj1) min xj1
- OPTMi,k minOPTMi,k ,
- max( OPTMj,k-1,

ERRM(j1,i) ) - ERRM(j1,i) can be obtained in O(1) time
- O(Bn) space and O(Bn2 ) time optimal algorithm

An Improved Optimal Algorithm

- OPTMi,j minjmax( OPTMj,k-1, ERRM(j1,i))

- Observations
- OPTMj,k-1 is an increasing function
- ERRM(j1,i) is a decreasing function
- To compute minx max ( F(x), G(x) ) where F(x)

and G(x) are non-decreasing and non-increasing

functions - We can perform binary search for the value of x

such that F(x) gt G(x) and F(x-1) lt G(x-1) - The minimum is min G(x-1) and F(X)

An Improved Optimal Algorithm

- OPTMi,j minmax(OPTMj,k-1, ERRM(j1,i))
- We can improve the most inner loop of Naïve

algorithm in O(log n) time. - However, ERRM(j1,i) cannot be computed in O(1)

time any more - Using an interval tree, we can compute min and

max values for j1, i, i.e. ERRM(j1,i), in

O(log n) time - Thus, our improved algorithm takes O(Bn log2n)

time with O(Bn) space

An Interval Tree Example

1,8

Min Interval

5,8

1,4

decomposeRight

decomposeLeft

1,2

3,4

5,6

7,8

1,1

2,2

3,3

4,4

5,5

6,6

7,7

8,8

The steps of decomposing 2,4 with an interval

tree

Consider another solution

- Make the first bucket as large as possible
- i.e. push the boundary right
- E.g. in the figure we can .
- As long as the max and min is same
- Why will we have to stop ?

Consider another solution (2)

- In this example we cannot
- But may be the error comes from a different

bucket! - Heres one idea
- Given an i, find Err1,i
- If i is small Err1,i OPT
- If i is large Err1,i OPT
- How ?

How ?

- Assume given an interval a,b, we can find the

min and max, and therefore Erra,b - With O(n) time and space preprocessing, we can

find Err in O(log n) time. (interval tree) - Checkp,q,b,?
- If q gt p (for b 0), we are done.
- Otherwise,
- Find mid, s.t. Errp,mid ? and Errp,mid1 gt

? - Checkmid1,q,b-1,?
- O(B log2 n)
- Binary Search log n log n (to find min and max

for Err) - Invocation of Check B times

Now for the original problem

- By binary search, find largest s such that
- When ?Err1,s and ?Err1,s1,
- Check1,n, B-1 ?false and Check1,n, B-1,

?true - Now OPT? or the best B-1 bucket error of

s1,n - A recursive algorithm!
- T(B) log n B log2 n T(B-1) ¼ O(B2 log3 n) !!

Summary

- In O(n B2 log3 n) time and O(n) space we can

find the optimum error. - What do we do if
- Stream or
- Less than O(n) space ?
- Approximate, using some of the old ideas

Short break !

- When we return
- Range Query Histograms
- Wavelets
- Optimum synopsis
- Connection to Histograms
- Overall ideas and themes

Range Query Histograms

A more synopsis structure

- Instead of estimating the value at a point we are

interested in sum of the values in

intervals/ranges. - Clearly, very useful.
- Clearly we need new optimization.
- E.g.,

A more difficult problem

- Only special cases solved (satisfactorily)
- Hierarchies
- Prefix ranges All ranges of form 1,j as j

varies - Complete Binary Ranges
- General hierarchies
- Uniform Ranges all ranges

Status Range Query

- Caveat

The uniform case

- Consider a sequence X0,x1,x2, ,xn
- Define the operators
- ?(g)i?j i gj is the prefix sum

Unbiased

- Suppose H is a histogram such that F?(X-H) is

s.t. ?i Fi0 - Or think of ?i ?rlti (Xr-Hr)0
- Claim Error of using H to answer range queries

for X is twice the error of using ?(H) to answer

point queries about ?(X) !

The main idea

- Define Gi?rlti Xi Hi ?(X)i - ?(H)i
- Now ?i Gi 0 if H is unbiased
- Pick a RANDOM elements u
- Expected Gu 0
- Pick two random elements u,v
- Expected (Gu-Gv)2Expected error of using H

to answer range queries for X - But that is equal to 2 Expected Gu2

A simple approximation

- What we want is
- Hard
- But we know how to get

?(H)

?(X)

Piecewise linear histograms!

An easy trick

- We can also find
- A buffer of Size 1 after each bucket
- Use it as a patch-up
- 2B buckets
- Same error as OPT
- Approximation algorithms try to find the

continuous variant

The Synopsis Construction Problem

- Formally, given a signal X and a dictionary ?i

find a representation F?i zi ?i with at most B

non-zero zi minimizing some error which a fn of

X-F - In case of histograms the dictionary was the

set of all possible intervals but we could only

choose a non-overlapping set.

The eternal what if

- If the ?i are designed for the data do we get

a better synopsis ? - Absolutely!
- Consider a Sine wave
- Or any smooth fn.
- Why though ?

Representations not piecewise const.

- Electromagnetic signals are sine/cosine waves.
- If we are considering any process which involve

electromagnetic signals this is a great idea. - These are particularly great for representing

periodic functions. - Often these algorithms are found in DSP (digital

signal processing chips) - A fascinating 300 years of history in Math !

A slight problem

- ni nill cfme back tf Ffurier
- Fourier is suitable to smooth natural processes

- If we are talking about signals from man-made

processes, clearly they cannot be natural (and

hardly likely to be smooth) - More seriously, discreteness and burstiness

The Wavelet (frames)

- Inherits properties from both worlds
- Fourier transform has all frequencies.
- Considers frequencies that are powers of 2 but

the effect of each wave is limited (shifted)

Wavelets

- What to do in a discrete world ?

The Haar Wavelets (1910) !

The Haar Wavelets

- Best energy synopsis amongst all wavelets (we

will see more later) - Great for data with discontinuities.
- A natural extension to discrete spaces
- 1,-1,0,0,0,0
, 0,0,1,-1,0,0,
,0,0,0,0,1,-1,

- 1,1,-1,-1,0,0,0,0, ,0,0,0,0,1,1,-1,-1,

The Haar Synopsis Problem

- Formally, given a signal X and the Haar basis

?i find a representation F?i zi ?i with at

most B non-zero zi minimizing some error which a

fn of X-F - Lets begin with the VOPT error (X-F22)

The Magic of Parseval (no spears)

- The l2 distance is unchanged by a rotation.
- A set of basis vectors ?i define a rotation iff

- h ?i,?j i ?ij , i.e.,
- Redefine the basis (scale) s.t. ?i2 1
- Let the transform be W
- Then X-F2 W(X-F)2W(X) W(F)2
- Now W(F)z1,z2, zn and so
- W(X) W(F)2 ?i (W(X)i zi)2

What did we achieve ?

- Storing the largest coefficients is the best

solution. - Note that the fact ziW(X)i is a consequence of

the optimization and IS NOT a specification of

the problem. - More on that later.

What is the best algorithm ?

- How to find the largest B coefficients of the set

x1,x2, ? - Cascade Algorithm.
- Recall the hierarchical nature.

Cascade algorithm ?

- Given a,b represent them as (a-b) and (ab)
- Divide by sqrt(2) so that the sum of squares etc
- Running time O(n)

1

4

5

6

Surfing Streams

- Notice that once the left half is done we only

need to remember the - A stream algorithm is natural

Surfing Streams

- Have an auxillary structure that maintains top B

of a set of numbers

Where else have you seen this ?

Reduce Merge Paradigm Also used in clustering

data streams

In summary

- Given a series of x1,x2,
xi,
xn in increasing

order of i we can find (maintain) the largest B

coefficients in O(n) time and O(Blog n) space - Ok, but only for X-F2

Extended Histograms

- What do we do in presence of multiple

dimensions/measures ? - Use multi-dim transforms
- Use many 1 D transforms
- Strategy Use a Flexible scheme that allows us to

store the index and a bitmap to indicate which

measures are stored.

How to solve it ?

- For the basic 1-D problem we need to choose the

largest B coefficients - Use Parseval to transform error of data to

choosing/not choosing coefficients - Here we have bags
- We can choose coefficient j with bitmap
- 0100 using HS space
- 0101 using H2S space
- 1111 using H4S space

Is 0101 better than 1100 ?

- Subproblem
- Given the fact that we have settled on choosing 2

coefficients for j, which 2 ? - It is the largest 2 again!
- Basically we can choose a set of indices j and

decide how many coefficients we choose for each j

What does this remind you of ?

Knapsack

- Each item j is available with M different

versions. - Cost of the rth version is HrS. The profit is an

increasing function of r. - Can choose only one version.

Strange roadbumps

- Optimal profit Optimal error total energy
- The relationship does not hold in approximation.
- 991100. Approximating 99 by 95 increases error

by 400 - We will return to this.

Many questions

- What do we do for other error measures ?
- What is the connection with Histograms ?
- Positives Some direction
- Cascade algorithm
- Hierarchy of coefficients

Non l2 errors

Storing coefficients is suboptimal

- Recall the complicate 1,4,5,6
- We want a 1 term summary and the error is l1
- What do we store ?

What is the final Result ?

3.5,3.5,3.5,3.5 What is the transform ?

1

4

5

6

7,0,0,0 But the set of coefficients available

8,?,?,?

What to do ?

- Search where there is light.
- Restricted problem. Useful if the synopsis has

more than one use. - Think outside the coefficients
- Probabilistic Rounding
- Search (cleverly) over the whole space

The Best Restricted Synospis

- Maximum Error.
- A value (at the leaf) is affected by only the

ancestors. - of ancestors log n
- Guess/try all of the set!
- O(n) choices
- Start bottom up and use a DP to choose the best B

coefficients overall. - Works for a large number of error measures.

Analysis

- At each internal node j we need to maintain the

table - Errorj,Ancestor set,b the contribution to

the minimum error by only the subtree rooted at j

when using b or less coefficients (for the

subtree) - Size of table O(n2B)
- Time O(n2B log B) depends on measure
- But we can do better.

Faster Restricted Synospis

- A better cut
- Number of coefficients in a subtree is at most

size1 - Size of the table storing Errj,Ancestor Set,b
- Remains constant as we go up the levels!
- Ancestor set decreases by 1
- b takes twice as many values
- O(n2) algorithm
- We can also reduce the space to O(n)

Thinking beyond the coefficient

- Probabilistic Rounding
- Start from the coefficients.
- Randomly round most of them to 0
- A few are rounded to non-zero values
- E.g. set zi? with prob. e-W(X)i/? and 0

otherwise - Has promise (correct expectation, variance)
- Two issues,
- The quality is unclear (wrt the original

optimization) - The Expected number of non-zero coefficients is B

- The variance is large, so with reasonable prob

2B

More exploration reqd

- Interestingly the method (as proposed) eliminates

a region of search space - We can construct examples that the optimum lies

in that range. - But is an interesting method and likely (I/we are

guessing) preserves more errors than one

simultaneously (multi-criterion optimization)

What is the optimum strategy

- Consider the best set of coefficients
- Zz1,z2, zn
- nudge them a bit by making them multiples of

some ? - The extra error is small (and a fn of ?)
- In fact each point sees ? log n
- By reducing ? we can get (1?) approx

A straightforward idea

- But we still need to find the solution

The ancestor set is unimportant what is

important is their combined effect. Try all

possible values (multiples of ?, but we still

need to fix the range)

The graphs the data

The graphs l1

Relative Error (small B), Relative l1

The times

What have we seen so far

- Wavelet representation of l_2 error
- Streaming
- Wavelet representation for non l_2 error
- Restricted
- Unrestricted
- Stream

A return to histograms

Easy relationships

- A B-bucket (piecewise constant) histogram can be

represented by 2B log n Haar wavelet

coefficients. - Why
- Only the 2B boundary points matter
- A B-term Haar wavelet synopsis can be represented

by 3B-bucket histogram. - Why
- Each wavelet basis creates 3 extra pieces from 1

line

Anything else ?

Totally! We can use Wavelets to get

(1\epsilon)-approximate V-optimal

histograms. In fact the method has advantages

Histograms, Take 5

- A B-term Histogram can be represented by cB log

n wavelet terms. - What is we choose the largest cB log n wavelet

terms ?

Need not be good.

- The best histogram has the cB log n wavelets

aligned such that the result is B buckets. - The best cB log n coefficients are all over the

place and give us 3cB log n buckets. - All hope is lost ?

If at first you dont succeed

- We repeat the process and also keep the next cB

log n coefficients - No.
- But notice that the energy drops.
- Energy X2W(X)2
- Basic intuition If there were a lot of

coefficients which were large then the best V-Opt

histogram MUST have a large error. - Why?

The robust property

- Look at W(X)-W(H)2X-H2
- W(H) has cB log n entries
- If W(X) has cB?-2 log n large entries ..

A strange idea in 1000 words

- Consider the projection to the largest cB?-2 log

n wavelet terms - Is

¼

?

No. But flatten the fn

? X

¼

In fact

- If we chose (B?log n)O(1), i.e., large, number of

coefficients then the boundary points of the

coefficients are (approximately) good boundary

points for a VOPT histogram.

The take away

- Im ok youre ok
- If Im not ok then youre not ok too.
- An oft repeated approximation paradigm
- if there are too many coefficients then my

algorithm is doomed but so is anyone elses, and

therefore I am good - if there are not too many coefficients then

were good.

The Extended Wavelets in l2

- We can store the largest coefficients
- If there are too many coefficients which are

large then optimum error is large. - Otherwise we repeatedly take out coefficients

till taking out coefficients will not reduce the

error any more. - DP on the set of coefficients taken out.

The Full Monty update streams

- So far we have been looking at X arriving as

x1,x2, - What happens when X is specified by a stream of

updates ? - i.e., (i,di)change xi to xi di

Sketches Stream Embeddings

- Basically Dimensionality reduction
- To compute the histogram H of signal X
- Compute embedding g(X) to fit the space
- Compute H s.t. g(H) is close to g(X)

Linear Embeddings

- JL Lemma
- A is a Random Matrix drawn

from Gaussian distribution. - Too many elements in matrix!
- Use Pseudorandom Generators
- P-Stable distribution for

What it achieves

- Computes Norm

Increasing the coordinate is adding the column to

sketch.

A

x

Suppose we knew the intervals

- The best histogram minimizes
- X-H2 ¼? AX AH 2

AX is a vector, AH is a linear function of B

values

We have a min sq. error program, solvable in

ptime more involved in 1-norm.

Cannot do that

- X-H2 W(X) W(H)2 ¼? AW(X) AW(H) 2

Idea Use the linear map to find the large

number of Wavelet coefficients (top k problem

using sketches) Use similar ideas to Take 5 to

get the final solution.

The return of the pink Fourier

- Assuming x1,x2,
,xi,
arrive in increasing order

of i, find/maintain the top k Fourier

coefficients. - Use the strategy
- Assume that there are O(k log n) frequencies and

try to find them. - If not, we are doomed and so is everyone.
- So we are ok.
- For the 3rd time

What about top k

- Assuming x1,x2,
,xi,
are specified by a stream

of updates find/maintain the top k values (all

elements with frequency 1/k or more). - Use the strategy
- Assume that there are O(k log n) elements and try

to find them. - If not, we are doomed and so is everyone.
- So we are ok.
- Again!
- Use Group testing
- 20 questions, bit chasing is an heavy item in

the first half ? You can use norms or you can

use collisions (hashes).

From optimization to learning

- We are trying to learn a pure signal that has

few coefficients - A general paradigm.

The Meaning of Life

- In Summary (high level)
- Approximation is very useful for synopsis

construction (the execution time speedups plus

the end use of synopsis is approximation only) - Synopses are usually applied on large data.

Asymptotic behaviour matters - The exact definition of the optimization is

important. How natural is natural - Few degrees of separation between the synopsis

structures. They are related. They should be. But

then we can use algorithmic techniques back and

forth between them.

The Summary (contd.)

- In algorithm design terms
- Most synopsis construction problems involve DP.

Investigating how to change the DP to get

approximation, space efficient algs., is often

useful. - Search techniques (computation geometry) search

exponents first are useful. - What you analyze (carefully) is often what you

would get asymptotically. The usual techniques we

use for pruning etc., can be analyzed and and

shown to be better. - Reduce-Merge ) Streaming ?
- The top k in various disguises. Group testing

matters.

What lies ahead

- Ok. So 1 D histograms have good algos.
- 2 D ?
- NP-Hard.
- Some approximation algorithms known.
- Q In linear time and sublinear space what can we

do ? - Sketch based results. Long way to go.

What lies ahead

- So 1 D Haar Wavelets have good algos (non l2).
- 2 D ?
- Unlikely to be NP-Hard
- Quasi-polynomial time nlog n approximation

algorithms known. - Q In linear time and sublinear space what can we

do ?

What lies ahead

- So 1 D Haar Wavelets have good algos (non l2).
- Non Haar ? Daubechies. Multifractals.
- Unlikely to be NP-Hard
- Quasi-polynomial time nlog n approximation

algorithms known. - What can we do ?

What lies ahead

- All the update stream results are based on l2

error because of Johnson Lindenstrauss (and some

on lp for 0ltp 2) - What about other errors ?
- Will require new techniques for streaming.

Notes (not from the underground)

- The VOPT definition
- Poosala, Haas, Ioannidis, Shekita, SIGMOD 96.
- The VOPT histogram algorithm
- Jagadish, Koudas, Muthukrishnan, Poosala, Sevcik,

Suel, VLDB 98. - Take 1
- Guha, Koudas, Shim, STOC, 01.
- Take 2
- Guha, Koudas, ICDE, 02.
- Take 3 4
- Guha, Koudas, Shim, TODS, 05.
- Take 5
- Guha, Indyk, Muthukrishnan, Strauss, ICALP, 02.
- Relative Error Histograms
- Guha, Shim, Woo, VLDB, 04.
- Maximum Error histograms
- Nicole, J. of Parallel Distributed Computing,

1994. - (Muthukrishnan, Khanna, Skiena, ICALP, 97),
- Guha, Shim, (here) 05.

More Notes

- Range Query Histograms
- Muthukrishnan, Strauss, SODA, 03.
- The Full Monty
- Gilbert, Guha, Indyk, Kotidis, Muthukrishnan,

Strauss, STOC, 02. - Parseval stuff
- Parseval, (margin of notebook ?), 1799.
- Folklore sum of squares and l2
- The mandala
- Surfing Wavelets
- Gilbert, Kotidis, Muthukrishnan,Strauss, VLDB,

01 - Probabilistic Synopsis
- Gibbons, Garofalakais, SIGMOD, 02 (also TODS,

04) - Maximum error (restricted version)
- Garofalakis, Kumar, PODS, 04.

Notes again

- Faster Restricted Synopsis
- Guha, VLDB, 05.
- Unrestricted non l2 error
- Guha, Harb, KDD, 05 new results
- Extended Wavelets
- Deligiannakis Rossopolous, SIGMOD 03.
- Guha, Kim, Shim, VLDB 04.
- Streaming Fourier approximation
- Gilbert, Guha, Indyk, Muthukrishnan, Strauss,

STOC, 02 - Learning Fourier Coefficients
- Linial, Kushilevitz, Mansour, JACM, 93
- JL Lemma
- Johnson, Lindenstrauss, , 84.
- Sketches
- Alon, Matias, Szegedy, JCSS, 99.
- Feigenbaum Kannan, Vishwanathan, Strauss, FOCS,

99 - Indyk, FOCS, 00

Roads not taken

- (but are relevant to synopsis)
- Property Testing
- Weighted sampling and SVD
- Median Finding
- Sampling based estimators