Offline, Stream and Approximation Algorithms for Synospis Construction - PowerPoint PPT Presentation

Loading...

PPT – Offline, Stream and Approximation Algorithms for Synospis Construction PowerPoint presentation | free to download - id: a9a76-NWVmO



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Offline, Stream and Approximation Algorithms for Synospis Construction

Description:

A tutorial on synopsis construction algorithms. 2. About this Tutorial ... What is a synopsis ? Hmm. Any 'shorthand' representation. Clustering! SVD! ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 154
Provided by: sudi4
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Offline, Stream and Approximation Algorithms for Synospis Construction


1
Offline, Stream and Approximation Algorithms for
Synospis Construction
  • Sudipto Guha University of Pennsylvania
  • Kyuseok Shim Seoul National University

2
About this Tutorial
  • Information is incomplete and could be inaccurate
  • Our presentation reflects our understanding which
    may be erroneous

3
Synopses Construction
  • Where is the life we have lost in living?
  • Where is the wisdom we have lost in knowledge?
  • Where is the knowledge we have lost in
    information?
  • T. S.
    Eliot, from The Rock.
  • Routers
  • Sensors
  • Web
  • Astronomy and sciences
  • Too much data too little time.

4
The idea
  • To see the world in a grain of sand…
  • Broad characteristics of the data
  • Compression
  • Dimensionality Reduction
  • Approximate query answering
  • Denoising, Outlier Detection and a broad array of
    signal processing

5
What is a synopsis ?
  • Hmm.
  • Any shorthand representation
  • Clustering!
  • SVD!
  • In this tutorial we will focus on signal/time
    series processing

6
The basic problem
  • Formally, given a signal X and a dictionary ?i
    find a representation F?i zi ?i with at most B
    non-zero zi minimizing some error which is a fn
    of X-F
  • Note, the above extends to any dim.

7
Many issues
  • What is the dictionary ?
  • Which B terms ?
  • What is the error ?
  • What are the constraints ?

8
Many issues
  • What is the dictionary ?
  • Set of vectors
  • Maybe a basis
  • Which B terms ?
  • What is the error ?
  • What are the constraints ?

Top K
9
Many issues
  • What is the dictionary ?
  • Set of vectors
  • Maybe a basis
  • Which B terms ?
  • What is the error ?
  • What are the constraints ?

Haar Wavelets
Also Fourier, Polynomials,…
10
Many issues
  • What is the dictionary ?
  • Set of vectors
  • May not be a basis
  • Histograms
  • There are n choose 2 vectors
  • But since we impose a non-overlapping restriction
    we get a unique representation.
  • Which B terms ?
  • What is the error ?
  • What are the constraints ?

11
Many issues
  • What is the dictionary ?
  • Which B terms ?
  • First B ?
  • Best B ?
  • What is the error ?
  • What are the constraints ?

Why should we choose first B ?
  • B vs 2B numbers
  • Also …

12
Approximation theory
  • Discipline of Math associated with approximation
    of functions.
  • Same as our problem
  • Linear theory (Parseval, 1800 over two
    centuries)
  • Non-Linear theory (Schmidt 1909, Haar 1910)
  • Is it relevant ? Yes. However Math treatment has
    been extremal, i.e., how does the error change
    as a function of B. Is that bound tight?
  • Note a yes answer does not say anything about
    given this signal, is that the best we can do ?

13
Many issues
  • What is the dictionary ?
  • Which B terms ?
  • What is the error ?
  • This controls which B.
  • X-F2 is most common, used all over in
    mathematics
  • X-F1,X-F1 are useful also
  • Weights. Relative error of approximation
  • 1000 by 1010 is not so bad.
  • 1 by 11 is not too good an idea.
  • What are the constraints ?

14
Many issues
  • What is the dictionary ?
  • Which B terms ?
  • What is the error ?
  • What are the constraints ?
  • Input ? Stream, stream of updates …
  • Space, time, precision and range of values (for
    zi in the expression F?i zi ?i )

15
In this tutorial
  • Histograms Wavelets
  • Will focus on Optimal, Approximation and
    Streaming algorithms
  • How to get one from the other!
  • Connections to top K and Fourier.

16
I. Histograms.
17
VOpt Histograms
  • Lets start simple
  • Given a signal X, find a piecewise constant
    representation H with at most B pieces minimizing
    X-H2
  • Jagadish, Koudas, Muthukrishnan, Poosala, Sevcik,
    Suel, 1998
  • Consider one bucket.
  • The mean is the best value.
  • A natural Dynamic programming formulation

18
An Example Histogram
Data Distribution
V-Optimal Histogram
19
Idea VOpt Algorithm
  • Within step/bucket Mean is the best.
  • Assume that the last bucket is j1,n.
  • What can we say about the rest k-1 ?

OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
20
Idea VOpt Algorithm
  • Within step/bucket Mean is the best.
  • Assume that the last bucket is j1,n.
  • What can we say about the rest k-1 ?

OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
21
Idea VOpt Algorithm
  • Within step/bucket Mean is the best.
  • Assume that the last bucket is j1,n.
  • What can we say about the rest k-1 ?

OPTj,k-1
SQERRj1,n
Last bucket
Must also be optimal for the range 1, j with
(k-1) buckets! Dynamic Programming !!
22
Idea VOpt Algorithm
  • Dynamic programming algorithm was given to
    construct the V optimal Histogram.
  • OPTn,k min OPTj,k-1,SQERR(j1)..n
  • 1jltn
  • OPTj, k the minimum cost of representing the
    set of values indexed by 1..j by a histogram
    with k buckets.
  • SQERR(j1)..n the sum of the squared absolute
    errors from (j1) to n.

23
The DP-based VOpt Algorithm
  • for i1 to n do
  • for k1 to B do
  • for j1 to i-1 do (split pt of k-1 bucket
    hist. and last bucket)
  • OPTi, k min OPTi, k, OPTj,k-1
    SQERRj1,i
  • We need O(Bn) entries for the table OPT
  • For each entry OPTi,k, it takes O(n) time if
    SQERRj1.i can be computed O(1) time
  • O(Bn) space and O(Bn2) time

OPT
B
n
24
Computation of Sum of Squared Absolute Error in
O(1) time
sum(2,3) x2x3 sum3-sum1 12-2 10
25
Computation of Sum of Squared Absolute Error in
O(1) time
Let and Then, Thus,
26
Analysis of VOpt Algorithm
  • O(n2B) time O(nB) space
  • The space can be reduced (Wednesday)
  • Main Question The end use of histogram is to
    approximate something.
  • Why not find an approximately optimal (e.g.,
    (1e) ) histogram?

27
If you had to improve something ?
Via Wavelets ssq O(n) time O(B2/?2) space
O(n2B) time O(nB) space
(1?) streaming O(nB2/?) time. O(B2/?) space
(1?) streaming ssq O(n) time. O(B/?2) space
O(n2B) time O(n) space
(1?) streaming O(n) time. O(B2/?) space
offline O(n) time. O(B2/?) space
Offline O(n) time. O(nB/?) space
28
Take 1
  • For i1 to n do
  • For K1 to B do
  • For j1 to i-1 do (split point for the last
    bucket)
  • OPT 1…i, k
  • Min OPT1…i, k,
    OPT1…j,k-1 SQERR(j1,i)
  • OPT1..j,k is increasing
  • SQERR(j1,i) is decreasing
  • Question Can we use the monotonicity for
    searching the minimum ?

As j increases
29
No
  • Consider a sequence of positive y1,y2,…,yn
  • F(i) ?i yi and G(i) F(n) F(i-1)
  • F(i) monotonically increasing … Opt1..j,k-1
  • G(i) monotonically deceasing … SQERR(j1,i)
  • ?(n) time is necessary to find mini F(i)G(i)
  • Open Question Does it extend to ?(n2) over the
    entire algorithm ?

30
What gives ?
  • Consider a sequence of positive y1,y2,…,yn
  • F(i) ?i yi and G(i) F(n) F(i-1)
  • Thus, F(i)G(i) F(i) xi
  • Any i gives a 2 approximation to mini F(i)
    G(i)
  • F(i) G(i) F(n) xi 2 F(n)
  • mini F(i) G(i) is at least F(n)

31
Round 1
  • Use a histogram to approximate the fn
  • Bootstrap!
  • Approximate the increasing fn in powers of (1d)
  • Right end pt is (1?) approximation of left end
    pt

32
What does that do ?
  • Consider evaluating the fn at the two endpoints
  • Proof by picture.

h
h
By construction.
Why ?
By monotonicity!
33
Therefore…
  • The right hand point is a (1d) approximation!
  • Holds for any point in between.
  • OPTxSQERRx1 OPTaSQERRb
  • OPTb/(1 d) SQERRb
  • OPTb SQERRb/ (1d)
  • Are we done ?
  • Not quite yet.
  • What happens for Bgt2 ? we do not compute
    OPTi,b exactly !!

SQERR
OPT
h
a
b
34
Zen and the art of histograms
  • Approximate the increasing fn in powers of (1d)
  • Right end pt is (1d) approximation
  • Prove by induction that the error is (1?)B
  • This tells us what ? should be (small), in fact
    if we set ??/2B then (1?)B 1?

35
Complexity analysis
  • of intervals p (B/?) log n
  • Why ?
  • c(1d) (p-1) nR2 and d ?/(2B)
  • R is the largest number in data
  • Assume R is polynomially bounded by n
  • Running time nB (B/?) log n
  • Why are we approximating the increasing function
    ? Why not the decreasing one ?

36
The first streaming model
  • The signal X is specified by xi arriving in
    increasing order of i
  • Not the most general model
  • But extremely useful for modeling time series data

37
Streaming
?1b xi ?1b x2i
Need to store ?1a xi ?1a x2i
a
b
Required space is (B2/?) log n
38
VOpt Construction O(Bn2)
  • Jagadish et al. VLDB 1998
  • OPT(i,k) min1jltiOPT(j,k-1)SQERR(j1,i)

10
7
8
9
3
4
5
6
1
2
OPTj,k
8
9
10
4
5
6
7
2
3
1
OPTj,k-1
n
n
39
AHIST-S (1e) Approximation
  • AOPTj,k min1jltiAOPTbjp,k-1SQERRbjp1,n
  • O(B2e-1nlogn) time and O(B2e-1logn) space

AOPTj,k
(1d)a b
AOPTj,k-1
n
(1d)a lt c
P
40
The overall idea
The natural DP table
The approximate table
41
Do ?s talk to us ?
  • DJIA data from 1901-1993

execution time
B
42
Take 2 GK02
  • Sliding window streams
  • Potentially infinite data interested in the
    last n only
  • Q Suppose we constructed histogram for 1..n
    and now want it for 2..(n1)
  • Previous idea is a dead on arrival.
  • Consider 100,1,2,3,4,5,7,8,…

43
Formal problem
  • Maintain a data structure
  • Given an interval a,b construct a B bucket
    histogram for a,b
  • Compute on the fly
  • Generalizes the window!
  • Generalizes VOpt when a1,bn

44
Reconsider the take 1
  • We are evaluating
  • Left to right, i.e.,

But we are still evaluating this guy !
45
A brave new world
  • Assume a O(n) size buffer holds xi values
  • The previous algorithm was
  • Several issues
  • Which values are necessary and sufficient
  • We are not evaluating all values what induction
    ?

46
A trickier proof
a
c
b
d
g
f
d

c
b
c




g

g

a
d

g
a
a
47
GK02 Enhanced (1e) Approximation
  • Lazy evaluation using Binary Search
  • O(B3e-2log3n) time and O(n) space
  • Pre-processing takes O(n) time SUM and SQSUM

(1d)a z
AOPTj,k
(1d)a lt z1
AOPTj,k-1
n
P
48
GK02 Enhanced (1e) Approximation
  • Creates all of B interval lists at once
  • The values of necessary AOPTj,k are computed
    recursively to find the intervals ajp, bjp
    where bjp is the largest z s.t.
  • (1e) AOPTajp,k (1e) AOPTz,k
  • (1e) AOPTajp,k lt (1e) AOPTz1,k
  • Note that AOPT increases as z increases
  • Thus, we can use binary search to find z
  • O(n) space of SUM and SQSUM arrays needs to be
    maintain to allow the computation of SQERR(j1,i)
    in O(1) time
  • O(nB3e-2log3n) time and O(n) space

49
Take 2 summary
  • O(n) space and O(nB3?-2log2 n) time
  • Is that the best ? Obviously no.

50
Take 3 AHIST-L- ?
  • Suppose we knew ? OPT 2 ? then…
  • Instead of powers of (1?/B) additive terms of
    ??/(2B) then …
  • Time is O(B3?-2 log n)
  • To get ? ?
  • 2-approximation ?O(1)
  • a binary search O(log n)
  • Thus, O(B3 log n log n)
  • Overall O(nB3(?-2logn)log n) time and O(nB2/?)
    space

O(B/?)
51
Take 4 AHIST-B
  • Consider the take 4 algorithm.
  • How to stream it ?

On the new part
Overall
M
52
Not done yet
k
K-1
1r
  • First find an ? O(1) approximation, then proceed
    back and refine

53
The running space-time
  • B( insertions)(log M)(log ?) where ?O(B?-1 log
    n) is the length of a list
  • Space
  • Who cares and why ?

54
Asymptotics
  • For fixed B and ?, we can compute a (1 ?)
    piecewise constant representation in
  • O(n log log n) time and O(log n) space or
  • O(n) time and O(log n log log n) space.
  • Extends to degree d polynomials, space increases
    by O(d) and time is O(nd d3…)

55
Our friendly ? Running time
Execution Time
B
56
Our friendly ? Error
(Error VOPT)/VOPT
B
57
What you analyze is what you get
Execution time
n
58
Questions ?
59
The status for VOPT
  • Saves space across all algorithms except
    algorithms which extend to general error measure
    over streams

60
For general error measure, IF…
  • The error of a bucket only depends on the values
    in the bucket.
  • The overall error function, is the sum of the
    errors in the buckets.
  • The data can be processed in O(T) time per item
    such that in O(Q) time we can find the error of a
    bucket, storing O(P) info.
  • The error (of a bucket) is a monotonic function
    of the interval.
  • The value of the maximum and the minimum nonzero
    error is polynomially bounded in n.

61
Then…
  • Optimum histogram in time O(nTn2(BQ)) time and
    O(n(PB)) space
  • (1?)-approximation in
  • O(nTnQB2?-1 log n) time and O(PB2?-1 log n)
    space,
  • O(nT QB3(log n ?-2 )log n) time and O(nP)
    space
  • O(nT) time and space
  • O(PB2 ?-1 log n (QB/T) B?-1 log2 (B?-1 log n)
    log n loglog n)

62
Splines and piecewise polynomials
  • Instead of
  • If we wanted
  • Or maybe…

63
The overall idea
  • If we want to represent xa1,…,xb by
    p0p1(x-xa)p2(x-xa)2 …
  • The solution is as above…
  • We need O(d) times (than before) space and need
    to solve the system. This means an increase by a
    factor O(d3) in time.

64
Another useful example Relative error
  • Issue with global measures Estimating 10 by 20
    and 1000 by 1010 has the same effect
  • The above is ok if we are querying for 1000 a
    1000 times and 10 times for 10 (point queries
    and VOPT measure)
  • But consider approximating a time series. We may
    be interested in per point guarantees.

65
Sum of Squared Relative Error for a Bucket
  • Relative error for a bucket (sr,er,xr)
  • Since A gt 0, it is minimized when xrB/A
  • The minimum value is C-B2/A
  • If the aggregated sum of A, B and C are stored,
    ERRSQ(i,j) can be computed in O(1) time
  • Optimal histogram can be constructed in O(Bn2)
    time… Approximation algorithms follow…

66
Maximum Error and the l1 metric
67
Maximum Error Histograms
  • A bucket (sr,er,xr) with a numbers x1, x2, …,
    xn s.t.
  • sr starting position
  • er ending position
  • xr representative value
  • Maximum Error is given by
  • Maximum relative error is defined as

68
Maximum Error of a bucket
  • Given numbers x1, x2, …, xn s.t.
  • Maximum Error is given by ErrMminxr maxi xi
    xr
  • What is the best xr
  • (xminxmax)/2

69
Maximum Relative Error of a set
  • Given a set of numbers x1, x2, …, xn
  • max the maximum of x1, x2, …, xn
  • min the minimum of x1, x2, …, xn
  • c A sanitary constant
  • Some function of c,max,min
  • E.g., when c min max the error is
  • Optimal maximum relative error for a bucket can
    be computed in O(1) time

70
The Naïve Optimal Algorithm
  • for i 1 to n do
  • OPTMi,1 ERRM(i,i)
  • for K 1 to B do
  • max - 8 min 8 OPTMi,k 8
  • for j i-1 to 1 do
  • if (max lt xj1) max xj1
  • if (min gt xj1) min xj1
  • OPTMi,k minOPTMi,k ,
  • max( OPTMj,k-1,
    ERRM(j1,i) )
  • ERRM(j1,i) can be obtained in O(1) time
  • O(Bn) space and O(Bn2 ) time optimal algorithm

71
An Improved Optimal Algorithm
  • OPTMi,j minjmax( OPTMj,k-1, ERRM(j1,i))
  • Observations
  • OPTMj,k-1 is an increasing function
  • ERRM(j1,i) is a decreasing function
  • To compute minx max ( F(x), G(x) ) where F(x)
    and G(x) are non-decreasing and non-increasing
    functions
  • We can perform binary search for the value of x
    such that F(x) gt G(x) and F(x-1) lt G(x-1)
  • The minimum is min G(x-1) and F(X)

72
An Improved Optimal Algorithm
  • OPTMi,j minmax(OPTMj,k-1, ERRM(j1,i))
  • We can improve the most inner loop of Naïve
    algorithm in O(log n) time.
  • However, ERRM(j1,i) cannot be computed in O(1)
    time any more
  • Using an interval tree, we can compute min and
    max values for j1, i, i.e. ERRM(j1,i), in
    O(log n) time
  • Thus, our improved algorithm takes O(Bn log2n)
    time with O(Bn) space

73
An Interval Tree Example
1,8
Min Interval
5,8
1,4
decomposeRight
decomposeLeft
1,2
3,4
5,6
7,8
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
The steps of decomposing 2,4 with an interval
tree
74
Consider another solution
  • Make the first bucket as large as possible
  • i.e. push the boundary right
  • E.g. in the figure we can….
  • As long as the max and min is same…
  • Why will we have to stop ?

75
Consider another solution (2)
  • In this example we cannot…
  • But may be the error comes from a different
    bucket!
  • Heres one idea
  • Given an i, find Err1,i
  • If i is small Err1,i OPT
  • If i is large Err1,i OPT
  • How ?

76
How ?
  • Assume given an interval a,b, we can find the
    min and max, and therefore Erra,b
  • With O(n) time and space preprocessing, we can
    find Err in O(log n) time. (interval tree)
  • Checkp,q,b,?
  • If q gt p (for b 0), we are done.
  • Otherwise,
  • Find mid, s.t. Errp,mid ? and Errp,mid1 gt
    ?
  • Checkmid1,q,b-1,?
  • O(B log2 n)
  • Binary Search log n log n (to find min and max
    for Err)
  • Invocation of Check B times

77
Now for the original problem
  • By binary search, find largest s such that
  • When ?Err1,s and ?Err1,s1,
  • Check1,n, B-1 ?false and Check1,n, B-1,
    ?true
  • Now OPT? or the best B-1 bucket error of
    s1,n
  • A recursive algorithm!
  • T(B) log n B log2 n T(B-1) ¼ O(B2 log3 n) !!

78
Summary
  • In O(n B2 log3 n) time and O(n) space we can
    find the optimum error.
  • What do we do if
  • Stream or
  • Less than O(n) space ?
  • Approximate, using some of the old ideas…

79
Short break !
  • When we return
  • Range Query Histograms
  • Wavelets
  • Optimum synopsis
  • Connection to Histograms
  • Overall ideas and themes

80
Range Query Histograms
81
A more synopsis structure
  • Instead of estimating the value at a point we are
    interested in sum of the values in
    intervals/ranges.
  • Clearly, very useful.
  • Clearly we need new optimization.
  • E.g.,

82
A more difficult problem
  • Only special cases solved (satisfactorily)
  • Hierarchies
  • Prefix ranges All ranges of form 1,j as j
    varies
  • Complete Binary Ranges
  • General hierarchies
  • Uniform Ranges all ranges

83
Status Range Query
  • Caveat

84
The uniform case
  • Consider a sequence X0,x1,x2,…,xn
  • Define the operators
  • ?(g)i?j i gj is the prefix sum

85
Unbiased
  • Suppose H is a histogram such that F?(X-H) is
    s.t. ?i Fi0
  • Or think of ?i ?rlti (Xr-Hr)0
  • Claim Error of using H to answer range queries
    for X is twice the error of using ?(H) to answer
    point queries about ?(X) !

86
The main idea
  • Define Gi?rlti Xi Hi ?(X)i - ?(H)i
  • Now ?i Gi 0 if H is unbiased
  • Pick a RANDOM elements u
  • Expected Gu 0
  • Pick two random elements u,v
  • Expected (Gu-Gv)2Expected error of using H
    to answer range queries for X
  • But that is equal to 2 Expected Gu2

87
A simple approximation
  • What we want is
  • Hard
  • But we know how to get

?(H)
?(X)
Piecewise linear histograms!
88
An easy trick
  • We can also find
  • A buffer of Size 1 after each bucket
  • Use it as a patch-up
  • 2B buckets
  • Same error as OPT
  • Approximation algorithms try to find the
    continuous variant

89
The Synopsis Construction Problem
  • Formally, given a signal X and a dictionary ?i
    find a representation F?i zi ?i with at most B
    non-zero zi minimizing some error which a fn of
    X-F
  • In case of histograms the dictionary was the
    set of all possible intervals but we could only
    choose a non-overlapping set.

90
The eternal what if
  • If the ?i are designed for the data do we get
    a better synopsis ?
  • Absolutely!
  • Consider a Sine wave …
  • Or any smooth fn.
  • Why though ?

91
Representations not piecewise const.
  • Electromagnetic signals are sine/cosine waves.
  • If we are considering any process which involve
    electromagnetic signals this is a great idea.
  • These are particularly great for representing
    periodic functions.
  • Often these algorithms are found in DSP (digital
    signal processing chips)
  • A fascinating 300 years of history in Math !

92
A slight problem …
  • ni nill cfme back tf Ffurier
  • Fourier is suitable to smooth natural processes
  • If we are talking about signals from man-made
    processes, clearly they cannot be natural (and
    hardly likely to be smooth) …
  • More seriously, discreteness and burstiness…

93
The Wavelet (frames)
  • Inherits properties from both worlds
  • Fourier transform has all frequencies.
  • Considers frequencies that are powers of 2 but
    the effect of each wave is limited (shifted)

94
Wavelets
  • What to do in a discrete world ?

The Haar Wavelets (1910) !
95
The Haar Wavelets
  • Best energy synopsis amongst all wavelets (we
    will see more later)
  • Great for data with discontinuities.
  • A natural extension to discrete spaces
  • 1,-1,0,0,0,0…, 0,0,1,-1,0,0,…,0,0,0,0,1,-1,…
    …
  • 1,1,-1,-1,0,0,0,0,…,0,0,0,0,1,1,-1,-1,……

96
The Haar Synopsis Problem
  • Formally, given a signal X and the Haar basis
    ?i find a representation F?i zi ?i with at
    most B non-zero zi minimizing some error which a
    fn of X-F
  • Lets begin with the VOPT error (X-F22)

97
The Magic of Parseval (no spears)
  • The l2 distance is unchanged by a rotation.
  • A set of basis vectors ?i define a rotation iff
  • h ?i,?j i ?ij , i.e.,
  • Redefine the basis (scale) s.t. ?i2 1
  • Let the transform be W
  • Then X-F2 W(X-F)2W(X) W(F)2
  • Now W(F)z1,z2,…zn and so
  • W(X) W(F)2 ?i (W(X)i zi)2

98
What did we achieve ?
  • Storing the largest coefficients is the best
    solution.
  • Note that the fact ziW(X)i is a consequence of
    the optimization and IS NOT a specification of
    the problem.
  • More on that later.

99
What is the best algorithm ?
  • How to find the largest B coefficients of the set
    x1,x2,… ?
  • Cascade Algorithm.
  • Recall the hierarchical nature.

100
Cascade algorithm ?
  • Given a,b represent them as (a-b) and (ab)
  • Divide by sqrt(2) so that the sum of squares etc…
  • Running time O(n)

1
4
5
6
101
Surfing Streams
  • Notice that once the left half is done we only
    need to remember the
  • A stream algorithm is natural

102
Surfing Streams
  • Have an auxillary structure that maintains top B
    of a set of numbers

Where else have you seen this ?
Reduce Merge Paradigm Also used in clustering
data streams
103
In summary
  • Given a series of x1,x2,…xi,…xn in increasing
    order of i we can find (maintain) the largest B
    coefficients in O(n) time and O(Blog n) space
  • Ok, but only for X-F2

104
Extended Histograms
  • What do we do in presence of multiple
    dimensions/measures ?
  • Use multi-dim transforms
  • Use many 1 D transforms
  • Strategy Use a Flexible scheme that allows us to
    store the index and a bitmap to indicate which
    measures are stored.

105
How to solve it ?
  • For the basic 1-D problem we need to choose the
    largest B coefficients
  • Use Parseval to transform error of data to
    choosing/not choosing coefficients
  • Here we have bags
  • We can choose coefficient j with bitmap
  • 0100 using HS space
  • 0101 using H2S space
  • 1111 using H4S space

106
Is 0101 better than 1100 ?
  • Subproblem
  • Given the fact that we have settled on choosing 2
    coefficients for j, which 2 ?
  • It is the largest 2 again!
  • Basically we can choose a set of indices j and
    decide how many coefficients we choose for each j

What does this remind you of ?
107
Knapsack
  • Each item j is available with M different
    versions.
  • Cost of the rth version is HrS. The profit is an
    increasing function of r.
  • Can choose only one version.

108
Strange roadbumps
  • Optimal profit Optimal error total energy
  • The relationship does not hold in approximation.
  • 991100. Approximating 99 by 95 increases error
    by 400
  • We will return to this.

109
Many questions
  • What do we do for other error measures ?
  • What is the connection with Histograms ?
  • Positives Some direction
  • Cascade algorithm
  • Hierarchy of coefficients

110
Non l2 errors
111
Storing coefficients is suboptimal
  • Recall the complicate 1,4,5,6
  • We want a 1 term summary and the error is l1
  • What do we store ?

What is the final Result ?
3.5,3.5,3.5,3.5 What is the transform ?
1
4
5
6
7,0,0,0 But the set of coefficients available
8,?,?,?
112
What to do ?
  • Search where there is light.
  • Restricted problem. Useful if the synopsis has
    more than one use.
  • Think outside the coefficients
  • Probabilistic Rounding
  • Search (cleverly) over the whole space

113
The Best Restricted Synospis
  • Maximum Error.
  • A value (at the leaf) is affected by only the
    ancestors.
  • of ancestors log n
  • Guess/try all of the set!
  • O(n) choices
  • Start bottom up and use a DP to choose the best B
    coefficients overall.
  • Works for a large number of error measures.

114
Analysis
  • At each internal node j we need to maintain the
    table
  • Errorj,Ancestor set,b the contribution to
    the minimum error by only the subtree rooted at j
    when using b or less coefficients (for the
    subtree)
  • Size of table O(n2B)
  • Time O(n2B log B) depends on measure
  • But we can do better.

115
Faster Restricted Synospis
  • A better cut
  • Number of coefficients in a subtree is at most
    size1
  • Size of the table storing Errj,Ancestor Set,b
  • Remains constant as we go up the levels!
  • Ancestor set decreases by 1
  • b takes twice as many values
  • O(n2) algorithm
  • We can also reduce the space to O(n)

116
Thinking beyond the coefficient
  • Probabilistic Rounding
  • Start from the coefficients.
  • Randomly round most of them to 0
  • A few are rounded to non-zero values
  • E.g. set zi? with prob. e-W(X)i/? and 0
    otherwise
  • Has promise (correct expectation, variance)
  • Two issues,
  • The quality is unclear (wrt the original
    optimization)
  • The Expected number of non-zero coefficients is B
  • The variance is large, so with reasonable prob
    2B

117
More exploration reqd
  • Interestingly the method (as proposed) eliminates
    a region of search space
  • We can construct examples that the optimum lies
    in that range.
  • But is an interesting method and likely (I/we are
    guessing) preserves more errors than one
    simultaneously (multi-criterion optimization)

118
What is the optimum strategy
  • Consider the best set of coefficients
  • Zz1,z2,…zn
  • nudge them a bit by making them multiples of
    some ?
  • The extra error is small (and a fn of ?)
  • In fact each point sees ? log n
  • By reducing ? we can get (1?) approx

119
A straightforward idea
  • But we still need to find the solution

The ancestor set is unimportant what is
important is their combined effect. Try all
possible values (multiples of ?, but we still
need to fix the range)
120
The graphs the data
121
The graphs … l1
122
Relative Error (small B), Relative l1
123
The times
124
What have we seen so far
  • Wavelet representation of l_2 error
  • Streaming
  • Wavelet representation for non l_2 error
  • Restricted
  • Unrestricted
  • Stream

125
A return to histograms
126
Easy relationships
  • A B-bucket (piecewise constant) histogram can be
    represented by 2B log n Haar wavelet
    coefficients.
  • Why
  • Only the 2B boundary points matter
  • A B-term Haar wavelet synopsis can be represented
    by 3B-bucket histogram.
  • Why
  • Each wavelet basis creates 3 extra pieces from 1
    line

127
Anything else ?
Totally! We can use Wavelets to get
(1\epsilon)-approximate V-optimal
histograms. In fact the method has advantages…
128
Histograms, Take 5
  • A B-term Histogram can be represented by cB log
    n wavelet terms.
  • What is we choose the largest cB log n wavelet
    terms ?

129
Need not be good.
  • The best histogram has the cB log n wavelets
    aligned such that the result is B buckets.
  • The best cB log n coefficients are all over the
    place and give us 3cB log n buckets.
  • All hope is lost ?

130
If at first you dont succeed…
  • We repeat the process and also keep the next cB
    log n coefficients …
  • No.
  • But notice that the energy drops.
  • Energy X2W(X)2
  • Basic intuition If there were a lot of
    coefficients which were large then the best V-Opt
    histogram MUST have a large error.
  • Why?

131
The robust property
  • Look at W(X)-W(H)2X-H2
  • W(H) has cB log n entries
  • If W(X) has cB?-2 log n large entries ..

132
A strange idea in 1000 words
  • Consider the projection to the largest cB?-2 log
    n wavelet terms
  • Is …

¼
?
133
No. But flatten the fn
? X
¼
134
In fact
  • If we chose (B?log n)O(1), i.e., large, number of
    coefficients then the boundary points of the
    coefficients are (approximately) good boundary
    points for a VOPT histogram.

135
The take away
  • Im ok youre ok
  • If Im not ok then youre not ok too.
  • An oft repeated approximation paradigm
  • if there are too many coefficients then my
    algorithm is doomed but so is anyone elses, and
    therefore I am good
  • if there are not too many coefficients then
    were good.

136
The Extended Wavelets in l2
  • We can store the largest coefficients
  • If there are too many coefficients which are
    large then optimum error is large.
  • Otherwise we repeatedly take out coefficients
    till taking out coefficients will not reduce the
    error any more.
  • DP on the set of coefficients taken out.

137
The Full Monty update streams
  • So far we have been looking at X arriving as
    x1,x2,…
  • What happens when X is specified by a stream of
    updates ?
  • i.e., (i,di)change xi to xi di

138
Sketches Stream Embeddings
  • Basically Dimensionality reduction
  • To compute the histogram H of signal X
  • Compute embedding g(X) to fit the space
  • Compute H s.t. g(H) is close to g(X)

139
Linear Embeddings
  • JL Lemma
  • A is a Random Matrix drawn
    from Gaussian distribution.
  • Too many elements in matrix!
  • Use Pseudorandom Generators
  • P-Stable distribution for

140
What it achieves
  • Computes Norm

Increasing the coordinate is adding the column to
sketch.
A
x
141
Suppose we knew the intervals
  • The best histogram minimizes
  • X-H2 ¼? AX AH 2

AX is a vector, AH is a linear function of B
values
We have a min sq. error program, solvable in
ptime more involved in 1-norm.
142
Cannot do that
  • X-H2 W(X) W(H)2 ¼? AW(X) AW(H) 2

Idea Use the linear map to find the large
number of Wavelet coefficients (top k problem
using sketches) Use similar ideas to Take 5 to
get the final solution.
143
The return of the pink Fourier
  • Assuming x1,x2,…,xi,… arrive in increasing order
    of i, find/maintain the top k Fourier
    coefficients.
  • Use the strategy
  • Assume that there are O(k log n) frequencies and
    try to find them.
  • If not, we are doomed and so is everyone.
  • So we are ok.
  • For the 3rd time …

144
What about top k
  • Assuming x1,x2,…,xi,… are specified by a stream
    of updates find/maintain the top k values (all
    elements with frequency 1/k or more).
  • Use the strategy
  • Assume that there are O(k log n) elements and try
    to find them.
  • If not, we are doomed and so is everyone.
  • So we are ok.
  • Again!
  • Use Group testing
  • 20 questions, bit chasing is an heavy item in
    the first half ? You can use norms or you can
    use collisions (hashes).

145
From optimization to learning
  • We are trying to learn a pure signal that has
    few coefficients…
  • A general paradigm.

146
The Meaning of Life
  • In Summary (high level)
  • Approximation is very useful for synopsis
    construction (the execution time speedups plus
    the end use of synopsis is approximation only)
  • Synopses are usually applied on large data.
    Asymptotic behaviour matters
  • The exact definition of the optimization is
    important. How natural is natural…
  • Few degrees of separation between the synopsis
    structures. They are related. They should be. But
    then we can use algorithmic techniques back and
    forth between them.

147
The Summary (contd.)
  • In algorithm design terms
  • Most synopsis construction problems involve DP.
    Investigating how to change the DP to get
    approximation, space efficient algs., is often
    useful.
  • Search techniques (computation geometry) search
    exponents first are useful.
  • What you analyze (carefully) is often what you
    would get asymptotically. The usual techniques we
    use for pruning etc., can be analyzed and and
    shown to be better.
  • Reduce-Merge ) Streaming ?
  • The top k in various disguises. Group testing
    matters.

148
What lies ahead
  • Ok. So 1 D histograms have good algos.
  • 2 D ?
  • NP-Hard.
  • Some approximation algorithms known.
  • Q In linear time and sublinear space what can we
    do ?
  • Sketch based results. Long way to go.

149
What lies ahead
  • So 1 D Haar Wavelets have good algos (non l2).
  • 2 D ?
  • Unlikely to be NP-Hard
  • Quasi-polynomial time nlog n approximation
    algorithms known.
  • Q In linear time and sublinear space what can we
    do ?

150
What lies ahead
  • So 1 D Haar Wavelets have good algos (non l2).
  • Non Haar ? Daubechies. Multifractals.
  • Unlikely to be NP-Hard
  • Quasi-polynomial time nlog n approximation
    algorithms known.
  • What can we do ?

151
What lies ahead
  • All the update stream results are based on l2
    error because of Johnson Lindenstrauss (and some
    on lp for 0ltp 2)
  • What about other errors ?
  • Will require new techniques for streaming.

152
Notes (not from the underground)
  • The VOPT definition
  • Poosala, Haas, Ioannidis, Shekita, SIGMOD 96.
  • The VOPT histogram algorithm
  • Jagadish, Koudas, Muthukrishnan, Poosala, Sevcik,
    Suel, VLDB 98.
  • Take 1
  • Guha, Koudas, Shim, STOC, 01.
  • Take 2
  • Guha, Koudas, ICDE, 02.
  • Take 3 4
  • Guha, Koudas, Shim, TODS, 05.
  • Take 5
  • Guha, Indyk, Muthukrishnan, Strauss, ICALP, 02.
  • Relative Error Histograms
  • Guha, Shim, Woo, VLDB, 04.
  • Maximum Error histograms
  • Nicole, J. of Parallel Distributed Computing,
    1994.
  • (Muthukrishnan, Khanna, Skiena, ICALP, 97),
  • Guha, Shim, (here) 05.

153
More Notes
  • Range Query Histograms
  • Muthukrishnan, Strauss, SODA, 03.
  • The Full Monty
  • Gilbert, Guha, Indyk, Kotidis, Muthukrishnan,
    Strauss, STOC, 02.
  • Parseval stuff
  • Parseval, (margin of notebook ?), 1799.
  • Folklore sum of squares and l2
  • The mandala
  • Surfing Wavelets
  • Gilbert, Kotidis, Muthukrishnan,Strauss, VLDB,
    01
  • Probabilistic Synopsis
  • Gibbons, Garofalakais, SIGMOD, 02 (also TODS,
    04)
  • Maximum error (restricted version)
  • Garofalakis, Kumar, PODS, 04.

154
Notes again
  • Faster Restricted Synopsis
  • Guha, VLDB, 05.
  • Unrestricted non l2 error
  • Guha, Harb, KDD, 05 new results
  • Extended Wavelets
  • Deligiannakis Rossopolous, SIGMOD 03.
  • Guha, Kim, Shim, VLDB 04.
  • Streaming Fourier approximation
  • Gilbert, Guha, Indyk, Muthukrishnan, Strauss,
    STOC, 02
  • Learning Fourier Coefficients
  • Linial, Kushilevitz, Mansour, JACM, 93
  • JL Lemma
  • Johnson, Lindenstrauss, , 84.
  • Sketches
  • Alon, Matias, Szegedy, JCSS, 99.
  • Feigenbaum Kannan, Vishwanathan, Strauss, FOCS,
    99
  • Indyk, FOCS, 00

155
Roads not taken
  • (but are relevant to synopsis)
  • Property Testing
  • Weighted sampling and SVD
  • Median Finding
  • Sampling based estimators
About PowerShow.com