Maintaining Stream Statistics Over Sliding Windows - PowerPoint PPT Presentation

About This Presentation
Title:

Maintaining Stream Statistics Over Sliding Windows

Description:

Maintaining Stream Statistics Over Sliding Windows Paper by Mayur Datar, Aristides Gionis, Piotr Indyk, Rajeev Motwani Presentation by Adam Morrison. – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 37
Provided by: Meyr6
Category:

less

Transcript and Presenter's Notes

Title: Maintaining Stream Statistics Over Sliding Windows


1
Maintaining Stream Statistics Over Sliding Windows
  • Paper by Mayur Datar, Aristides Gionis, Piotr
    Indyk, Rajeev Motwani

Presentation by Adam Morrison.
2
Sliding Window Intro
  • Infinite stream.
  • Only last N elements relevant.
  • Packet streams.
  • N is huge.
  • Stronger model

3
Model
  • Count memory bits.
  • Online algorithm.

Arrival
5
6
7
Timestamp
3 2 1
3 2 1
3 2 1
4
Plan
  • Basic Counting
  • Given a bit stream, maintain at every time
    instant the count of 1s in the last N elements.
  • Sum
  • Given an integer stream, maintain the sum of the
    last N elements.
  • Everything else

5
Basic Counting
  • Exact Solution? (Counter?)

Exact solution requires ?(N) bits.
1
1
0
2
2
1
6
Approximate Basic Counting
  • Solution Approximate the answer and bound the
    relative error

7
The idea
Bucket sizes? Policy for creating new
buckets? What is it good for?
  • Dynamic histogram of active 1s.
  • New 1s go into right most bucket.
  • For each bucket keep the timestamp of the most
    recent 1 and the buckets size.
  • When timestamp expires, free bucket.

8
Example (N4)
9
(Timestamps are easy)
Cyclic counter mod N.
N15
10
What does the histogram buy us?
  • Active bucket ? Contains an active 1.
  • Only the last bucket might contain expired 1s.

11
Estimating number of 1s
  • Conclusion
  • T sum of all bucket sizes but last.
  • So there are at least T 1s.
  • C size of last bucket.
  • Actual of 1s can be anything from 1 to C.

12
Absolute ?Relative
13
Bounding the error
  • Goal Relative error at most ?1/k.

14
How can we do that?(With as few buckets as
possible?)
  • Non-decreasing bucket sizes.
  • Bucket sizes constrained to
  • At most buckets of each size.
  • For all sizes but that of last bucket, at least
    buckets of each size.

15
New 1 create bucket
Check if invariant violated.
Too many buckets merge
16
Why it works (correctness)
  • If there are at least
  • buckets of sizes

17
Why it works (space)
  • Can account for all 1s with just


18
Space usage
of buckets
Bucket size
T counter for estimation
19
Operations
  • Estimation O(1)
  • Insertion Cascading makes it
  • worst case.
  • But only O(1) amortized!

20
Plan
  • Basic Counting
  • Given a bit stream, maintain at every time
    instant the count of 1s in the last N elements.
  • Sum
  • Given an integer stream, maintain the sum of the
    last N elements.
  • Everything else

21
Extending to Sum
  • Integers in range 0, R.
  • On value V, insert V 1s.
  • Timestamps
  • Bucket counter
  • of buckets
  • Total space

Insertion takes ?(R)!
22
Reducing insertion time
  • If we had a way to rebuild the entire histogram
  • We could buffer new values
  • And rebuild histogram when buffer reaches size B.
  • If it takes ,
    amortized is

23
k/2 canonical representation
Would it really? Is this representation unique?
  • The k/2 canonical representation of S

If S is the total size of the buckets, computing
its k/2 canonical representation would help us
rebuild the histogram.
24
Total time required is O(log S).
01
j2
5
25
If a value gets unindexed, it will never be
indexed in the future.
8 6 4 3 2 1
9 7 5 4 3 2
10 8 6 5 4 3
Calculate S1S2 representation
10 6 2 1 1 1 1
26
Plan
  • Lower Bounds
  • More about timestamps.
  • Applications.
  • More problems
  • Basic Counting
  • Given a bit stream, maintain at every time
    instant the count of 1s in the last N elements.
  • Sum
  • Given an integer stream, maintain the sum of the
    last N elements.
  • Everything else

27
Lower bounds
  • Lower Bounds
  • More about timestamps.
  • Applications.
  • More problems
  • Basic Counting and Sum algorithms are optimal.
  • Similar techniques will show that lots of other
    problems are intractable. (Later.)

28
Basic Counting bound
N
29
Big block d
Same idea works for Sum.
30
Randomized bound
Lower bound applies to randomized algorithms.
  • Yao minimax principle
  • Expected space complexity of optimal algorithm
    for an input distribution is a lower bound on
    expected space complexity of randomized algorithm.

31
Timestamps
  • Lower Bounds
  • More about timestamps.
  • Applications.
  • More problems

If much less than N items can arrive during the
window, memory usage is reduced.
  • Define window based on real time equate
    timestamp with clock.
  • No work needs to be done when items dont arrive,
    so deletions can be deferred.

32
Applications
  • Lower Bounds
  • More about timestamps.
  • Applications.
  • More problems
  • Adapting algorithms to the sliding window model
    using EH to replace counters.
  • Counters require bits, EH takes
    .
  • Also factor loss in accuracy.

33
More Problems
  • Lower Bounds
  • More about timestamps.
  • Applications.
  • More problems
  • Min/Max
  • Storing subsequence of (say) mins is optimal.
  • Distinct values
  • Basic Counting reduces to it.

34
Other Problems
  • Distinct values with deletions.
  • Factor 2 estimation requires ?(N) space.
  • Map 1s in a bit string to distinct values. Pad
    with zeros to infer value of last bit, then use
    deletion to cancel that bit.
  • Repeat.

35
Other Problems
  • Sum with negative integers.
  • Factor 2 estimation requires ?(N) space.
  • Maps 1s in bit string to (-1,1) and 0s to (1,-1).
  • Pad with 0s and query at odd time instants.

36
END
Write a Comment
User Comments (0)
About PowerShow.com