CS 361A Advanced Data Structures and Algorithms - PowerPoint PPT Presentation

Loading...

PPT – CS 361A Advanced Data Structures and Algorithms PowerPoint presentation | free to download - id: a9a73-ODU4Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

CS 361A Advanced Data Structures and Algorithms

Description:

Synopsis Data Structures. Sampling Techniques. Frequency Moments Problem ... Synopsis Data Structures 'Lossy' ... Synopsis Size 420 KB (0.1 ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 73
Provided by: RajeevM2
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CS 361A Advanced Data Structures and Algorithms


1
CS 361A (Advanced Data Structures and Algorithms)
  • Lectures 16 17 (Nov
    16 and 28, 2005)
  • Synopses, Samples, and Sketches
  • Rajeev Motwani

2
Game Plan for Week
  • Last Class
  • Models for Streaming/Massive Data Sets
  • Negative results for Exact Distinct Values
  • Hashing for Approximate Distinct Values
  • Today
  • Synopsis Data Structures
  • Sampling Techniques
  • Frequency Moments Problem
  • Sketching Techniques
  • Finding High-Frequency Items

3
Synopsis Data Structures
  • Synopses
  • Webster a condensed statement or outline (as of
    a narrative or treatise)
  • CS 361A succinct data structure that lets us
    answers queries efficiently
  • Synopsis Data Structures
  • Lossy Summary (of a data stream)
  • Advantages fits in memory easy to communicate
  • Disadvantage lossiness implies approximation
    error
  • Negative Results ? best we can do
  • Key Techniques randomization and hashing

4
Numerical Examples
  • Approximate Query Processing AQUA/Bell Labs
  • Database Size 420 MB
  • Synopsis Size 420 KB (0.1)
  • Approximation Error within 10
  • Running Time 0.3 of time for exact query
  • Histograms/Quantiles Chaudhuri-Motwani-Narasayya,
    Manku-Rajagopalan-Lindsay, Khanna-Greenwald
  • Data Size 109 items
  • Synopsis Size 1249 items
  • Approximation Error within 1

5
Synopses
  • Desidarata
  • Small Memory Footprint
  • Quick Update and Query
  • Provable, low-error guarantees
  • Composable for distributed scenario
  • Applicability?
  • General-purpose e.g. random samples
  • Specific-purpose e.g. distinct values estimator
  • Granularity?
  • Per database e.g. sample of entire table
  • Per distinct value e.g. customer profiles
  • Structural e.g. GROUP-BY or JOIN result samples

6
Examples of Synopses
  • Synopses need not be fancy!
  • Simple Aggregates e.g. mean/median/max/min
  • Variance?
  • Random Samples
  • Aggregates on small samples represent entire data
  • Leverage extensive work on confidence intervals
  • Random Sketches
  • structured samples
  • Tracking High-Frequency Items

7
Random Samples
8
Types of Samples
  • Oblivious sampling at item level
  • Limitations Bar-YossefKumarSivakumar STOC 01
  • Value-based sampling e.g. distinct-value
    samples
  • Structured samples e.g. join sampling
  • Naïve approach keep samples of each relation
  • Problem sample-of-join join-of-samples
  • Foreign-Key Join Chaudhuri-Motwani-Narasayya
    SIGMOD 99

A A B B
A B
what if A sampled from L and B from R?
L
R
9
Basic Scenario
  • Goal maintain uniform sample of item-stream
  • Sampling Semantics?
  • Coin flip
  • select each item with probability p
  • easy to maintain
  • undesirable sample size is unbounded
  • Fixed-size sample without replacement
  • Our focus today
  • Fixed-size sample with replacement
  • Show can generate from previous sample
  • Non-Uniform Samples Chaudhuri-Motwani-Narasayya

10
Reservoir Sampling Vitter
  • Input stream of items X1 , X2, X3, …
  • Goal maintain uniform random sample S of size n
    (without replacement) of stream so far
  • Reservoir Sampling
  • Initialize include first n elements in S
  • Upon seeing item Xt
  • Add Xt to S with probability n/t
  • If added, evict random previous item

11
Analysis
  • Correctness?
  • Fact At each instant, S n
  • Theorem At time t, any XieS with probability n/t
  • Exercise prove via induction on t
  • Efficiency?
  • Let N be stream size
  • Remark Verify this is optimal.
  • Naïve implementation ? N coin flips ? time O(N)

12
Improving Efficiency
J94
J32
items inserted into sample S (where n3)
  • Random variable Jt number jumped over after
    time t
  • Idea generate Jt and skip that many items
  • Cumulative Distribution Function F(s) PJt
    s, for tgtn s0

13
Analysis
  • Number of calls to RANDOM()?
  • one per insertion into sample
  • this is optimal!
  • Generating Jt?
  • Pick random number U e 0,1
  • Find smallest j such that U F(j)
  • How?
  • Linear scan ? O(N) time
  • Binary search with Newtons interpolation ?
    O(n2(1 polylog N/n)) time
  • Remark see paper for optimal algorithm

14
Sampling over Sliding Windows Babcock-Datar-Motwa
ni
  • Sliding Window W last w items in stream
  • Model item Xt expires at time tw
  • Why?
  • Applications may require ignoring stale data
  • Type of approximation
  • Only way to define JOIN over streams
  • Goal Maintain uniform sample of size n of
    sliding window

15
Reservoir Sampling?
  • Observe
  • any item in sample S will expire eventually
  • must replace with random item of current window
  • Problem
  • no access to items in W-S
  • storing entire window requires O(w) memory
  • Oversampling
  • Backing sample B select each item with
    probability
  • sample S select n items from B at random
  • upon expiry in S ? replenish from B
  • Claim n lt B lt n log w with high probability

16
Index-Set Approach
  • Pick random index set I i1, … , in , X?0,1,
    … , w-1
  • Sample S items Xi with i e i1, … , in (mod w)
    in current window
  • Example
  • Suppose w2, n1, and I1
  • Then sample is always Xi with odd i
  • Memory only O(k)
  • Observe
  • S is uniform random sample of each window
  • But sample is periodic (union of arithmetic
    progressions)
  • Correlation across successive windows
  • Problems
  • Correlation may hurt in some applications
  • Some data (e.g. time-series) may be periodic

17
Chain-Sample Algorithm
  • Idea
  • Fix expiry problem in Reservoir Sampling
  • Advance planning for expiry of sampled items
  • Focus on sample size 1 keep n independent such
    samples
  • Chain-Sampling
  • Add Xt to S with probability 1/mint,w evict
    earlier sample
  • Initially standard Reservoir Sampling up to
    time w
  • Pre-select Xts replacement Xr e Wtw Xt1, …,
    Xtw
  • Xt expires ? must replace from Wtw
  • At time r, save Xr and pre-select its own
    replacement ? building chain of potential
    replacements
  • Note if evicting earlier sample, discard its
    chain as well

18
Example
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
19
Expectation for Chain-Sample
  • T(x) Echain length for Xt at time tx
  • Echain length T(w) ? e ? 2.718
  • Ememory required for sample size n O(n)

20
Tail Bound for Chain-Sample
  • Chain hops of total length at most w
  • Chain of h hops ? ordered (h1)-partition of w
  • h hops of total length less than w
  • plus, remainder
  • Each partition has probability w-h
  • Number of partitions
  • h O(log w) ? probability of a partition is
    O(w-c)
  • Thus memory O(n log w) with high probability

21
Comparison of Algorithms
  • Chain-Sample beats Oversample
  • Expected memory O(n) vs O(n log w)
  • High-probability memory bound both O(n log w)
  • Oversample may have sample size shrink below n!

22
Sketches and Frequency Moments
23
Generalized Stream Model
  • Input Element (i,a)
  • a copies of domain-value i
  • increment to ith dimension of m by a
  • a need not be an integer
  • Negative value captures deletions

24
Example
On seeing element (i,a) (1,-1)
4
1
1
1
1
m0 m1 m2 m3 m4
25
Frequency Moments
  • Input Stream
  • values from U 0,1,…,N-1
  • frequency vector m (m0,m1,…,mN-1)
  • Kth Frequency Moment Fk(m) Si mik
  • F0 number of distinct values (Lecture 15)
  • F1 stream size
  • F2 Gini index, self-join size, Euclidean norm
  • Fk for kgt2, measures skew, sometimes useful
  • F8 maximum frequency
  • Problem estimation in small space
  • Sketches randomized estimators

26
Naive Approaches
  • Space N counter mi for each distinct value i
  • Space O(1)
  • if input sorted by i
  • single counter recycled when new i value appears
  • Goal
  • Allow arbitrary input
  • Use small (logarithmic) space
  • Settle for randomization/approximation

27
Sketching F2
  • Random Hash h(i) 0,1,…,N-1 ? -1,1
  • Define Zi h(i)
  • Maintain X Si miZi
  • Easy for update streams (i,a) just add aZi to X
  • Claim X2 is unbiased estimator for F2
  • Proof EX2 E(Si miZi)2
  • ESi mi2Zi2
    ESi,jmimjZiZj
  • Si mi2EZi2
    Si,jmimjEZiEZj
  • Si mi2 0 F2
  • Last Line? Zi2 1 and EZi 0 as
    uniform-1,1

from independence
28
Estimation Error?
  • Chebyshev bound
  • Define Y X2 ? EY EX2 Si mi2 F2
  • Observe EX4 E(SmiZi)4

  • ESmi4Zi44ESmimj3ZiZj36ESmi2mj2Zi2Zj2

  • 12ESmimjmk2ZiZjZk224ESmimjmkmlZiZjZkZl
  • Smi4 6Smi2mj2
  • By definition VarY EY2 EY2 EX4
    EX22

  • Smi46Smi2mj2 Smi42Smi2mj2
  • 4Smi2mj2
    2EX22 2F22

Why?
29
Estimation Error?
  • Chebyshev bound
  • P relative estimation error gt?
  • Problem What if we want ? really small?
  • Solution
  • Compute s 8/?2 independent copies of X
  • Estimator Y mean(Xi2)
  • Variance reduces by factor s
  • P relative estimation error gt?

30
Boosting Technique
  • Algorithm A Randomized ?-approximate estimator f
  • P(1- ?)f f (1 ?)f
    3/4
  • Heavy Tail Problem Pfz, f, fz 1/16,
    3/4, 3/16
  • Boosting Idea
  • O(log1/e) independent estimates from A(X)
  • Return median of estimates
  • Claim Pmedian is ?-approximate gt1- e
    Proof
  • Pspecific estimate is ?-approximate ¾
  • Bad event only if gt50 estimates not
    ?-approximate
  • Binomial tail probability less than e

31
Overall Space Requirement
  • Observe
  • Let m Smi
  • Each hash needs O(log m)-bit counter
  • s 8/?2 hash functions for each estimator
  • O(log 1/e) such estimators
  • Total O(?-2 log 1/e log m) bits
  • Question Space for storing hash function?

32
Sketching Paradigm
  • Random Sketch inner product
  • frequency vector m (m0,m1,…,mN-1)
  • random vector Z (currently, uniform -1,1)
  • Observe
  • Linearity ? Sketch(m1) Sketch(m2) Sketch
    (m1 m2)
  • Ideal for distributed computing
  • Observe
  • Suppose Given i, can efficiently generate Zi
  • Then can maintain sketch for update streams
  • Problem
  • Must generate Zih(i) on first appearance of i
  • Need O(N) memory to store h explicitly
  • Need O(N) random bits

33
Two birds, One stone
  • Pairwise Independent Z1,Z2, …, Zn
  • for all Zi and Zk, PZix, Zky
    PZix.PZky
  • property EZiZk EZi.EZk
  • Example linear hash function
  • Seed Slta,bgt from 0..p-1, where p is prime
  • Zi h(i) aib (mod p)
  • Claim Z1,Z2, …, Zn are pairwise independent
  • Zix and Zky ?? xaib (mod p) and yakb (mod
    p)
  • fixing i, k, x, y ? unique solution for a, b
  • PZix, Zky 1/ p2 PZix.PZky
  • Memory/Randomness n log p ? 2 log p

34
Wait a minute!
  • Doesnt pairwise independence screw up proofs?
  • No EX2 calculation only has degree-2 terms
  • But what about VarX2?
  • Need 4-wise independence

35
Application Join-Size Estimation
  • Given
  • Join attribute frequencies f1 and f2
  • Join size f1.f2
  • Define X1 f1.Z and X2 f2.Z
  • Choose Z as 4-wise independent uniform -1,1
  • Exercise Show, as before,
  • EX1 X2 f1.f2
  • VarX1 X2 2 (f1.f2)2
  • Hint a.b a.b

36
Bounding Error Probability
  • Using s copies of Xs taking their mean Y
  • Pr Y- f1.f2 ? f1.f2 Var(Y) /
    ?2(f1.f2)2

  • 2f12f22 / s?2(f1.f2)2
  • 2 /
    s?2cos2 ?
  • Bounding error probability?
  • Need s gt 2/?2cos2?
  • Memory? O( log 1/e cos-2? ?-2 (log N log
    m))
  • Problem
  • To choose s need a-priori lower bound on cos ?
    f1.f2
  • What if cos ? really small?

37
Sketch Partitioning
Idea for dealing with f12f22/(f1.f2)2 issue --
partition domain into regions where self-join
size is smaller to compensate small join-size
(cos ?)
self-join(R1.A)self-join(R2.B) 205205 42K
self-join(R1.A)self-join(R2.B)
self-join(R1.A)self-join(R2.B) 2005 2005
2K
38
Sketch Partitioning
  • Idea
  • intelligently partition join-attribute space
  • need coarse statistics on stream
  • build independent sketches for each partition
  • Estimate S partition sketches
  • Variance S partition variances

39
Sketch Partitioning
  • Partition Space Allocation?
  • Can solve optimally, given domain partition
  • Optimal Partition Find K-partition to minimize
  • Results
  • Dynamic Programming optimal solution for single
    join
  • NP-hard for queries with multiple joins

40
Fk for k gt 2
  • Assume stream length m is known (Exercise
    Show can fix with log m space overhead by
    repeated-doubling estimate of m.)
  • Choose random stream item ap ? p
    uniform from 1,2,…,m
  • Suppose ap v e 0,1,…,N-1
  • Count subsequent frequency of v
  • r q qp, aqv
  • Define X m(rk (r-1)k)

41
Example
  • Stream
  • 7,8,5,1,7,5,2,1,5,4,5,10,6,5,4,1,4,7,3,8
  • m 20
  • p 9
  • ap 5
  • r 3

42
Fk for k gt 2
  • Var(X) kN1 1/k Fk2
  • Bounded Error Probability ? s O(kN1 1/k / ?2)
  • Boosting ? memory bound
  • O(kn1 1/k ?-2 (log 1/e)(log N
    log m))

Summing over m choices of stream elements
43
Frequency Moments
  • F0 distinct values problem (Lecture 15)
  • F1 sequence length
  • for case with deletions, use Cauchy distribution
  • F2 self-join size/Gini index (Today)
  • Fk for k gt2
  • omitting grungy details
  • can achieve space bound
  • O(kN1 1/k ?-2 (log 1/e)(log n log m))
  • F8 maximum frequency

44
Communication Complexity
  • Cooperatively compute function f(A,B)
  • Minimize bits communicated
  • Unbounded computational power
  • Communication Complexity C(f) bits exchanged by
    optimal protocol ?
  • Protocols?
  • 1-way versus 2-way
  • deterministic versus randomized
  • Cd(f) randomized complexity for error
    probability d

ALICE input A
BOB input B
45
Streaming Communication Complexity
  • Stream Algorithm ?1-way communication protocol
  • Simulation Argument
  • Given algorithm S computing f over streams
  • Alice initiates S, providing A as input stream
    prefix
  • Communicates to Bob Ss state after seeing A
  • Bob resumes S, providing B as input stream
    suffix
  • Theorem Stream algorithms space requirement is
    at least the communication complexity C(f)

46
Example Set Disjointness
  • Set Disjointness (DIS)
  • A, B subsets of 1,2,…,N
  • Output
  • Theorem Cd(DIS) O(N), for any dlt1/2

47
Lower Bound for F8
  • Theorem Fix elt1/3, dlt1/2. Any stream algorithm S
    with
  • P (1-e)F8 lt S lt (1e)F8 gt 1-d
  • needs O(N) space
  • Proof
  • Claim S ? 1-way protocol for DIS (on any sets A
    and B)
  • Alice streams set A to S
  • Communicates Ss state to Bob
  • Bob streams set B to S
  • Observe
  • Relative Error elt1/3 ? DIS solved exactly!
  • Perror lt½ lt d ? O(N) space

48
Extensions
  • Observe
  • Used only 1-way communication in proof
  • Cd(DIS) bound was for arbitrary communication
  • Exercise extend lower bound to multi-pass
    algorithms
  • Lower Bound for Fk, kgt2
  • Need to increase gap beyond 2
  • Multiparty Set Disjointness t players
  • Theorem Fix e,dlt½ and k gt 5. Any stream
    algorithm S with
  • P (1-e)Fk lt S lt (1e)Fk gt 1-d
  • needs O(N1-(2 d)/k) space
  • Implies O(N1/2) even for multi-pass algorithms

49
Tracking High-Frequency Items
50
Problem 1 Top-K List Charikar-Chen-Farach-Colto
n
  • The Google Problem
  • Return list of k most frequent items in stream
  • Motivation
  • search engine queries, network traffic, …
  • Remember
  • Saw lower bound recently!
  • Solution
  • Data structure Count-Sketch ? maintaining
    count-estimates of high-frequency elements

51
Definitions
  • Notation
  • Assume 1, 2, …, N in order of frequency
  • mi is frequency of ith most frequent element
  • m Smi is number of elements in stream
  • FindCandidateTop
  • Input stream S, int k, int p
  • Output list of p elements containing top k
  • Naive sampling gives solution with p ?(m log k
    / mk)
  • FindApproxTop
  • Input stream S, int k, real ?
  • Output list of k elements, each of frequency mi
    gt (1-?) mk
  • Naive sampling gives no solution

52
Main Idea
  • Consider
  • single counter X
  • hash function h(i) 1, 2,…,N ? -1,1
  • Input element i ? update counter X Zi h(i)
  • For each r, use XZr as estimator of mr
  • Theorem EXZr mr

    Proof
  • X Si miZi
  • EXZr ESi miZiZr Si miEZi Zr mrEZr2
    mr
  • Cross-terms cancel

53
Finding Max Frequency Element
  • Problem varX F2 Si mi2
  • Idea t counters, independent 4-wise hashes
    h1,…,ht
  • Use t O(log m ? mi2 / (?m1)2)
  • Claim New Variance lt ? mi2 / t (?m1)2 / log m
  • Overall Estimator
  • repeat median of averages
  • with high probability, approximate m1

h1 i? 1, 1
ht i? 1, 1
54
Problem with Array of Counters
  • Variance dominated by highest frequency
  • Estimates for less-frequent elements like k
  • corrupted by higher frequencies
  • variance gtgt mk
  • Avoiding Collisions?
  • spread out high frequency elements
  • replace each counter with hashtable of b counters

55
Count Sketch
  • Hash Functions
  • 4-wise independent hashes h1,...,ht and s1,…,st
  • hashes independent of each other
  • Data structure hashtables of counters X(r,c)

1 2 … b
56
Overall Algorithm
  • sr(i) one of b counters in rth hashtable
  • Input i ? for each r, update X(r,sr(i)) hr(i)
  • Estimator(mi) medianr X(r,sr(i)) hr(i)
  • Maintain heap of k top elements seen so far
  • Observe
  • Not completely eliminated collision with high
    frequency items
  • Few of estimates X(r,sr(i)) hr(i) could have
    high variance
  • Median not sensitive to these poor estimates

57
Avoiding Large Items
  • b gt O(k) ? with probability O(1), no collision
    with top-k elements
  • t hashtables represent independent trials
  • Need log m/? trials to estimate with probability
    1-?
  • Also need small variance for colliding small
    elements
  • Claim
  • Pvariance due to small items in each estimate lt
    (?igtk mi2)/b O(1)
  • Final bound b O(k ?igtk mi2 / (?mk)2)

58
Final Results
  • Zipfian Distribution mi ? 1/i? Power Law
  • FindApproxTop
  • k (?igtkmi2) / (?mk)2 log m/?
  • Roughly sampling bound with frequencies squared
  • Zipfian gives improved results
  • FindCandidateTop
  • Zipf parameter 0.5
  • O(k log N log m)
  • Compare sampling bound O((kN)0.5 log k)

59
Problem 2 Elephants-and-Ants Manku-Motwani
Stream
  • Identify items whose current frequency exceeds
    support threshold s 0.1.
  • Jacobson 2000, Estan-Verghese 2001

60
Algorithm 1 Lossy Counting
Step 1 Divide the stream into windows
Window-size w is function of support s specify
later…
61
Lossy Counting in Action ...
Empty
62
Lossy Counting (continued)
63
Error Analysis
How much do we undercount?
If current size of stream N and
window-size w
1/e then
windows eN
frequency error ?
Rule of thumb Set e 10 of support
s Example Given support frequency s
1, set error frequency e 0.1
64
Putting it all together…
Output Elements with counter values exceeding
(s-e)N
Approximation guarantees Frequencies
underestimated by at most eN No false
negatives False positives have true
frequency at least (se)N
  • How many counters do we need?
  • Worst case bound 1/e log eN counters
  • Implementation details…

65
Number of Counters?
  • Window size w 1/?
  • Number of windows m ?N
  • ni counters alive over last i windows
  • Fact
  • Claim
  • Counter must average 1 increment/window to
    survive
  • active counters

66
Enhancements
Frequency Errors For counter (X, c),
true frequency in c, ceN
Trick Track number of windows t counter
has been active For counter (X,
c, t), true frequency in c, ct-1
If (t 1), no error!
Batch Processing Decrements after k
windows
67
Algorithm 2 Sticky Sampling
? Create counters by sampling ? Maintain exact
counts thereafter
What is sampling rate?
68
Sticky Sampling (continued)
For finite stream of length N Sampling rate
2/eN log 1/?s
? probability of failure
Output Elements with counter values exceeding
(s-e)N
Same Rule of thumb Set e 10 of support
s Example Given support threshold s 1,
set error threshold e 0.1 set
failure probability ? 0.01
69
Number of counters?
Finite stream of length N Sampling rate 2/eN
log 1/?s
Infinite stream with unknown N Gradually adjust
sampling rate
In either case, Expected number of counters
2/? log 1/?s
70
References Synopses
  • Synopsis data structures for massive data sets.
    Gibbons and Matias, DIMACS 1999.
  • Tracking Join and Self-Join Sizes in Limited
    Storage, Alon, Gibbons, Matias, and Szegedy. PODS
    1999.
  • Join Synopses for Approximate Query Answering,
    Acharya, Gibbons, Poosala, and Ramaswamy.  SIGMOD
    1999.
  • Random Sampling for Histogram Construction How
    much is enough? Chaudhuri, Motwani, and
    Narasayya. SIGMOD 1998.
  • Random Sampling Techniques for Space Efficient
    Online Computation of Order Statistics of Large
    Datasets, Manku, Rajagopalan, and Lindsay. SIGMOD
    1999.
  • Space-efficient online computation of quantile
    summaries, Greenwald and Khanna. SIGMOD 2001.

71
References Sampling
  • Random Sampling with a Reservoir, Vitter.
    Transactions on Mathematical Software 11(1)37-57
    (1985).
  • On Sampling and Relational Operators. Chaudhuri
    and Motwani. Bulletin of the Technical Committee
    on Data Engineering (1999).
  • On Random Sampling over Joins. Chaudhuri,
    Motwani, and Narasayya. SIGMOD 1999.
  • Congressional Samples for Approximate Answering
    of Group-By Queries, Acharya, Gibbons, and
    Poosala. SIGMOD 2000.
  • Overcoming Limitations of Sampling for
    Aggregation Queries, Chaudhuri, Das, Datar,
    Motwani and Narasayya. ICDE 2001.
  • A Robust Optimization-Based Approach for
    Approximate Answering of Aggregate Queries,
    Chaudhuri, Das and Narasayya. SIGMOD 01.
  • Sampling From a Moving Window Over Streaming
    Data. Babcock, Datar, and Motwani. SODA 2002.
  • Sampling algorithms lower bounds and
    applications. Bar-YossefKumarSivakumar. STOC
    2001.

72
References Sketches
  • Probabilistic counting algorithms for data base
    applications. Flajolet and Martin. JCSS (1985).
  • The space complexity of approximating the
    frequency moments. Alon, Matias, and Szegedy.
    STOC 1996.
  • Approximate Frequency Counts over Streaming Data.
    Manku and Motwani. VLDB 2002.
  • Finding Frequent Items in Data Streams. Charikar,
    Chen, and Farach-Colton. ICALP 2002.
  • An Approximate L1-Difference Algorithm for
    Massive Data Streams. Feigenbaum, Kannan,
    Strauss, and Viswanathan. FOCS 1999.
  • Stable Distributions, Pseudorandom Generators,
    Embeddings and Data Stream Computation. Indyk.
    FOCS  2000.
About PowerShow.com