Variance Estimation over Sliding Windows

About This Presentation

Title:

Variance Estimation over Sliding Windows

Description:

Invariant: Any adjacent bucket pair except B2,1 within right-half window W1 has ... C (C 1) pairs of adjacent buckets in merging step. Worst Case Time is ... – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 17

Provided by: publicI

Category:

more less

Transcript and Presenter's Notes

Title: Variance Estimation over Sliding Windows

1
Variance Estimation over Sliding Windows
Linfeng Zhang and Yong Guan Department of
Electrical and Computer Engineering Information
Assurance Center Iowa State University Ames,
Iowa, USA
June 13, 2007, Beijing, China
2
Outline

Motivation
Problem Definition
Related Work
Our Algorithm
Contribution
Optimal in space requirement.
Optimal in worst case running time.
Summary Future Work

3
Motivation

Advanced Attack Traceback Project
Goal
Trace origin of the attack through the Internet.
Capture the statistics of large network data.
Challenging Issues
Huge and continuous data vs. Limited Memory
Only one pass to process data
Monitor/Detect anomaly of the network data.

4
Motivation (cont.)

Variance Estimation over Sliding Windows
Variance is often related to anomaly and status
change.
Our Approach achieves
Optimal space requirement
Optimal worst case running time
Applications of Our Approach
Network monitoring
Intrusion detection
Financial analysis
Weather forecast
Disaster forecast

5
Problem Definition

Sliding Window Model
First proposed by Datar, Gionis, Indyk and
Motwani
Example Window Size N 8

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

16
0
1
0
0
1
1
1
0
1
1
0
0
0
1
0
1

FULL
6
Problem Definition (cont.)

Problem Definition
Maintain ?-approximate variance of an integer
stream over sliding windows with size N in one
pass.
Variance ( a
series of N integers)
( mean)
?-Approximation (
Variance Estimation)

7
EH Algorithm

Exponential Histograms (EH)
Datar, Gionis, Indyk and Motwani (SODA 2002)
Bit Counting
Space Requirement
How EH works
Example

1
1
1
1
2
1
1
2
1
1
2
1
1
2
4
1
1
0
1
0
1
1
1
0
1
1
1
1
.
.
.
8
EH Algorithm (cont.)

Can apply to any function f satisfying
properties
1. f(X) 0.
2. f(X) poly(X).
3. f(XUY) f(X) f(Y).
4. f(XUY) C(f(X) f(Y)), where constant C1.
However, variance does not satisfy the last
property.
Example

X
µX
µXUY
VXUY
µY
VX
VY
Y
9
BDMO Algorithm

Babcock, Datar, Motwani and OCallaghan (PODS
2003)
Keep each new element in a single bucket.
Each bucket Bi maintains three variables (ni, µi,
Vi)
ni Number of elements in the bucket
µi Mean of elements in the bucket
Vi Variance of elements in the bucket
Merge adjacent buckets if
Summary information of the combination of two
buckets

10
BDMO Algorithm (cont.)

Space
However optimal bound is
Running Time
Amortized
Worst Case

Open Problem
11
Our Algorithm
xt

Step 1 Insert New Element xt
Create a new bucket B1 for xt with (n1, µ1, V1)
(1, xt, 0)
Step 2 Delete Expired Bucket
Step 3 Merge Adjacent Buckets
Rule 1
Rule 2
Rule 3

(Oldest)
(Newest)
12
Correctness

Why can such a merging rule set bound error to
O(?)?

Rule 1 guarantees
Case 1 µC is close to µB
Rule 12 guarantee
Case 2 µC is far away from µB

O(1)
13
Space Requirement

Invariant Any adjacent bucket pair except B2,1
within right-half window W1 has either Property 1
or 2.
Property 1
Variance doubles for each 5/? bucket pairs.
Property 2
Size doubles for each 10/? bucket pairs.
Space Requirement

Optimal!
14
Worst Case Running Time