Title: Monitoring Streams -- A New Class of Data Management Applications
1Monitoring Streams -- A New Class of Data
Management Applications
Don Carney Brown University Ugur
Çetintemel Brown University Mitch Cherniack
Brandeis University Christian Convey Brown
University Sangdon Lee Brown University Greg
Seidman Brown University Michael Stonebraker
MIT Nesime Tatbul Brown University Stan
Zdonik Brown University
2Example Stream Applications
- Critical Care
- Streams of Vital Sign Measurements
- Physical Plant Monitoring
- Streams of Environmental Readings
- Market Analysis
- Streams of Stock Exchange Data
- Biological Population Tracking
- Streams of Positions from Individuals of a Species
3Not Your Average DBMS
- External, Autonomous Data Sources
- Querying Time-Series
- Triggers-in-the-large
- Real-time response requirements
- Approximate Query Results
4Aurora At-A-Glance
- Stream Query Processing System
- 3 Schools, 5 Faculty, 11 Grad Students, Several
Ugrads - Features
- Designed for Scalablility 106 stream inputs,
queries - QoS-Driven Resource Management
- Continuous and Historical Queries
- Stream Storage Management
- Implemented Prototype Demo Submission, Fall 02
- This paper
- System Overview Architecture and High-Level
Strategies
5Talk Outline
- Introduction
- 2. Aurora Overview
- Runtime Operation
- Adaptivity
- 5. Related Work and Conclusions
6Aurora from 100,000 Feet
Query
. . .
. . .
. . .
Query
. . .
. . .
. . .
. . .
Query
7Aurora from 100 Feet
Slide
s
s
. . .
. . .
s
s
m
. . .
. . .
. . .
È
m
Tumble
s
m
- Queries Workflow (Boxes and Arcs)
- Workflow Diagram Aurora Network
- Boxes Query Operators
- Arcs Streams
- Query Operators (Boxes)
- Simple FILTER, MAP, RESTREAM
- Binary UNION, JOIN, RESAMPLE
- Windowed TUMBLE, SLIDE, XSECTION, WSORT
- Streams (Arcs)
- stream tuple sequence from common source
- (e.g., sensor)
- tuples timestamped on arrival (Internal use QoS)
8Aurora in Action
Slide
s
s
s
s
s
s
. . .
. . .
s
s
s
s
s
s
s
App
s
m
s
s
s
m
m
s
. . .
. . .
. . .
È
È
È
È
È
È
È
m
m
m
App
Tumble
Tumble
Tumble
s
m
s
s
m
s
m
s
Arcs Tuple Queues
Box-at-a-time Scheduling
Outputs Monitored for QoS
9Continuous and Historical Queries
1 Hour
Connection Point
10Quality-of-Service (QoS)
B
C
A
Tuples Delivered
Output Value
Delay
- Specifies Utility Of Imperfect Query Results
- Delay-Based (specify utility of late results)
- Delivery-Based, Value-Based (specify utility of
partial results) - QoS Influences
- Scheduling, Storage Management, Load Shedding
11Talk Outline
- Introduction
- 2. Aurora Overview
- 3. Runtime Operation
- 4. Adaptivity
- 5. Related Work and Conclusions
12Runtime OperationBasic Architecture
Router
Scheduler
Box Processors
QOS Monitor
13Runtime OperationScheduling Maximize Overall
QoS
Delay 2 sec Utility 0.5
A Cost 1 sec
(, age 1 sec)
Delay 5 sec Utility 0.8
B Cost 2 sec
Choice 2
(, age 3 sec)
Schedule Box A now rather than later Ideal
Maximize Overall Utility Presently exploring
scalable heuristics (e.g., feedback-based)
14Runtime OperationScheduling Minimizing Per
Tuple Processing Overhead
B
A
A (x)
A (y)
A (z)
B (A (x))
B (A (y))
B (A (z))
Default Operation Context Switch
15Runtime OperationStorage Management
- Run-time Queue Management
- Prefetch Queues Prior to Being Scheduled
- Drop Tuples from Queues to Improve QoS
- 2. Connection Point Management
- Support Efficient (Pull-Based) Access to
Historical Data - E.g., indexing, sorting, clustering,
16Talk Outline
- Introduction
- 2. Aurora Overview
- 3. Runtime Operation
- 4. Adaptivity
- 5. Related Work and Conclusions
17AdaptivityQuery Optimization
- Compile-time, Global Optimization Infeasible
- Too Many Boxes
- Too Much Volatility in Network, Data
Dynamic, Local Optimization
1. Identify Subnetwork
2. Buffer Inputs
3. Drain Subnetwork
4. Optimize Subnetwork
5. Turn on Taps
18AdaptivityLoad Shedding
- 1. Two Load Shedding Techniques
- Random Tuple Drops
- Add DROP box to network (DROP a special case of
FILTER) - Position to affect queries w/ tolerant
delivery-based QoS reqts - Semantic Load Shedding
- FILTER values with low utility (acc to
value-based QoS) - 2. Triggered by QoS Monitor
- e.g., after Latency Analysis reveals certain
applications are continuously receiving poor QoS
19AdaptivityDetecting Overload
Cost c Selectivity s
Input rate r
1/c gt r Þ Problem
Latency Analysis
20Talk Outline
- Introduction
- 2. Aurora Overview
- 3. Runtime Operation
- 4. Adaptivity
- 5. Related Work and Conclusions
21Related Work
- Stream Processing Systems
- Niagara CDTY00, STREAM BW01, Tribeca SH98
- Telegraph MF02, MSHR02
- Adaptive Query Processing
- Eddies AH00, Tukwila IFFLW99, Query
Scrambling AFTU96 - Multiple Query Optimization
- SG90, RC88
- Approximate Query Answering
- Online Aggregation HHW97, AQUA AGP99
- Active Databases
- PD99, SPAM91, HC99
- Continuous Queries
- Tapestry TGNO92, OpenCQ LPT99, Chronicle
JMS95
22Conclusions
- Aurora Stream Query Processing System
- Designed for Scalability
- QoS-Driven Resource Management
- Continuous and Historical Queries
- Stream Storage Management
- Implemented Prototype
- Web site www.cs.brown.edu/research/aurora/
23ImplementationGUI
24ImplementationRuntime