Title: Dynamic Plan Migration for Continuous Queries over Data Streams
1Dynamic Plan Migration for Continuous Queries
over Data Streams
- Yali Zhu, Elke Rundensteiner and George Heineman
- Database System Research Group, WPI.
- Massachusetts, USA
- SIGMOD2004
Research partly supported by the RDC grant
2003-04 on On-line Stream Monitoring Systems
Untethered Healthcare, Intrusion Detection, and
Beyond.
2Stream Query Optimization
- Differences with Traditional Query Optimization?
3Stream Query Optimization
- More dynamic fluctuations in statistics
- ? compile time optimization not possible
- Global optimization not practical as huge query
networks - ? adaptive optimization.
- Need to take CPU processing and main memory into
account - ? other cost models
4Motivation of Query Migration
- Continuous queries over streams
- Statistics unknown before start
- Statistics changing during execution
- Stream rates, arrival pattern, distribution, etc
- Need for dynamic adaptation
- Plan re-optimization
- Change the shape of query plan tree
5Run-time Plan Re-Optimization
- Step 1 - Decide when to optimize
- Statistics Monitoring
- Step 2 Generate new query plan
- Query Optimization
- Step 3 Replace current plan by new plan
- Plan Migration
6Naïve Plan Migration Strategy
BC
AB
AB
BC
A
A
B
B
C
C
- Migration Steps
- Pause execution of old plan
- Drain out all tuples inside old plan
- Replace old plan by new plan
- Resume execution of new plan
Problem Works for stateless operators only
7Stateful Operator in CQ
- Why stateful
- Need non-blocking operators in CQ
- Operator needs to output partial results
- State data structure keeps received tuples
Example Symmetric NL join w/ window constraints
ax
b2
ax
b3
State A
State B
Observation The purge of tuples in states
relies typically on processing of new tuples.
AB
b1
b2
b3
b4
b5
ax
A
B
ax
8Naïve Migration Strategy Revisited
BC
AB
Deadlock Waiting Problem
A
B
C
(2) All tuples drained
- Steps
- (1) Pause execution of old plan
- (2) Drain out all tuples inside old plan
- (3) Replace old plan by new plan
- (4) Resume execution of new plan
(3) Old Replaced By new
(4) Processing Resumed
9Concept of Migration Boxes
- Two exchangable migration boxes
- One contains old plan or sub-plan
- One contains new plan or sub-plan
- Two plans are semantically equivalent
- Same input queues and output queues
- Migration abstracted as replacing old box by new
box.
10Problem Definition
- Dynamic Plan Migration
- Input (two migration boxes)
- One contains old plan
- One contains new plan
- Have same input and output queues
- Result
- Old box is replaced by new box
- Valid Migration
- No missing tuples
- No duplicates
- Key points
- - Involved plans contain stateful operators
- Need to migrate yet still retain useful states
- and discard useless states.
11State of the Art
- Efficient mid-query re-optimization of
sub-optimal query execution plans - Kabra, DeWitt 1998
- Only migrates unprocessed portion
- Query plan competing model
- Ioannidis, Ng, et. al. 1992 Graefe, Cole.
1994 - Generate several candidate query plans before
start - Execute all, choose one after a while
12Outline
- Problem Motivation and Definition
- Dynamic Migration Strategies
- Moving State Strategy
- Parallel Track Strategy
- Experimental Results
13Moving State Strategy
- Basic idea
- Share common states between two boxes
- Key Steps
- Identify common states
- State matching
- Share common states
- State moving
- Recompute unmatched states
- State recomputing
14Moving State Strategy
- State Matching
- state in old box has unique ID
- During rewriting, new ID given to newly generated
state in new box - When rewriting done, match states based on IDs.
- State Moving
- Between matched states
- On same machine, creates new pointers for matched
states in new box - Whats left?
- Unmatched states in new box
QABCD
QABCD
CD
AB
SABC
SD
SA
SBCD
CD
BC
SD
SBC
SAB
SC
BC
AB
SB
SC
SA
SB
QA
QB
QC
QD
QA
QB
QC
QD
Old Box
New Box
15Moving State Strategy
- Basic idea
- Share common states between two migration boxes
- Key steps
- State Matching
- Match states based on IDs.
- State Moving
- Create new pointers for matched states in new box
- Whats left?
- Unmatched states in new box
QABCD
QABCD
CD
AB
SABC
SD
SA
SBCD
CD
BC
SD
SBC
SAB
SC
BC
AB
SB
SC
SA
SB
QA
QB
QC
QD
QA
QB
QC
QD
Old Box
New Box
16Unmatched States
QABCD
- State Recomputing
- Recursively recompute unmatched SBC and SBCD from
bottom up - Why always possible?
- Old and new boxes have same input queues
- The states associated with input queues always
match - Why necessary?
AB
SA
SBCD
CD
SBC
SD
BC
SB
SC
QA
QB
QC
QD
17Terms on Tuples
QABCD
SABC
SD
CD
SAB
- New/Old tuples
- Old tuples already in old box
- when migration starts
- New tuples not exist in old box
- when migration starts
- Sub-tuples
- Tuple ABCD is result of
- Tuple A, B, C and D are sub-tuples of tuple ABCD
- Tuple ABCD has 2416 possible combinations of
old/new sub-tuples
SC
BC
SA
SB
AB
QA
QB
QC
QD
18Why Recompute Unmatched States
- To get the complete results of ABCD, we need all
16 old/new combinations
SA
SBCD
AB
SD
SBC
CD
SB
SC
BC
If SBC not recomputed, will miss results with
both B and C as OLD
QC
QD
QA
QB
B
C
D
A
B
C
D
A
B
C
D
A
Old Tuple
New Tuple
19Cost Estimation of MS Migration
- Cost of MS consists of
- Cost of state matching
- ID comparison (neglectable)
- Cost of state moving
- Create pointers (neglectable)
- Cost of state recomputing
- Majority of cost
- Affecting parameters
- Operator selectivities
- of tuples in states
- Estimated as (input rate x window size)
- See paper for detailed cost models
Cost model conclusion Cost of MS has polynomial
relationship to window size
20Cost Estimation of MS Migration
TMS Tmatch Tmove Trecompute TMS
Trecompute(SBC) Trecompute(SBCD)
?B?CW2(Tj TssBC) 2?B?C?DW3(TjsBC
TssBCsBCD)
Tm Time spent for each string comparison Tc
Time spent to create a new cursor Tj Time spent
to join a pair of tuples Ts Time spent to insert
one tuple into a state ?A Average tuple input
rate from QA ?B Average tuple input rate from
QB sAB Reduction factor of join operator AB W
Global time window constraint
21MS Migration Pros and Cons
- Pros
- Fast when of tuples in states is small
- Low input rates, low selectivity or small window
- Cons
- Output silence during entire migration stage
- Can query output even during migration?
- Motivation for Parallel Track Strategy
22Parallel Track Strategy
- Basic idea
- Execute both old and new plans in parallel
- Gradually push old tuples out of old box by
purging - Key Steps
- Connect new box
- Execute both boxes in parallel
- Remove old box once expired
- Contains only new tuples
- No old tuples or sub-tuples
23Parallel Track Strategy
- Key steps
- Connect boxes
- Execute in parallel
- Until old box expired (no old tuple or
sub-tuple) - Disconnect old box
- Start execute new box only
QABCD
QABCD
SABC
SD
SBCD
SA
CD
AB
SBC
SAB
SD
SC
BC
CD
SA
SB
SB
SC
BC
AB
QA
QB
QC
QD
QD
QA
QB
QC
24Potential Duplicates
- Tuple ABCD
- 2416 possible old/new sub-tuple combination
- Same case not generated by both boxes
- Otherwise we have duplicates
- In new box
- all states start empty
- only generates ABCD as (new,new,new,new)
- In old box
- may generate all 16 cases
- duplicate the case of (new,new,new,new)
25Duplicate Elimination
At root op in old box If both to-be-joined
tuples have all-new sub-tuples, dont join.
QABCD
SABC
SD
CD
SAB
SC
BC
Other op in old box Proceed as normal
SA
SB
AB
QA
QB
QC
QD
26Estimation of PT Migration Duration
T
SA
SB
Old
Old
AB
h0
Old
Old
W
QA
QB
TM-start
1st W
New
New
Old Box
2nd W
New
New
SABC
SD
TM-end
CD
SC
SAB
Estimation Formula
BC
TPT
h2
W if h0
SA
SB
AB
2W if hgt0
QA
QB
QC
QD
h height of the query tree
27PT Migration Duration
- Given enough system computing resources
- new tuples processed right away
- PT migration duration 2W
- If not enough system resources
- New tuples accumulated in queues
- PT migration duration gt 2W
28Cost Estimation of PT Migration
- Cost of PT
- cost of process 2W tuples in old box
-
- cost of process 2W tuples in new box
- Parameters
- Input rates, window size, selectivity
- Similar to MS strategy
29Cost Estimation of PT Migration
- Costs of processing 2Ws new tuples in both boxes
- For old box
- TAB Cost of Purge Cost of Insert Cost of
Join - For new box
- Differentiate first and second W
- TBC Cost for the first W Cost for the second W
30PT Migrations Pros and Cons
- Pros
- Keep on producing results even during migration
- no results during MS migration
- Cons
- Migration duration is at least 2W
- MS may be faster depending on tuples in states
31Outline
- Problem Definition and Motivation
- Dynamic Migration Strategies
- Moving State Strategy
- Parallel Track Strategy
- Experimental Results
32Experimental Setup
- Embed in the CAPE system
- CAPE Continuous Adaptive Processing Engine
- A streaming query engine developed at DSRG, WPI
- VLDB04 demo
- Layers of Adaptations
- Punctuation exploring
- Adaptive scheduling
- Query migration
- Dynamic distribution
33Experimental Setup (II)
- Experiments on migration duration
- Vary window size
- Vary input rates
- Experiments on migration effects
- Changes of output rates
- Arrival Streams
- Generated by stream generator in CAPE
- Poisson arrival pattern (exponential for
inter-arrival time) - Machine
- WIN 2000 Pentium III processor
- 500MHz CPU, 384M REM
34Experimental Setup (II)
- Experiments on migration duration
- Vary window size
- Vary input rates
- Experiments on migration effects
- Changes of intermediate results
- Changes of output rates
- Data Set
- Enough system
- resources (low config)
- Not enough system
- resources (high config)
- Machine
- WIN 2000 Pentium III processor
- 500MHz CPU, 384M REM
Migration Duration Migration Duration Migration Duration Migration Effects Migration Effects
set1 set2 set3 set4 (L) set5 (H)
W (ms) vary 1000 vary 1000 2000
IA(ms) 100 50 100 100 50
IB(ms) 100 vary 12 100 50
IC(ms) 100 50 12 100 50
ID(ms) 100 50 12 100 50
?AB 0.1 0.1 0.1 0.1 0.2
?BC 0.05 0.05 0.1 0.02 0.05
?CD 0.02 0.02 0.1 0.02 0.05
35Migration Duration vs. Window Size
36Migration Duration vs. Input Rates
- T_MS almost constant
- T_PT increases with ?B
37Migration Effects
- Migration starts at 10000ms
- Four lines
- New run the new (better) query plan alone
- Old run the old (worse) query plan alone
- MS start with old plan, migrate to new plan by
MS strategy - PT start with old plan, migrate to new plan by
PT strategy
38Experimental Results High Config
Migration starts at 10000ms New run the new
(better) query plan alone Old run the old
(worse) query plan alone MS start with old
plan, migrate to new plan by MS strategy PT
start with old plan, migrate to new plan by PT
strategy
39Conclusions
- Identify problem of migration for stateful
operators - First solutions for continuous query migration
- Moving state strategy
- Parallel track strategy
- Embed both strategies into stream system
- Cost model and experimental evaluation
- Cost model confirmed by experiments
- Identify performance trade-off of two strategies
40Conclusions
- Migration duration
- Confirms with prior analysis
- Moving State Strategy
- Affected by arrival rates and window size
- Parallel Track Strategy
- 2W if Given enough system resource
- Otherwise affected by arrival rates and window
size - Output during migration
- No output during MS migration
- Still output during PT migration
41Future Work
- General migration framework
- All stateful operator types
- Cost analysis
- Effects on optimization choices
42- CAPE website _at_
- http//davis.wpi.edu/dsrg/CAPE/