Title: Dynamic Plan Migration for Continuous Query over Data Streams
1Dynamic Plan Migration for Continuous Query over
Data Streams
- Yali Zhu, Elke Rundensteiner and George Heineman
- Database System Research Group
- Worcester Polytechnic Institute
- Massachusetts, USA
Research partly supported by the RDC grant
2003-04 on On-line Stream Monitoring Systems
Untethered Healthcare, Intrusion Detection, and
Beyond.
2Motivation
- Continuous query over streams
- Statistics unknown before start
- Statistics changing during execution
- Stream rates, arrival pattern, distribution, etc
- Need for dynamic adaptation
- Plan re-optimization
- Change the shape of the query plan tree
3Run-time Plan Re-Optimization
- Step 1 - Decide when to optimize
- Statistics Monitoring
- Step 2 Generate new query plan
- Query Optimization
- Step 3 Replace current plan by new plan
- Plan Migration
4Naïve Plan Migration Strategy
BC
AB
AB
BC
A
A
B
B
C
C
- Migration Steps
- Pause execution of old plan
- Drain out all tuples inside old plan
- Replace old plan by new plan
- Resume execution of new plan
Problem Works for stateless operators only
5Stateful Operator in CQ
- Why stateful
- Need non-blocking operators in CQ
- Operator needs to output partial results
- State data structure keep received tuples
Example Symmetric NL join w/ window constraints
ax
b2
ax
b3
State A
State B
Key Observation The purge of tuples in states
relies on processing of new tuples.
AB
b1
b2
b3
b4
b5
ax
A
B
ax
6Naïve Migration Strategy Revisited
BC
AB
Deadlock Waiting Problem
A
B
C
(2) All tuples drained
- Steps
- (1) Pause execution of old plan
- (2) Drain out all tuples inside old plan
- (3) Replace old plan by new plan
- (4) Resume execution of new plan
(3) Old Replaced By new
(4) Processing Resumed
7Problem Definition
- Dynamic Plan Migration
- Input (two migration boxes)
- One contains old plan
- One contains new plan
- Have same input and output queues
- Result
- Old box is replaced by new box
- Valid Migration
- No missing tuples
- No duplicates
- Key points
- - Involved plans contain stateful operators
- Need to migrate yet still retain useful states
- and discard useless states.
8State of the Art
- Efficient mid-query re-optimization of
sub-optimal query execution plans - Kabra, DeWitt 1998
- Only migrates unprocessed portion
- Query plan competing model
- Ioannidis, Ng, et. al. 1992 Graefe, Cole.
1994 - Generate several candidate query plans before
start - Execute all, choose one after a while
9Outline
- Problem Motivation and Definition
- Dynamic Migration Strategies
- Moving State Strategy
- Parallel Track Strategy
- Experimental Results
10Moving State Strategy
- Basic idea
- Share common states between two migration boxes
- Key steps
- State Matching
- Match states based on IDs.
- State Moving
- Create new pointers for matched states in new box
- Whats left?
- Unmatched states in new box
QABCD
QABCD
CD
AB
SABC
SD
SA
SBCD
CD
BC
SD
SBC
SAB
SC
BC
AB
SB
SC
SA
SB
QA
QB
QC
QD
QA
QB
QC
QD
Old Box
New Box
11Unmatched States
QABCD
- State Recomputing
- Recursively recompute unmatched SBC and SBCD from
bottom up - Why always possible?
- Old and new boxes have same input queues
- The states associated with input queues always
match - Why necessary?
AB
SA
SBCD
CD
SBC
SD
BC
SB
SC
QA
QB
QC
QD
12Terms on Tuples
QABCD
SABC
SD
CD
SAB
- New/Old tuples
- Old tuples already in old box
- when migration starts
- New tuples not exist in old box
- when migration starts
- Sub-tuples
- Tuple ABCD is result of
- Tuple A, B, C and D are sub-tuples of tuple ABCD
- Tuple ABCD has 2416 possible combinations of
old/new sub-tuples
SC
BC
SA
SB
AB
QA
QB
QC
QD
13Why Recompute Unmatched States
- To get the complete results of ABCD, we need all
16 old/new combinations
SA
SBCD
AB
SD
SBC
CD
SB
SC
BC
If SBC not recomputed, will miss results with
both B and C as OLD
QC
QD
QA
QB
B
C
D
A
B
C
D
A
B
C
D
A
Old Tuple
New Tuple
14Cost Estimation of MS Migration
- Cost of MS consists of
- Cost of state matching
- ID comparison (neglectable)
- Cost of state moving
- Create pointers (neglectable)
- Cost of state recomputing
- Majority of cost
- Affecting parameters
- Operator selectivities
- of tuples in states
- Estimated as (input rate x window size)
- See paper for detailed cost models
One cost model conclusion Cost of MS has
polynomial relation to window size
15MS Migration Pros and Cons
- Pros
- Fast when of tuples in states is small
- Low input rates, low selectivity or small window
- Cons
- Output silence during entire migration stage
- Can query output even during migration?
- Motivation for Parallel Track Strategy
16Parallel Track Strategy
- Basic idea
- Execute both plans in parallel and gradually
push old tuples out of old box by purging - Key steps
- Connect boxes
- Execute in parallel
- Until old box expired (no old tuple or
sub-tuple) - Disconnect old box
- Start execute new box only
QABCD
QABCD
SABC
SD
SBCD
SA
CD
AB
SBC
SAB
SD
SC
BC
CD
SA
SB
SB
SC
BC
AB
QA
QB
QC
QD
QD
QA
QB
QC
17Potential Duplicates
Duplicate Prevention
- Tuple ABCD
- 2416 possible old/new sub-tuple combinations
- Same case not generated by both boxes
- Otherwise we may have duplicates
- In new box
- all states start empty
- only generates ABCD as (new,new,new,new)
- In old box
- may generate all 16 cases
- duplicate the case of (new,new,new,new)
At root op in old box If both to-be-joined
tuples have all-new sub-tuples, dont join.
QABCD
SABC
SD
CD
SC
SAB
BC
Other op in old box Proceed as normal
SA
SB
AB
QD
QA
QC
QB
18Estimation of PT Migration
T
Old
Old
Old
Old
W
TM-start
Old Box
1st W
New
New
SABC
SD
CD
2nd W
New
New
TM-end
SC
SAB
BC
Estimation Formula
SA
SB
AB
TPT 2W
QA
QB
QC
QD
19PT Migration Duration
- Given enough system computing resources
- new tuples processed right away
- PT migration duration 2W
- If not enough system resources
- New tuples accumulated in queues
- PT migration duration gt 2W
20Cost Estimation of PT Migration
- Cost of PT
- cost of process 2W tuples in old box
-
- cost of process 2W tuples in new box
- Parameters
- Input rates, window size, selectivity
- Similar to MS strategy
21PT Migrations Pros and Cons
- Pros
- Keep on producing results even during migration
- no results during MS migration
- Cons
- Migration duration is at least 2W
- MS may be faster depending on tuples in states
22Outline
- Problem Definition and Motivation
- Dynamic Migration Strategies
- Moving State Strategy
- Parallel Track Strategy
- Experimental Results
23Experimental Setup
- Embed in the CAPE system
- CAPE Continuous Adaptive Processing Engine
- A streaming query engine developed at DSRG, WPI
- VLDB04 demo
- Layers of Adaptations
- Punctuation exploring
- Adaptive scheduling
- Query migration
- Dynamic distribution
- Input Streams
- By stream generator of CAPE
- Poisson arrival pattern
- Experiments on migration duration
- Vary window size
24Migration Duration vs. Window Size
25Conclusions
- Identify problem of migration for stateful
operators - First solutions for continuous query migration
- Moving state strategy
- Parallel track strategy
- Embed both strategies into stream system
- Cost model and experimental evaluation
- Cost model confirmed by experiments
- Identify performance trade-off of the two
strategies
26Thank You
- For more information, check the CAPE website _at_
- http//davis.wpi.edu/dsrg/CAPE/