Dynamic Plan Migration for Continuous Queries over Data Streams - PowerPoint PPT Presentation

About This Presentation
Title:

Dynamic Plan Migration for Continuous Queries over Data Streams

Description:

Dynamic Plan Migration for Continuous Queries over Data Streams ... Resumed (3) Old Replaced. By new. Deadlock Waiting Problem: SIGMOD 2004. 9 ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 41
Provided by: webC
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Plan Migration for Continuous Queries over Data Streams


1
Dynamic Plan Migration for Continuous Queries
over Data Streams
  • Yali Zhu, Elke Rundensteiner and George Heineman
  • Database System Research Group, WPI.
  • Massachusetts, USA
  • SIGMOD2004

Research partly supported by the RDC grant
2003-04 on On-line Stream Monitoring Systems
Untethered Healthcare, Intrusion Detection, and
Beyond.
2
Stream Query Optimization
  • Differences with Traditional Query Optimization?

3
Stream Query Optimization
  • More dynamic fluctuations in statistics
  • ? compile time optimization not possible
  • Global optimization not practical as huge query
    networks
  • ? adaptive optimization.
  • Need to take CPU processing and main memory into
    account
  • ? other cost models

4
Motivation of Query Migration
  • Continuous queries over streams
  • Statistics unknown before start
  • Statistics changing during execution
  • Stream rates, arrival pattern, distribution, etc
  • Need for dynamic adaptation
  • Plan re-optimization
  • Change the shape of query plan tree

5
Run-time Plan Re-Optimization
  • Step 1 - Decide when to optimize
  • Statistics Monitoring
  • Step 2 Generate new query plan
  • Query Optimization
  • Step 3 Replace current plan by new plan
  • Plan Migration

6
Naïve Plan Migration Strategy
BC
AB
AB
BC
A
A
B
B
C
C
  • Migration Steps
  • Pause execution of old plan
  • Drain out all tuples inside old plan
  • Replace old plan by new plan
  • Resume execution of new plan

Problem Works for stateless operators only
7
Stateful Operator in CQ
  • Why stateful
  • Need non-blocking operators in CQ
  • Operator needs to output partial results
  • State data structure keeps received tuples

Example Symmetric NL join w/ window constraints
ax
b2
ax
b3
State A
State B
Observation The purge of tuples in states
relies typically on processing of new tuples.
AB
b1
b2
b3
b4
b5
ax
A
B
ax
8
Naïve Migration Strategy Revisited
BC
AB
Deadlock Waiting Problem
A
B
C
(2) All tuples drained
  • Steps
  • (1) Pause execution of old plan
  • (2) Drain out all tuples inside old plan
  • (3) Replace old plan by new plan
  • (4) Resume execution of new plan

(3) Old Replaced By new
(4) Processing Resumed
9
Concept of Migration Boxes
  • Two exchangable migration boxes
  • One contains old plan or sub-plan
  • One contains new plan or sub-plan
  • Two plans are semantically equivalent
  • Same input queues and output queues
  • Migration abstracted as replacing old box by new
    box.

10
Problem Definition
  • Dynamic Plan Migration
  • Input (two migration boxes)
  • One contains old plan
  • One contains new plan
  • Have same input and output queues
  • Result
  • Old box is replaced by new box
  • Valid Migration
  • No missing tuples
  • No duplicates
  • Key points
  • - Involved plans contain stateful operators
  • Need to migrate yet still retain useful states
  • and discard useless states.

11
State of the Art
  • Efficient mid-query re-optimization of
    sub-optimal query execution plans
  • Kabra, DeWitt 1998
  • Only migrates unprocessed portion
  • Query plan competing model
  • Ioannidis, Ng, et. al. 1992 Graefe, Cole.
    1994
  • Generate several candidate query plans before
    start
  • Execute all, choose one after a while

12
Outline
  • Problem Motivation and Definition
  • Dynamic Migration Strategies
  • Moving State Strategy
  • Parallel Track Strategy
  • Experimental Results

13
Moving State Strategy
  • Basic idea
  • Share common states between two boxes
  • Key Steps
  • Identify common states
  • State matching
  • Share common states
  • State moving
  • Recompute unmatched states
  • State recomputing

14
Moving State Strategy
  • State Matching
  • state in old box has unique ID
  • During rewriting, new ID given to newly generated
    state in new box
  • When rewriting done, match states based on IDs.
  • State Moving
  • Between matched states
  • On same machine, creates new pointers for matched
    states in new box
  • Whats left?
  • Unmatched states in new box

QABCD
QABCD
CD
AB
SABC
SD
SA
SBCD
CD
BC
SD
SBC
SAB
SC
BC
AB
SB
SC
SA
SB
QA
QB
QC
QD
QA
QB
QC
QD
Old Box
New Box
15
Moving State Strategy
  • Basic idea
  • Share common states between two migration boxes
  • Key steps
  • State Matching
  • Match states based on IDs.
  • State Moving
  • Create new pointers for matched states in new box
  • Whats left?
  • Unmatched states in new box

QABCD
QABCD
CD
AB
SABC
SD
SA
SBCD
CD
BC
SD
SBC
SAB
SC
BC
AB
SB
SC
SA
SB
QA
QB
QC
QD
QA
QB
QC
QD
Old Box
New Box
16
Unmatched States
QABCD
  • State Recomputing
  • Recursively recompute unmatched SBC and SBCD from
    bottom up
  • Why always possible?
  • Old and new boxes have same input queues
  • The states associated with input queues always
    match
  • Why necessary?

AB
SA
SBCD
CD
SBC
SD
BC
SB
SC
QA
QB
QC
QD
17
Terms on Tuples
QABCD
SABC
SD
CD
SAB
  • New/Old tuples
  • Old tuples already in old box
  • when migration starts
  • New tuples not exist in old box
  • when migration starts
  • Sub-tuples
  • Tuple ABCD is result of
  • Tuple A, B, C and D are sub-tuples of tuple ABCD
  • Tuple ABCD has 2416 possible combinations of
    old/new sub-tuples

SC
BC
SA
SB
AB
QA
QB
QC
QD
18
Why Recompute Unmatched States
  • To get the complete results of ABCD, we need all
    16 old/new combinations

SA
SBCD
AB
SD
SBC
CD
SB
SC
BC
If SBC not recomputed, will miss results with
both B and C as OLD
QC
QD
QA
QB
B
C
D
A
B
C
D
A
B
C
D
A
Old Tuple
New Tuple
19
Cost Estimation of MS Migration
  • Cost of MS consists of
  • Cost of state matching
  • ID comparison (neglectable)
  • Cost of state moving
  • Create pointers (neglectable)
  • Cost of state recomputing
  • Majority of cost
  • Affecting parameters
  • Operator selectivities
  • of tuples in states
  • Estimated as (input rate x window size)
  • See paper for detailed cost models

Cost model conclusion Cost of MS has polynomial
relationship to window size
20
Cost Estimation of MS Migration
TMS Tmatch Tmove Trecompute TMS
Trecompute(SBC) Trecompute(SBCD)
?B?CW2(Tj TssBC) 2?B?C?DW3(TjsBC
TssBCsBCD)
Tm Time spent for each string comparison Tc
Time spent to create a new cursor Tj Time spent
to join a pair of tuples Ts Time spent to insert
one tuple into a state ?A Average tuple input
rate from QA ?B Average tuple input rate from
QB sAB Reduction factor of join operator AB W
Global time window constraint
21
MS Migration Pros and Cons
  • Pros
  • Fast when of tuples in states is small
  • Low input rates, low selectivity or small window
  • Cons
  • Output silence during entire migration stage
  • Can query output even during migration?
  • Motivation for Parallel Track Strategy

22
Parallel Track Strategy
  • Basic idea
  • Execute both old and new plans in parallel
  • Gradually push old tuples out of old box by
    purging
  • Key Steps
  • Connect new box
  • Execute both boxes in parallel
  • Remove old box once expired
  • Contains only new tuples
  • No old tuples or sub-tuples

23
Parallel Track Strategy
  • Key steps
  • Connect boxes
  • Execute in parallel
  • Until old box expired (no old tuple or
    sub-tuple)
  • Disconnect old box
  • Start execute new box only

QABCD
QABCD
SABC
SD
SBCD
SA
CD
AB
SBC
SAB
SD
SC
BC
CD
SA
SB
SB
SC
BC
AB
QA
QB
QC
QD
QD
QA
QB
QC
24
Potential Duplicates
  • Tuple ABCD
  • 2416 possible old/new sub-tuple combination
  • Same case not generated by both boxes
  • Otherwise we have duplicates
  • In new box
  • all states start empty
  • only generates ABCD as (new,new,new,new)
  • In old box
  • may generate all 16 cases
  • duplicate the case of (new,new,new,new)

25
Duplicate Elimination
At root op in old box If both to-be-joined
tuples have all-new sub-tuples, dont join.
QABCD
SABC
SD
CD
SAB
SC
BC
Other op in old box Proceed as normal
SA
SB
AB
QA
QB
QC
QD
26
Estimation of PT Migration Duration
T
SA
SB
Old
Old
AB
h0
Old
Old
W
QA
QB
TM-start
1st W
New
New
Old Box
2nd W
New
New
SABC
SD
TM-end
CD
SC
SAB
Estimation Formula
BC
TPT
h2
W if h0
SA
SB
AB
2W if hgt0
QA
QB
QC
QD
h height of the query tree
27
PT Migration Duration
  • Given enough system computing resources
  • new tuples processed right away
  • PT migration duration 2W
  • If not enough system resources
  • New tuples accumulated in queues
  • PT migration duration gt 2W

28
Cost Estimation of PT Migration
  • Cost of PT
  • cost of process 2W tuples in old box
  • cost of process 2W tuples in new box
  • Parameters
  • Input rates, window size, selectivity
  • Similar to MS strategy

29
Cost Estimation of PT Migration
  • Costs of processing 2Ws new tuples in both boxes
  • For old box
  • TAB Cost of Purge Cost of Insert Cost of
    Join
  • For new box
  • Differentiate first and second W
  • TBC Cost for the first W Cost for the second W

30
PT Migrations Pros and Cons
  • Pros
  • Keep on producing results even during migration
  • no results during MS migration
  • Cons
  • Migration duration is at least 2W
  • MS may be faster depending on tuples in states

31
Outline
  • Problem Definition and Motivation
  • Dynamic Migration Strategies
  • Moving State Strategy
  • Parallel Track Strategy
  • Experimental Results

32
Experimental Setup
  • Embed in the CAPE system
  • CAPE Continuous Adaptive Processing Engine
  • A streaming query engine developed at DSRG, WPI
  • VLDB04 demo
  • Layers of Adaptations
  • Punctuation exploring
  • Adaptive scheduling
  • Query migration
  • Dynamic distribution

33
Experimental Setup (II)
  • Experiments on migration duration
  • Vary window size
  • Vary input rates
  • Experiments on migration effects
  • Changes of output rates
  • Arrival Streams
  • Generated by stream generator in CAPE
  • Poisson arrival pattern (exponential for
    inter-arrival time)
  • Machine
  • WIN 2000 Pentium III processor
  • 500MHz CPU, 384M REM

34
Experimental Setup (II)
  • Experiments on migration duration
  • Vary window size
  • Vary input rates
  • Experiments on migration effects
  • Changes of intermediate results
  • Changes of output rates
  • Data Set
  • Enough system
  • resources (low config)
  • Not enough system
  • resources (high config)
  • Machine
  • WIN 2000 Pentium III processor
  • 500MHz CPU, 384M REM

Migration Duration Migration Duration Migration Duration Migration Effects Migration Effects
set1 set2 set3 set4 (L) set5 (H)
W (ms) vary 1000 vary 1000 2000
IA(ms) 100 50 100 100 50
IB(ms) 100 vary 12 100 50
IC(ms) 100 50 12 100 50
ID(ms) 100 50 12 100 50
?AB 0.1 0.1 0.1 0.1 0.2
?BC 0.05 0.05 0.1 0.02 0.05
?CD 0.02 0.02 0.1 0.02 0.05
35
Migration Duration vs. Window Size
36
Migration Duration vs. Input Rates
  • T_MS almost constant
  • T_PT increases with ?B

37
Migration Effects
  • Migration starts at 10000ms
  • Four lines
  • New run the new (better) query plan alone
  • Old run the old (worse) query plan alone
  • MS start with old plan, migrate to new plan by
    MS strategy
  • PT start with old plan, migrate to new plan by
    PT strategy

38
Experimental Results High Config
Migration starts at 10000ms New run the new
(better) query plan alone Old run the old
(worse) query plan alone MS start with old
plan, migrate to new plan by MS strategy PT
start with old plan, migrate to new plan by PT
strategy
39
Conclusions
  • Identify problem of migration for stateful
    operators
  • First solutions for continuous query migration
  • Moving state strategy
  • Parallel track strategy
  • Embed both strategies into stream system
  • Cost model and experimental evaluation
  • Cost model confirmed by experiments
  • Identify performance trade-off of two strategies

40
Conclusions
  • Migration duration
  • Confirms with prior analysis
  • Moving State Strategy
  • Affected by arrival rates and window size
  • Parallel Track Strategy
  • 2W if Given enough system resource
  • Otherwise affected by arrival rates and window
    size
  • Output during migration
  • No output during MS migration
  • Still output during PT migration

41
Future Work
  • General migration framework
  • All stateful operator types
  • Cost analysis
  • Effects on optimization choices

42
  • CAPE website _at_
  • http//davis.wpi.edu/dsrg/CAPE/
Write a Comment
User Comments (0)
About PowerShow.com