Efficient Scheduling of Heterogeneous Continuous Queries - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Efficient Scheduling of Heterogeneous Continuous Queries

Description:

the one with higher selectivity produces more tuples per time unit (higher Output Rate) ... The output rate of a query = selectivity/cost ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 33
Provided by: msh88
Category:

less

Transcript and Presenter's Notes

Title: Efficient Scheduling of Heterogeneous Continuous Queries


1
Efficient Scheduling of Heterogeneous Continuous
Queries
Mohamed A. Sharaf Panos K. Chrysanthis Alexandros
Labrinidis Kirk Pruhs Advanced Data Management
Technologies Lab Department of Computer
Science University of Pittsburgh VLDB 2006
2
Motivating Example
  • Tell me when there are airplane tickets such
    that
  • Itinerary Pittsburgh -gt Korea -gt Pittsburgh
  • Dates September 8 -gt September 16
  • Price lt 1200
  • This is a form of a Continuous Query (CQ)
  • CQs registered ahead of time
  • Arrival of new data triggers execution
  • CQs support monitoring applications
  • ltinsert your favorite monitoring application
    heregt

3
Data Stream Management System (DSMS)
  • DSMS Database system Online system
  • Our Goal Improve the online performance of a DSMS

Memory Manager
Query Optimizer
Output Data Stream Dn
Query Scheduler
Load Shedder
Query Scheduler
1
2
3
Continuous Query Qn
Output Data Stream D1
Input Data Streams
1
2
3
Continuous Query Q1
4
Need for Query Scheduling
  • The execution order of continuous queries
    determines the overall behavior of the system
  • e.g., memory usage Babcock et. al., SIGMOD03
  • Traditionally
  • One operator per thread
  • Resource management done by OS
  • Problems
  • No objective for optimization
  • Does not exploit query semantics

5
Scheduling Multiple Continuous Queries (MCQ)
  • Given
  • A set of n queries ready to execute (queries with
    pending updates)
  • A certain metric to optimize
  • Then
  • The MCQ Scheduler decides the execution order of
    the n queries so that to optimize the given metric


CQ2
CQn
CQ1
6
Outline
  • Introduction
  • Scheduling for Quality of Service (QoS)
  • Average response time
  • Average slowdown
  • Balancing the trade-off between average and worst
    case
  • Implementation issues
  • Conclusions

7
Response Time
  • The response time of a tuple is the interval of
    time between its arrival at the DSMS until its
    departure
  • Tuples that are filtered out (discarded) during
    query processing do not contribute to the metric
  • Shortest Remaining Processing Time (SRPT) is the
    policy to optimize response time in Web servers
  • Would SRPT optimize response time for multiple
    CQs ?!
  • No because it does not exploit CQs
    characteristics!

8
Impact of Selectivity
  • Selectivity of a query (S) is the probability of
    producing an output tuple after processing an
    input tuple (i.e., detecting a related event)
  • S0.1 10 input tuples ? 1 output event
  • S1.0 10 input tuples ? 10 output events
  • If two queries have the same cost then
  • the one with higher selectivity produces more
    tuples per time unit (higher Output Rate).

9
Impact of Output Rate
  • Q1 S11.0 and C11 mS then OR11.0
  • Q2 S20.2 and C21 mS then OR20.2
  • 5 pending tuples arrived at time 0

10
Highest Rate Policy
  • Assign each query a priority equal to its output
    rate
  • The output rate of a query selectivity/cost
  • How to compute the output rate of a query with
    more than one operator ?
  • At each scheduling point, schedule the query with
    the highest global output rateHighest Rate
    Policy (HR)

11
Simulation Testbed
  • Developed a DSMS simulator in C
  • Policies for multi-query scheduling
  • Round Robin (RR Aurora)
  • Highest Rate (HR)
  • First Come First Serve (FCFS)
  • Shortest Remaining Processing Time (SRPT)
  • Input traces from Internet traffic
  • Generate 500 continuous queries
    select-join-project
  • Uniform distribution of costs and selectivities
  • Assigned costs and selectivities determine the
    systems utilization (or load)

12
Results Average Response Time
73
Avg. Response Time (?Sec)
65
13
Outline
  • Introduction
  • Scheduling for Quality of Service (QoS)
  • Average response time
  • Average slowdown
  • Balancing the trade-off between average and worst
    case
  • Implementation issues
  • Conclusions

14
Slowdown
  • Slowdown (or stretch) Mehta DeWitt, VLDB93
  • Ratio between the tuples response time to its
    ideal processing time if it were the only tuple
    in the system
  • slowdown is more fair than response time
  • It relates response time to demand tuples for
    an expensive query are expected to stay longer as
    they contribute more to the load
  • Ideally, slowdown 1
  • Slowdown increases with increasing load

15
SRPT vs. HR
  • In Web Servers, SRPT is
  • Optimal for response time, and
  • Near optimal for slowdown
  • Short jobs spend shorter time in the system
  • In DSMSs HR minimizes average response time but
    what about average slowdown ?
  • Is it possible under HR for short queries to
    experience high slowdown leading to an overall
    high slowdown ?
  • Queries with low selectivity are penalized !

16
Example
  • Q1 S11.0 and C15 mS then OR10.2
  • Q2 S20.33 and C22 mS then OR20.17
  • 3 pending tuples arrived at time 0

17
Parameters for Scheduling
  • Sx s1 s2 s3
  • Cxavg c1 (c2s1) (c3s1s2)
  • Cx cost of detecting an event
  • c1 c2c3
  • ideal processing time
  • Wx the current wait time of the oldest tuple
    in Qx input queue

3 ?
2 8
1 ?
18
Scheduling for Slowdown (1)
C1avg
C2avg
C2
C1
Q1
Q2
W1
S1
W2
S2
t1
t2
  • Compute slowdown (H) under two policies
  • Policy X first Q1 then Q2
  • Policy Y first Q2 then Q1

Extra wait time for Q1 to finish execution
Processing time
Probability that t1 is produced
Wait time
19
Scheduling for Slowdown (2)
C1
C2
C1avg
C2avg
C2
C1
Q1
Q2
Q1
Q2
W1
S1
W2
S2
W1
S1
W2
S2
t1
t1
t2
t2
  • Under policy X first Q1 then Q2
  • Under policy Y first Q2 then Q1
  • For HX lt HY

20
Scheduling for Slowdown (3)
Priority of Qx
  • Sx/Cxavg is the output rate (ORx) of Qx
  • Cx is the ideal processing time of a tuple
    produced by Qx
  • Our Highest Normalized Rate (HNR) policy
    emphasizes the tuple ideal processing time
  • Inexpensive queries with low productivity are not
    penalized
  • For equal costs Ci 1 ? HNR HR
  • For selectivity 1 Si 1 ? HNR SRPT

21
Results Average Slowdown
20
Avg. Slowdown
22
Outline
  • Introduction
  • Scheduling for Quality of Service (QoS)
  • Average response time
  • Average slowdown
  • Balancing the trade-off between average and worst
    case
  • Implementation issues
  • Conclusions

23
Worst-Case Performance
  • Queries/Events may experience starvation
  • Queries with low selectivity and/or high cost
  • Typically measured using
  • maximum response time, or
  • maximum slowdown
  • Maximum slowdown (or response time) is
  • A very sensitive metric
  • It does not consider the average-case performance

24
Trade-off between Avgerage Case and Worst Case
  • Maximum slowdown worst-case performance
  • Average slowdown average-case performance
  • We need to look at both metrics at the same time
  • Lp norm of slowdowns captures both metrics
  • L2 norm of N tuples
  • it takes into account all values
  • it penalizes outliers

25
Scheduling for the L2 Norm of Slowdowns
  • Balance Slowdown Policy (BSD)
  • Priority of Qx
  • A query is scheduled either because
  • It has a high normalized rate, or
  • Its pending tuples accumulated high slowdown
  • All users are satisfied Fairness

Normalized Rate
Current Slowdown
26
Results Balancing the trade-off
77
Max. Slowdown
31
Avg. Slowdown
27
Results L2 Norm of Slowdowns
L2 Norm of Slowdowns
24
28
Slowdown per Class (same cost queries)
29
Outline
  • Introduction
  • Scheduling for Quality of Service (QoS)
  • Implementation issues
  • Scheduling overhead
  • Shared operators (details in paper)
  • Conclusions

30
Optimization Methods
L2 SD of BSD-Logarithmic / L2 SD of
BSD-Hypothetical
  • BSD-Hypothetical BSD without overhead

31
Conclusions
  • In this talk, we presented
  • QoS metrics for evaluating the performance of a
    DSMS
  • Scheduling policies that exploit the properties
    of CQs
  • Policies to improve QoS
  • Highest Rate (HR) for average response time
  • Highest Normalized Rate (HNR) for average
    slowdown
  • Balance Slowdown (BSD) for balancing the
    trade-off between average- and worst-case
    performance
  • Addressed implementation issues to ensure the
    applicability of our proposed policies
  • We empirically evaluated the gains provided by
    the proposed policies compared to existing
    policies

32
Thank You
Questions ?
http//db.cs.pitt.edu/streams
Thanks NSF IIS-0534531 (AQSIOS Project)
Write a Comment
User Comments (0)
About PowerShow.com