Efficient Scheduling of Heterogeneous Continuous Queries - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Efficient Scheduling of Heterogeneous Continuous Queries

Description:

the one with higher selectivity produces more tuples per time unit (higher Output Rate) ... The output rate of a query = selectivity/cost ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 33

Provided by: msh88

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Scheduling of Heterogeneous Continuous Queries

1
Efficient Scheduling of Heterogeneous Continuous
Queries
Mohamed A. Sharaf Panos K. Chrysanthis Alexandros
Labrinidis Kirk Pruhs Advanced Data Management
Technologies Lab Department of Computer
Science University of Pittsburgh VLDB 2006
2
Motivating Example

Tell me when there are airplane tickets such
that
Itinerary Pittsburgh -gt Korea -gt Pittsburgh
Dates September 8 -gt September 16
Price lt 1200
This is a form of a Continuous Query (CQ)
CQs registered ahead of time
Arrival of new data triggers execution
CQs support monitoring applications
ltinsert your favorite monitoring application
heregt

3
Data Stream Management System (DSMS)

DSMS Database system Online system
Our Goal Improve the online performance of a DSMS

Memory Manager
Query Optimizer
Output Data Stream Dn
Query Scheduler
Load Shedder
Query Scheduler
1
2
3
Continuous Query Qn
Output Data Stream D1
Input Data Streams
1
2
3
Continuous Query Q1
4
Need for Query Scheduling

The execution order of continuous queries
determines the overall behavior of the system
e.g., memory usage Babcock et. al., SIGMOD03
Traditionally
One operator per thread
Resource management done by OS
Problems
No objective for optimization
Does not exploit query semantics

5
Scheduling Multiple Continuous Queries (MCQ)

Given
A set of n queries ready to execute (queries with
pending updates)
A certain metric to optimize
Then
The MCQ Scheduler decides the execution order of
the n queries so that to optimize the given metric

CQ2
CQn
CQ1
6
Outline

Introduction
Scheduling for Quality of Service (QoS)
Average response time
Average slowdown
Balancing the trade-off between average and worst
case
Implementation issues
Conclusions

7
Response Time

The response time of a tuple is the interval of
time between its arrival at the DSMS until its
departure
Tuples that are filtered out (discarded) during
query processing do not contribute to the metric
Shortest Remaining Processing Time (SRPT) is the
policy to optimize response time in Web servers
Would SRPT optimize response time for multiple
CQs ?!
No because it does not exploit CQs
characteristics!

8
Impact of Selectivity

Selectivity of a query (S) is the probability of
producing an output tuple after processing an
input tuple (i.e., detecting a related event)
S0.1 10 input tuples ? 1 output event
S1.0 10 input tuples ? 10 output events
If two queries have the same cost then
the one with higher selectivity produces more
tuples per time unit (higher Output Rate).

9
Impact of Output Rate

Q1 S11.0 and C11 mS then OR11.0
Q2 S20.2 and C21 mS then OR20.2
5 pending tuples arrived at time 0

10
Highest Rate Policy

Assign each query a priority equal to its output
rate
The output rate of a query selectivity/cost
How to compute the output rate of a query with
more than one operator ?

At each scheduling point, schedule the query with
the highest global output rateHighest Rate
Policy (HR)

11
Simulation Testbed

Developed a DSMS simulator in C
Policies for multi-query scheduling
Round Robin (RR Aurora)
Highest Rate (HR)
First Come First Serve (FCFS)
Shortest Remaining Processing Time (SRPT)
Input traces from Internet traffic
Generate 500 continuous queries
select-join-project
Uniform distribution of costs and selectivities
Assigned costs and selectivities determine the
systems utilization (or load)

12
Results Average Response Time
73
Avg. Response Time (?Sec)
65
13
Outline

Introduction
Scheduling for Quality of Service (QoS)
Average response time
Average slowdown
Balancing the trade-off between average and worst
case
Implementation issues
Conclusions

14
Slowdown

Slowdown (or stretch) Mehta DeWitt, VLDB93
Ratio between the tuples response time to its
ideal processing time if it were the only tuple
in the system
slowdown is more fair than response time
It relates response time to demand tuples for
an expensive query are expected to stay longer as
they contribute more to the load
Ideally, slowdown 1
Slowdown increases with increasing load

15
SRPT vs. HR

In Web Servers, SRPT is
Optimal for response time, and
Near optimal for slowdown
Short jobs spend shorter time in the system
In DSMSs HR minimizes average response time but
what about average slowdown ?
Is it possible under HR for short queries to
experience high slowdown leading to an overall
high slowdown ?
Queries with low selectivity are penalized !

16
Example

Q1 S11.0 and C15 mS then OR10.2
Q2 S20.33 and C22 mS then OR20.17
3 pending tuples arrived at time 0

17
Parameters for Scheduling

Sx s1 s2 s3
Cxavg c1 (c2s1) (c3s1s2)
Cx cost of detecting an event
c1 c2c3
ideal processing time
Wx the current wait time of the oldest tuple
in Qx input queue

3 ?
2 8
1 ?
18
Scheduling for Slowdown (1)
C1avg
C2avg
C2
C1
Q1
Q2
W1
S1
W2
S2
t1
t2

Compute slowdown (H) under two policies
Policy X first Q1 then Q2
Policy Y first Q2 then Q1

Extra wait time for Q1 to finish execution
Processing time
Probability that t1 is produced
Wait time
19
Scheduling for Slowdown (2)
C1
C2
C1avg
C2avg
C2
C1
Q1
Q2
Q1
Q2
W1
S1
W2
S2
W1
S1
W2
S2
t1
t1
t2
t2

Under policy X first Q1 then Q2

Under policy Y first Q2 then Q1

For HX lt HY

20
Scheduling for Slowdown (3)
Priority of Qx

Sx/Cxavg is the output rate (ORx) of Qx
Cx is the ideal processing time of a tuple
produced by Qx
Our Highest Normalized Rate (HNR) policy
emphasizes the tuple ideal processing time
Inexpensive queries with low productivity are not
penalized
For equal costs Ci 1 ? HNR HR
For selectivity 1 Si 1 ? HNR SRPT

21
Results Average Slowdown
20
Avg. Slowdown
22
Outline

Introduction
Scheduling for Quality of Service (QoS)
Average response time
Average slowdown
Balancing the trade-off between average and worst
case
Implementation issues
Conclusions

23
Worst-Case Performance

Queries/Events may experience starvation
Queries with low selectivity and/or high cost
Typically measured using
maximum response time, or
maximum slowdown
Maximum slowdown (or response time) is
A very sensitive metric
It does not consider the average-case performance

24
Trade-off between Avgerage Case and Worst Case

Maximum slowdown worst-case performance
Average slowdown average-case performance
We need to look at both metrics at the same time
Lp norm of slowdowns captures both metrics
L2 norm of N tuples
it takes into account all values
it penalizes outliers

25
Scheduling for the L2 Norm of Slowdowns

Balance Slowdown Policy (BSD)
Priority of Qx
A query is scheduled either because
It has a high normalized rate, or
Its pending tuples accumulated high slowdown
All users are satisfied Fairness

Normalized Rate
Current Slowdown
26
Results Balancing the trade-off
77
Max. Slowdown
31
Avg. Slowdown
27
Results L2 Norm of Slowdowns
L2 Norm of Slowdowns
24
28
Slowdown per Class (same cost queries)
29
Outline

Introduction
Scheduling for Quality of Service (QoS)
Implementation issues
Scheduling overhead
Shared operators (details in paper)
Conclusions

30
Optimization Methods
L2 SD of BSD-Logarithmic / L2 SD of
BSD-Hypothetical

BSD-Hypothetical BSD without overhead

31
Conclusions

In this talk, we presented
QoS metrics for evaluating the performance of a
DSMS
Scheduling policies that exploit the properties
of CQs
Policies to improve QoS
Highest Rate (HR) for average response time
Highest Normalized Rate (HNR) for average
slowdown
Balance Slowdown (BSD) for balancing the
trade-off between average- and worst-case
performance
Addressed implementation issues to ensure the
applicability of our proposed policies
We empirically evaluated the gains provided by
the proposed policies compared to existing
policies