Eddies: Continuously Adaptive Query Processing presentation

About This Presentation

Transcript and Presenter's Notes

Title: Eddies: Continuously Adaptive Query Processing

1
Eddies Continuously Adaptive Query Processing

Based on a SIGMOD2002 paper and talk
by Avnur and Hellerstein.

2
State-of-Art in Query Optimization

Given
Database state and statistics known a-priori
One (short) user query to process
Query may be run only once
Query Processing
A-priori decide on a (static) query plan
Run query using this one plan
Also
Possibly update statistics sometimes (in steady
state)

3
Adaptive Systems General Flavor

Repeat
Observe (model) environment
Use observation to choose behavior
Take action

4
Adaptivity in Current DBs

Limited coarse grain
Repeat
Observe (model) environment
runstats (once per week!!) model changes in data
Use observation to choose behavior
query optimization fixes a single static query
plan
Take action
query execution blindly follow plan

5
Query Optimization

Adaptivity at a per-week frequency!
Not suited for volatile environments

6
A Networking Problem!?

Networks do dataflow!
Significant history of adaptive techniques
E.g. TCP congestion control
E.g. routing
But traditionally much lower function
Ship bitstreams
Minimal, fixed code
Lately, moving up the foodchain?
app-level routing
active networks

7
Query Plans are Dataflow

Programming model iterators
old idea, widely used in DB query processing
object with three methods
Init(), GetNext(), Close()
input/output types
query plan graph of iterators
pipelining iterators that return results before
children Close()

8
Querying in Volatile Environments

Federated query processors
No control over stats, performance, admin
(DataJoiner)
Shared-Nothing Systems
No control over system balance
User control of running queries
No control over user interaction (online
aggregation)
Sensor Nets the next killer app
No control over anything!

9
Varying

Computing resources
Data flows unpredictably from sources
Code performs unpredictably along flows
Continuous volatility due to many decentralized
systems
Data Characteristics
Distributions
Burstiness
User preferences
What get fast
How much data

10
Toward Continuous Adaptivity

Need much more frequent adaptivity
Goal adapt per tuple of each relation??
The traditional runstats-optimize-execute loop is
far too coarse-grained
So, continuously perform all 3 functions, at
runtime
Aim for adaptivity over best-case performance (as
the later never exists for long)

11
Road Map

Adaptive Query Processing
Intra-join adaptivity
Synchronization Barriers
Moments of Symmetry
Eddies
Encapsulated, adaptive dataflow

12
Adaptable Operators and Plans

Moments of symmetry query
processing stage during which pipelined query
operators or inputs can be easily reordered (with
no or minimal state management)
Synchronization barriers require
inputs from different sources to be coordinated
and possibly restricted to the rate of the slower
input
We need good operators.

13
Adaptable Joins, Issue 1

Synchronization Barrier merge join
Right input frozen,waiting for left
Cant adapt while waitingfor barrier!
So, favor joins that have
no barriers or seldom barriers
at worst, adaptable barriers

?
2000 2001 2002 2003 2004
2 3 4 5 6
14
Adaptable Joins, Issue 2

Would like to reorder in-flight (pipelined) joins
Base case swap inputs to a join ??
Moment of symmetry
inputs can be swapped with no/little state
management
Aim for frequent moments of symmetry ? more
frequent adaptivity

15
Adaptable Joins, Issue 2

Moments of Symmetry
Suppose you can adapt an in-flight query plan
How would you do it?
Base case reorder inputs of a single join
Nested loops join

16
Adaptable Joins, Issue 2

Moments of Symmetry
Suppose you can adapt an in-flight query plan
How would you do it?
Base case reorder inputs of a single join
Nested loops join
Cleaner if you waittil end of inner loop

17
Adaptable Joins, Issue 2

Moments of Symmetry
Suppose you can adapt an in-flight query plan
How would you do it?
Base case reorder inputs of a single join
Nested loops join
Cleaner if you waittil end of inner loop
Hybrid Hash
Reorder while building?

18
Moments of Symmetry, cont.

Moment of Symmetry
Can swap join inputs w/o state modification
Nested Loops join end of each inner loop
Hybrid Hash join never
Sort-Merge join essentially always
More frequent moments of symmetry ? more
frequent adaptivity

19
Joins for Adaptivity

Pipelined hash join (hash ripple or Xjoin)
No synchronization barriers
Continuous symmetry
Good for equi-join
Simple (or block) ripple join
Synchronization barriers at corners
Moments of symmetry at corners
Good for non-equi-join
When symmetry At corners, i.e., for each new
tuple, once it has been processed using the given
operator s state

R
S
?
20
Beyond Binary Joins

Think of swapping inners
Can be done at a global moment of symmetry
Intuition like an n-ary join
Except that each pair can bejoined by a
different algorithm!
So
Need to introduce n-ary joins to a query engine

21
Need well-behaved join algorithms

Pipelining
Avoid synch barriers
Frequent moments of symmetry

22
Continuous Adaptivity Goal Eddies
Eddy

Avoid need for traditional cost estimation
Avoid generation of a good query plan

23
Continuous Adaptivity Eddies
Eddy

A pipelining n-ary tuple-routing iterator
(just like join or sort)
works well with ops that havefrequent moments of
symmetry

24
Continuous Adaptivity Eddies
Eddy

Adjusts flow adaptively
Tuples flow in different orders
Visit each op once before output

25
Routing Eddies
Eddy

Naïve routing policy
All ops fetch from eddy as fast as possible
Previously-seen tuples precede new tuples

26
Schedule Grab when Ready?

Two expensive selections s1 and s2
Selectivity(s1)Selectivity(s2)50
Cost(s2) 5.
Vary Cost(s1).
What expect? ?
Does it make a difference at all?

27
Cost Factor?

Two expensive selections, 50 selectivity
Cost(s2) 5. Vary cost of s1.
Favors faster operation

28
But is it Enough?

Given two expensive selections
Cost same, say cost(s1)cost(s2)5
Selectivity(s2) 50.
Vary selectivity of s1.
Does that make a difference?

29
Selectivity-based?

Two expensive selections, cost 5
Selectivity(s2) 50. Vary selectivity of s1.

30
Schedule Selectivity-based?

Conclude Heavy tuple shedder early on is good.

31
How to choose?

If we knew all selectivities and all costs (and
they were static), maybe we could pick the best
overall schedule here.
Otherwise, we need a cheap means to observe their
changes
And, we need a means to react in a simply manner
based on those perceived changes

32
An Aside How to choose?

A machine learning problem?
Each agent pays off differently
Explore Or Exploit?
Heuristics ?
Sometimes want to randomly choose one
Usually want to go with the best
If probabilities are stationary, dampen
exploration over time

33
Eddies with Lottery Scheduling

Operator gets 1 ticket when it takes a tuple
Favor operators that run fast (low cost)
Operator loses a ticket when it returns a tuple
Favor operators that drop tuples (low
selectivity)
Winner?
Large number of tickets measure of goodness
Lottery Scheduling
When two operators vie for the same tuple,
hold a lottery
Never let any operator go to zero tickets
Support occasional random exploration

34
Lottery-Based Eddy

Two expensive selections, cost 5
Selectivity(s2) 50. Vary selectivity of s1.

35
In a Volatile Environment

Two index joins
Slow 5 second delay Fast no delay
Toggle after 30 seconds

36
Related Work
Competition Sampling
Query Scrambling
Ingres DECOMP
Inter-Operator
Late Binding
Future Work
Per Query
System R
Eddies
Frequency of Adaptivity

Late Binding Dynamic, Parametric
HP88,GW89,IN92,GC94,AC96,LP97
Per Query Mariposa SA96, ASE CR94
Competition RDB AZ96
Inter-Op KD98, Tukwila IF99
Query Scrambling AF96,UFA98
Survey Hellerstein, Franklin, et al., DE
Bulletin 2000

37
Summary

Eddies Continuously Adaptive Dataflow
Suited for volatile performance environments
Changes in operator/machine peformance
Changes in selectivities (e.g. with sorted
inputs)
Changes in data delivery
Currently adapts join order
Competitive methods to adapt access join
methods?
Requires well-behaved join algorithms
Pipelining
Avoid synch barriers
Frequent moments of symmetry
The end of the runstats/optimizer/executor
boundary!
At best, System R is good for hints on initial
ticket distribution

Write a Comment

User Comments (0)

About PowerShow.com

Eddies: Continuously Adaptive Query Processing PowerPoint PPT Presentation