Adaptive Query Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Adaptive Query Processing

Description:

Actuate. Measure: monitor cardinalities, resource utilization, etc ... Actuate: switch to a new query plan, hopefully without wasting too much work ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 58
Provided by: Shan218
Learn more at: http://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Query Processing


1
Adaptive Query Processing
  • Amol Deshpande, University of Maryland
  • Vijayshankar Raman, IBM Almaden Research Center

Title slide
2
Goal
  • Research in query processing and optimization has
    taken a new track into adaptive query processing
    (QP)
  • Most new research projects that need QP use some
    adaptive approach
  • Whether regular QP, XML QP, text, continuous QP,
    web QP,
  • Selinger-style query processing fallen out of
    flavor
  • Also true in industrial projects, especially new
    ones.
  • Educate audience on what this is all about
  • Where adaptive QP works, where it has known
    problems, what is unknown
  • What is Out of Scope of this tutorial
  • Parallelism, distributed eddies, continuous query
    processing

3
Agenda
  • Background on non-adaptive Query Processing
  • History, where it works, and where it breaks
  • Adaptive Query Processing
  • Dimensions of Adaptivity
  • Evolutionary Adaptive query processing
  • Late binding and parametric optimization
  • Competition
  • Mid-query reoptimization
  • Tuple-wise adaptive Query Processing
  • Eddies, SteMs, STAIRs
  • Commentary
  • What works and what fails
  • Open problems

4
Background on Non-Adaptive Query Processing
  • Central value proposition of relational model
    declarative queries
  • You decide what data access (query) you want to
    do the DBMS will figure out the right
    algorithm (plan)
  • right usually translated to good enough
  • Standard method of realizing this cost-based
    query optimization

5
Cost-based Query Optimization
Results
Query Optimizer
Query Executor
Compiled Query Plan
Declarative Query
Disk(s)
6
Non-Adaptive Query Processing
  • Several Improvements over the years
  • Wider plan space
  • Started out with left deep tree of binary join
    operators
  • Now
  • lots more operators
  • Many more query transformations
  • multi-block SQL
  • bushy plans
  • Plan space has grown to be huge and messy
  • Use of statistics to get better costs
  • Better dynamic programming techniques
    (transformational, memoization)
  • Good surveys exist Graefe 1993, Chaudhuri
    1998

7
Success of Non-Adaptive QP
  • Used pretty much everywhere
  • every commercial DBMS uses some form of
    cost-based optimization
  • Why?
  • Plan space is too wide and combinatorial
  • Other ideas like randomized optimization simply
    dont search widely enough(e.g. all possible
    ways of using indexes)
  • Benefits from 25 years of tuning
  • Widely viewed as success story for relational
    databases, and as vindication of relational
    model
  • Some qualifications
  • DBAs love and demand control (hints,
    syntax-direction, plan-fixing across releases, )
  • Applications often help the query processor by
    splitting a query into multiple phases (precursor
    of adaptivity)

8
Where does traditional optimization break (1)
  • In recent years, optimizers have started to break
    at the seams as they are extended to new
    environments

SELECT p.profile_id, p.account_id,
p.profile_title, p.profile_default,
r.profile_detail_id AS registrant_detail_id,
r.profile_first_name AS registrant_first_name,
r.profile_last_name AS registrant_last_name,
r.profile_job_title AS registrant_job_title,
r.profile_organization_name AS registrant_organiza
tion_name, r.profile_address1 AS
registrant_address1, r.profile_address2 AS
registrant_address2, r.profile_city AS
registrant_city, r.state_id FROM
TABLE_ACCOUNT_PROFILES p LEFT JOIN
TABLE_ACCOUNT_PROFILE_DETAILS r ON p.profile_id
r.profile_id AND r.profile_type_id 1 LEFT JOIN
TABLE_STATES rs ON r.state_id rs.state_id LEFT
JOIN TABLE_COUNTRIES rc ON
Results
very complex user queries
Query Optimizer
Query Executor
Declarative Query
Disk(s)
9
Where does traditional optimization break (1)
  • In recent years, optimizers have started to break
    at the seams as they are extended to new
    environments

wide area federations
SELECT p.profile_id, p.account_id,
p.profile_title, p.profile_default,
r.profile_detail_id AS registrant_detail_id,
r.profile_first_name AS registrant_first_name,
r.profile_last_name AS registrant_last_name,
r.profile_job_title AS registrant_job_title,
r.profile_organization_name AS registrant_organiza
tion_name, r.profile_address1 AS
registrant_address1, r.profile_address2 AS
registrant_address2, r.profile_city AS
registrant_city, r.state_id FROM
TABLE_ACCOUNT_PROFILES p LEFT JOIN
TABLE_ACCOUNT_PROFILE_DETAILS r ON p.profile_id
r.profile_id AND r.profile_type_id 1 LEFT JOIN
TABLE_STATES rs ON r.state_id rs.state_id LEFT
JOIN TABLE_COUNTRIES rc ON
Results
very complex user queries
Query Optimizer
Query Executor
Declarative Query
data streams
X M L
10
Where does traditional optimization break (2)
  • Bad Statistics
  • DBA-challenged environment is the norm
  • New data types like XML and text
  • Deep Web and federated systems
  • Correlations (many optimizers implement only
    marginal histograms)
  • Very dynamic environments
  • Continuous queries query runs for ever, so
    things change
  • Wide area federations

11
Where does traditional optimization break (3)
  • Complex queries
  • Large numbers of tables (usually bail to greedy
    beyond 10-20 tables in a join)
  • Queries with Parameter markers
  • Select average(salary) from Employee where age gt
    1 and age lt 2
  • Need for sharing work between queries esp. in
    continuous queries
  • Asynchronous (push) data sources (e.g. data
    streams)
  • New Metrics for QP
  • E.g., Interactive metric
  • User preferences change too quickly for
    optimization to pay off
  • Pipelined plans focus on early results

12
Agenda
  • Background on non-adaptive Query Processing
  • History, where it works, and where it breaks
  • Adaptive Query Processing
  • Dimensions of Adaptivity
  • Evolutionary Adaptive query processing
  • Late binding and parametric optimization
  • Competition
  • Mid-query reoptimization
  • Tuple-wise adaptive Query Processing
  • Eddies, SteMs, STAIRs
  • Commentary
  • What works and what fails
  • Open problems

13
Idea of Adaptive Query Processing
  • Interleave the optimization and execution stages
  • We learn better statistics as the query executes
  • Two main styles

Evolutionary
Revolutionary
Static Late Inter
Intra
Per Plans Binding
Operator Operator
Tuple
Traditional Dynamic QEP Query Scrambling
Xjoin, DPHJ, Eddies DBMS
Parametric Mid-query Reopt.
Convergent QP
Competitive Progressive Opt.
14
Idea of Adaptive Query Processing
  • Interleave the optimization and execution stages
  • We learn better statistics as the query executes
  • Two main styles

Evolutionary
Revolutionary
Static Late Inter
Intra
Per Plans Binding
Operator Operator
Tuple
Traditional Dynamic QEP Query Scrambling
Xjoin, DPHJ, Eddies DBMS
Parametric Mid-query Reopt.
Convergent QP
Competitive Progressive Opt.
15
Dimensions of Adaptation
Measure
Analyze
Actuate
Plan
  • Measure monitor cardinalities, resource
    utilization, etc
  • Analyze verify compliance with performance goals
    (very tricky problem during intermediate stages
    of a query)
  • Plan can involve optimizer, or be a routing
    policy
  • Actuate switch to a new query plan, hopefully
    without wasting too much work

16
Dimensions of Adaptation
Measure
Analyze
Actuate
Plan
  • Traditional database system use a simple form of
    inter-query adaptation
  • Measurements done on the tables directly, and
    analyzed during query optimization time
  • Also, by measuring the cardinalities during
    execution
  • Adaptive selectivity estimation
    Chen/Roussopoulos 1994
  • LEO Volker et al 2001
  • SITS Bruno/Chaudhuri 2002
  • But always a single plan is used to execute the
    query fully

17
Agenda
  • Background on non-adaptive Query Processing
  • History, where it works, and where it breaks
  • Adaptive Query Processing
  • Dimensions of Adaptivity
  • Evolutionary Adaptive query processing
  • Late binding and parametric optimization
  • Competition
  • Mid-query reoptimization
  • Tuple-wise adaptive Query Processing
  • Eddies, SteMs, STAIRs
  • Commentary
  • What works and what fails
  • Open problems

18
Adaptive Query Processing
Evolutionary
Revolutionary
Static Late Inter
Intra
Per Plans Binding
Operator Operator
Tuple
Traditional Dynamic QEP Query Scrambling
Xjoin, DPHJ, Eddies DBMS
Parametric Mid-query Reopt.
Convergent QP
Competitive Progressive Opt.
19
Late Binding and Parametric Optimization
  • Idea let optimizer pick multiple plans for each
    query
  • Choose the best one just before execution

Pick one and Actuate
Plan
Q
Plan
Plans
20
Late Binding and Parametric Optimization
  • Parametric optimization Graefe/Cole 1994,
    Ioannidis 1992, Ganguly 1998, etc.
  • Pick optimal plans for various regions of the
    parameter spaces
  • Do this during dynamic programming itself
  • Using plausible assumptions about cost functions
  • Challenges
  • makes optimization too expensive
  • Many plausible assumptions dont hold in practice
    (see, e.g., Haritsa 2005)
  • Cannot address problems that 1st crop up during
    execution

selectivity 2
selectivity 1
21
Late Binding and Parametric Optimization
  • Variation Validity Ranges (Markl/Raman/Lohman/Pi
    raheh/Simmens 2005)
  • Attach to each edge of a query plan, a range of
    cardinalities within which plan is optimal
  • Easier to compute during dynamic programming
    optimizer
  • Use numerical root finding techniques works well
    with real cost functions
  • Extends nicely to more general notions of robust
    query optimization
  • e.g., range within which plan is within 50 of
    optimal (Raman/Haas/Reder/Markl 2005).
  • But, Limited effectiveness
  • only tells you if a plan is broken, not how to
    fix it
  • Meshes well with mid-query reoptimization

22
Competition
Measure
Run for a while
Actuate a few
Analyze
Pick faster one
Q
Plan
Plan
Plans
  • E.g. DEC RDB Antonshenkov 1993 run access
    methods competitively
  • Measure monitor cardinalities, costs of access
    methods
  • Analyze decide which access method has higher
    rate
  • Advantage simple method
  • Challenge avoid redundant work, avoid duplicates
  • Use a shared hash table global timestamp
    Raman/Deshpande/Hellerstein 2003
  • Needs more evaluation of overheads

23
Adaptive Query Processing
Evolutionary
Revolutionary
Static Late Inter
Intra
Per Plans Binding
Operator Operator
Tuple
Traditional Dynamic QEP Query Scrambling
Xjoin, DPHJ, Eddies DBMS
Parametric Mid-query Reopt.
Convergent QP
Competitive Progressive Opt.
24
Mid-Query Reoptimization
  • Kabra et al 1998, Markl et al 2005, Babu et al
    2005
  • Switch query execution plans at well-defined
    points
  • Typically, materialization points
  • Advantage
  • seems to work well in practice
  • Independent implementations
  • Challenge
  • coarse grained adaptation
  • Cannot always reuse work done in prior executions

If plan is sufficiently sub-optimal, switch to
new plan
25
Measure monitoring cardinalities
  • During execution, monitor cardinalities on each
    edge of query plan
  • E.g., each call to getNext() increments count
  • must refine for operators that are executed
    multiple times(e.g., inner of nested loops
    joins) e.g. Stillger/Markl/Lohman 2001
  • As we monitor, we narrow down range of possible
    values for true cardinality (primarily the lower
    bound)
  • Can extrapolate to expected value of true
    cardinality
  • E.g.
  • Expected value of c2 expected value of c1
    (current c2) / (current c1)

c1
c2
op
26
Analyze Switching plans Mid-Query
  • Re-optimize the original query to form a new plan
  • Getting a better plan
  • Plug in all cardinality information learned
    during this query
  • Reusing work
  • Plug in fully materialized relations from current
    executionas materialized views
  • Materialization points
  • Sorts
  • Inner of hash joins
  • Explicit materializations (uncorrelated inners,
    common subexpressions, )

27
Analyze when to switch plans
  • When no tuples have been output, and
  • When current plan is sufficiently sub-optimal,
    and
  • Range of true cardinalities on an edge has no
    overlap with validity range for that edge
  • Or, expected true cardinality falls far outside
    of validity range
  • Cost of reoptimization gt cost of new plan
    remaining cost of current plan
  • Hard to gauge this accurately many variables
  • How many intermediate results we will succeed in
    reusing
  • how much better is the new plan
  • Relatively easy to gauge this if we switch at a
    materialization point
  • Because cost of reoptimization is negligible

28
Adaptive Query Processing
Evolutionary
Revolutionary
Static Late Inter
Intra
Per Plans Binding
Operator Operator
Tuple
Traditional Dynamic QEP Query Scrambling
Xjoin, DPHJ, Eddies DBMS
Parametric Mid-query Reopt.
Convergent QP
Competitive Progressive Opt.
29
Query Scrambling
  • Amsaleg/Franklin/Tomasic/Urhan 1996,
    Urhan/Franklin/Amsaleg 1998
  • Designed for coping with delays in a wide area
    setting

Query Processor
Measure delays
Network
Response time optimization
Wide Area Data Sources
30
Adaptive Query Processing
Evolutionary
Revolutionary
Static Late Inter
Intra
Per Plans Binding
Operator Operator
Tuple
Traditional Dynamic QEP Query Scrambling
Xjoin, DPHJ, Eddies DBMS
Parametric Mid-query Reopt.
Convergent QP
Competitive Progressive Opt.
31
Agenda
  • Background on non-adaptive Query Processing
  • History, where it works, and where it breaks
  • Adaptive Query Processing
  • Dimensions of Adaptivity
  • Evolutionary Adaptive query processing
  • Tuple-wise adaptive Query Processing
  • Eddies, SteMs, STAIRs
  • Commentary
  • What works and what fails
  • Open problems

32
Eddies Avnur/Hellerstein 2000
Eddies combine all the dimensions of adaptivity
into a single operator
Measure
Eddy
Analyze
Actuate
Plan
33
Eddies Avnur/Hellerstein 2000
A traditional query plan queries executed using
iterator (getNext) model
Query execution using an eddy
Output
Eddy
S
Output
E
C
C
  • An eddy operator
  • Intercepts tuples from sources and output
    tuples from operators
  • Uses feedback from the operators to route

S
E
34
Example Database
select from students, enrolled, courses where
students.name enrolled.name and
enrolled.course courses.course
Name Level Course Instructor
Jen Senior CS2 Smith
Name Level Course
Joe Junior CS1
Jen Senior CS2
Course Instructor
CS2 Smith
Courses
Name Course
Joe CS1
Jen CS2
Name Level
Joe Junior
Jen Senior
Students
Enrolled
35
Query Execution using Eddies
Probe to find matches
Insert with key hash(joe)

HashTable S.Name
HashTable E.Name
Joe Junior
Joe Junior
Joe Junior
Eddy
S
E
Output
C

HashTable E.Course
HashTable C.Course
No matches Eddy processes the next tuple
E C
36
Query Execution using Eddies
Probe
Insert

HashTable E.Name
HashTable S.Name
Joe CS1
Joe Jr
Joe CS1
Jen Sr
Joe Jr CS1
Eddy
S
Joe CS1
E
Output
C

HashTable E.Course
HashTable C.Course
Joe Jr CS1
Joe Jr CS1
CS2 Smith
37
Query Execution using Eddies
Probe

HashTable S.Name
HashTable E.Name
Joe CS1
Joe Jr
Jen CS2 Smith
Jen Sr
Jen CS2 Smith
Jen Sr. CS2 Smith
Eddy
S
Jen Sr. CS2 Smith
Jen CS2
E
Output
C

HashTable E.Course
HashTable C.Course
Probe
Jen CS2
CS2 Smith
Joe Jr CS1
Jen CS2 Smith
Jen CS2
Note this is a symmetric hash join
38
Eddies Postmortem
Output
Output
E C
E S
S E
Courses
C E
Students
Course Instructor
CS2 Smith
Name Level
Joe Junior
Jen Senior
Students
Enrolled
Courses
Enrolled
Name Level
Joe Junior
Jen Senior
Name Course
Joe CS1
Name Course
Jen CS2
Course Instructor
CS2 Smith
  • Eddy executes different query execution plans for
    different parts of data
  • Eddy adapts the join order during query execution
  • Access methods and join algorithms still chosen
    up front

39
Eddies and Pipelined Joins
  • Simplest way to describe eddies is in terms of
    symmetric hash join
  • This is the approach we used in our example
  • Approach extends directly to index nested loops
    joins
  • Can also extend this to non-pipelined joins (e.g.
    sort-merge, or hybrid hash)
  • BUT, opportunities for adaptation are much less
  • Only at end of building whole hash table (just
    like with mid-query re-optimizations)
  • Generalized using SteMs and STAIRs

40
Eddies Routing Policy Non-adaptive
  • Choosing which operator to route a given tuple to
  • The brain of the eddy

Send S and E tuples here Returns S JOIN E (SE)
tuples
Non-Adaptive Optimization 1. Run a regular
optimizer to figure out the join order 2. Map
join order into order of tuple routing
Eddy
S
E
Output
C
Send SE, and C tuples here Returns SEC tuples
41
Eddies Routing Policy Lottery Scheduling
  • Choosing which operator to route a given tuple to
  • The brain of the eddy

Send here 99 of the time Send to the other
operator 1 of the time
sent 100 received 2
Lottery Scheduling Avnur 00 Simplified
Description 1. Maintain for each operator
tuples sent tuples returned
cost per tuple 2. Choose (roughly) based on the
above 3. Explore by randomly sending tuples in
the wrong orders
Eddy
S
E
Output
C
sent 10 received 20
42
Eddies Routing Policy Statistics Based
  • Choosing which operator to route a given tuple to
  • The brain of the eddy

Deshpande, Hellerstein 04 1. Learn
statistics on each input table as data streams
in 2. Choose (roughly) based on the statistics
(analogous to traditional QO)
Eddy
S
Output
E
C
43
Eddies Routing Policy Interactivity Metric
  • Choosing which operator to route a given tuple to
  • The brain of the eddy

Raman, Hellerstein 02 Use user
preferences for different kinds of tuples and
different kinds of result columns to determine
the routing order
Eddy
S
Output
E
C
44
Agenda
  • Background on non-adaptive Query Processing
  • History, where it works, and where it breaks
  • Adaptive Query Processing
  • Dimensions of Adaptivity
  • Evolutionary Adaptive query processing
  • Tuple-wise adaptive Query Processing
  • Eddies, SteMs, STAIRs
  • Commentary
  • What works and what fails
  • Open problems

45
State Modules (SteMs) More aggressive adaptation
  • Eddies allow dynamic adaptation of join order
  • Within a particular spanning tree of the join
    graph
  • Other adaptations
  • Access methods especially index vs scan access
    methods
  • Join algorithms e.g., index join (IJ) vs hash
    join (HJ)
  • To guard against wrong cardinality estimates
  • To simultaneously optimize for both pipelining
    completion time (IJ?HJ)
  • To deal with memory overflows (HJ?IJ)

46
Query Processing with SteMs
  • Eddy execs queries by routing tuples to SteMs and
    AMs
  • SteM
  • Dictionary of homogeneous tuples
  • (insert) build and (search) probe operations
  • Access modules (AMs)
  • probe with tuples and get matches
  • One Example Symmetric Hash Join
  • Build SteMR
  • Probe SteMS with R tuple
  • Build SteMS
  • Probe SteMR with R tuple
  • Generalizes to n-ary join
  • Can simulate other join algorithms, by using
    other routing policy
  • Thus, changing the routing allows adaptation of
    join algorithm
  • Subtle routing constraints needed to ensure
    correct execution Raman et al 2003

S bld
R bld
S probe
R probe
Eddy
repeat
S
R
47
Another application of SteMs
  • Sharing State across queries
  • E.g. 1000 queries of the formselect from R
    where R.a parameter
  • Equivalent toselect from R, Q where R.a
    Q.parameter
  • Eddy routes tuples from R SteM and queries from Q
    SteM
  • Useful for continuous-query proc. and multi-query
    proc
  • Need to extend SteMs with eviction (deletion)
    operation for window joins
  • See Madden et. al paper in SIGMOD 2002

48
Agenda
  • Background on non-adaptive Query Processing
  • History, where it works, and where it breaks
  • Adaptive Query Processing
  • Dimensions of Adaptivity
  • Evolutionary Adaptive query processing
  • Tuple-wise adaptive Query Processing
  • Eddies, SteMs, STAIRs
  • Commentary
  • What works and what fails
  • Open problems

49
Query execution using STAIRS Deshpande/Hellerstei
n 2004(Storage, Transformation and Access for
Intermediate Results)
  • Extend SteMs to handle reuse of intermediate
    results

Probe into E.Name STAIR
s1
s1
s1
s1
50
Agenda
  • Background on non-adaptive Query Processing
  • History, where it works, and where it breaks
  • Adaptive Query Processing
  • Dimensions of Adaptivity
  • Evolutionary Adaptive query processing
  • Tuple-wise adaptive Query Processing
  • Eddies, SteMs, STAIRs
  • Commentary
  • What works and what fails
  • Open problems

51
So where are we
  • Evolutionary techniques
  • Relatively well understood
  • Commercial database support in recent years
  • But
  • Dont offer fine-grained adaptivity required for
    the new environments
  • Going back to the list of reasons why adaptive
    query processing is required

52
Evolutionary Techniques
  • Can handle to some extent
  • Queries with parameter markers
  • Complex queries with large tables
  • Bad Statistics/correlations (if used in
    conjunction with sophisticated synopses)
  • Can not easily handle
  • Deep web and federated systems (maybe)
  • Interactive query metrics
  • Can not handle
  • Very dynamic environments
  • Continuous queries query runs for ever, so
    things change
  • Extensive sharing of work between queries esp.
    in continuous queries
  • Asynchronous (push) data sources, triggers, etc.
  • Open
  • XML (depends on the complexity of the query
    language)
  • Text and other types of complex data

53
So where are we
  • Revolutionary techniques
  • Show great promise
  • Have the potential to handle most of the issues
  • But
  • Much harder to implement
  • Seem to have non-negligible overheads in the
    common case
  • Very immature research area
  • More work is needed

54
Some Open Problems
  • Parametric query optimization
  • Can potentially avoid the need for adaptive query
    processing in many scenarios
  • Especially parameter markers and correlations
  • Few practical techniques so far
  • Optimization with expanded plan space
  • Eddies can explore a plan space much larger than
    traditional plan space
  • They allow relations to be broken into pieces,
    with each piece executed separately
  • Can we explore this plan space in a non-adaptive
    setting ?
  • Recent work on
  • Conditional Planning Deshpande et al, ICDE 2005
  • Content-based Routing Babu et al, VLDB 2005

55
Some Open Problems
  • Eddies
  • Routing policies
  • Whether eddies will succeed depends on the
    routing policies
  • Little work so far...
  • SteMs, STAIRs
  • Theoretical analysis of optimization space
  • Practical viability analysis
  • Especially in the context of continuous query
    processing
  • Parallel, distributed environments, P2P, Grid
  • Disk
  • Flexibility demanded by adaptive techniques at
    odds against the careful scheduling typically
    done by DBMSs
  • Very little work on understanding this

56
Conclusions
  • Declarative queries are the fundamental
    justification for the relational model
  • Otherwise we could all go back to CODASYL
  • Traditional query optimization does not work all
    that well outside of a very narrow domain
  • Evolutionary techniques
  • Well understood Offer limited adaptivity
  • Revolutionary techniques
  • Hold much more promise Immature area
  • Very exciting research lies ahead

57
Questions ?
Write a Comment
User Comments (0)
About PowerShow.com