Robust Query Processing through Progressive Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Robust Query Processing through Progressive Optimization

Description:

Robust Query Processing through Progressive Optimization ... Current optimizers depend heavily upon the cardinality ... algorithms, and flipping inner/outer ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 36
Provided by: rajaag
Category:

less

Transcript and Presenter's Notes

Title: Robust Query Processing through Progressive Optimization


1
Robust Query Processing through Progressive
Optimization
Volker Markl, Vijayshankar Raman, David
Simmen, Guy Lohman, Hamid Pirahesh, Miso
Cilimdzic
  • SIGMOD 2004

Modified by S. Sudarshan from talk by Raja
Agrawal
2
Motivation
  • Current optimizers depend heavily upon the
    cardinality estimations
  • What if there errors in those estimations?
  • Errors can occur due to
  • Inaccurate statistics
  • Invalid assumptions (e.g. attribute independence)

3
Progressive Query Optimization
  • Idea lazily trigger reoptimization during
    execution if cardinality counts indicate current
    plan is suboptimal
  • introduces checkpoint (CHECK) operator to compare
    actual vs estimated cardinality
  • key idea precompute cardinality ranges for which
    plan is optimal

4
Evaluating a re-optimization scheme
  • Risk Vs Opportunity
  • Risk
  • Extent to which re-optimization is not worthwhile
  • reoptimization chooses another bad plan
  • work redone
  • cardinality errors may even cancel, and fixing
    one may give an even worse plan!
  • Opportunity
  • Refers to the aggressiveness
  • more CHECK operators..

5
Background
Risk
Opportunity
  • Redbrick
  • Star schema with fact table and multiple
    dimension tables
  • First apply selections on dimension tables
  • Then decide what plan to use
  • Kabra DeWitt 98 (KD98)
  • Introduced idea of mid-query reoptimization
  • Allow partial results to be use like materialized
    views
  • But ad-hoc cardinality threshold, and only
    reoptimize fully materialized plans

6
Background
Risk
Opportunity
  • Tukwila data integration system
  • optimizer may have no idea of statistics
  • interleave optimization and query execution
  • partial query plans
  • Fragment fully pipelined tree with doubly
    pipelined hash join
  • Query Scrambling
  • reorder query to deal with delayed sources

7
Background
Risk
Opportunity
  • Eddies (Telegraph)
  • Ingres/DEC Rdb run multiple access methods
    competitively then choose
  • Parametric Query Optimization (PQO)
  • e.g. Cole and Graefe 94, Hulgeri and Sudarshan 02
  • Choose from a set of plans, each optimal for
    selectivity range
  • POP converse find optimal cardinality range for
    a give plan

8
Example of Progressive Optimization in Action
9
Progressive Query Optimization(POP)
10
Architecture of POP
11
Architecture of POP
  • CHECK operator to find if a plan is suboptimal
  • At optimization time, find out cardinality range
    (at CHECK location) for which plan is optimal
  • At run time, ensure cardinality within l,u
  • If violated, stop plan execution and reoptimize
  • Location of CHECKs
  • Re-optimize
  • taking observed cardinality into account, and
  • exploiting intermediate results where beneficial
  • Heuristic limit number of reoptimizations
    (default 3)

12
Validity Ranges
  • Consider a plan edge e that flows rows into
    operator o,
  • let P be the subplan rooted at o.
  • The validity range for e is an upper and lower
    bound on the number of rows flowing through e,
    such that if the range is violated at runtime, we
    can guarantee P is suboptimal
  • Ad-hoc thresholds (proposed earlier) are a bad
    idea
  • E.g. even a 100x error on very small relation may
    not make a difference in optimal plan

13
Finding Optimality Ranges
  • Plan Popt with root operator oopt is being
    compared with another plan Palt different only in
    the root operator oalt.

14
Finding Optimality Ranges
  • Need to solve
  • cost(Palt , c) cost(Popt , c) 0
  • where c is the cardinality on edge e
  • Cost functions can be complex/non-linear/non-conti
    nuous

15
Newton-Raphson Iteration
16
What does this achieve?
  • Detects suboptimality of the root operator where
    Popt and Palt share the same input edges.
  • Validity range might miss a cross-over point with
    a plan that uses a different join order (and
    hence has different input edges).
  • Two plans are structurally equivalent if they
    share the same set of edges
  • where an edge is defined by the set of rows
    flowing through it during query execution.
  • Allows different algorithms, and flipping
    inner/outer

17
Optimality wrt structurally equivalent plans
  • Theorem . Suppose edges edges ei1 , ei2 , ,
    eik are seen to be erroneous wrt cardinality.
    Then the following statements are equivalent
  • P is suboptimal with respect to another plan P'
    that has the same set of edges e1 , e2 , , em
  • At least one of Pi1 , Pi2 , , Pik is suboptimal
    given the cardinality errors in those edges in
    e1 , e2 , , em that lie under them.
  • At least one of oi1 , oi2 , , oik is a
    suboptimal operator given the cardinality errors
    in e1 , e2 , , em that are in its input
    edges.

18
Conservative detection of suboptimality
  • Suppose we detect suboptimality of (R Join S)
    Join T wrt estimated costs of (R Join T) Join S
  • During run time, we can never observe the
    cardinality of R Join T
  • We would be making an arbitrary guess as to the
    correlation of the predicates on the R and T
    tables
  • Best not to infer suboptimality wrt such
    estimates
  • However, reoptimization may result in a different
    join order

19
Exploiting Intermediate Results
  • All the intermediate results are stored as
    temporary MVs
  • with cardinalities available to the optimizer
  • can be reused if it leads to a better plan
  • but not necessarily used, e.g. if join result is
    very large, and a different join order is
    preferred
  • must be reused if it has performed side-effects
  • Reoptimization done as part of same transaction

20
Optional use of MV
21
Variants of CHECK
  • Variants applicable in different cases, trade off
    risk for opportunity
  • Variants
  • Lazy checking
  • Lazy checking with eager materialization
  • Eager checking without compensation
  • Eager checking with buffering
  • Eager checking with deferred compensation

22
Lazy Checking
  • Adding CHECKs above a materialization point
    (SORT, TEMP etc)
  • No results have been output yet
  • And materialized results can be re-used
  • very low overhead

23
Lazy checking with eager materialization
  • Can insert materialization point if it does not
    exists already
  • Risk overhead of materialization
  • Typically done only for outer input of indexed
    nested-loop join
  • low cost if outer is small (as estimated by
    optimizer)
  • and INL is in trouble anyway if outer is large

24
Eager Checking
  • Lazy checking may be too late
  • e.g. if very bad join order chosen, with huge
    intermediate results
  • Idea check even before entire result is
    materialized, and stop early
  • Problem what if some results have already been
    output?
  • Compensation

25
Eager Checking
  • EC without Compensation
  • CHECK is pushed down the materialization point,
    into pipeline

26
Eager Checking
  • EC with buffering
  • CHECK and buffer
  • output from buffer once sure about bound
  • e.g. 0,b), or b,infinity
  • else reoptimize
  • delayed pipelining

27
EC with Deferred Compensation
  • Only SPJ queries
  • Identifier of all rows returned to the user are
    stored in a table S, which is used later in the
    new plan for anti-join with the new-result stream

28
CHECK Placement
29
CHECK Placement
  • LCEM and ECB outer side of nested-loop join
  • LC above materialization points
  • ECWC and ECDC anywhere
  • Do not place CHECKs if
  • no alternative plan above CHECK
  • simple queries with low estimated cost

30
Performance Analysis Robustness
  • TPC-H Q10 Replace constant in selection on
    lineitem by parameter marker, so optimizer
    doesnt know actual selectivity
  • 5 different optimal plans

31
Risk Analysis
  • Analyze LC, LCEM, ECB
  • Can be reoptimized more than once
  • Conclusion low overhead/risk

32
Opportunity Analysis
  • Goal how often does opportunity to reoptimize
    arise?
  • Introduce LC/LCEM/ECB checkpoints
  • But turn off reoptimization, and run same plan
  • Opportunity region for ECB dotted line

33
POP in (in)action
  • Real world workload (DMV data and queries)
  • Complex predicates leading to cardinality
    estimation errors
  • substring comparison, like, IN, ..

34
POP in (in)action (contd.)
  • Re-optimization may result in the choice of worse
    plan due to
  • Two estimation errors canceling out each other
  • Re-using intermediate results

35
Conclusions
  • POP gives us a robust mechanism for
    re-optimization through inserting of CHECK (in
    its various flavors)
  • Higher opportunity at low risk
Write a Comment
User Comments (0)
About PowerShow.com