# Query Optimization Strategies - PowerPoint PPT Presentation

View by Category
Title:

## Query Optimization Strategies

Description:

### Some initial suggestions for the project proposal ... since they didn't pay off immediately had to hack them into sort-merge join ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 30
Provided by: zack4
Category:
Tags:
Transcript and Presenter's Notes

Title: Query Optimization Strategies

1
Query Optimization Strategies
• Zachary G. Ives
• University of Pennsylvania
• CIS 650 Implementing Data Management Systems
• January 31, 2005

2
• Wednesday
• Some initial suggestions for the project proposal
report
• The next assignment
• Read Gray et al. (granularity of locks) and Kung
and Robinson (optimistic concurrency control)
papers
• Review Kung and Robinson
• Consider how optimistic CC is or isnt useful in
Web-scale data management

3
Todays Trivia Question
4
Query Optimization
• Challenge pick the query execution plan that
has minimum cost
• Sources of cost
• Interactions with other work
• Size of intermediate results
• Choices of algorithms, access methods
• Mismatch between I/O, CPU rates
• Data properties skew, order, placement

5
The General Model of Optimization
• Given an AST of a query
• Build a logical query plan
• (Tree of query algebraic operations)
• Transform into better logical plan
• Convert into a physical query plan
• (Includes strategies for executing operations)

6
Strategies
• Basically, search, over the space of possible
plans
• At least of exponential complexity in the number
of operators
• Hence, exhaustive search is generally not
feasible
• What can you do?
• Heuristics only INGRES, Oracle until the
mid-90s
• Randomized, simulated annealing, many efforts
in the mid-90s
• Heuristics plus cost-based join enumeration
System-R
• Stratified search heuristics plus cost-based
enumeration of joins and a few other operators
Starburst optimizers
• Unified search full cost-based search EXODUS,

7
What Are the Cost Estimating Factors?
• Some notion of CPU, disk speeds, page sizes,
buffer sizes,
• Cost model for every operator
• Some information about tables and data
• Sizes
• Cardinalities
• Number of unique values (from index)
• Histograms, sketches,

8
The System-R Approach Heuristics with
Cost-Based Join Enumeration
• Make the following assumptions
• All predicates are evaluated as early as possible
• All data is projected away as early as possible
• Separately consider operations that produce
intermediate state or are blocking
• Joins
• Aggregation
• Sorting
• Correlation with a subquery (join, exists, )
• By choosing a join ordering, were automatically
choosing where selections and projections are
pushed why is this so?

9
System-R Architecture
• Breaks a query into its blocks, separately
optimizes them
• Nested loops join between them, if necessary
• Within a block focuses on joins (only a few
kinds) in dynamic programming enumeration
• Principle of optimality best k-way join includes
best (k-1)-way join
• Use simple table statistics when available, based
on indices magic numbers where unavailable
• Heuristics
• Push sargable selects, projects as low as
possible
• Cartesian products only after joins
• Left-linear trees only n2n-1 cost-estimation
operations
• Grouping last
• Extra interesting orders dimension
• Grouping, ordering, join attributes

10
Example
• Schema R(a,b), S(b,c), T(a,c), U(c,d)
• SELECT d, AVG(c)FROM R,S,T,UWHERE R.aS.b AND
S.cT.c AND R.aT.a AND T.cU.cGROUP BY dORDER
BY d
• In relational algebra ?d/AVG(c)(?d(sa lt 2 (R ?
S ? T ? U)))

11
Why We Need More than System-R
• Cross-query-block optimizations
• e.g., push a selection predicate from one block
to another
• Better statistics
• More general kinds of optimizations
• Optimization of aggregation operations with joins
• Different cost and data models, e.g., OO, XML
• Additional joins, e.g., containment joins
• Can we build an extensible architecture for this?
• Logical, physical, and logical-to-physical
transformations
• Enforcers
• Alternative search strategies
• Left-deep plans arent always optimal
• Perhaps we can prune more efficiently

12
Optimizer Generators
• Idea behind EXODUS and StarburstBuild a
programming language for writing custom
optimizers!
• Rule-based or transformational language
• Describes, using declarative template and
conditions, an equivalent expression
• EXODUS compile an optimizer based on a set of
rules
• Starburst run a set of rules in the
interpreter a client could customize the rules

13
Starburst Query Optimizer
• Corona query processor vs. Core engine follows
• Part of a very ambitious project
• Hydrogen language highly orthogonal (unlike SQL),
and supported many fancy OO concepts (e.g.,
inheritance, methods), recursion special
constraints, etc. much more powerful than SQL
at the time
• Some portions of Hydrogen and Corona made their
way into DB2, later SQL standards
• Two stages (stratified search)
• Query rewrite/query graph model a
SQL-block-level, relational calculus-like
representation of queries
• Plan optimization a System-R-style dynamic
programming phase once query rewrite has completed

14
Starburst QGM
• Tries to encode relational calculus-like
concepts
• Predicates between variables within each SQL
block body
• Variables can be
• Distinguished (set-builder / for-each / F)
• Existential (?)
• Universal (?)
• Returned values from each block (head)
• Predicates across blocks

15
Starburst QGM Example
• SELECT partno, price, order_qty
• FROM quotations Q1
• WHERE Q1.partno IN
• (SELECT partnoFROM inventory Q3WHERE
Q3.onhand_qty lt Q1.order_qtyAND Q3.type cpu

16
Starburst Query Rewrite
• Focus inter-block optimizations
• Pushing predicates across views, pushing
projections down
• Magic sets rewritings
• Simplification, transitivity
• Implemented through production rules
• Condition action rules selected by
• Sequence
• Priority
• Probability distribution
• Search may stop after a budget
• End product logical plan(s), chosen via above
constraints
• Normally set is singleton
• Can also CHOOSE among set by invoking the
cost-based plan optimizer

17
Query Rewrite Example
• Convert subquery to join
• IF OP1.type Select Æ Q2.type 9 Æ (at each
evaluation of the existential predicate at most
one tuple of T2 satisfies the predicate)
• THEN Q2.type F
• Merge operations
• IF OP1.type Select Æ OP2.type Select Æ
Q2.type F Æ (NOT (T1.distinct false Æ
OP2.eliminate_duplicate true))
• THEN
• merge OP2 into OP1
• IF OP2.eliminate_duplicateTHEN
• OP1.eliminate_duplicate true

18
Query Rewrite Example
• Convert subquery to join
• IF OP1.type Select Æ Q2.type 9 Æ (at each
evaluation of the existential predicate at most
one tuple of T2 satisfies the predicate)
• THEN Q2.type F
• Merge operations
• IF OP1.type Select Æ OP2.type Select Æ
Q2.type F Æ (NOT (T1.distinct false Æ
OP2.eliminate_duplicate true))
• THEN
• merge OP2 into OP1
• IF OP2.eliminate_duplicateTHEN
• OP1.eliminate_duplicate true

19
Starburst Plan Optimization
• Separately optimizes each QGM operation (box)
• Grammar of STrategy AlteRnatives (STARs)
• Take high-level operations and turn them into
LOw-LEvel Plan OPerations (LOLEPOPs)
• JOIN, UNION, SCAN, SORT, SHIP,
• Tables and plans have properties
• Relational (tables joined, columns accessed,
predicates applied)
• Operational (ordering, site)
• Estimated (cost, cardinality)
• GLUE operators SORT, SHIP
• Join enumerator tries alternative join sequences
a la System-R
• Can produce bushy trees
• Can have rank/priority with each STAR

20
Starburst Pros and Cons
• Pro
• Stratified search generally works well in
practice DB2 UDB has perhaps the best query
optimizer out there
• Interesting model of separating calculus-level
and algebra-level optimizations
• Generally provides fast performance
• Con
• Interpreted rules were too slow and no database
user ever customized the engine!
• Difficult to assign priorities to transformations
• Some QGM transformations that were tried were
difficult to assess without running many
cost-based optimizations
• Rules got out of control

21
The EXODUS and Volcano Optimizer Generators
• Part of a database toolkit approach to building
systems
• A set of libraries and languages for building
databases with custom data models and
functionalities

(rules in E)
(EXODUS)
(gcc)
(MyDB plan)
(MyQL)
22
EXODUS/Volcano Model
• Try to unify the notion of logical-logical
transformations and logical-physical
transformations
• No stratification as in Starburst everything is
transformations
• Challenge efficient search need a lot of
pruning
• EXODUS used many heuristics, something called a
MESH
• Volcano branch-and-bound pruning, recursion
memoization

23
Example Rules
• Physical operators
• operator 2 join
• method 2 hash_join loops_join cartesian_product
• Logical-logical transformations
• join (1,2) -gt join(2,1)
• Logical-physical transformations
• join (1,2) by hash_join (1,2)
• Can get quite hairy
• join 7 (join 8 (1,2), 3) lt-gt join 8(1, join 7
(2,3))ifdef FORWARDif (NOT cover_predicate
(OPERATOR_7 oper_argument, INPUT_2 oper_property,
INPUT_3 oper_property)) REJECT

24
So How Does the Optimizer Work?(EXODUS version)
• Needs to enumerate all possible transformations
without repeating
• Every expression is stored in a MESH
• Basically, an associative lookup for each
expression, which can link to other entries in
the same MESH

25
Search in EXODUS
• Apply a transformation, see if it produces a new
node
• If so
• Find cheapest implementation rule
• Also apply all relevant transformation rules, add
results to OPEN set
• Propagate revised cost to parents (reanalyze)
• Check parents for new transformation
possibilities (rematch)
• Heuristics to guide the search in the OPEN set
• Promise an expected cost factor for each
transformation rule, based on analysis of
averages of the optimizers cost model results
• Favor items with high expected payoff over the
current cost
• Problem often need to apply 2 rules to get a
benefit use heuristics
• Once a full plan is found, optimizer does hill
climbing, only applying a limited set of rules

26
Pros and Cons of EXODUS
• Pros
• Unified model of optimization is powerful,
elegant
• Very extensible architecture
• Cons
• Combined logical and physical expressions in the
same MESH
• equivalent logical plans with different physical
operators (e.g., merge vs. hash joins) were kept
twice
• Physical properties werent handled well
• sort enforcers were seldom applied since they
didnt pay off immediately had to hack them
into sort-merge join
• Hard-coded transformation, then algorithm
selection, cost analysis
• always applied even if not part of the most
promising expression
• applied based on perceived benefit biased
towards larger expressions, which meant repeated
re-optimization
• Cost wasnt very generic a concept

27
Volcano, Successor to EXODUS(Shouldnt it be
LEVITICUS?)
• Re-architected into a top-down, memoized engine
• Depth-first search allows branch-and-bound
pruning
• FindBestPlan takes logical expression, physical
properties, cost bound
• Else compute set of possible moves
• Logical-logical rule
• Compliant logical-physical rule
• Enforcer
• Insert logical expression into lookup table
• Insert physical op, plan into separate lookup
table
• Return best plan and cost
• More generic notions of properties and enforcers
(e.g., location, SHIP), cost (an ADT)

28
• Basically, a cleaner, more object-oriented
version of the Volcano engine
• Rumor has it that MS SQL Server is currently
based on a (simplified and streamlined) version

29
Optimization Evaluation
• So, which is best?
• Heuristics plus join-based enumeration (System-R)
• Stratified, calculus-then-algebraic (Starburst)
• Con QGM transformations are almost always
heuristics-based
• Pro very succinct transformations at QGM level