Exploiting Similarity of Subqueries for Complex Query Optimization1 - PowerPoint PPT Presentation

About This Presentation
Title:

Exploiting Similarity of Subqueries for Complex Query Optimization1

Description:

If there is at least one predicate in P involving vertices ... maximizes the value of formula (1). How to Choose Starting Nodes ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 34
Provided by: cseY
Category:

less

Transcript and Presenter's Notes

Title: Exploiting Similarity of Subqueries for Complex Query Optimization1


1
COSC6421 Advanced Database Systems
Instructor Jarek Gryz
Exploiting Similarity of Subqueries for Complex
Query Optimization1
Presented by Qiong Wang
York University
2
Outline
  • Introduction-Why exploiting similarity
  • Description of Technique
  • Experimental Result
  • Advantages and Limitation Discussion
  • Related work
  • Summary

3
Background Introduction
  • User queries become more and more complex, such
    as increasing complexity of the operations and
    the query structure.
  • Query optimizers are seeking an efficient query
    execution plan( QEP) in DBMS. But, there are some
    limitations
  • Limitations of two types of current algorithms
    determining a optimal join orders
  • Dynamic programming -gt Optimal Solution but
    long optimization time
  • Heuristics Based Algorithm-gt Good time complexity
    but suboptimal solution
  • Limitations of some other studies which can find
    a good plan
  • May not make use of some special characteristics
    of the underlying queries
  • Only exploiting the common sub-queries in a query

4
Technique Introduction
  • A new similarity-based optimization technique.
  • Takes the structural characteristics of a complex
    query into consideration
  • The key idea
  • Identify groups of similar sub-queries that often
    appear in a complex query
  • Share the optimization result within each group
    in the query

5
Technique Description
  • Preliminary
  • Exploiting Similarity of Sub-Query
  • Optimizing Query

6
A query Q
Preliminary
  • TR1,R2,R3,,Rm The set of Base Tables
    referenced in Q
  • P p1,p2,p3,pn The set of predicates
    referenced in Q
  • Table Instance - each table reference in Q

7
Preliminary
  • (1) A Query Graph G
  • V the set of all vertices in G lt Table
    Instances
  • For x ? V, R ? T , mappingd(x) R
  • E the set of all edges in G lt Predicates
  • For e ? E, c ? 2P , e is labeled with ? (e)
    c
  • If there is at least one predicate in P involving
    vertices
  • x and y, then query graph G has an edge e
    between x and y
  • The set of all predicates involving x and y
    labeled on the edge e

8
Constructing Query graph
  • An example query graph
  • G (V,E,T,P,df)
  • Sizeof(x) Size of the table represented by
    x
  • Sel (e) Selectivity of ?(e), i.e., the
    selectivity of the conjunction of
  • all the simple predicates
    in ?(e)

9
Preliminary
  • (2) Ring NetWork
  • A data structure which represents a query graph
  • Every vertex x in query graph G has a node x
  • Node x has a set of adjacent nodes y1, y2, ...,
    yn
  • A ring x ? y1 ? y2 ? ... ? yn ? x
  • to represent such an adjacency
    relationship.
  • Node x is called the owner (node) of the ring,
  • Nodes y1, y2, ..., yn are called the members of
    the ring.

10
Preliminary
  • (3) Similar Sub-query Graphes

Suppose we have two sub-query graphs G(V
,EV , TV , PV , dV , ?V ) G(V,
EV, TV, PV, dV, ?V) Rt
Error bound for table size Rs Error
bound for condition selectivities
11
Constructing Query graph
  • Similar Sub-query Graphes (contd)

If G and G satisfy the following conditions,
they are regarded as a pair of similar sub-query
graphs with respect to error bounds rt and rs,
G (rt,rs)G
12
Example of a query graph with similar sub-queries
13
Exploiting Similarity of Sub-Queries
  • Algorithm Design Idea
  • Identify pairs of similar sub-queries with
    respect to the given error bounds rt and rs
  • Optimize one sub-query in a similar pair, and map
    and apply the resulting execution plan to the
    other sub-query
  • Replace similar sub-queries with their
    (estimated) result tables in the query graph and
    optimize the resulting modified query.

14
Exploiting Similarity of Sub-Queries
  • Algorithm Explanation
  • Assume all self-loops are
    removed

1. Choosing Starting Nodes (1) Construct
ring network and similarity starting lists
15
Exploiting Similarity of Sub-Queries
Similarity Starting List
OL2
Each list is for one base table and has a header
containing -Base Table Name Ri, -Two
sublists OLi all its instances (nodes) SLi
--other table instances whose sizes are
within error bound rt with respect to
the size of Ri
SL2
Base Table R2
16
Choosing Starting Nodes (contd)
  • (2) In order to find a pair of as large as
    possible similar subquery graphs, we should
    choose a pair of nodes with the maximum number of
    adjacent node pairs.
  • Because the larger the similar subquery graphs in
    a pair, the more the optimization work can be
    shared.


17
How to Choose Starting Nodes
For any pair of potential starting nodes x and y
selected from a similarity starting list, we have
a formula
m be the size of set T for query Q
choosing a pair of starting nodes is to choose
the pair that maximizes the value of formula (1).
18
How to Choose Starting Nodes
  • Two indicator arrays O (occurrence) and S
    (similarity)
  • For each table instance
  • The lengths of arrays O and S are the size
    of T
  • Oxi indicates
  • current adjacent nodes representing base
    table Ri
  • Sxi indicates
  • current adjacent nodes whose table sizes are
  • within the given error tolerance with respect
    to Ri
  • excluding Oxi.

19
How to Choose Starting Nodes (contd)
20
Searching for Similar Sub-query Graphs
  • 2. For all the unselected nodes (x1 and y1) in
    the rings of x, y
  • Check if sizeof(x1) and sizeof(y1) are within
    error bound rt
  • Check if adding x1 and y1 into the current
    sub-query pair graph will violate the similarity
    of sub-queries or not
  • Add x1, y1 into sub-query pair graphs

21
Searching for Similar Sub-query Graphs (Contd)
  • 3. Use two threshold values c1 and c2 to
    determine whether to accept, reject, or hold a
    pair of similar sub-query graphs.
  • Let n be the number of nodes in a similar
    subquery graph
  • (1) if n c2, then the new pair of similar
    subquery graphs is accepted
  • (2) if c1 n lt c2, then we put this pair on
    hold
  • (3) if n lt c1, then the new pair is rejected.

22
  • 4. Remove all the nodes in the accepted graphs
    and choose another starting nodes and repeated
    the procedure until no pair of nodes can be
    expanded.

23
5. Optimizing Query
  • Apply an optimization algorithm to optimize one
    of the sub-query graphs in each pair.
  • Map the execution plan for one sub-query to the
    one for its partner
  • Example, G1 and G2
  • Nodes x1 ? y1, x2 ? y2, x3 ? y3,
  • By optimizing G1, we get such a plan
    ((x1x2) x3).
  • The mapped plan for G2 is ((y1y2) y3).

24
Experimental Result
Comparison of I/O costs for execution plans
generated by two techniques For a set
of randomly-generated test queries.
25
Effect of changing error bounds on I/O costs
26
Effect of changing error bounds on I/O costs
  • Very small error bounds cannot yield good
    performance, since the smaller the error bounds,
    the smaller the similar sub-queries
  • Moderate error bounds (0.15 - 0.40) yield the
    best performance, since similar sub-queries with
    reasonable sizes can be found
  • Large error bounds lead to poor performance,
    since sub-queries are less similar in such cases,
    which makes that sharing execution plans between
    them may not be appropriate
  • Very large error bound (close to 1), the
    performance of the similarity-based technique
    stays at the same level.
  • The sizes of similar sub queries may reach
    their maximums
  • when the error bounds are beyond a certain
    limit.

27
Advantages and Limitation Discussion
  • Advantages
  • Take the features of original structures of
    complex query into consideration and design a
    query graph
  • Exploiting the similarity of Sub-queries beyond
    common sub-query, reduce the optimization time by
    sharing the same mapped plan
  • Reduce the query complexity before apply any
    optimization algorithm.
  • Limitation
  • According to the experimental result, the
    selectivity of similarity sub-queries depends on
    the error bound.
  • 2. The paper didnt mention the application of
    this technique in multiple queries.

28
Related Work
  • (1) The author also introduced a technique to
    optimize a complex query by exploiting common
    sub-queries recently.
  • This technique is shown to be more effective
    than a pure randomization based method since it
    takes the structural characteristics of a complex
    query into consideration.
  • (2) Some other Algorithms used to optimize large
    query
  • Iterative improvement (II)
  • Simulated Annealing (SA)
  • Tabu Search (TS)
  • AB algorithm (AB)
  • Generic Algorithm (GA)

29
Related Work
  • (2) Related topic-Multiple-Query
  • A single query given to a system may result in
    multiple queries
  • Queries are given to the system from various
    users. Then batching all users requests is
    possibly required.
  • Some parts of the query are automatically
    generated by programs

30
Multiple Query Optimization
  • Major issue in multiple-query processing is
    the redundancy due to accessing the same data
    multiple times in different queries.2
  • Recognize the possibilities of sharing
  • Exploiting common sub-expressions and reduce
    evaluation cost
  • Reusing Materialized or intermediate results from
    other computation
  • Example Volcano-SH and RU 3
  • Modify the optimizer search strategy and find a
    globally optimal plan.
  • Example IE (Interleave Execution) and HA
    (Heuristic Algorithm) 2

31
Summary
  • Described a new query optimization technique for
    complex query, exploiting similarity of
    Sub-queries
  • Algorithm Design Idea
  • Query Graph, Ring Network, Similar Sub-query
    Graph
  • Discussed experimental result, advantages and
    limitation
  • Related work Discussion

32
Reference
  • 1 Yingying Tao, Qiang Zhu and Calisto Zuzarte.
    Exploiting Similarity of Subqueries for Complex
    Query Optimization, LNCS 2736, Jan 2003, Pages
    747 - 759
  • 2 TIMOS K. SELLIS and C. Lin. Multiple Query
    Optimization, ACM TODS13(1) p23-52 (1988)
  • 3 Prasan Roy, S. Seshadri, S. Sudarshan,
    Siddhesh Bhobe. Efficient and Extensible
    Algorithms for Multi Query Optimization, ACM 2000
    1-58113-218-2/00/05
  • 4 Jamal R.Alsabbagh, Vijay V. Raghavan.
    Analysis of Common Subexpression Exploitation
    Models in Multiple-Query Processing

33
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com