Multi-Query Optimization and Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Multi-Query Optimization and Applications

Description:

Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay Motivation Queries often involve repeated computation Queries on ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 43
Provided by: Prasa1
Category:

less

Transcript and Presenter's Notes

Title: Multi-Query Optimization and Applications


1
Multi-Query Optimization andApplications
  • Prasan Roy
  • Indian Institute of Technology - Bombay

2
Motivation
  • Queries often involve repeated computation
  • Queries on overlapping views, stored procedures,
    nested queries, etc.
  • Update expressions for a set of overlapping
    materialized views
  • Automatically generated queries
  • XML-QL complex path expressions ? SQL query
    batches
  • Our focus Faster query processing by avoiding
    repeated computation

3
Outline
  • Multi-query optimization
  • Application to related problems
  • Query result caching
  • Materialized view selection and maintenance
  • Conclusions and future work

4
Multi-Query Optimization
Prasan Roy, S. Seshadri, S. Sudarshan and
Siddhesh Bhobe, Efficient and Extensible
Algorithms for Multi-Query Optimization, ACM
SIGMOD 2000
5
Motivating Example
Best Plan for A JOIN B JOIN C
Best Plan for B JOIN C JOIN D
100
100
10
B
A
100
100
10
C
B
C
D
10
10
10
10
Total Cost 460
Foreign Key Dependency A?B?C?D
6
Motivating Example
100
100
10
10
BC
10
D
A
10
10
100
B
C
10
10
Total Cost 370 Benefit 90
Foreign Key Dependency A?B?C?D
7
Problem Statement
  • Find the cheapest plan exploiting transiently
    materialized common subexpressions (CSEs)
  • Assumption No shared pipelines

D
A
B
C
Common Subexpression
8
Problems
  • Locally optimal subplans may not be globally
    optimal
  • Mutually exclusive alternatives
  • (A JOIN B JOIN C)
  • (B JOIN C JOIN D)
  • (C JOIN D JOIN E)
  • What to share (B JOIN C) or (C JOIN D) ?
  • Materializing and sharing a CSE not necessarily
    cheaper

9
Example
Best Plan for A JOIN B JOIN C
Best Plan for B JOIN C JOIN D
10
100
10
B
A
1
10
10
C
B
C
D
10
1
1
1
Total Cost 154
Foreign Key Dependency A?B?C?D
10
Example
100
10
10
10
BC
D
10
A
1
10
10
B
C
10
1
Total Cost 172 Benefit -18
Foreign Key Dependency A?B?C?D
11
Approach
  1. Set up the search space of execution plans
  2. Explore the search space to find the best
    execution plan

12
Representation of Plan Space
  • AND/OR Query DAG

BCD
ABC
Equivalence Class (OR node)
Operation (AND node)
BC
CD
AB
Example Plan (Solution Graph)
A
C
D
B
13
DAG Generation ModificationsUnification
  • Volcano Duplicate subexpressions ? No CSEs!

ABC
BCD
BC
BC
AB
CD
C
A
B
C
D
B
  • Modification Duplicate subexpressions unified

14
DAG Generation ModificationsSubsumption
  • Volcano No expression subsumption ? Missed CSEs

?(Agt50)
?(Alt10)
?(Agt50)
Subsumption derivation
?(Alt10)
?(Agt50)
?(Agt50)
?(Alt10 or Agt50)
?(Agt10)
  • Modification Subsumption derivations introduced

15
Exploring the Search SpaceAn Exhaustive Algorithm
  • Input DAG for query Q
  • Output Set of nodes to materialize, corresp.
    best plan
  • Y set of equivalence nodes in DAG
  • Pick X ? Y which minimizes BestCost(Q, X)
  • Return X
  • BestCost(Q, X) cost of the best plan for Q
    given that the nodes
  • in X are transiently materialized

Too expensive! Need heuristics.
16
Exploring the Search SpaceA Greedy Heuristic
  • Input DAG for query Q
  • Output Set of nodes to materialize, corresp.
    best plan
  • X Y set of equivalence nodes in DAG
  • While( Y ? )
  • Pick z ?Y which maximizes Benefit(z Q, X)
  • If( Benefit(z Q, X) gt 0 )
  • Y Y z X X U z
  • Else Y
  • Return X
  • Benefit(z Q, X) BestCost(Q, X) - BestCost(Q,
    X U z)
  • Appeared in Gupta, ICDT97. Our Contribution
    improve efficiency

17
Improving EfficiencySummary
  • Input DAG for query Q
  • Output Set of nodes to materialize, corresp.
    best plan
  • X Y set of equivalence nodes in DAG
  • While( Y ? )
  • Pick z ?Y which maximizes Benefit(z Q, X)
  • If( Benefit(z Q, X) gt 0 )
  • Y Y z X X U z
  • Else Y
  • Return X
  • Restrict the set of materialization candidates
  • Compute Benefit efficiently
  • Heuristically avoid computing Benefit for some
    nodes

?
??
18
Improving EfficiencyOnly CSEs Materialized
  • CSEs identified in a bottom-up traversal

BCD
ABC
Common Subexpression
BC
CD
AB
A
C
D
B
19
Improving EfficiencySummary
  • Input DAG for query Q
  • Output Set of nodes to materialize, corresp.
    best plan
  • X Y set of equivalence nodes in DAG
  • While( Y ? )
  • Pick z ?Y which maximizes Benefit(z Q, X)
  • If( Benefit(z Q, X) gt 0 )
  • Y Y z X X U z
  • Else Y
  • Return X
  • Restrict the set of materialization candidates ?
  • Compute Benefit efficiently
  • Heuristically avoid computing Benefit for some
    nodes

?
??
20
Efficient Benefit Computation
Incremental Re-optimization
  • X Set of CSEs already materialized
  • z unmaterialized CSE
  • Best plan given X materialized ?
  • Best plan given X U z materialized
  • Observation
  • Best plans change only for the ancestors of z

21
Incremental Re-optimization Example
  • X

BCD
ABC
230
230
?
?
230
230
120
120
z (B JOIN C)
100
100
100
100
10
10
BC
BC
10
130
CD
AB
100
100
100
Best Plan
C
B
A
D
10
10
10
10
22
Incremental Re-optimization Efficient Propagation
  • Ancestor nodes visited bottom-up in a topological
    order
  • Guarantees no revisits
  • Propagation path pruned if the current nodes
    best cost remains unchanged

23
Improving EfficiencySummary
  • Input DAG for query Q
  • Output Set of nodes to materialize, corresp.
    best plan
  • X Y set of equivalence nodes in DAG
  • While( Y ? )
  • Pick z ?Y which maximizes Benefit(z Q, X)
  • If( Benefit(z Q, X) gt 0 )
  • Y Y z X X U z
  • Else Y
  • Return X
  • Restrict the set of materialization candidates ?
  • Compute Benefit efficiently ?
  • Heuristically avoid computing Benefit for some
    nodes

?
??
24
Avoiding Benefit Computation
  • Monotonicity Assumption
  • Benefit of a node does not increase due to
    materialization of other nodes
  • Often true
  • An earlier benefit of a node is an upper bound on
    its current benefit
  • Do not recompute a nodes benefit if another
    nodes current benefit is greater

Optimization costs decrease by 90
25
Experimental Results
  • TPCD-0.1 on Microsoft SQL Server 6.5
  • using SQL rewriting for MQO

26
Alternatives to Greedy Volcano-SH
  • A lightweight post-pass heuristic
  • Compute the best plan for each query
    independently, using Volcano
  • Find the set of nodes in the best plans to
    materialize (cost-based)
  • Similar previous work
  • Subramanium and Venkataraman, SIGMOD 1998

27
Alternatives to Greedy Volcano-RU
  • A lightweight extension of Volcano
  • Batched queries optimized in sequence Q1, Q2, ,
    Qn
  • Find the best plan for query Qi given the best
    plans for queries Qj, j lt i
  • Cost based materialization of nodes in best plans
    of Qj, j lt i
  • Plan quality sensitive to the query sequence

28
Experimental Results
  • TPCD-0.1 query batches

29
Experimental Results
  • TPCD-0.1 query batches

30
Features
  • Easily implemented
  • First MQO implementation integrated with a
    state-of-the-art optimizer (as far as we know)
  • Also partially prototyped on Microsoft
    SQL-Server
  • Support for index selection
  • Index modeled as physical property
    (like interesting order)
  • Extensible and flexible
  • New operators, data models
  • Readily adapts to other problems
  • Query result caching
  • Materialized view selection/maintenance

31
Query Result Caching
P. Roy, K. Ramamritham, S. Seshadri, P. Shenoy
and S. Sudarshan, Dont Trash Your Intermediate
Results, Cache em, Submitted for publication
32
Problem Statement
  • Minimize the total execution time of an online
    workload by
  • Caching intermediate/final results of individual
    queries, and
  • Using these cached results to answer later
    queries

33
System Model
34
Contributions
  • Intermediate as well as final results cached
  • Optimizer-driven cache management
  • Adapts to workload changes
  • Cache-aware cost-based optimization
  • Novel framework for cached result matching

35
Experimental Results
  • Overheads negligible
  • Performance on 900 query TPCD-1 based uniform
    cube-point workload

36
Materialized View Selection and Maintenance
Hoshi Mistry, Prasan Roy, K. Ramamritham and S.
Sudarshan, Materialized View Selection and
Maintenance Using Multi-Query Optimization, Submit
ted for publication
37
Problem Statement
  • Speed up maintenance of a set of materialized
    views by
  • Exploiting CSEs between different view
    maintenance expressions
  • Selecting additional views to be materialized

38
Contributions
  • Optimization of maintenance expressions
  • Support for transiently materialized delta
    views
  • Nicely integrates transient vs permanent view
    materialization choices

39
Experimental Results
  • Overheads negligible
  • Performance benefit for maintenance of two
    TPCD-0.1 based SPJA views

40
Conclusion
  • MQO is practical
  • Low overheads, high benefits
  • Easily implemented and integrated
  • Leads to novel solutions to related problems
  • Query result caching
  • Materialized view selection and maintenance

41
Future Work
  • Further extensions of MQO
  • Shared execution pipelines
  • Query result caching in presence of updates
  • Other problems
  • Continuous queries, XML view caching, etc.

42
Other Contributions
  • Garbage Collection in Object Oriented Databases
  • Developed a transaction-aware cyclic reference
    counting algorithm
  • Provided a formal proof of correctness

S. Ashwin, Prasan Roy, S. Seshadri, Avi
Silberschatz and S. Sudarshan, Garbage Collection
in Object-Oriented Databases Using Transactional
Cyclic Reference Counting, VLDB 1997 Prasan
Roy, S. Seshadri, Avi Silberschatz, S. Sudarshan
and S. Ashwin, Garbage Collection in
Object-Oriented Databases Using Transactional
Cyclic Reference Counting, Invited Paper, VLDB
Journal, August 1998
Write a Comment
User Comments (0)
About PowerShow.com