Title: Chapter 15 Algorithms for Query Processing and Optimization
1Chapter 15Algorithms for Query Processing and
Optimization
- ICS 424 Advanced Database Systems
- Dr. Muhammad Shafique
2Outline
- Introduction
- Processing a query
- SQL queries and relational algebra
- Implementing basic query operations
- Heuristics-based query optimization
- Overview of query optimization in Oracle
3Material Covered from Chapter 15
- Pages 537, 538, 539
- Section 15.1
- Section 15.2
- Section 15.6
- Section 15.7
- Section 15.9
4Introduction to Query Processing
- Query optimization
- The process of choosing a suitable execution
strategy for processing a query. - Two internal representations of a query
- Query Tree
- Query Graph
5Background Review
- DDL compiler
- DML compiler
- Runtime database processor
- System catalog
6Processing a Query
- Tasks in processing a high-level query
- Scanner scans the query and identifies the
language tokens - Parser checks syntax of the query
- The query is validated by checking that all
attribute names and relation names are valid - An intermediate internal representation for the
query is created (query tree or query graph) - Query execution strategy is developed
- Query optimizer produces an execution plan
- Code generator generates the object code
- Runtime database processor executes the code
- Query processing and query optimization
7Processing a Query
- Typical steps in processing a high-level query
- Query in a high-level query language like SQL
- Scanning, parsing, and validation
- Intermediate-form of query like query tree
- Query optimizer
- Execution plan
- Query code generator
- Object-code for the query
- Run-time database processor
- Results of query
8(No Transcript)
9SQL Queries and Relational Algebra
- SQL query is translated into an equivalent
extended relational algebra expression ---
represented as a query tree - In order to transform a given query into a query
tree, the query is decomposed into query blocks - Query block
- The basic unit that can be translated into the
algebraic operators and optimized. - A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause
if these are part of the block. - The query optimizer chooses an execution plan for
each block
10COMPANY Relational Database Schema (1)
11COMPANY Relational Database Schema (2)
12SQL Queries and Relational Algebra (1)
- Example
- SELECT Lname, Fname
- FROM EMPLOYEE
- WHERE Salary gt ( SELECT MAX(Salary)
- FROM EMPLOYEE
- WHERE Dno 5 )
- Inner block and outer block
13Translating SQL Queries into Relational Algebra
- SELECT LNAME, FNAME
- FROM EMPLOYEE
- WHERE SALARY gt ( SELECT MAX (SALARY)
- FROM EMPLOYEE
- WHERE DNO 5)
SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO 5
SELECT LNAME, FNAME FROM EMPLOYEE WHERE
SALARY gt C
pLNAME, FNAME (sSALARYgtC(EMPLOYEE))
FMAX SALARY (sDNO5 (EMPLOYEE))
14SQL Queries and Relational Algebra (2)
- Uncorrelated nested queries Vs Correlated nested
queries - Example
- Retrieve the name of each employee who works on
all the projects controlled by department number
5. - SELECT FNAME, LNAME FROM EMPLOYEE WHERE
( (SELECT PNO FROM WORKS_ON
WHERE SSNESSN) CONTAINS
(SELECT PNUMBER FROM PROJECT
WHERE DNUM5) )
15SQL Queries and Relational Algebra (3)
- Example
- For every project located in Stafford,
retrieve the project number, the controlling
department number and the department managers
last name, address and birthdate. - SQL query
- SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS,
E.BDATE - FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E
- WHERE P.DNUMD.DNUMBER AND D.MGRSSNE.SSN
AND P.PLOCATIONSTAFFORD - Relation algebra
- ?PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((?PLOCATIONSTAFFORD(PROJECT)) DNUMDNUMBER
(DEPARTMENT)) MGRSSNSSN (EMPLOYEE)) -
16SQL Queries and Relational Algebra (4)
17Implementing Basic Query Operations
- An RDBMS must provide implementation(s) for all
the required operations including relational
operators and more - External sorting
- Sort-merge strategy
- Sorting phase
- Number of file blocks (b)
- Number of available buffers (nB)
- Runs --- (b / nB)
- Merging phase --- passes
- Degree of merging --- the number of runs that are
merged together in each pass
18Algorithms for External Sorting (1)
- External sorting
- Refers to sorting algorithms that are suitable
for large files of records stored on disk that do
not fit entirely in main memory, such as most
database files. - Sort-Merge strategy
- Starts by sorting small subfiles (runs) of the
main file and then merges the sorted runs,
creating larger sorted subfiles that are merged
in turn.
19Algorithms for External Sorting (2)
20Algorithms for External Sorting (3)
- Analysis
- Number of file blocks b
- Number of initial runs nR
- Available buffer space nB
- Sorting phase nR ?(b/nB)?
- Degree of merging dM Min (nB-1, nR)
- Number of passes nP ?(logdM(nR))?
- Number of block accesses (2 b) (2 b
(logdM(nR))) - Example done in the class
21Implementing Basic Query Operations (cont.)
- Estimates of selectivity
- Selectivity is the ratio of the number of tuples
that satisfy the condition to the total number of
tuples in the relation. - SELECT ( ? ) operator implementation
- Linear search
- Binary search
- Using a primary index (or hash key)
- Using primary index to retrieve multiple records
- Using clustering index to retrieve multiple
records - Using a secondary index on an equality comparison
- Conjunctive selection using an individual index
- Conjunctive selection using a composite index
- Conjunctive selection by intersection of record
pointers
22Implementing Basic Query Operations (cont.)
- JOIN operator implementation
- Nested-loop join
- Sort-merge join
- Hash join
- Partition Hash join
- Hybrid hash join
- PROJECT operator implementation
- Set operator implementation
- Implementing Aggregate operators/functions
- Implementing OUTER JOIN
23(No Transcript)
24Buffer Space and Join performance
- In the nested-loop join, it makes a difference
which file is chosen for the outer loop and which
for the inner loop. If EMPLOYEE is used for the
outer loop, each block of EMPLOYEE is read once,
and the entire DEPARTMENT file (each of its
blocks) is read once for each time we read in (
nB - 2) blocks of the EMPLOYEE file. We get the
following - Total number of blocks accessed for outer file
bE - Number of times ( nB - 2) blocks of outer file
are loaded ? bE/ nB 2 ? - Total number of blocks accessed for inner file
bD ?bE/ nB 2 ? - Hence, we get the following total number of
block accesses - bE (? bE/ nB 2 ? bD) 2000 (? (2000/5) ?
10) 6000 blocks -
- On the other hand, if we use the DEPARTMENT
records in the outer loop, by symmetry we get the
following total number of block accesses - bD (? bD/ nB 2 ? bE) 10 (?(10/5) ?
2000) 4010 blocks -
25Implementing Basic Query Operations (cont.)
- Combining operations using pipelining
- Temporary files based processing
- Pipelining or stream-based processing
- Example consider the execution of the following
query - ?list of attributes( (? c1(R) ? (? c2 (S))
26General Transformation Rules for Relational
Algebra Operations
- Cascade of ? A conjunctive selection condition
can be broken up into a cascade (that is, a
sequence) of individual ? operations ? C1 AND
C2 AND .AND Cn (R) ? C1 (?C2( (?Cn(R))) - Commutativity of ? The ? operation is
commutative ?C1(?C2(R)) ?C2(?C1(R)) - Cascade of ? In a cascade (sequence) of ?
operations, all but the last one can be ignored - Commuting ? with ? If the selection condition
c involves only those attributes A1, ..., An in
the projection list, the two operations can be
commuted - And more
27Heuristic-Based Query Optimization
- Outline of heuristic algebraic optimization
algorithm - Break up SELECT operations with conjunctive
conditions into a cascade of SELECT operations - Using the commutativity of SELECT with other
operations, move each SELECT operation as far
down the query tree as is permitted by the
attributes involved in the select condition - Using commutativity and associativity of binary
operations, rearrange the leaf nodes of the tree - Combine a CARTESIAN PRODUCT operation with a
subsequent SELECT operation in the tree into a
JOIN operation, if the condition represents a
join condition - Using the cascading of PROJECT and the commuting
of PROJECT with other operations, break down and
move lists of projection attributes down the tree
as far as possible by creating new PROJECT
operations as needed - Identify sub-trees that represent groups of
operations that can be executed by a single
algorithm
28Heuristic-Based Query Optimization Example
- Query
- "Find the last names of employees born after
1957 who work on a project named Aquarius." - SQL
- SELECT LNAME
- FROM EMPLOYEE, WORKS_ON, PROJECT
- WHERE PNAMEAquarius AND PNUMBERPNO AND
ESSNSSN AND BDATE.1957-12-31
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34Overview of Query Optimization in Oracle
- Rule-based query optimization the optimizer
chooses execution plans based on heuristically
ranked operations. - May be phased out
- Cost-based query optimization the optimizer
examines alternative access paths and operator
algorithms and chooses the execution plan with
lowest estimate cost. - The query cost is calculated based on the
estimated usage of resources such as I/O, CPU and
memory needed. - Application developers could specify hints to the
ORACLE query optimizer. - application developer might know more information
about the data. - SELECT / ...hint... / rest of query
- SELECT / index(t1 t1_abc) index(t2 t2_abc) /
COUNT()FROM t1, t2WHERE t1.col1 t2.col1
35Summary
- Background review
- Processing a query
- SQL queries and relational algebra
- Implementing basic query operations
- Heuristics-based query optimization
- Overview of query optimization in Oracle