Chapter 15 Algorithms for Query Processing and Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 15 Algorithms for Query Processing and Optimization

Description:

Typical steps in processing a high-level query. Query in a high-level query language like SQL ... WHERE PNAME= Aquarius' AND PNUMBER=PNO AND ESSN=SSN AND BDATE. ... – PowerPoint PPT presentation

Number of Views:610
Avg rating:3.0/5.0
Slides: 36
Provided by: facultyK
Category:

less

Transcript and Presenter's Notes

Title: Chapter 15 Algorithms for Query Processing and Optimization


1
Chapter 15Algorithms for Query Processing and
Optimization
  • ICS 424 Advanced Database Systems
  • Dr. Muhammad Shafique

2
Outline
  • Introduction
  • Processing a query
  • SQL queries and relational algebra
  • Implementing basic query operations
  • Heuristics-based query optimization
  • Overview of query optimization in Oracle

3
Material Covered from Chapter 15
  • Pages 537, 538, 539
  • Section 15.1
  • Section 15.2
  • Section 15.6
  • Section 15.7
  • Section 15.9

4
Introduction to Query Processing
  • Query optimization
  • The process of choosing a suitable execution
    strategy for processing a query.
  • Two internal representations of a query
  • Query Tree
  • Query Graph

5
Background Review
  • DDL compiler
  • DML compiler
  • Runtime database processor
  • System catalog

6
Processing a Query
  • Tasks in processing a high-level query
  • Scanner scans the query and identifies the
    language tokens
  • Parser checks syntax of the query
  • The query is validated by checking that all
    attribute names and relation names are valid
  • An intermediate internal representation for the
    query is created (query tree or query graph)
  • Query execution strategy is developed
  • Query optimizer produces an execution plan
  • Code generator generates the object code
  • Runtime database processor executes the code
  • Query processing and query optimization

7
Processing a Query
  • Typical steps in processing a high-level query
  • Query in a high-level query language like SQL
  • Scanning, parsing, and validation
  • Intermediate-form of query like query tree
  • Query optimizer
  • Execution plan
  • Query code generator
  • Object-code for the query
  • Run-time database processor
  • Results of query

8
(No Transcript)
9
SQL Queries and Relational Algebra
  • SQL query is translated into an equivalent
    extended relational algebra expression ---
    represented as a query tree
  • In order to transform a given query into a query
    tree, the query is decomposed into query blocks
  • Query block
  • The basic unit that can be translated into the
    algebraic operators and optimized.
  • A query block contains a single SELECT-FROM-WHERE
    expression, as well as GROUP BY and HAVING clause
    if these are part of the block.
  • The query optimizer chooses an execution plan for
    each block

10
COMPANY Relational Database Schema (1)
11
COMPANY Relational Database Schema (2)
12
SQL Queries and Relational Algebra (1)
  • Example
  • SELECT Lname, Fname
  • FROM EMPLOYEE
  • WHERE Salary gt ( SELECT MAX(Salary)
  • FROM EMPLOYEE
  • WHERE Dno 5 )
  • Inner block and outer block

13
Translating SQL Queries into Relational Algebra
  • SELECT LNAME, FNAME
  • FROM EMPLOYEE
  • WHERE SALARY gt ( SELECT MAX (SALARY)
  • FROM EMPLOYEE
  • WHERE DNO 5)

SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO 5
SELECT LNAME, FNAME FROM EMPLOYEE WHERE
SALARY gt C
pLNAME, FNAME (sSALARYgtC(EMPLOYEE))
FMAX SALARY (sDNO5 (EMPLOYEE))
14
SQL Queries and Relational Algebra (2)
  • Uncorrelated nested queries Vs Correlated nested
    queries
  • Example
  • Retrieve the name of each employee who works on
    all the projects controlled by department number
    5.
  • SELECT FNAME, LNAME FROM EMPLOYEE WHERE
    ( (SELECT PNO FROM WORKS_ON
    WHERE SSNESSN) CONTAINS
    (SELECT PNUMBER FROM PROJECT
    WHERE DNUM5) )

15
SQL Queries and Relational Algebra (3)
  • Example
  • For every project located in Stafford,
    retrieve the project number, the controlling
    department number and the department managers
    last name, address and birthdate.
  • SQL query
  • SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS,
    E.BDATE
  • FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E
  • WHERE P.DNUMD.DNUMBER AND D.MGRSSNE.SSN
    AND P.PLOCATIONSTAFFORD
  • Relation algebra
  • ?PNUMBER, DNUM, LNAME, ADDRESS, BDATE
    (((?PLOCATIONSTAFFORD(PROJECT)) DNUMDNUMBER
    (DEPARTMENT)) MGRSSNSSN (EMPLOYEE))

16
SQL Queries and Relational Algebra (4)
17
Implementing Basic Query Operations
  • An RDBMS must provide implementation(s) for all
    the required operations including relational
    operators and more
  • External sorting
  • Sort-merge strategy
  • Sorting phase
  • Number of file blocks (b)
  • Number of available buffers (nB)
  • Runs --- (b / nB)
  • Merging phase --- passes
  • Degree of merging --- the number of runs that are
    merged together in each pass

18
Algorithms for External Sorting (1)
  • External sorting
  • Refers to sorting algorithms that are suitable
    for large files of records stored on disk that do
    not fit entirely in main memory, such as most
    database files.
  • Sort-Merge strategy
  • Starts by sorting small subfiles (runs) of the
    main file and then merges the sorted runs,
    creating larger sorted subfiles that are merged
    in turn.

19
Algorithms for External Sorting (2)
20
Algorithms for External Sorting (3)
  • Analysis
  • Number of file blocks b
  • Number of initial runs nR
  • Available buffer space nB
  • Sorting phase nR ?(b/nB)?
  • Degree of merging dM Min (nB-1, nR)
  • Number of passes nP ?(logdM(nR))?
  • Number of block accesses (2 b) (2 b
    (logdM(nR)))
  • Example done in the class

21
Implementing Basic Query Operations (cont.)
  • Estimates of selectivity
  • Selectivity is the ratio of the number of tuples
    that satisfy the condition to the total number of
    tuples in the relation.
  • SELECT ( ? ) operator implementation
  • Linear search
  • Binary search
  • Using a primary index (or hash key)
  • Using primary index to retrieve multiple records
  • Using clustering index to retrieve multiple
    records
  • Using a secondary index on an equality comparison
  • Conjunctive selection using an individual index
  • Conjunctive selection using a composite index
  • Conjunctive selection by intersection of record
    pointers

22
Implementing Basic Query Operations (cont.)
  • JOIN operator implementation
  • Nested-loop join
  • Sort-merge join
  • Hash join
  • Partition Hash join
  • Hybrid hash join
  • PROJECT operator implementation
  • Set operator implementation
  • Implementing Aggregate operators/functions
  • Implementing OUTER JOIN

23
(No Transcript)
24
Buffer Space and Join performance
  • In the nested-loop join, it makes a difference
    which file is chosen for the outer loop and which
    for the inner loop. If EMPLOYEE is used for the
    outer loop, each block of EMPLOYEE is read once,
    and the entire DEPARTMENT file (each of its
    blocks) is read once for each time we read in (
    nB - 2) blocks of the EMPLOYEE file. We get the
    following
  • Total number of blocks accessed for outer file
    bE
  • Number of times ( nB - 2) blocks of outer file
    are loaded ? bE/ nB 2 ?
  • Total number of blocks accessed for inner file
    bD ?bE/ nB 2 ?
  • Hence, we get the following total number of
    block accesses
  • bE (? bE/ nB 2 ? bD) 2000 (? (2000/5) ?
    10) 6000 blocks
  • On the other hand, if we use the DEPARTMENT
    records in the outer loop, by symmetry we get the
    following total number of block accesses
  • bD (? bD/ nB 2 ? bE) 10 (?(10/5) ?
    2000) 4010 blocks

25
Implementing Basic Query Operations (cont.)
  • Combining operations using pipelining
  • Temporary files based processing
  • Pipelining or stream-based processing
  • Example consider the execution of the following
    query
  • ?list of attributes( (? c1(R) ? (? c2 (S))

26
General Transformation Rules for Relational
Algebra Operations
  • Cascade of ? A conjunctive selection condition
    can be broken up into a cascade (that is, a
    sequence) of individual ? operations ? C1 AND
    C2 AND .AND Cn (R) ? C1 (?C2( (?Cn(R)))
  • Commutativity of ? The ? operation is
    commutative ?C1(?C2(R)) ?C2(?C1(R))
  • Cascade of ? In a cascade (sequence) of ?
    operations, all but the last one can be ignored
  • Commuting ? with ? If the selection condition
    c involves only those attributes A1, ..., An in
    the projection list, the two operations can be
    commuted
  • And more

27
Heuristic-Based Query Optimization
  • Outline of heuristic algebraic optimization
    algorithm
  • Break up SELECT operations with conjunctive
    conditions into a cascade of SELECT operations
  • Using the commutativity of SELECT with other
    operations, move each SELECT operation as far
    down the query tree as is permitted by the
    attributes involved in the select condition
  • Using commutativity and associativity of binary
    operations, rearrange the leaf nodes of the tree
  • Combine a CARTESIAN PRODUCT operation with a
    subsequent SELECT operation in the tree into a
    JOIN operation, if the condition represents a
    join condition
  • Using the cascading of PROJECT and the commuting
    of PROJECT with other operations, break down and
    move lists of projection attributes down the tree
    as far as possible by creating new PROJECT
    operations as needed
  • Identify sub-trees that represent groups of
    operations that can be executed by a single
    algorithm

28
Heuristic-Based Query Optimization Example
  • Query
  • "Find the last names of employees born after
    1957 who work on a project named Aquarius."
  • SQL
  • SELECT LNAME   
  • FROM EMPLOYEE, WORKS_ON, PROJECT   
  • WHERE PNAMEAquarius AND PNUMBERPNO AND
    ESSNSSN AND BDATE.1957-12-31

29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Overview of Query Optimization in Oracle
  • Rule-based query optimization the optimizer
    chooses execution plans based on heuristically
    ranked operations.
  • May be phased out
  • Cost-based query optimization the optimizer
    examines alternative access paths and operator
    algorithms and chooses the execution plan with
    lowest estimate cost.
  • The query cost is calculated based on the
    estimated usage of resources such as I/O, CPU and
    memory needed.
  • Application developers could specify hints to the
    ORACLE query optimizer.
  • application developer might know more information
    about the data.
  • SELECT / ...hint... / rest of query
  • SELECT / index(t1 t1_abc) index(t2 t2_abc) /
    COUNT()FROM t1, t2WHERE t1.col1 t2.col1

35
Summary
  • Background review
  • Processing a query
  • SQL queries and relational algebra
  • Implementing basic query operations
  • Heuristics-based query optimization
  • Overview of query optimization in Oracle
Write a Comment
User Comments (0)
About PowerShow.com