Chapters 17 and 18: System Catalog and Query Optimization - PowerPoint PPT Presentation

1 / 116
About This Presentation
Title:

Chapters 17 and 18: System Catalog and Query Optimization

Description:

Platter. Track. Sector. Cylinder. Read/Write Heads. Chaps17&18-8. CSE. 255. Track. Sector ... Platters. R/W. Heads. Note: Parallel Read/Write Drives. Activate ... – PowerPoint PPT presentation

Number of Views:277
Avg rating:3.0/5.0
Slides: 117
Provided by: stevenad
Category:

less

Transcript and Presenter's Notes

Title: Chapters 17 and 18: System Catalog and Query Optimization


1
Chapters 17 and 18 SystemCatalog and Query
Optimization
Prof. Steven A. Demurjian, Sr. Computer Science
Engineering Department The University of
Connecticut 191 Auditorium Road, Box
U-155 Storrs, CT 06269-3155
steve_at_engr.uconn.edu http//www.engr.uconn.edu/st
eve (860) 486 - 4818
  • A portion of these slides are being used with the
    permission of Dr. Ling Lui, Associate Professor,
    College of Computing, Georgia Tech.
  • Other slides have been adapted from the AWL web
    site for the textbook.
  • Remaining slides represent new material.

2
Overview of Material
  • Key Background Topics
  • What are Typical Database Processing Actions?
  • Disk Drives and Disk Storage
  • Database Processing/Architectures
  • Motivating Query Optimization
  • Query Processing
  • Chapter 17 - System Catalog
  • What is it?
  • How is it Used?
  • Chapter 18 - Query Optimization in RDBMS
  • High-level Query Optimization (Algebraic)
  • Low-level Query Optimization (Cost-based)

3
Typical Database Processing
Parsed and Optimized User Trans.
Concurrency Control
Pre-Processing - Parser/Lexical - Optimizer/Views
Lock Request Response
Lock Request
User Transaction
Low-Level Processing - Enqueue Trans. - Request
Locks - Issue I/Os - Process Returned Data -
Integrity Checks - Security Checks - Logging for
Recovery - Release Locks - Dequeue Trans.
High-Level Processing - Enqueue Trans. - Request
Locks - Release Locks -Dequeue Trans.
Errors
Response to User
I/O Request
Post-Processing - Collection of Results -
Aggregation Operations - Security Checks
Errors Results
Results
Disk I/O
Recovery
4
What are the Processing Issues for DBs?
  • Database Applications of Today and Tomorrow
    Require High Volumes of Information!
  • Increase of Information Still Requires High
    Performance!
  • Throughput and Response Time
  • Where's the Bottleneck in DBS?
  • CPU ??
  • Main Memory Size/Speed ??
  • Virtual Memory Limitations ??
  • Communications Bus ??
  • I/O Channel ??

5
90-10 Rule for Database Processing
  • Load (Transaction per second) vs. Performance
    (Response Time of Transactions)
  • Processing of Large Amounts of Raw Data
  • Addressed in Secondary Storage
  • Staged to Main Memory
  • Identifying Relevant Data
  • Large Amounts of Raw Data Discarded
  • Focus on Data Most Likely to Contain Answers
  • Possible Loss of CPU and Main Memory Cycles
  • This is Double Jeopardy!
  • Load of DBS Must be Reduced
  • Performance of DBS Degrades

6
90-10 Rule for Conventional DBS
  • Only 10 of Relevant Data has Answers
  • Note Naive Approach to Database Searching Often
    Occurs (Little or No Indexing in Practice!)
  • Only 10 of Raw Data is Relevant

Application Programs
Operating System
Database Functions
On-Line I/O
Disk I/O
7
Randomly Accessed Storage Devices
  • Popular Media (Hard Drives, CDs, DVDs, etc.)
  • Access to Information in Any Order
  • Sequential Access Not Typically Supported or
    Needed, Since Files Not Stored Sequentially
  • Recall, Disk Defragmentation on PC Platform
  • Block-Oriented Utilization of Device
  • Block Access to Optimize Transfer
  • Block Size is Device/Controller Dependent
  • Linear/Non-Linear Byte Orders with Blocks
  • Key Concepts
  • Platter
  • Track
  • Sector
  • Cylinder
  • Read/Write Heads

8
Rotating Storage
Track
R/W Heads
Platters
Sector
Cylinder
Top View of a Surface
Note Parallel Read/Write Drives Activate All
Heads Simultaneously
9
Disk Drive Components
10
Disk Characteristics and Access
  • Transfer Time Time to Copy Bits From Disk
    Surface to Primary Memory
  • Disk Latency Time
  • Rotational Delay Waiting for Proper Sector to
    Rotate Under R/W Head
  • Rotate to Next Sector to Process Next Request
  • Disk Seek Time
  • Delay While R/W Head Moves to the Destination
    Track/Cylinder
  • Move Head In/Out to Seek Next Track/Cylinder
  • Access Seek (In/Out) Latency (Around)
    Transfer (Bytes)
  • For DBMS - Key is Moving Data To/From Disk ASAP
    w.r.t. Performance and Response Time
  • Improve on 90-10 via Processing/Optimization

11
Historical DB Architecture - Mainframe
12
Client/Server DBS Architecture
13
Mixed Architecture
14
Three and Four Tier Architectures
From http//java.sun.com/javaone/javaone98/sessio
ns/T400/index.html
15
What is MBDS?
  • MBDS is Multi-Process, Multi-Computer, Parallel
    Database System
  • MBDS Composed of
  • Host for Issuing User Requests
  • Controller to Interact with Host (and User)
  • One or More Backend Database Processors
  • Goals of MBDS
  • Suppose Request Takes 4 Minutes with One Backend
  • Improve Response Time by Increasing Backends
  • Two Backends - Request 2 Minutes
  • Four Backends - Request 1 Minutes

16
What is MBDS Architecture?
Database Blocks are Distributed Across All
Backends
Backend (BE) DB Processors are Replicated
Host User
Database ControllerSends Same Query in Parallel
to all BEs
Database Controller
BEs work in Parallel on Each Query and
Communicate for Join
Results are Sent to and Collected by the DB
Controller - then to the User
17
What are MBDS Processes?
Request Preparation
Post Processing
Database Controller
Get Msg.
Put Msg.
Get Msg.
Put Msg.
Directory Management
Record Processing
Backend Database Processor
Concurrency Control
Disk I/O
18
What are MBDS Messages?
No. Type SRC
DST 1 New Request
Host ReqP 2 Results of Request
PoPr Host 3 Number of Reqs in Transaction
ReqP PoPr 4 Aggregate Operators (Sum,
etc.) ReqP PoPr 6 Parsed Request to
Backends ReqP DM 12 Backend
Aggregate Operator Results RecP PoPr 15 Ids
for Accessing Database Indexes DM DMs 16
Request and Disk Addresses DM
RecP 21 Ids for Accessing Database Records
DM CC 22 Locks Obtained Okay to Execute
CC RecP 23 Request ID of Finished Request
RecP CC
19
Sample Processing of Retrieve Request
B3
A1
K12
Request Preparation
Post Processing
C4
K12
D6
Get Msg.
Put Msg.
F15 From Other Backend
E15 To Backend(s)
Get Msg.
Put Msg.
K12
Record Processing
D6,F15
E15
I16
Directory Management
J23
H22
Concurrency Control
G21
Disk I/O
20
What are Synchronization Issues in MBDS?
  • Coordination of Synchronous Behavior
  • Within Controller and Backend to Allow Multiple
    Active Requests within
  • Each Process
  • Requests at Different Stages in Different
    Processes
  • Between Controller and Backends to Allow
  • A Request to be Processed by All Backends
  • A Request to be Processed by One Backend
  • Among Multiple Backends to Allow a Backend
  • to Synchronize its Work on one Request with Other
    Backends
  • to Forward Results to Another Backend

21
Why is Query Optimization Needed?
  • Data Volume in any Type of Join or Cartesian
    Product has the Potential to be Very Large!
  • Consider R(A, B) r1, r2 , ..., rn
  • Consider S(C, D) s1, s2 , ..., sm
  • R x S r1 s1, r1 s2, r1 s3, r1 s4, r2 s1,
    r2 s2, r2 s3, r2 s4, which contains n x m
    tuples!
  • What is the Issue?
  • If n is 10,000 and m is 20,000 then
  • Cartesian Product has 200,000,000 Tuples
  • Join must Perform 200,000,000 Comparisons

22
Why is Query Optimization Needed?
  • n/m - Number of Tuples of R/S Respectively
  • bR / bS - Number of Tuples/Block of Memory
  • Assume that K Blocks Fit into Primary Memory

n / bR (m / bS ) Number of Blocks for R/S
1 Block of R
(m / bS )/(K-1)
Number of Times that K-1 Memory Chunk Filled by S
1
2
(n / bR )(m / bS )/(K-1)
K-1 Blocks of S
Which if Filled for Each Block of R
3

(n / bR ) (n / bR )(m / bS )/(K-1)
Total Block Reads Must also Read Blocks of R
K-1
23
Why is Query Optimization Needed?
If n m 10,000 and bR bS 5, and K
100(10,000/5)(10,000/5)(10,000/5)/99 42,400
Blocks to Read At 20 Blocks/Second - 35 Minutes!
24
Observation
  • Cartesian Product Yields Unwanted Data
  • SELECT R.AFROM R, SWHERE R.B S.C and S.C 99
  • In Relational Algebra?A (? BC and D99 (R x
    S)) ?A (? BC (R x ? D99 (S) )) ?A (R x BC
    (? D99 (S)))
  • Has Performance Improved? How?

25
Evaluation
  • Cartesian Product for SELECT - 40,000 Blocks
  • SELECT R.AFROM R, SWHERE R.B S.C and S.C 99
  • Relational Algebra with Equijoin?A (R x BC (?
    D99 (S)))The ? D99 (S) Limits the Size of S
    DramaticallyAs a Result, the Equijoin of R and ?
    D99 (S) Would Likely Reduce the Total Blocks
    Required to 4,000
  • Thus, a Smart Query Execution Strategy Can
    Dramatically Reduce the Amount of I/Os

26
Query Optimization Goal
  • Limit Costly Join Operation by Reducing Data to
    be Scanned or that Participates in the Join
  • Query Optimization is Strategy to Achieve Goal
  • While Improving Selection and Projection can
    Help, the Main Objective is Join
  • In Worst Case - Cartesian Product
  • Can Improve by Introducing Indices on the Join
    Attributes (R.B and S.C) to Limit Product
  • Can Further Improve by Sorting on the Join
    Attributes (R.B and S.C)
  • This Reduces Block Accesses by Limiting the
    Number of Blocks that Must be Examined in a Join
  • If Bs Values Range from 0 to 100 and C from 50
    to 150, only need to Compare from 50 to 100

27
Query Processing
  • Internal Data Structure
  • Memory Hierarchy
  • Main Memory Secondary Memory
  • Information Must be Staged from Secondary to
    Primary Memory for Database Operation
  • Sequential Search
  • Brute force Approach
  • Direct Access (Indexed Search)
  • Hash, Inverted Index file, Binary Search Tree,
    B-tree, B-tree
  • Improves Selection by Focusing on Subset of
    Tuples that are Involved in the Answer and
    Equijoin by Not Having to Compare All Blocks in
    Two Relations

28
Algorithms for Database Query Operators
  • Largely Fall into Three Classes
  • Sorting-Based Methods
  • Hash-Based Methods
  • Index-Based Methods
  • Such Algorithms are Divided into Three Degrees of
    Difficulty and Cost (Limiting Factor is Size of
    Data)
  • One Pass Algorithms
  • Where Data is Only Read Once From Disk
  • Two-pass Algorithms
  • Data is Read from Disk, Processed in Some Way,
    Written Back to Disk, Read Again for Processing,
    etc.
  • Multi-pass Algorithms
  • Where 3 or More Passes Are Required, i.e.,
    Recursive Generalization of the Two-pass
    Algorithms

29
Database Join and Sort are External
  • Suppose that your DBS has 1,000 1K Blocks of
    Memory Available for Performing Operations (e.g.,
    Select, Project, Join, Union, Aggregation, etc.)
  • Suppose Sort R by R.B
  • R Contains 5000 Blocks
  • In order to Perform a Sort/Merge - You Must Use
    External Algorithm since all 5000 Blocks Can Fit
    Into Memory at the Same Time
  • Suppose Join R (500 Blocks) and S (800 Blocks)
  • Again - their Total Exceeds Memory - Hence you
    Must Take an Approach that Compares One Block of
    R with All Blocks of S, etc. (Slides 22,23)

30
Database Join and Sort are External
  • Whats True about Todays DBMS Like Oracle?
  • Oracle Recommends 2 Gigabytes of Primary Memory
  • That 2 Gigabytes Must be Shared by
  • Operating System
  • Other Applications Running on Same Server (Web
    Server, etc.)
  • Database Management Software
  • Even if there was 1.5 Gigabytes Available, Modern
    DBs can Exceed that size Very Easily
  • Moreover,
  • Cartesian Product Could Exceed Available Mem.
  • Join Could Require External Approach Since All
    Tables Involved in Join Cant fit in 1.5
    Gigabytes
  • External Sorting/Block Oriented Processing is Norm

31
Algorithms for DB Query Operators
  • Relational Algebra Operators can be Classified
    into Three Groups
  • Tuple-at-a-time Unary Operators
  • Selection and Projection
  • No Need to Bring Entire Relation into Memory at
    One Time
  • Full-Relation Unary Operators
  • Duplicate Elimination and Grouping
  • Requires Seeing All or Most of the Tuples in
    Memory at Once
  • Full-Relation Binary Operators
  • Set and Bag Versions of Union, Intersection, and
    Difference, Joins, and Cartesian Products
  • Requires Seeing the Tuples of Both Relations in
    Memory

32
Query Access
SELECT EMP.ENAME FROM EMP, WORKS, PROJ WHERE
(EMP.ENO WORKS.ENO) AND (WORKS.PNO PROJ.PNO)
AND (PROJ.PNAME CAD/CAM)
33
Database Access
SELECT EMP.ENAME FROM EMP, WORKS, PROJ WHERE
(EMP.ENO WORKS.ENO) AND (WORKS.PNO PROJ.PNO)
AND (PROJ.PNAME CAD/CAM)
9
10
2
7
1
8
Schema
3
6
4
5
Physical/Internal Data Schema
Operating System
34
Database Access
  • User program A sends to DBMS an invoke command to
    retrieve a (set of) record
  • DBMS analyzes the external schema of the user
    program A and finds the database description of
    the record.
  • DBMS checks with the schema to get the data types
    and location information of record
  • DBMS checks with the physical schema to find out
    which device the record is in and what access
    methods can be used.
  • According to 4, DBMS sends OS a read command to
    execute the search.
  • OS issues the page invoke command to the
    correspond device, and then puts the page
    fetched into the system buffer.
  • DBMS uses the schema and the external schema to
    infer the logical structure of the retrieving
    record.
  • DBMS places the relevant data to the UWA, and
  • provides the status information at the program
    invocation exit

35
The System Catalog
  • Store the Meta Information that Describes Each
    Database, Including a Description of
  • Conceptual Database Schema (Logical Data Model)
  • Relations, Attributes, Keys, Indexes, Views
  • Internal Schema
  • External Schema
  • Store Information Needed by Specific DBMS Modules
  • Query Optimization Module
  • Security and Authorization

36
Metadata - What is it?
  • System metadata
  • Where data came from
  • How data were changed
  • How data are stored
  • How data are mapped
  • Who owns data
  • Who can access data
  • Data usage history
  • Data usage statistics
  • System metadata are critical in a DBMS
  • Application metadata
  • What data are available
  • Where data are located
  • What the data mean
  • How to access the data
  • Predefined reports
  • Predefined queries
  • How current the data are
  • Application metadata are critical in a database
    system

37
Metadata v.s. Data
relations
  • Meta schema
  • describes all schemata that can be defined in the
    data model
  • Data Dictionary Schema
  • contains copy of metaschema schema for format
    definitions schema for data about application
    data
  • Data Dictionary Data
  • schema for application data metadata about
    application data
  • Data
  • raw formatted application data

rel-name
att-name
dom-name
access-rights
user
relation
operation
relations
rel-name
att-name
dom-name
(u1, supplier, insert)
(u2, supplier, delete)
supplier
s
sname
location
(s1, smith, london)
(s2, jones, boston)
38
Example of Catalog Information
39
Relational DBMS Catalog
  • All Metadata Stored as Relations
  • Example of Metadata Tables are

40
EER Diagram for Relational Catalog
41
Metadata in Oracle
  • Complex Data Dictionary
  • All Schema Objects (Tables,Views, Indices, )
  • User, All, and DBA Views

SELECT FROM ALL_CATALOG WHERE
OWNERSMITH
SELECT COLUMN_NAME, DATA_TYPE, DATA_LENGTH,
NUM_DISTINCT, LOW_VALUE, HIGH_VALUE FROM
USER_TAB_COLUMS WHERE TABLE_NAMEORDERS
42
Metadata in Oracle
SELECT PCT_FREE, INITIAL_EXTENT, NUM_ROWS,
BLOCKS, EMPTY_BLOCKS, AVG_ROW_LENGTH FROM
USER_TABLES WHERE TABLE_NAME ORDERS
SELECT INDEX_NAME, UNIQUENESS, BLEVEL,
LEAF_BLOCKS, DISTINCT_KEYS, AVG_LEAF_BLOCKS_PER_K
EY, AVG_DATA_BLOCKS_PER_KEY FROM
USER_INDEXES WHERE TABLE_NAME ORDERS
43
Uses of System Catalog
  • DDL Compilers
  • Correct Definition ofRelations and Attributes
  • DML (Query) Compiler
  • DML Parser
  • Guided by the Description of DML Syntax and the
    Schema Information in the Catalog, Generates a
    Query Tree after Parser
  • Optimizer
  • Generates Access Paths that is Relatively Optimal
    for Executing a Query/ DML Command, by Accessing
    the Database Structure Information (Schemas), and
    Mapping High-level SQL Queries Into Low-level
    File Access Commands

SELECT EMP.ENAME FROM EMP, WORKS, PROJ WHERE
(EMP.ENO WORKS.ENO) AND (WORKS.PNO PROJ.PNO)
AND (PROJ.PNAME CAD/CAM)
44
Revisit Typical Database Processing
Parsed and Optimized User Trans.
Concurrency Control
Pre-Processing - Parser/Lexical - Optimizer/Views
Lock Request Response
Lock Request
User Transaction
Low-Level Processing - Enqueue Trans. - Request
Locks - Issue I/Os - Process Returned Data -
Integrity Checks - Security Checks - Logging for
Recovery - Release Locks - Dequeue Trans.
High-Level Processing - Enqueue Trans. - Request
Locks - Release Locks -Dequeue Trans.
Errors
Response to User
I/O Request
Post-Processing - Collection of Results -
Aggregation Operations - Security Checks
Errors Results
Results
Disk I/O
Recovery
45
Typical Database Processing
  • Pre-Processing
  • Actions Taken Upon Receipt of a Query from User
  • SQL Query via Query Tool or JDBC Call
  • Compilation of DB Query
  • Check Syntax, Optimize, Develop Run-Time Strategy
    (Similar to PL Compilation)
  • Query is Translated to DB Transaction
  • A Transaction Contains Multiple DB Operations
  • Transaction has Explicit Order of Operations
  • Database Transaction Must Succeed or Fail
  • There is no Intermediate State
  • Completely Executed and Committed or Aborts at
    any Point and Undone
  • New State or Previous State of DB

46
Typical Database Processing
  • High-Level Processing
  • Enqueue Transaction from Pre-Processing
  • Transaction Must Wait for Earlier Transactions
  • Remember - Shared DB State!
  • Request Locks from Concurrency Control
  • All Locks Before Proceeding vs. Locks as Needed
  • Avoid Deadlock and Livelock
  • Release Locks
  • As Use of Data Completes to Increase Availability
  • What Happens if Failure of Later Step in
    Transaction
  • Dequeue Transaction
  • Completes Transaction Processing
  • Return Result to Post-Processing

47
Typical Database Processing
  • Low-Level Processing
  • Enqueue Transaction - Do Actual DB Operations
  • Request Locks - Lower Granularity Level
  • Issue I/Os - Based on Operations to Access
    Correct and Relevant DB Records
  • Process Returned Data - Aggregation, Sorting
  • Integrity Checks Do I/D/U Satisfy Constraints?
  • Security Checks Is DB R/I/D/U Allowed?
  • Logging for Recovery - Commit the Transaction
  • Release Locks - Available to Others
  • Dequeue Transaction - Return Results to
    High-Level Processing
  • Note The Multiple Operations of Each DB
    Transaction All Must be Successful

48
Typical Database Processing
  • Post Processing
  • Collection of Results
  • May be Passed Portions of Results as they
    Complete
  • For Example, Sorted Blocks of Data that are then
    Merged in a Final Step
  • Aggregation Operations
  • May be Passed Aggregate Intermediate Results
  • Sum for Different Departments to be Totaled
  • Security Checks
  • Last Step Filtering to Insure Only Allowed Data
    is Returned
  • May Execute Query but Only see Aggregate Result
  • Send Results to User

49
Typical Database Processing
  • Concurrency Control
  • Control Access to Information
  • Data and Metadata
  • Prevent Simultaneous Updates
  • Ensure Database Always Correct and Consistent
  • Serial Schedule vs. Serializable Transaction
  • Two Types
  • Pessimistic - Locking-Based - Assume Collisions
    Will Occur - e.g., Peoplesoft Course Registration
  • Optimistic - Time-Based - Fix Problems After the
    Fact - e.g., ATM Machines Example
  • CC Manages Locks at Different Granularity Levels
    (Table, Attribute, View, Tuple, Metadata, etc.)

50
Typical Database Processing
  • Disk I/O
  • Performs the Actual Disk I/O for Read/Writes
  • Block Oriented Activity
  • Maintain Queue of All I/O Requests
  • Ordering is Critical
  • Related to Concurrency Control and Consistency
  • Single DB Transactions can have Multiple DB
    Operations
  • Disk I/Os for Different Operations at Different
    Times
  • High and Low Level Processing will Determine What
    Operations Needed When
  • Disk I/O - Relatively Dumb

51
Typical Database Processing
  • Recovery
  • Tightly Tied to DB Transaction Concept
  • Transactions Must be
  • Atomic - Happens or Doesnt
  • Durable - Once Committed, Results Survive Failure
  • Consistent - Follows Protocol/Correct DB State
  • When Failure Occurs, Can we
  • Recover to a Correct Earlier State
  • Reconcile all Active Transactions that were
    Executing at Failure Time
  • Involves Logging of Database Actions
  • Objective High Availability and Reliability

52
Query Optimization
  • Not Really Optimizing, but Planning to Avoid Bad
    Execution Strategies
  • Models
  • Heuristics-Based
  • Apply Transformation Rules According to a General
    Strategy
  • Focus on Relational Algebra that Underlies Each
    Query
  • Improve the Order of Relational Operations
  • Cost-Based
  • Minimize a Cost FunctionI/O Cost CPU Cost
  • Subject to a Set of Constraints

53
Query Processing Methodology
High-level Calculus-based Query
EXTERNAL SCHEMA
Query Preprocessing
LOGICAL SCHEMA
Algebraic Query (a tree structure)
Query Optimization
INTERNAL SCHEMA
Execution Schedule (file access plan)
54
Query Preprocessing
  • Input Calculus Query on Base Relations
  • Normalization
  • Manipulate Query Quantifiers and Qualification
  • Analysis
  • Detect and Reject Incorrect Queries
  • Possible for Only a Subset of Relational Calculus
  • Simplification
  • Eliminate Redundant Predicates
  • Restructuring
  • Calculus Query implies Algebraic Query
  • More Than One Translation is Possible
  • Use Transformation Rules

55
Normalization
  • Lexical and Syntactic Analysis (Similar to
    Compilers)
  • Check Validity
  • Check for Attributes and Relations
  • Type Checking on the Qualification
  • Put into Normal Form
  • Conjunctive Normal Form
  • (p11?p12??p1n) ??????pm1?pm2??pmn)
  • Disjunctive Normal Form
  • (p11??p12??p1n) ??????pm1?pm2??pmn)
  • OR's Mapped into Union
  • AND's Mapped into Join or Selection

56
Refute Incorrect Queries
  • Example E(ENAME, ENO), P(JNO,JNAME),
    W(ENO,PNO,DUR) SELECT ENAME, JNAME
  • FROM E, P, W
  • WHERE DUR gt 27 AND DUR lt 25
  • Incorrect
  • Disjoint Components are Useless
  • Multiple Relations, Missing Joins, may not be
    incorrect, but may indicate Cartesian product
  • Contradictory
  • Qualification can not be Satisfied by any Tuple
  • DUR gt 27 AND DUR lt 25

57
Simplification
  • Why Simplify?
  • The Simpler the Query, the Less Work there is and
    the Better the Performance
  • How? Use transformation rules
  • Elimination of Redundancy
  • Idempotency Rules
  • p1 ? (p1) false
  • (p1 ??p2) (p1) ? ?(p2)
  • p1 ? false p1
  • Application of Transitivity
  • Use of Integrity Rules
  • Example
  • x gt a and x gt b
  • DUR gt 27 AND DUR gt 25

58
Restructuring
  • Convert Relational Calculus to Relational Algebra
  • Make use of Query Trees
  • Example
  • Find the names of employees other than J.
    Doe who worked on the CAD/CAM project for either
    1 or 2 years.
  • SELECT ENAME
  • FROM E, W, P
  • WHERE E.ENOW.ENO
  • AND W.JNOP.JNO
  • AND E.ENAMEltgt"J. Doe"
  • AND P.JNAME"CAD/CAM"
  • AND (W.DUR12 OR W.DUR24)

59
Query Optimization Objectives
  • Improving Performance
  • Arriving at a Query Plan of Execution
  • Analyzing the Relational Algebra Query
  • Replace Costly Operations
  • Do Selections and Projections Early
  • Optimization Heuristics for the Relational
    Algebra
  • Performing Selection and Projection Before Join
  • Combining Several Selections Over a Single
    Relation Into One Selection
  • Find Common Subexpressions
  • Algebraic Rewriting/transformation Rules
  • General Transformation Rules for Relational
    Algebra (Equivalence-preserving Algebraic
    Rewriting Rules)

60
Query Optimization An Example
  • Why is it important?
  • SELECT ENAME
  • FROM E,W
  • WHERE E.ENO W.ENO
  • AND W.RESP "Manager"
  • Strategy 1
  • ?ENAME(?RESP"Manager"?E.ENOW.ENO(E ? W))
  • Strategy 2
  • ?ENAME( E ENO(?RESP"Manager"(W)))

61
Cost of Alternatives
  • Assume
  • card(E) 4,000 card(W)10,000
  • 10 of tuples in W satisfy RESP"Manager"
    (selection generates 1,000 tuples)
  • Execution time Proportional to the Sum of the
    Cardinalities of the Temporary Relations
  • Searching is Done by Sequential Scanning
  • Strategy 1 Strategy 2
  • Cartesian prod. 40,000,000 Selection over W
    10,000
  • Search over all 40,000,000 Join(40001000)
    4,000,000
  • 80,000,000 4,010,000

62
General Query Optimization Strategy
  • Perform Selections Early
  • Yields Smaller Intermediate Results
  • Direct Impact on Subsequent Join/Cartesian Prod.
  • Combine Selections with a Prior Cartesian Product
    into a Theta or Equi Join
  • Join is a Cheaper Operation
  • Combine (Cascade) Selections and Projections
    ?AB(?B (R)) ? ?AB(R) ?p1 (? p2 (R)) ? ?p1 p2
    (R) This Results in One Pass Instead of Two
    over Table

63
General Query Optimization Strategy
  • Identify Common Subexpressions
  • Compute Once and Store
  • use Stored Version for Subsequent Times
  • Often Useful When Views are Employed
  • Preprocess Data via Sorts and Indexes
  • Speeds up Searches and Joins by Limiting Scope
  • Evaluate and Assess Different Options
  • For Cartesian Product, Use Smaller Relation for
    Comparison
  • Use System Catalog (Meta-data) to Effect Order in
    Query Execution Plan

64
Relational Algebra Transformations
  • Cascade of Selection
  • ?p1 p2 pn(R)????p1(?p2(...(?pn(R))...))
  • Commutativity of Selection
  • ?p1(??p2(R))????p2(?p1(R))
  • ?p1 or?p2(R )????p1(R??? ?p2(R)
  • Cascade of Projection
  • ?A1,A2, An(R)????A1(?A2(...(?An(R))...))
    ???A1(R) if A1? A2? ... ? An
  • Commuting Selection with Projection
  • ?A1,A2,...,An(?p(R))????p(?A1,A2,...,An(R)

65
Relational Algebra Transformations
  • Commutativity of Theta Join and Cartesian Product
  • R A S???S A R
  • R ? S???S ? R
  • Commuting Selection with Theta Join (Cartesian)
  • ?p(A)(R ??S) ????p(A)(R)) ??S A defined on R
    only
  • ?p(A)p(B)(R ??S) ??? ?p(A)(R)) ?? (?p(B)(S)) (A
    defined on R, B defined on S)
  • Also Holds for Theta Join as Well
  • Commuting Projection with Theta Join (Cartesian)
  • ?C(R ??S) ???A(R) ???B(S) where A?BC
  • A are Attributes in C for R and B are Attributes
    in C for S

66
Relational Algebra Transformations
  • Commutativity of Set Operations
  • R ??S ??S ??R
  • R ??S ??S ??R
  • Associativity of Set Operations
  • (R ??S) ??T ??R ???S ??T)
  • (R S) T ??R (S T)
  • (R ??S) ??S ??R ? (S ? T)
  • (R ??S) ??S ??R ??(S ??T)
  • Commuting Select with Set Operations
  • ?p(Ai)(R ??T) ???p(Ai)(R) ???p(Ai)(T) where Ai
    is defined on both R and T
  • ?p(Ai)(R ??T) ???p(Ai)(R) ???p(Ai)(T) where Ai
    is defined on both R and T

67
Relational Algebra Transformations
  • 11. Commuting Projection with Union
  • ?C(R q(Aj,Bk) S) ???A(R) q(Aj,Bk)
    ?B(S)
  • ?C(R ??S) ???A (R) ???B (S)where RA and SB
    C A' ??B' where A' ? A, B ? B
  • 12. Converting Selection/Cartesian Into Theta
    Join
  • ?C (R ??S) ? R S

C
68
Heuristics Algebraic Optimization Concepts
  • Using Cascade of Selections Rule, Break up Any
    Selections With Conjunctive Conditions Into a
    Cascade of Selections
  • Allows More Freedom in Moving Selections Down
    Different Branches of the Tree
  • Using Commutativity of Selections with Other
    Operations Rules, Move Each Selection Down the
    Query Tree as far as Possible
  • If Possible, Combine a Cartesian Product With a
    Selection Into a Join

69
Heuristics Algebraic Optimization Concepts
  • Using Associativity of Binary Operations,
    Rearrange the Leaf Nodes So That the Most
    Restrictive Selections Are Executed First
  • The Fewer Tuples the Resulting Relation Contains,
    the More Restrictive the Selection
  • Reducing the Size of Intermediate Results
    Improves Performance
  • Using Cascade of Projections and Commutativity of
    Projections with Other Operations, Move
    Projections Down the Query Tree as Far as
    Possible
  • Identify Subtrees that Represent Groups of
    Operations that can be Executed by a Single
    Algorithm

70
Heuristic Algebraic Optimization Algorithm
  • Use Rule 1 to Break up Selects with Conjunctions
    into a Cascade to Move them Down the Query Tree
  • Use Rules 2, 4, 6, and 10 to Commute Select with
    Project, Join, Cart. Prod., Union, and
    Intersection
  • Use Rule 5 (Commute) and 9 (Associative) to
    Rearrange the Leaf Nodes of Query Tree to
  • Most Restrictive Select Executed First
  • Avoid Cartesian Product in Leaf Nodes
  • Use Rule 12 to Convert a Select/Cart Prod to Join
  • Use Rules 3, 4, 7, and 11 to Cascade and Commute
    Project - Pushing Down Tree as Far as Possible
  • Identify Subtrees that Can Execute as Independent
    Algorithms (Set of Operations)

71
Heuristic Optimization Example
Canonical query tree at the end of query
preprocessing phase
?ENAME
?(DUR12 OR DUR24) AND JNAMECAD/CAM AND
ENAMEltgtJ. DOE
E(ENAME, ENO) P(JNO,JNAME) W(ENO,PNO,DUR)
JNO
ENO
P
W
E
72
Heuristic Optimization Example
?ENAME
?DUR12 OR DUR24
?JNAMECAD/CAM
Use cascading of selections rule to decompose
selections
?ENAME ltgt J. DOE
JNO
ENO
P
W
E
73
Heuristic Optimization Example
?ENAME
?DUR12 OR DUR24
?JNAMECAD/CAM
Push selection down using commutativity of
selection over join
JNO
ENO
?ENAME ltgt "J. Doe"
W
P
E
74
Heuristic OptimizationExample
?ENAME
?DUR12 OR DUR24
Push selection down using commutativity of
selection over join
JNO
ENO
?JNAME "CAD/CAM"
?ENAME ltgt "J. Doe"
P
W
E
75
Heuristic OptimizationExample
?ENAME
JNO
Push selection down
ENO
?JNAME "CAD/CAM"
?ENAME ltgt "J. Doe"
?DUR 12 ??DUR24
P
E
W
76
Heuristic OptimizationExample
?ENAME
JNO
Do early projection
?JNO,ENAME
ENO
?JNO
?JNO,ENAME
?JNO,ENO
?JNAME "CAD/CAM"
?ENAME ltgt "J. Doe"
?DUR 12 ??DUR24
P
E
W
77
Heuristic OptimizationExample
?ENAME
Identify subtrees that can be implemented in one
algorithm
JNO
?JNO,ENAME
ENO
?JNO
?JNO,ENAME
?JNO,ENO
?JNAME "CAD/CAM"
?ENAME ltgt "J. Doe"
?DUR 12 ??DUR24
P
E
W
78
Heuristic Optimization A Second Example
BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pnam
e, Paddr, Pcity) BORROWERS(Name, Addr, City,
Card_No) LOANS(Card_No, LC_No, Date)
Let XLOANS ?S(?F(Loans x Borrowers x
Books)) where S Title, Author, Pname, LC_No,
Name,
Addr, City, Card_No, Date and F
Borrower.Card_No Loans.Card_No
Books.LC_No Loans.LC_No
79
Heuristic Optimization A Second Example
XLOANS
80
Heuristic Optimization A Second Example
?
Title
?
Date ? 1/1/88
?
Title, Author, Pname, LC_No, Name, Addr, City,
Card_No, Date
?
Borrower.Card_No Loans.Card_No Books.LC_No
Loans.LC_No
X
Query ?TITLE(?Date ? 1/1/88 (XLOANS))
Books
X
Loans
Borrower
81
Heuristic Optimization A Second Example
?
?
Title
Date ? 1/1/88
Try to Cascade
X
Books
X
Loans
Borrower
82
Heuristic Optimization A Second Example
?
Title
Commute Selectand Project
X
Books
X
Loans
Borrower
83
Heuristic Optimization A Second Example
?
Title
Commute Selectand Select
X
Books
X
Loans
Borrower
84
Heuristic Optimization A Second Example
?
Title
X
Commute Select andCartesian Product Two Levels
Down
Books
X
Borrower
Loans
85
Heuristic Optimization A Second Example
?
Title
Try to Cascade
Borrower.Card_No Loans.Card_No
X
Books
X
Borrower
Loans
86
Heuristic Optimization A Second Example
?
Title
X
Commute Select andCartesian Product One Level
Down
Books
?
Borrower.Card_No Loans.Card_No
X
Whats Next?
Borrower
Loans
87
Heuristic Optimization A Second Example
?
Title
X
CombineProjections
Books
?
Borrower.Card_No Loans.Card_No
X
Borrower
What is Still a Problem? We are Not Projecting so
All Attributes are Still Collected Until the
Final Project!
Loans
88
Heuristic Optimization A Second Example
X
Books
?
Borrower.Card_No Loans.Card_No
X
Borrower
Add Strategic Projections to Send Only the
Minimum Up the Tree as Needed for Join/Result Set
Loans
89
Heuristic Optimization A Second Example
What is the Final Step? Combine Select and
Cartesian Product Result Equijoins!
X
Books
?
Borrower.Card_No Loans.Card_No
X
Borrower
Loans
90
Heuristics Query Optimization Summary
  • First Apply Operations that Reduce the Size of
    Intermediate Results
  • Move Selections and Projections Down the Tree as
    far as Possible
  • Early Selections Reduce the Number of Tuples
  • Early Projections Reduce the Number of Attributes
  • Selection and Join Should be Executed Before
    Other Similar Operations.
  • This is Accomplished by Reordering the Leaf Nodes
    of the Tree Among Themselves and Adjusting the
    Rest of the Tree Appropriately

91
Cost-Based Optimization
  • Reduce Defined Cost of Executing Queries
  • What is Involved in the Cost of Executing a
    Query?
  • Access Cost to Secondary Storage
  • Search for Data Block (Index)
  • Read/Write Index and Data Blocks
  • Storage Cost
  • Index and Data Blocks
  • Intermediate Files
  • Computation Cost
  • Query Planning - Optimization Effort
  • Record Search, Sort, Merge
  • Actual Transaction/Query Operations
  • Communications Cost
  • Transfer of Results to the User

92
Complexity of Relational Operations
  • Assuming
  • Relations of Cardinality n
  • Sequential Scan of Data in each Relation
  • Complexity of Each Operation is Indicated
  • Avoid Cartesian Product at All Costs!

Operation
Complexity
Select
Project
O(n)
(w/o duplicate elimination)
Project
O(nlog n)
(with duplicate elimination)
Group
Join
O(nlog n)
Division
Set Operators
Cartesian Product
O(n2)
93
Cost-Based Optimization
  • To Understand Cost-Based Operations, we Must
    Focus on Implementation Strategy of
  • Select
  • Project
  • Join
  • For Select and Project - There is a Fixed Cost
    that we Must Live With
  • For Join
  • Implementation Strategy
  • Different Join Strategies
  • Objective
  • Minimize the Number of Blocks Involved
  • Note that Cost-Based and Relational Algebra
    Heuristic Optimization Can Complement One Another

94
Implementation of SELECT
  • Principles
  • Equality Eliminates Many Tuples
  • Index Focuses and Limits Search Scope
  • Sequential Scan
  • Brute Force
  • Search All Records to Find Matching Ones
  • Binary Search
  • Equality Comparison on a Key Attribute
  • Primary Index or Hash Key for Single Record
  • Equality Comparison on a Key Attribute With
    Primary Index or Hash Key
  • Go Directly to Record No Need to Scan Entire
    Table
  • Cost to Maintain Index/Hash

95
Implementation of SELECT
  • Primary Index for Multiple Records
  • Use Primary Key to Find the Equality Attribute
  • Go Forward (gt or ?) or Backward (lt or ?)
    According to the Comparison Operator
  • Clustering Index for Multiple Records
  • Equality Comparison on a Non-key Attribute With a
    Clustering Index (e.g., Sort-Merge Algorithm)
  • Secondary Index
  • Equality or Range Queries
  • Primary Indexes Play a Role Similar to Searching
    Sorted Array
  • Well Discuss Indexing Techniques at a Later Time

96
Implementation of SELECT
  • Conjunctive Selection (C1? C2 ? ? CN)
  • If One of the Conjuncts has a Good Access Path,
    Use it and Check the Other Conjuncts for Each of
    These Records
  • Composite Index
  • If an Index has Been Established Jointly for a
    Number of Attributes in the Conjunct
  • Equality Condition
  • Intersection of Pointers
  • If Secondary Indexes Exist on All or Most of the
    Attributes in the Conjunct and the Indexes
    Include Record Pointers
  • Retrieve Each Attribute Using These Indexes and
    Then Take Their Intersection

97
Implementing PROJECT
  • If ltAttribute Listgt Includes Key
  • Simple Since the Cardinality of the Result is the
    Same as the Cardinality of the Original Relation
  • No Need to Remove Duplicates - Key Attribute
  • If ltAttribute Listgt Does Not Include Key
  • Duplicates Allowed
  • Duplicate Elimination
  • Sort After Projection and then Eliminate
    Consecutively Appearing Duplicates
  • See Textbook for Algorithms
  • Use Hashing Hash Each Record Into a Bucket and
    Check Against Records Already in That Bucket
  • Size Estimation card(?A(R))card(R)

98
Implementing JOIN
  • Nested Loop
  • Simple Iteration and Block-Oriented Iteration
  • For Each Block in R do
  • Retrieve Every Record from S and Test Join
    Condition
  • An Index for S may Speed up the Inner Loop
  • Smaller Relation should be Outer Loop
  • Calculation of I/O
  • Let bo (bi) be the Number of Blocks taken up by
    Outer (Inner) Relation
  • Let nB (gt1) the Buffer Size (in blocks) Devoted
    to Arguments
  • Let bR be the size of the Resulting Relation (in
    blocks)
  • Total no. of Block Access bo ?bo/(nB-1)??bi
    bR

99
Implementing JOIN
  • Sort-Merge Join
  • Physically Sort Relations R and S
  • Scan R and S in the Sorted Order and Merge
  • See Algorithm in Textbook
  • If Files are Not Physically Sorted, but Sorted on
    the Join Attributes, a Variation May be used
  • Quite Inefficient Since Records are Scattered
    Over the Disk
  • Total number of block access b bi
    bolog2bo bilog2bi bR

100
Implementing JOIN
  • Hash Join
  • Hash R and S Using the Same Hash Function
  • If Hash File Can Be Memory-Resident, it is
    Efficient and Easy to Implement
  • If Buffer Space is Insufficient, then Part of the
    Hash File has to be on Disk
  • Various Optimizations for this Case
  • Hybrid Hash Join is Described in the Book
  • Again - Biggest Problem is Overhead Associated
    with Maintaining Hash Index Over Time

101
Access Using Indices Estimation of Costs
Example Given a bank database consisting of
the following three relation schemas Branch(bank
-name, assets, bank-city) Deposit(bank-name,
account-number, customer-name, balance) Customer(c
ustomer-name, street, zipcode, customer-city) Co
nsider the SQL query for the bank
database Select account-number From Deposit
Where bank-name BofA and customer-name
Bill and balance gt 1000
102
Heuristic Optimization
  • Use Cascading of Selections Rule to Decompose,
    Three Logical Query Plan Alternatives Are
    Obtained
  • Objective - Choose the Best Alternative in
    Terms of Execution Time (Block Reads)
  • What should be the Focus in Select Order?

?Account-Number
?Account-Number
?Account-Number
? bank-name BofA
?balance gt 1000
? bank-name BofA
?customer-name Bill
?balance gt 1000
?customer-name Bill
?balance gt 1000
?customer-name Bill
? bank-name BofA
Deposit
Deposit
Deposit
103
Access Using Indices Estimation of Costs
Assumptions 100 Different Banks
(bank-name) 1000 Customers (on average) per
bank Balance could range from 0 to 10,000
dollars Branch(bank-name, assets,
bank-city) Deposit(bank-name, account-number,
customer-name, balance) Customer(customer-name,
street, zipcode, customer-city) Select
account-number From Deposit Where bank-name
BofA and customer-name Bill and
balance gt 1000
104
Estimation of Cost of Access - Version 1
Branch(bank-name, assets, bank-city) Deposit(bank-
name, account-number, customer-name,
balance) Customer(customer-name, street, zipcode,
customer-city)
  • Recall Assumptions
  • 100 Banks
  • 1000 Customers/Bank
  • 0 to 10,000 dollars/account that are Distributed
    Evenly Across Accts.
  • Tuples in Deposit? 100,000
  • What Does balance gt 1000 do?
  • Retrieve 90 of Accounts
  • All Banks, All Customers
  • What Does customer-name bill do?
  • All Customers Named Bill Regardless of the Bank
  • Is this a Good Strategy?

105
Estimation of Cost of Access - Version 2
Branch(bank-name, assets, bank-city) Deposit(bank-
name, account-number, customer-name,
balance) Customer(customer-name, street, zipcode,
customer-city)
  • Recall Assumptions
  • 100 Banks
  • 1000 Customers/Bank
  • 0 to 10,000 dollars/account that are Distributed
    Evenly Across Accts.
  • Tuples in Deposit 100,000
  • What Does bank-name BofA do?
  • Retrieves 1000 Tuples for BofA on Average
  • What Does customer-name bill do?
  • The Customer Bill
  • What Does balance gt 1000 do?
  • Is this a Good Strategy?

106
Estimation of Cost of Access - Version 3
Branch(bank-name, assets, bank-city) Deposit(bank-
name, account-number, customer-name,
balance) Customer(customer-name, street, zipcode,
customer-city)
  • Recall Assumptions
  • 100 Banks
  • 1000 Customers/Bank
  • 0 to 10,000 dollars/account that are Distributed
    Evenly Across Accts.
  • Tuples in Deposit 100,000
  • What Does customer-name bill do?
  • Retrieves 100 Tuples
  • One per Bank
  • What Does balance gt 1000 do?
  • Do they Have Enough Money?
  • What Does bank-name BofA do?
  • Is this a Good Strategy?

107
Join Strategies
  • Several Factors Influence the Selection of an
    Optimal
  • The Physical Order of Tuples in a Relation
  • The Presence of Indices and the Type of Index
    (Clustering or Nonclustering)
  • The Cost of Computing a Temporary Index for the
    Sole Purpose of Processing One Query
  • Example Consider the natural join
  • Deposit ? Customer
  • nDeposit 10,000.
  • nCustomer 200.
  • 20 tuples fit in one block for both relations
  • buffersize 2 blocks


108
Join Strategies Block-Oriented Iteration
  • Block-oriented Iteration
  • Process the relations on a per-block basis
    rather on a per-tuple basis
  • Using this approach, a major saving in block
    accesses results
  • Example Consider the natural join Deposit ?
    Customer
  • nDeposit 10,000.
  • nCustomer 200.
  • 20 tuples fit in one block for both relations
  • Case 1 outerloop Deposit , inner loop Customer
  • reading Customer once for every block of Deposit
    tuples requires (200/20) (10,000/20) 10
    500
  • reading Deposit relation requires 10000/20 500
    block reads
  • the total cost in terms of block accesses is
    5500
  • -gt 5000 blocks accesses to Customer and
  • -gt 500 blocks accesses to Deposit

109
Join Strategies Block-oriented Iteration
  • Case 2 outerloop Customer, inner loop Deposit
  • Reading Deposit once for every block of Customer
    tuples requires (10,000/20) (200/20) 5000
  • Reading Customer relation requires 200/20
    10 block reads
  • The total cost in terms of block accesses is
    5010
  • gt5000 accesses to Deposit blocks and
  • gt10 accesses to Customer blocks
  • Case 3 If Customer relation is smaller enough
    to fit in main memory, our strategy requires only
    500 blocks to read Deposit relation and 10 blocks
    to read Customer relation.
  • The total comes to 510 blocks

110
Query Execution Cost Summary
  • Access Cost to Secondary Storage
  • Search for Data Block (Index)
  • Read/write Index and Data Blocks
  • Storage Cost
  • Index and Data Blocks
  • Intermediate Files
  • Computation Cost
  • Query Planning
  • Record Search, Sort, Merge
  • Actual Transaction/query Operations
  • Communications Cost
  • Data Transfer Across a Network

111
Access Plan
  • Access Plan is a Concrete Query Processing Plan
    which Presents a Detailed Strategy for Processing
    a Query
  • The Main Cost Factors to Be Considered Include
  • The Relational Operations to be Performed
  • Indices to be Used
  • The Order in Which Tuples are to be Accessed
  • The Order in Which Operations are to be Performed
  • Typical Focus is on Join and Optimizing its
    Execution, Particularly when Multiple Tables are
    Involved

112
Statistics
  • The Following are Kept in the System Catalog for
    Optimization Purposes
  • File Parameters Block Size
  • Number of Tuples in Each Relation
  • Size of Tuples
  • Key Fields, Indices
  • Number of Levels in an Index
  • Highest Key, Lowest Key
  • Number of Distinct Values (Maybe)
  • Others Frequency of Operations, Join Keys, Etc.
  • All DBMSs Keep the First Four, Many Keep All

113
Join Ordering
  • Given R S T W Determine the Best
    Ordering Alternative
  • ((R S) T) W
  • (R (S T)) W
  • R (S (T W))
  • ((R T) S) W
  • ((R W) S) T
  • (R S) (T W)
  • Ordering is Critical to Arrive at Best Strategy
    for Execution, Particularly as
  • Number of Relations Increase
  • Size of Relation (Tuples/Blocks) Increase

114
Query Optimization Search Strategies
  • Exhaustive Search
  • Optimal
  • Combinatorial Complexity in the Number of
    Relations
  • Heuristics
  • Not Optimal
  • Group Common Sub-expressions
  • Perform Selection, Projection First
  • Replace a Join by a Series of Semi-joins
  • Reorder Operations to Reduce Intermediate
    Relation Size
  • Optimize Individual Operations

115
Query Optimization Timing Issues
  • Static
  • Compilation gt Optimize Prior to the Execution
  • Difficult to Estimate the Size of the
    Intermediate Results gt Error Propagation
  • Can Amortize Over Many Executions
  • Dynamic
  • Run Time Optimization
  • Exact Information on the Intermediate Relation
    Sizes
  • Have to Reoptimize for Multiple Executions
  • Hybrid
  • Compile Using a Static Algorithm
  • If the Error in Estimate Sizes gt Threshold,
    Reoptimize at Run Time

116
Concluding Remarks
  • Most Systems Implement Only a Few Strategies
  • The Number of Strategies that are Considered by
    Any Query Optimizer is Limited
  • Some Systems Reduce the Number of Strategies by
    Making a Heuristic Guess of Strategy for Each
    Query
  • The Optimizer Considers Every Possible Strategy,
    but Terminates as Soon as it Determines the Cost
    is Greater than the Pre-chosen Strategy
  • Thus Only a Few Competing Strategies Require Full
    Analysis of the Cost
  • The Overhead of Query Optimization is Reduced
  • Remember - Trade off in Optimization Time
  • For PL - Optimization is Pre-Execution (Compile)
  • For DB - Optimization is Part of Execution (Run)
Write a Comment
User Comments (0)
About PowerShow.com