Database Systems II Query Execution

Introduction

- We have optimized the logical query plan,

applying relational algebra equivalences. - In order to refine this plan into a physical

query plan, we in particular need to choose one

of the available algorithms to implement the

basic operations (selection, join, . . . ) of the

query plan. - For each alternative physical query plan, we

estimate its cost. - The cost estimates are based on the size

estimates that we discussed in the previous

chapter.

Introduction

- Disk I/O (read / write of a disk block) is orders

of magnitude more expensive than CPU operations. - Therefore, we use the number of disk I/Os to

measure the cost of a physical query plan. - We ignore CPU costs, timing effects, and double

buffering requirements. - We assume that the arguments of an operator are

found on disk, but the result of the operator is

left in main memory.

Introduction

- We use the following parameters (statistics) to

express the cost of an operator- B(R) of

blocks containing R tuples,- f(R) max of

tuples of R per block,- M memory blocks

available in the buffer,- HT(i) levels in

index i,- LB(i) of leaf blocks in index i. - M may comprise the entire main memory, but

typically the main memory needs to be shared with

other processes, and M is much (!) smaller.

Introduction

- The performance of relational operators depends

on many parameters such as the following ones. - Are the tuples of a relation stored physically

contiguous (clustered)? If yes, the number of

blocks to be read is greatly reduced compared to

non-clustered storage. - Is a relation sorted by the relevant (selection,

join) attribute? Otherwise, it may need to be

sorted on-the-fly. - Which indexes exist? Some algorithms require the

existence of a corresponding index.

Introduction

- Each operator (selection, join, . . .) in a

logical query plan can be implemented by one of a

fairly large number of alternative algorithms . - We distinguish three types of algorithms-

sorting-based algorithms,- hash-based

algorithms,- index-based algorithms. - Sorting, building of hash table or building of

index can either have happened in advance or may

happen on the fly.

Introduction

- We can also categorize algorithms according to

the number of passes over the data - - one-pass algorithms read data only once

from disk,- two-pass algorithms read data

once from disk, write intermediate relation

back to disk and then read the intermediate

relation once.- multiple-pass algorithms

perform more than two passes over the data,

not considered in class.

One-Pass Algorithms for Unary Operations

- Consider the unary, tuple-at-a-time operations,

selection and projection on relation R. - Read all the blocks of R into the input buffer,

one at a time. - Perform the operation on each tuple and move the

selected / projected tuple to the output buffer.

Inputbuffer

Outputbuffer

Unaryoperation

R

One-Pass Algorithms for Unary Operations

- Output buffer may be input buffer of other

operation and is not counted. - Thus, algorithm requires only M 1 buffer

blocks. - I/O cost is B(R).
- If some index is applicable for a selection, have

to read only blocks that contain qualifying

tuples.

One-Pass Algorithms for Binary Operations

- Binary operations union, intersection,

difference, Cartesian product, and join. - Use subscripts B and S to distinguish between the

set- and bag version, e.g. - The bag union can be computed

using a very simple one-pass algorithm copy each

tuple of R to the output, and copy each tuple of

S to the output. - I/O cost is B(R) B(S), M 1.

One-Pass Algorithms for Binary Operations

- Other binary operations require the reading of

the smaller of the two input relations into main

memory. - One buffer to read blocks of the larger relation,

M-1 buffers for holding the entire smaller table. - I/O cost is B(R) B(S).
- In main memory, a data structure is built that

efficiently supports insertions and searches. - Data structure, e.g., hash table or binary

balanced tree. Space overhead can be neglected.

One-Pass Algorithms for Binary Operations

- For set union, read the smaller relation (S) into

M-1 buffers, representing it in a data structure

whose search key consists of all attributes. - All these tuples are also copied to the output.
- Read all blocks of R into the M-th buffer, one at

a time. - For each tuple t of R, check whether t is in S.

If not, copy t to the output. - For set intersection, copy t to output if it also

is in S.

Nested-Loop Joins

- We now consider algorithms for the join operator.
- The simplest one is the nested-loop join, a

one-and-a-half pass algorithm. - One table is read once, the other one multiple

times. - It is not necessary that one relation fits in

main memory. - Perform the join through two nested loops over

the two input relations.

Nested-Loop Joins

- Tuple-based nested-loop join
- natural join R S, join attribute

C - for each r ? R do
- for each s ? S do
- if r.C s.C then output (r,s)
- Outer relation R, inner relation S.
- One buffer for outer relation, one buffer for

inner relation. - M 2.
- I/O cost is T(R) x T(S).

Nested-Loop Joins

- Example
- Relations not clustered
- T(R1) 10,000 T(R2) 5,000
- R1 as the outer relation
- Cost for each R1 tuple t1
- read tuple t1 read relation R2
- Total I/O cost is 10,000 (15,000)50,010,000

Nested-Loop Joins

- Can do much better by organizing access to both

relations by blocks. - Use as much buffer space as possible (M-1) to

store tuples of the outer relation. - Block-based nested-loop join
- for each chunk of M-1 blocks of R do

read these blocks into the buffer - for each block b of S do

read b into the buffer for each tuple t of b

do find the tuples of R that join

with t and output the join results

Nested-Loop Joins

- Example
- R1 as the outer relation
- T(R1) 10,000, T(R2) 5,000
- S(R1) S(R2) 1/10 block
- M 101, 100 buffers for R1, 1 buffer for R2
- 10 R1 chunks
- cost for each R1 chunk
- read chunk 1,000 IOs
- read R2 5,000 IOs
- total I/O cost is 10 x 6,000 60,000 IOs

Nested-Loop Joins

- Can do even better by reversing the join order.
- R2 R1
- T(R1) 10,000, T(R2) 5,000
- S(R1) S(R2) 1/10 block
- M 101, 100 buffers for R2, 1 buffer for R1
- 5 R2 chunks
- cost for each R2 chunk
- read chunk 1,000 IOs
- read R1 10,000 IOs
- total I/O cost is 5 x 11,000 55,000 IOs

Nested-Loop Joins

- Finally, performance is dramatically improved

when input relations are clustered. - With clustered relations, for each R2 chunk
- read chunk 100 IOs
- read R1 1,000 IOs
- Total I/O is 5 x 1,100 5,500 IOs.
- Note that the IO cost for a one-pass join (which

has the minimum IO of any join algorithm) in this

example is 1,000 500 1,500 IOs. - For a comparison, the one-pass join requires

M501 buffer blocks, which is optimal.

Two-Pass Algorithms Based on Sorting

- If the input relations are sorted, the efficiency

of duplicate elimination, set-theoretic

operations and join can be greatly improved. - Reserve one buffer for each of the input

relations R and S and another buffer for the

output. - Scan both relations simultaneously in sort order,

merging matching tuples. - For example, for set intersection repeatedly

consider the tuple t that is least in the sort

order (w.r.t. primary key) among all tuples in

the input buffer. If it appears in both R and S,

output t.

Two-Pass Algorithms Based on Sorting

- In the following, we present a simple sort-merge

join algorithm. - It is called merge-join, if step (1) can be

skipped, since the input relations R1 and R2 are

already sorted. - Sort-merge join
- (1) if R1 and R2 not sorted, sort them
- (2) i ? 1 j ? 1
- while (i ? T(R1)) ? (j ? T(R2)) do
- if R1 i .C R2 j .C then outputTuples
- else if R1 i .C gt R2 j .C then j ? j1
- else if R1 i .C lt R2 j .C then i ? i1

Two-Pass Algorithms Based on Sorting

- Procedure outputTuples produces all pairs of

tuples from R1 and R2 with C R1 i .C R2 j

.C. - In the worst case, need to match each pairs of

tuples from R1 and R2 (nested-loop join). - Procedure outputTuples
- While (R1 i .C R2 j .C) ? (i ? T(R1)) do
- jj ? j
- while (R1 i .C R2 jj .C) ? (jj ?

T(R2)) do - output pair R1 i , R2 jj
- jj ? jj1
- i ? i1

Two-Pass Algorithms Based on Sorting

Example i R1i.C R2j.C j 1 10

5 1 2 20 20 2 3 20

20 3 4 30 30 4 5 40

30 5 50 6 52 7

Two-Pass Algorithms Based on Sorting

- Example
- Both R1, R2 ordered by C relations clustered.

Memory

..

R1

R1 R2

..

R2

Total cost read R1 cost read R2 cost

1,000 500 1,500 IOs

Two-Pass Algorithms Based on Sorting

- What if input relations are not yet in the

required sort order? - Do Two-Phase, Multiway Merge-Sort (2PMMS).
- Phase 1 Sort each block of relation R separately

in main memory, write sorted sublists back to

disk. - Phase 2 Merge all the B(R) sorted sublists.

input buffer (sorted)

selectsmallestunchosen

output buffer

. . .

pointer to firstunchosen tuple

Two-Pass Algorithms Based on Sorting

- Each sorted sublist has a length of M blocks.
- Number of sublists is B(R)/M.
- Therefore,
- This means we require
- In phase 1, each tuple is read and written once.

In phase 2, each tuple is read again. We ignore

the cost of writing the results to disk. - Thus, the IO cost is 3B(R).

Two-Pass Algorithms Based on Sorting

- IO cost is 4B(R), if sorting is used as a first

step of sort-join and the results must be written

to the disk. - If relation R is too big, apply the idea

recursively. - Divide R into chunks of size M(M-1), use 2PMMS to

sort each one, and take resulting sorted lists as

input for a third (merge) phase. - This leads to Multi-Phase, Multiway Merge Sort.

Two-Pass Algorithms Based on Sorting

- Example M101

(i) For each 100 blk chunk of R - read

chunk - sort in memory - write to disk

sorted chunks

R1

...

R2

Memory

Two-Pass Algorithms Based on Sorting

(ii) Read all chunks merge write out Sorted

file Memory Sorted Chunks

...

...

Two-Pass Algorithms Based on Sorting

Sort cost cach tuple is read, written,

read, written Join cost each tuple is read

Sort cost R1 4 x 1,000 4,000 Sort cost R2

4 x 500 2,000 Total cost sort cost join

cost 6,000 1,500 7,500 IOs

Two-Pass Algorithms Based on Sorting

- Nested loop join (best version discussed above)

needs only 5,500 IOs, i.e. outperforms sort-join. - However, the situation changes for the following

scenario - R1 10,000 blocks clustered
- R2 5,000 blocks not ordered.
- R1 is 1000 blocks, sorting needs M ? 31.62.R2

is 500 blocks, sorting needs M ? 22.36. I.e.,

need at least M32 buffers.

Two-Pass Algorithms Based on Sorting

- Nested-loops join 5000 x (10010,000) 50 x

10,100 - 100
- 505,000 IOs
- Sort-join 5(10,0005,000) 75,000 IOs
- Sort-join clearly outperforms nested-loop join!

Two-Pass Algorithms Based on Sorting

- Simple sort-join costs 5(B(R) B(S)) IOs.
- It requires
- It assumes that tuples with the same join

attribute value fit in M blocks. - If we do not have to worry about large numbers of

tuples with the same join attribute value, then

we can combine the second phase of the sort with

the actual join (merge). - We can save the writing to disk in the sort step

and the reading in the merge step.

Two-Pass Algorithms Based on Sorting

- This algorithm is an advanced sort-merge join.
- Repeatedly find the least C-value c among the

tuples in all input buffers. - Instead of writing a sorted output buffer to

disk, and reading it again later, identify all

the tuples of both relations that have Cc. - Cost is only 3(B(R) B(S)) IOs.
- Since we have to simultaneously sort both input

tables and keep them in memory, the memory

requirements are getting larger

Two-Pass Algorithms Based on Hashing

- If both input relations are too large to be

stored in the buffer, hash all the tuples of both

relations applying the same hash function to the

join attribute(s). - Hash function h has domain of k hash values, i.e.

k buckets. - Only tuples from R and S that fall into the same

bucket i can join. - Hash first relation R, then relation S, writing

the buckets to disk.

Two-Pass Algorithms Based on Hashing

- To hash relation R, read it block by block.
- Allocate one buffer block to each of the k

buckets. - For each tuple t, move it to the buffer of h(t).
- If a buffer is full, write it to disk and

initialize it. - Finally, write to disk all partially full buffer

blocks. - IO cost is B(R).
- Memory requirement M k1.

Two-Pass Algorithms Based on Hashing

- For each i, read the i-th bucket of R into

completely memory, and read the i-th bucket of S

into memory, one block at a time. - For each S tuple s in the buffer block, determine

matching tuples r in R and output (r,s). - We assume that each bucket fits into main memory.

Two-Pass Algorithms Based on Hashing

Hash join Hash function h, range 0 . . .

k Buckets for R1 G0, G1, ... Gk Buckets for R2

H0, H1, ... Hk Algorithm (1) Hash R1 tuples into

G buckets (2) Hash R2 tuples into H buckets (3)

For i 0 to k do match tuples in buckets Gi,

Hi and output results

Two-Pass Algorithms Based on Hashing

Example hash function even/odd buckets

R1 R2 Buckets 2 5 Even 4 4 3

12 Odd 5 3 8 13 9

8 11 14

2 4 8

4 12 8 14

3 5 9

5 3 13 11

R1 R2

Two-Pass Algorithms Based on Hashing

- R, S clustered (un-ordered).
- Use 100 hash buckets of 10 blocks each.
- To hash R read R, hash, and write buckets.
- Hash S in the same way.
- R

100

...

...

10 blocks

Two-Pass Algorithms Based on Hashing

- Suppose R is the smaller of the input relations.
- Read one R bucket, build memory hash table (with

other hash function). - Read corresponding S bucket, one block at a time,

and hash probe. - Repeat same procedure for all buckets.
- R

S

...

R

...

memory

Two-Pass Algorithms Based on Hashing

- Cost
- Bucketize Read R write
- Read S write
- Join Read R, S
- Total cost 3 x 1,000500 4,500 IO
- This is an approximation, since buckets will vary

in size, and we have to round up to full blocks.

Two-Pass Algorithms Based on Hashing

- Memory requirements
- Size of R bucket (B(R)/M-1)
- k M-1 number of hash buckets
- This is assuming that all hash buckets of R have

the same size. - Same calculation for S.
- The buckets for the smaller input relation must

fit into main memory.

Index-Based Algorithms

- Index-based algorithms are especially useful for

the selection operator, but also for the join

operator. - We distinguish clustering and non-clustering

indexes. - A clustering index is an index where all tuples

with a given search key value appear on (roughly)

as few blocks as possible. - One relation can have only one clustering index,

but multiple non-clustering indexes.

Index-Based Algorithms

Index join For each r ? R do X ?

index (S, C, r.C) for each s ? X do

output (r,s) index(rel, attr,

value) returns the set of rel tuples with attr

value

Index-Based Algorithms

- ExampleAssume R.C index exists 2 levels.Assume

S clustered, unordered.Assume R.C index fits in

memory. - Cost reads of S 500 IOs
- for each S tuple
- - probe index no IO
- - if match, read R tuple 1 IO.

Index-Based Algorithms

What is expected number of matching tuples?

(a) say R.C is key, S.C is foreign key then

expect 1 match

(b) say V(R,C) 5000, T(R) 10,000 with

uniform distribution assumption expect

10,000/5,000 2 matching tuples.

Index-Based Algorithms

What is expected number of matching tuples?

(c) Say DOM(R, C)1,000,000 T(R)

10,000 with alternate assumption expect

10,000 1 matches 1,000,000

100

Index-Based Algorithms

Total cost of index join

(a) Total cost 5005000(1)1 5,500 IO (b)

Total cost 5005000(2)1 10,500 IO (c) Total

cost 5005000(1/100)1550 IO

Index-Based Algorithms

What if index does not fit in memory?

- Example say R1.C index is 201 blocks.
- Keep root 99 leaf nodes in memory.
- Expected cost of each probe is
- E (0)99 (1)101 ? 0.5.
- 200 200

Index-Based Algorithms

- Total cost (including probes)
- For case (b)
- 5005000 probe get records
- 5005000 0.52 uniform assumption
- 50012,500 13,000 IOs
- For case (c)
- 50050000.5 ? 1 (1/100) ? 1
- 500250050 3,050 IOs

Summary Join Algorithms

- Nested-loop join ok for small relations

(relative to memory size). - Hash-join usually best for equi-join, where

relations not sorted and no indexes exist. - Sort-merge join good for non-equi-join e.g.,

R.C gt S.C. - If relations already sorted, use merge join.
- If index exists, index-join can be efficient
- (depends on expected result size).