Query Optimization, Concluded and Transactions and Concurrency - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Query Optimization, Concluded and Transactions and Concurrency

Description:

For queries over a single relation, queries consist of a combination of selects, ... be able to understand what's going on by considering each separate ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 40
Provided by: zack4
Category:

less

Transcript and Presenter's Notes

Title: Query Optimization, Concluded and Transactions and Concurrency


1
Query Optimization, Concludedand Transactions
and Concurrency
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 550 Database Information Systems
  • December 19, 2009

Some slide content derived from Ramakrishnan
Gehrke
2
Reminders
  • Project demos will be on the 10th - 14th
  • Please sign your group up!
  • Due on the 22nd by 6PM a 5-10 page report
    describing
  • What your project goals were
  • What you implemented
  • Basic architecture and design
  • Division of labor
  • 2nd Midterm 12/9, 430PM Final exam 12/22,
    6-8PM, DRL A1

3
Enumeration of Alternative Plansfor a Single
Query Block
  • There are two main cases
  • Single-relation plans
  • Multiple-relation plans
  • For queries over a single relation, queries
    consist of a combination of selects, projects,
    and aggregate ops
  • Each available access path (file scan / index) is
    considered, and the one with the least estimated
    cost is chosen.
  • The different operations are essentially carried
    out together (e.g., if an index is used for a
    selection, projection is done for each retrieved
    tuple, and the resulting tuples are pipelined
    into the aggregate computation).

4
Cost Estimates for Single-Relation Plans
  • Index I on primary key matches selection
  • Cost is Height(I)1 for a B tree, about 1.2 for
    hash index.
  • Clustered index I matching one or more selects
  • (NPages(I)NPages(R)) product of RFs of
    matching selects.
  • Non-clustered index I matching one or more
    selects
  • (NPages(I)NTuples(R)) product of RFs of
    matching selects.
  • Sequential scan of file
  • NPages(R).

5
Enumeration of Left-Deep Plans
  • Left-deep plans differ only in the order of
    relations, the access method for each relation,
    the join method
  • Enumerated using N passes (if N relations
    joined)
  • Pass 1 Find best 1-relation plan for each
    relation
  • Pass 2 Find best way to join result of each
    1-relation plan (as outer) to another relation
    (All 2-relation plans)
  • Pass N Find best way to join result of a
    (N-1)-relation plan (as outer) to the Nth
    relation (All N-relation plans)
  • For each subset of relations, retain only
  • Cheapest plan overall, plus
  • Cheapest plan for each interesting order of the
    tuples

6
Enumeration of Plans (Contd.)
  • ORDER BY, GROUP BY, aggregates etc. handled as a
    final step, using either an interestingly
    ordered plan or an addional sorting operator
  • An (n-1)-way plan is only combined with an
    additional relation if there is a join condition
    between them, or all predicates in WHERE have
    been used up
  • i.e., avoid Cartesian products
  • This approach is still exponential in the of
    tables
  • Approximately 2n cost enumerations

7
Cost Estimation for Multirelation Plans
SELECT attribute list FROM relation list WHERE
term1 AND ... AND termk
  • Consider a query block
  • Maximum tuples in result is the product of the
    cardinalities of relations in the FROM clause.
  • Reduction factor (RF) associated with each term
    reflects the impact of the term in reducing
    result size
  • Result cardinality Max tuples product of
    all RFs.
  • Join one new relation at a time
  • Cost of join method, plus estimation of join
    cardinality gives us both cost estimate and
    result size estimate

8
Example
Sailors B tree on rating Hash on
sid Reserves B tree on bid
  • Pass1
  • Sailors B tree matches ratinggt5, and is
    probably cheapest
  • However, if this selection retrieves many tuples
    and index is unclustered, sequential scan may be
    better
  • Still, B tree plan kept (tuples are in rating
    order)
  • Reserves B tree on bid matches bid500
    cheapest
  • Pass 2
  • Retrieve each plan retained from Pass 1
  • Consider how to join it as the outer relation
    with the (only) other relation
  • e.g., Reserves as outer Hash index can be used
    to get Sailors tuples that satisfy sid outer
    tuples sid value

9
The General Form Choice of Joinsat Each Level
RST
(RT)S
(ST)R
(RS)T
Level 3
for each ordering
RS
ST
RT
R ? T
T ? R
R?S
S ? T
S ? R
T ? S
Level 2
for each ordering
T
R
S
Level 1
for each ordering
10
Query Optimization Recapped
  • Must understand optimization in order to
    understand the performance impact of a given
    database design (relations, indexes) on a
    workload (set of queries)
  • Two parts to optimizing a query
  • Consider a set of alternative plans
  • Must prune search space typically, left-deep
    plans only
  • Must estimate cost of each plan that is
    considered
  • Must estimate size of result and cost for each
    plan node
  • Key issues Statistics, indexes, operator
    implementations

11
Plan Enumeration
  • All single-relation plans are first enumerated.
  • All access paths considered, cheapest is chosen
  • Selections/projections considered as early as
    possible.
  • Next, for each 1-relation plan, all ways of
    joining another relation (as inner) are
    considered.
  • Next, for each 2-relation plan that is
    retained, all ways of joining another relation
    (as inner) are considered, etc.
  • At each level, for each subset of relations, only
    best plan for each interesting order of tuples is
    retained

12
The Bigger Picture Tuning
  • We saw that indexes and optimization decisions
    were critical to performance
  • Many DBAs and consultants have made a living off
    understanding query workloads, data, and
    estimated intermediate result sizes
  • Recent development self-tuning DBMSs
  • SQL Server and DB2 Index Wizards take a query
    workload and try to find an optimal set of
    indices for it
  • Adaptive query processing tries to figure out
    where the optimizers estimates went wrong and
    compensate for it

13
From Queries to Updates
  • Weve spent a lot of time talking about querying
    data
  • Yet updates are a really major part of many DBMS
    applications
  • Particularly important ensuring ACID properties
  • Atomicity each operation looks atomic to the
    user
  • Consistency each operation in isolation keeps
    the database in a consistent state (this is the
    responsibility of the user)
  • Isolation should be able to understand whats
    going on by considering each separate transaction
    independently
  • Durability updates stay in the DBMS!!!

14
What is a Transaction?
  • A transaction is a sequence of read and write
    operations on data items that logically functions
    as one unit of work
  • should either be done entirely or not at all
  • if it succeeds, the effects of write operations
    persist (commit) if it fails, no effects of
    write operations persist (abort)
  • these guarantees are made despite concurrent
    activity in the system, and despite failures that
    may occur

15
How Things Can Go Awry
  • Suppose we have a table of bank accounts which
    contains the balance of the account
  • An ATM deposit of 50 to account 1234 would be
    written as
  • This reads and writes the accounts balance
  • What if two accountholders make deposits
    simultaneously from two ATMs?

update Accounts set balance balance 50 where
account 1234
16
Concurrent Deposits
  • This SQL update code is represented as a
    sequence of read and write operations on data
    items (which for now should be thought of as
    individual accounts)
  • where X is the data item representing the
    account with account 1234.

Deposit 1 Deposit
2 read(X.bal)
read(X.bal) X.bal X.bal 50 X.bal
X.bal 10 write(X.bal)
write(X.bal)
17
A Bad Concurrent Execution
  • Only one action (e.g. a read or a write) can
    actually happen at a time, and we can interleave
    deposit operations in many ways

Deposit 1 Deposit
2 read(X.bal)
read(X.bal) X.bal
X.bal 50
X.bal X.bal 10 write(X.bal)

write(X.bal)
18
A Good Execution
  • Previous execution would have been fine if the
    accounts were different (i.e. one were X and one
    were Y), i.e., transactions were independent
  • The following execution is a serial execution,
    and executes one transaction after the other

Deposit 1 Deposit
2 read(X.bal) X.bal
X.bal 50 write(X.bal)
read(X.bal)
X.bal X.bal 10

write(X.bal)
19
Good Executions
  • An execution is good if it is serial
    (transactions are executed atomically and
    consecutively) or serializable (i.e. equivalent
    to some serial execution)
  • Equivalent to executing Deposit 1 then 3, or vice
    versa
  • Why would we want to do this instead?

Deposit 1 Deposit
3 read(X.bal)
read(Y.bal) X.bal
X.bal 50
Y.bal Y.bal 10 write(X.bal)

write(Y.bal)
20
Atomicity
  • Problems can also occur if a crash occurs in the
    middle of executing a transaction
  • Need to guarantee that the write to X does not
    persist (ABORT)
  • Default assumption if a transaction doesnt commit

21
Transactions in SQL
  • A transaction begins when any SQL statement that
    queries the db begins.
  • To end a transaction, the user issues a COMMIT or
    ROLLBACK statement.

Transfer UPDATE Accounts SET balance
balance - 100 WHERE account 1234 UPDATE
Accounts SET balance balance 100 WHERE
account 5678 COMMIT
22
Read-Only Transactions
  • When a transaction only reads information, we
    have more freedom to let the transaction execute
    in parallel with other transactions.
  • We signal this to the system by stating

SET TRANSACTION READ ONLY SELECT FROM
Accounts WHERE account1234 ...
23
Read-Write Transactions
  • If we state read-only, then the transaction
    cannot perform any updates.
  • Instead, we must specify that the transaction may
    update (the default)

SET TRANSACTION READ ONLY UPDATE Accounts SET
balance balance - 100 WHERE account 1234
...
ILLEGAL!
SET TRANSACTION READ WRITE update Accounts set
balance balance - 100 where account 1234
...
24
Dirty Reads
  • Dirty data is data written by an uncommitted
    transaction a dirty read is a read of dirty data
    (WR conflict)
  • Sometimes we can tolerate dirty reads other
    times we cannot
  • e.g., if we wished to ensure balances never went
    negative in the transfer example, we should test
    that there is enough money first!

25
Bad Dirty Read
EXEC SQL select balance into bal
from Accounts where
account1234 if (bal gt 100) EXEC SQL
update Accounts set balance
balance - 100 where account
1234 EXEC SQL update Accounts
set balance balance 100
where account 5678 EXEC SQL COMMIT
If the initial read (italics) were dirty, the
balance could become negative!
26
Acceptable Dirty Read
  • If we are just checking availability of an
    airline seat, a dirty read might be fine! (Why is
    that?)
  • Reservation transaction

EXEC SQL select occupied into occ
from Flights where Num
123 and date11-03-99
and seat23f if (!occ) EXEC SQL
update Flights set
occupiedtrue where Num 123 and
date11-03-99 and
seat23f else notify user that seat is
unavailable
27
Other Undesirable Phenomena
  • Unrepeatable read a transaction reads the same
    data item twice and gets different values (RW
    conflict)
  • Phantom problem a transaction retrieves a
    collection of tuples twice and sees different
    results

28
Phantom Problem Example
  • T1 find the students with best grades who Take
    either cis550-f09 or cis570-f08
  • T2 insert new entries for student 1234 in the
    Takes relation, with grade A for cis570-f08 and
    cis550-f09
  • Suppose that T1 consults all students in the
    Takes relation and finds the best grades for
    cis550-f09
  • Then T2 executes, inserting the new student at
    the end of the relation, perhaps on a page not
    seen by T1
  • T1 then completes, finding the students with best
    grades for cis570-f08 and now seeing student 1234

29
Isolation
  • The problems weve seen are all related to
    isolation
  • General rules of thumb w.r.t. isolation
  • Fully serializable isolation is more expensive
    than no isolation
  • We cant do as many things concurrently (or we
    have to undo them frequently)
  • For performance, we generally want to specify the
    most relaxed isolation level thats acceptable
  • Note that were slightly violating a
    correctness constraint to get performance!

30
Specifying Acceptable Isolation Levels
  • The default isolation level is SERIALIZABLE (as
    for the transfer example).
  • To signal to the system that a dirty read is
    acceptable,
  • In addition, there are

SET TRANSACTION READ WRITE ISOLATION LEVEL READ
UNCOMMITTED
SET TRANSACTION ISOLATION LEVEL READ
COMMITTED SET TRANSACTION ISOLATION LEVEL
REPEATABLE READ
31
READ COMMITTED
  • Forbids the reading of dirty (uncommitted) data,
    but allows a transaction T to issue the same
    query several times and get different answers
  • No value written by T can be modified until T
    completes
  • For example, the Reservation example could also
    be READ COMMITTED the transaction could
    repeatably poll to see if the seat was available,
    hoping for a cancellation

32
REPEATABLE READ
  • What it is NOT a guarantee that the same query
    will get the same answer!
  • However, if a tuple is retrieved once it will be
    retrieved again if the query is repeated
  • For example, suppose Reservation were modified to
    retrieve all available seats
  • If a tuple were retrieved once, it would be
    retrieved again (but additional seats may also
    become available)

33
Implementing Isolation Levels
  • One approach use locking at some level (tuple,
    page, table, etc.)
  • each data item is either locked (in some mode,
    e.g. shared or exclusive) or is available (no
    lock)
  • an action on a data item can be executed if the
    transaction holds an appropriate lock
  • consider granularity of locks how big of an
    item to lock
  • Larger granularity fewer locking operations but
    more contention!
  • Appropriate locks
  • Before a read, a shared lock must be acquired
  • Before a write, an exclusive lock must be acquired

34
Lock Compatibility Matrix
  • Locks on a data item are granted based on a lock
    compatibility matrix
  • When a transaction requests a lock, it must wait
    (block) until the lock is granted

35
Locks Prevent Bad Execution
  • If the system used locking, the first bad
    execution could have been avoided

Deposit 1 Deposit
2 xlock(X) read(X.bal)
xlock(X)
is not granted X.bal X.bal 50
write(X.bal) release(X)
xlock(X)
read(X.bal)
X.bal X.bal 10

write(X.bal)
release(X)
36
Lock Types and Read/Write Modes
  • When we specify read-only, the system only uses
    shared-mode locks
  • Any transaction that attempts to update will be
    illegal
  • When we specify read-write, the system may also
    acquire locks in exclusive mode
  • Obviously, we can still query in this mode

37
Isolation Levels and Locking
  • READ UNCOMMITTED allows queries in the
    transaction to read data without acquiring any
    lock
  • For updates, exclusive locks must be obtained
    and held to end of transaction
  • READ COMMITTED requires a read-lock to be
    obtained for all tuples touched by queries, but
    it releases the locks immediately after the read
  • Exclusive locks must be obtained for updates and
    held to end of transaction

38
Isolation levels and locking, cont.
  • REPEATABLE READ places shared locks on tuples
    retrieved by queries, holds them until the end of
    the transaction
  • Exclusive locks must be obtained for updates and
    held to end of transaction
  • SERIALIZABLE places shared locks on tuples
    retrieved by queries as well as the index, holds
    them until the end of the transaction
  • Exclusive locks must be obtained for updates and
    held to end of transaction
  • Holding locks to the end of a transaction is
    called strict locking

39
Summary of Isolation Levels
Level Dirty Read
Unrepeatable Read Phantoms READ
UN- Maybe Maybe Maybe COMMITTED READ
No Maybe Maybe COMMITTED REPEATABLE No
No Maybe READ SERIALIZABLE No No No
Write a Comment
User Comments (0)
About PowerShow.com