Database Performance - PowerPoint PPT Presentation

1 / 49

About This Presentation

Title:

Database Performance

Description:

For example:- male/female. avoid indexing a column with too many distinct values ... Expression Transformation Rules. Examples (not complete) ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 50

Provided by: sues5

Category:

more less

Transcript and Presenter's Notes

Title: Database Performance

1
Database Performance
2
Contents

Database Performance
Denormalisation
Indexes
Clustering
Query Optimisation
Benchmarking
Wisconsin, TPC-C, 007, Bucky
Summary

3
Database Performance

Query performance is necessary to achieve
acceptable performance of a RDBMS
Various ways in which this can be achieved
De-normalisation of data to reduce joins
Creating indexes on frequently retrieved
attributes
Clustering tables to reduce the number of disk
reads
Automatic optimisation of queries

4
Normalisation

Normalisation improves the logical database
design and prevents anomalies BUT
More tables more joins
Joining gt 3 tables is likely to be slow
De-normalisation reverses normalisation for
efficiency purposes

5
Database Performance

Example
Branch(BranchNo, street, city, postcode,
mgrstaffno)
Could also be
Branch(BranchNo, street, postcode, mgrstaffno)
Postcode(Postcode,city)

6
De-normalisation

Advantages
Minimises need for joins
Reduces number of foreign keys in relations
Reduces number of indexes
Saves storage space
Reduces number of relations

7
De-normalisation

Disadvantages
Speed up retrievals, but may slow down updates
Increases application complexity
Relation size can increase
Sacrifices flexibility

8
Indexes

INDEXES
An index is a table or some other data structure
that is used to determine the location of a row
within a table that satisfies some condition.
Indexes may be defined on both primary and non
key attributes.

9
Indexes

Oracle allows faster access on any named table by
using an index.
each row within a table is given a unique value
or rowid.
each rowid can be held in an index.
an index can be created at any time.
any column within a table can be indexed.

10
When to create an Index?

Before any input of data for Unique index
After data input for Non-unique index

11
Creating Indexes

HOW DO YOU CREATE AN INDEX ?
EXAMPLE -
(a) CREATE INDEX TENIDX ON
TENANT(SURNAME)
(b) CREATE UNIQUE INDEX TENIDX ON
TENANT(SURNAME)

12
Index Guidelines

GUIDELINES FOR USE OF INDEXES
gt 200 rows in a table
a column is frequently used in a where clause
specific columns are frequently used as join
columns

13
Indexes

POINTS TO WATCH
avoid if possible gt 3 indexes on any one table
avoid indexing a column with too few distinct
values
For example- male/female
avoid indexing a column with too many distinct
values
avoid if gt 15 of rows will be retrieved

14
Clusters

A disk is arranged in blocks
Blocks are retrieved as a whole and buffered
Disk Access time is slow compared with Memory
access
Gains can be made if the number of block
transfers can be reduced

15
Database Performance

CLUSTERING
clusters physically arrange the data on disk so
that frequently retrieved info is stored together
allows 2 or more tables to be stored in the same
physical block
can greatly reduce access time for join
operations
can also reduce storage space requirements

16
Database Performance

CLUSTER DEFINITION
clustering is transparent to the user
no queries have to be modified
no applications need to be changed
tables are queried in the same way whether
clustered or not

17
Database Performance

DECIDING WHERE TO USE CLUSTERS
Each table can only reside in 1 cluster
At least one attribute in the cluster must be NOT
NULL
Consider the query transactions in the system
How often is the query submitted?
How time critical is the query?
Whats the amount of data retrieved?

18
Clustering Tables
Staff Table
Branch Table
Tables Clustered on Common Column
19
Database Performance

CLUSTERING EXERCISE

WAREHOUSE
STOCK
3
PRODUCT
1000
20
Database Performance

To speed up access time to data in these three
tables (WAREHOUSE, PRODUCT, STOCK) it is
necessary to cluster either STOCK around
WAREHOUSE, or STOCK around PRODUCT.
How do we decide which will be the most
efficient?
For the purpose of this exercise we will assume
that each block can hold 100 records.

21
Database Performance

If STOCK is clustered around PRODUCT
No of products 1000. There will be 1 record
for each PRODUCT in each WAREHOUSE. Therefore
each product would have 3 records
Each block would contain 100/3 products, i.e. 33
products. There would therefore be a 1 in 3
chance of accessing a particular stock item by
reading one block of data.

22
Database Performance

If STOCK is clustered around WAREHOUSE
No of warehouses _____. There will be ____
record for each item of STOCK in each warehouse.
Therefore each warehouse would have ______
records. The records for each warehouse would
have to be stored across ______ blocks.
Access would therefore be faster if STOCK is
clustered around the product.

23
Database Performance

SQL OPTIMISATION
Select from ...
DBMS
DATA FILES
24
Query Optimisation

Automatic query optimisation can dramatically
improve query execution time
e.g. Consider the simple SQL query
select s.student_no, s.student_name,
c.course_name
from student s, course c
where s.course_id c.course_id
and s.age gt 25
This query is more optimal if the selections and
projections are performed before the join

25
Example

1000 students of which only 100 are over the age
of 25, and there are 50 courses
Alternative 1 Join first
read the 1000 students, read all courses 1000
times (once for each student), construct an
intermediate table of 1000 records (which may be
too large to fit in memory)
restrict the result to those over the age of 25
(100 rows at most)
project the result over the required attributes

26
Example

Alternative Restrict first
read 1000 tuples but restrict to those over the
age of 25, returning an intermediate table of
only 100 rows - which has a much better potential
of being storable in main memory
join the result with the course table, again
returning an intermediate table of only 100 rows
project the result over the required attributes
Obviously this version is BETTER!
Could be improved further by doing the projection
before the join.

27
Query Processing Stages

Four stages in query processing
Cast the query into internal form
normally tree based (relational algebra)
Convert to canonical form
Choose candidate low-level procedures
using indexes, clustering, etc.
Generate query plans and choose and run the
optimal query
based on cost formulas and database statistics
Rule or cost based in Oracle

28
Query Cast into Internal Form
RESULT
PROJECT over student_no,
RESTRICT where age gt 25
JOIN over course_id
S
C
29
Canonical Form

Canonical form
given a set Q of queries, and a notion of
equivalence between two queries q1 and q2 in set
Q, then there exists a subset C of Q, the set of
canonical forms for Q, if and only if every query
q in Q is equivalent to only one query c in C.
The query c is the canonical form of the query q
Uses expression transformation rules

30
Expression Transformation Rules

Examples (not complete)
(A WHERE p1) WHERE p2 A WHERE p1 and p2
(A PROJECT x,y) PROJECT y A PROJECT y
(A UNION B) PROJECT x (A PROJECT x) UNION (B
PROJECT x)
(A JOIN B) PROJECT x (A PROJECT x1) JOIN (B
PROJECT x2)
A JOIN B B JOIN A
(A JOIN B) JOIN C A JOIN (B JOIN C)
(A JOIN B) PROJECT x A PROJECT x
where x is FK from B to A

31
Choose Candidate Low-Level Procedures

How to execute the query represented by that
converted form
Take into consideration
Indexes
Other physical access paths
Distribution of data values
Clustering
Specify as a series of low-level operations
Each low level operation has a set of predefined
implementation procedures

32
Generate Query Plans/Choose Cheapest

Each query plan from stage 3 will have a cost
formula generated from the cost formula for each
low-level procedure
Oracle supports
Rule Based
Rank queries according to algebra operations
15 rules in Oracle
Cost Based
Optimal rule based query may not in fact be
optimal due to cost of operating query, e.g. join
order
Need to gather statistics

33
Database Performance

OPTIMIZING PERFORMANCE
Performance can be regarded as a
balancing act between-
access performance
update performance
ease of use/modification

34
Benchmarking

Software and systems development projects include
performance evaluation work, but sometimes not
sufficient to prevent major performance problems
Benchmarking is a useful tool which can be used
at the prototyping stage to improve performance
of the DBMS application
There are many benchmarks available

35
Database Benchmarking

A tool for comparing the performance of DBMS
summarise relative performance in a single figure
Usually measured in transactions per second (tps)
with a cost measure in terms of system cost over
5 years
Two principal uses of benchmarks
providing comparative performance measures
a source of data and queries that represent
experimental approximations to other problems

36
Data Generation Approaches

Artificial
generation of data with entirely artificial
properties
designed to investigate particular aspects of a
system, e.g. join
e.g. Wisconsin
Synthetic Workload
produce a simplified version of an application
use synthetically generated data with similar
properties to real system, e.g. a banking
application
e.g. Transaction Processing Performance Council
(TPC)

37
Wisconsin Benchmark

First systematic benchmark definition
compares particular features of DBMS rather than
a simple overall performance metric
A single-user series of tests, comprising
selections and projections with varying
selectivities on clustered, indexed non-indexed
attributes
joins of varying selectivities
aggregate functions (e.g. min, max, sum)
updates/deletions involving key/non-key attributes

38
Wisconsin Benchmark

Straightforward to implement
Scalable, e.g. parallel architectures
Useful readily understandable results
Lack of highly skewed attribute distribution
Simple join queries

39
TPC-C

Measures performance of a typical order entry
application
from initiation at a terminal until response
arrives back from server
benchmark encompasses time taken by server,
network and other system components
terminals emulated using a negative-exponential
transaction arrival distribution

40
TPC-C Schema
Taken from TPC-C Standard Specification
available at www.tpc.org
41
TPC-C

5 transactions covering
New order
A payment
Order status enquiry
A delivery
A stock level inquiry
10 terminals at each warehouse
All 5 transactions available at each terminal
produce an equal number of New-Order and Payment
transactions and to produce one Delivery
transaction, one Order-Status transaction, and
one Stock-Level transaction for every ten
New-Order transactions
Metric
number of New-Order transactions executed per
minute

42
Other TPC Benchmarks

TPC-H
Ad-hoc decision support environments
TPC-App
Application server and web services benchmark
Further info on
www.tpc.org

43
OO7

A benchmark for Object Database Systems
Examines the performance characteristics of
different types of retrieval/traversal, object
creation/deletion and updates and query processor
A number of sample implementations are provided
Based on a complex parts hierarchy
Further info
ftp.cs.wisc.edu

44
OO7 Tests

Test 1
Raw traversal speed, traversal with updates,
operations on manuals
Tests with/without full cache
Test 2
Exact matches, range searches, path lookup, scan,
make, join
Test 3
Insert/update a group of composite parts

45
BUCKY

An Object-Relational Benchmark
Objective
To test the key features that add the object to
object-relational database systems, as defined by
Stonebraker
Inheritance, Complex Objects, ADTs
Not triggers
Based on a university database schema
Exists as an object-relational and a relational
schema
can compare performance tradeoffs between using
object aspects of DBMS compared to purely
relational

46
BUCKY Schema
Taken from The BUCKY Object-Relational
Benchmark, Carey, el. al
47
BUCKY Queries

Aim is to test various object queries,
involving
1. row types with inheritance
2. inter-object references
3. set-valued attributes
4. methods of row objects
5. ADT attributes and their methods
Two BUCKY performance metrics
O-R Efficiency Index, for comparing O-R and
relational implementations
O-R Power Rating, for comparing O-R systems

48
Summary

Database application performance can be improved
by
Indexes
Clustering
De-normalisation
Query Optimisation
But it is important at the design stage to ensure
that the application is designed with optimal
performance in mind
Benchmarking is a tool which can help with this
Its a black art!

49
Further Reading

Connelly and Begg, chapter 18, 21
Also information in OODB chapter on benchmarking
Date chapter on Query Optimisation
Bitton, De Witt and Turbyfill Benchmarking
Database Systems A Systematic Approach, Proc.
9th VLDB 1983.
Carey, et. al The BUCKY Object-Relational
Benchmark,
http//www.cs.wisc.edu/naughton/bucky.html

Write a Comment

User Comments (0)