Database Performance - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Database Performance

Description:

For example:- male/female. avoid indexing a column with too many distinct values ... Expression Transformation Rules. Examples (not complete) ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 50
Provided by: sues5
Category:

less

Transcript and Presenter's Notes

Title: Database Performance


1
Database Performance
2
Contents
  • Database Performance
  • Denormalisation
  • Indexes
  • Clustering
  • Query Optimisation
  • Benchmarking
  • Wisconsin, TPC-C, 007, Bucky
  • Summary

3
Database Performance
  • Query performance is necessary to achieve
    acceptable performance of a RDBMS
  • Various ways in which this can be achieved
  • De-normalisation of data to reduce joins
  • Creating indexes on frequently retrieved
    attributes
  • Clustering tables to reduce the number of disk
    reads
  • Automatic optimisation of queries

4
Normalisation
  • Normalisation improves the logical database
    design and prevents anomalies BUT
  • More tables more joins
  • Joining gt 3 tables is likely to be slow
  • De-normalisation reverses normalisation for
    efficiency purposes

5
Database Performance
  • Example
  • Branch(BranchNo, street, city, postcode,
    mgrstaffno)
  • Could also be
  • Branch(BranchNo, street, postcode, mgrstaffno)
  • Postcode(Postcode,city)

6
De-normalisation
  • Advantages
  • Minimises need for joins
  • Reduces number of foreign keys in relations
  • Reduces number of indexes
  • Saves storage space
  • Reduces number of relations

7
De-normalisation
  • Disadvantages
  • Speed up retrievals, but may slow down updates
  • Increases application complexity
  • Relation size can increase
  • Sacrifices flexibility

8
Indexes
  • INDEXES
  • An index is a table or some other data structure
    that is used to determine the location of a row
    within a table that satisfies some condition.
  • Indexes may be defined on both primary and non
    key attributes.

9
Indexes
  • Oracle allows faster access on any named table by
    using an index.
  • each row within a table is given a unique value
    or rowid.
  • each rowid can be held in an index.
  • an index can be created at any time.
  • any column within a table can be indexed.

10
When to create an Index?
  • Before any input of data for Unique index
  • After data input for Non-unique index

11
Creating Indexes
  • HOW DO YOU CREATE AN INDEX ?
  • EXAMPLE -
  • (a) CREATE INDEX TENIDX ON
  • TENANT(SURNAME)
  • (b) CREATE UNIQUE INDEX TENIDX ON
  • TENANT(SURNAME)

12
Index Guidelines
  • GUIDELINES FOR USE OF INDEXES
  • gt 200 rows in a table
  • a column is frequently used in a where clause
  • specific columns are frequently used as join
    columns

13
Indexes
  • POINTS TO WATCH
  • avoid if possible gt 3 indexes on any one table
  • avoid indexing a column with too few distinct
    values
  • For example- male/female
  • avoid indexing a column with too many distinct
    values
  • avoid if gt 15 of rows will be retrieved

14
Clusters
  • A disk is arranged in blocks
  • Blocks are retrieved as a whole and buffered
  • Disk Access time is slow compared with Memory
    access
  • Gains can be made if the number of block
    transfers can be reduced

15
Database Performance
  • CLUSTERING
  • clusters physically arrange the data on disk so
    that frequently retrieved info is stored together
  • allows 2 or more tables to be stored in the same
    physical block
  • can greatly reduce access time for join
    operations
  • can also reduce storage space requirements

16
Database Performance
  • CLUSTER DEFINITION
  • clustering is transparent to the user
  • no queries have to be modified
  • no applications need to be changed
  • tables are queried in the same way whether
    clustered or not

17
Database Performance
  • DECIDING WHERE TO USE CLUSTERS
  • Each table can only reside in 1 cluster
  • At least one attribute in the cluster must be NOT
    NULL
  • Consider the query transactions in the system
  • How often is the query submitted?
  • How time critical is the query?
  • Whats the amount of data retrieved?

18
Clustering Tables
Staff Table
Branch Table
Tables Clustered on Common Column
19
Database Performance
  • CLUSTERING EXERCISE

WAREHOUSE
STOCK
3
PRODUCT
1000
20
Database Performance
  • To speed up access time to data in these three
    tables (WAREHOUSE, PRODUCT, STOCK) it is
    necessary to cluster either STOCK around
    WAREHOUSE, or STOCK around PRODUCT.
  • How do we decide which will be the most
    efficient?
  • For the purpose of this exercise we will assume
    that each block can hold 100 records.

21
Database Performance
  • If STOCK is clustered around PRODUCT
  • No of products 1000. There will be 1 record
    for each PRODUCT in each WAREHOUSE. Therefore
    each product would have 3 records
  • Each block would contain 100/3 products, i.e. 33
    products. There would therefore be a 1 in 3
    chance of accessing a particular stock item by
    reading one block of data.

22
Database Performance
  • If STOCK is clustered around WAREHOUSE
  • No of warehouses _____. There will be ____
    record for each item of STOCK in each warehouse.
    Therefore each warehouse would have ______
    records. The records for each warehouse would
    have to be stored across ______ blocks.
  • Access would therefore be faster if STOCK is
    clustered around the product.

23
Database Performance

SQL OPTIMISATION
Select from ...
DBMS
DATA FILES
24
Query Optimisation
  • Automatic query optimisation can dramatically
    improve query execution time
  • e.g. Consider the simple SQL query
  • select s.student_no, s.student_name,
    c.course_name
  • from student s, course c
  • where s.course_id c.course_id
  • and s.age gt 25
  • This query is more optimal if the selections and
    projections are performed before the join

25
Example
  • 1000 students of which only 100 are over the age
    of 25, and there are 50 courses
  • Alternative 1 Join first
  • read the 1000 students, read all courses 1000
    times (once for each student), construct an
    intermediate table of 1000 records (which may be
    too large to fit in memory)
  • restrict the result to those over the age of 25
    (100 rows at most)
  • project the result over the required attributes

26
Example
  • Alternative Restrict first
  • read 1000 tuples but restrict to those over the
    age of 25, returning an intermediate table of
    only 100 rows - which has a much better potential
    of being storable in main memory
  • join the result with the course table, again
    returning an intermediate table of only 100 rows
  • project the result over the required attributes
  • Obviously this version is BETTER!
  • Could be improved further by doing the projection
    before the join.

27
Query Processing Stages
  • Four stages in query processing
  • Cast the query into internal form
  • normally tree based (relational algebra)
  • Convert to canonical form
  • Choose candidate low-level procedures
  • using indexes, clustering, etc.
  • Generate query plans and choose and run the
    optimal query
  • based on cost formulas and database statistics
  • Rule or cost based in Oracle

28
Query Cast into Internal Form
RESULT
PROJECT over student_no,
RESTRICT where age gt 25
JOIN over course_id
S
C
29
Canonical Form
  • Canonical form
  • given a set Q of queries, and a notion of
    equivalence between two queries q1 and q2 in set
    Q, then there exists a subset C of Q, the set of
    canonical forms for Q, if and only if every query
    q in Q is equivalent to only one query c in C.
  • The query c is the canonical form of the query q
  • Uses expression transformation rules

30
Expression Transformation Rules
  • Examples (not complete)
  • (A WHERE p1) WHERE p2 A WHERE p1 and p2
  • (A PROJECT x,y) PROJECT y A PROJECT y
  • (A UNION B) PROJECT x (A PROJECT x) UNION (B
    PROJECT x)
  • (A JOIN B) PROJECT x (A PROJECT x1) JOIN (B
    PROJECT x2)
  • A JOIN B B JOIN A
  • (A JOIN B) JOIN C A JOIN (B JOIN C)
  • (A JOIN B) PROJECT x A PROJECT x
  • where x is FK from B to A

31
Choose Candidate Low-Level Procedures
  • How to execute the query represented by that
    converted form
  • Take into consideration
  • Indexes
  • Other physical access paths
  • Distribution of data values
  • Clustering
  • Specify as a series of low-level operations
  • Each low level operation has a set of predefined
    implementation procedures

32
Generate Query Plans/Choose Cheapest
  • Each query plan from stage 3 will have a cost
    formula generated from the cost formula for each
    low-level procedure
  • Oracle supports
  • Rule Based
  • Rank queries according to algebra operations
  • 15 rules in Oracle
  • Cost Based
  • Optimal rule based query may not in fact be
    optimal due to cost of operating query, e.g. join
    order
  • Need to gather statistics

33
Database Performance
  • OPTIMIZING PERFORMANCE
  • Performance can be regarded as a
  • balancing act between-
  • access performance
  • update performance
  • ease of use/modification

34
Benchmarking
  • Software and systems development projects include
    performance evaluation work, but sometimes not
    sufficient to prevent major performance problems
  • Benchmarking is a useful tool which can be used
    at the prototyping stage to improve performance
    of the DBMS application
  • There are many benchmarks available

35
Database Benchmarking
  • A tool for comparing the performance of DBMS
  • summarise relative performance in a single figure
  • Usually measured in transactions per second (tps)
  • with a cost measure in terms of system cost over
    5 years
  • Two principal uses of benchmarks
  • providing comparative performance measures
  • a source of data and queries that represent
    experimental approximations to other problems

36
Data Generation Approaches
  • Artificial
  • generation of data with entirely artificial
    properties
  • designed to investigate particular aspects of a
    system, e.g. join
  • e.g. Wisconsin
  • Synthetic Workload
  • produce a simplified version of an application
  • use synthetically generated data with similar
    properties to real system, e.g. a banking
    application
  • e.g. Transaction Processing Performance Council
    (TPC)

37
Wisconsin Benchmark
  • First systematic benchmark definition
  • compares particular features of DBMS rather than
    a simple overall performance metric
  • A single-user series of tests, comprising
  • selections and projections with varying
    selectivities on clustered, indexed non-indexed
    attributes
  • joins of varying selectivities
  • aggregate functions (e.g. min, max, sum)
  • updates/deletions involving key/non-key attributes

38
Wisconsin Benchmark
  • Straightforward to implement
  • Scalable, e.g. parallel architectures
  • Useful readily understandable results
  • Lack of highly skewed attribute distribution
  • Simple join queries

39
TPC-C
  • Measures performance of a typical order entry
    application
  • from initiation at a terminal until response
    arrives back from server
  • benchmark encompasses time taken by server,
    network and other system components
  • terminals emulated using a negative-exponential
    transaction arrival distribution

40
TPC-C Schema
Taken from TPC-C Standard Specification
available at www.tpc.org
41
TPC-C
  • 5 transactions covering
  • New order
  • A payment
  • Order status enquiry
  • A delivery
  • A stock level inquiry
  • 10 terminals at each warehouse
  • All 5 transactions available at each terminal
  • produce an equal number of New-Order and Payment
    transactions and to produce one Delivery
    transaction, one Order-Status transaction, and
    one Stock-Level transaction for every ten
    New-Order transactions
  • Metric
  • number of New-Order transactions executed per
    minute

42
Other TPC Benchmarks
  • TPC-H
  • Ad-hoc decision support environments
  • TPC-App
  • Application server and web services benchmark
  • Further info on
  • www.tpc.org

43
OO7
  • A benchmark for Object Database Systems
  • Examines the performance characteristics of
    different types of retrieval/traversal, object
    creation/deletion and updates and query processor
  • A number of sample implementations are provided
  • Based on a complex parts hierarchy
  • Further info
  • ftp.cs.wisc.edu

44
OO7 Tests
  • Test 1
  • Raw traversal speed, traversal with updates,
    operations on manuals
  • Tests with/without full cache
  • Test 2
  • Exact matches, range searches, path lookup, scan,
    make, join
  • Test 3
  • Insert/update a group of composite parts

45
BUCKY
  • An Object-Relational Benchmark
  • Objective
  • To test the key features that add the object to
    object-relational database systems, as defined by
    Stonebraker
  • Inheritance, Complex Objects, ADTs
  • Not triggers
  • Based on a university database schema
  • Exists as an object-relational and a relational
    schema
  • can compare performance tradeoffs between using
    object aspects of DBMS compared to purely
    relational

46
BUCKY Schema
Taken from The BUCKY Object-Relational
Benchmark, Carey, el. al
47
BUCKY Queries
  • Aim is to test various object queries,
    involving
  • 1. row types with inheritance
  • 2. inter-object references
  • 3. set-valued attributes
  • 4. methods of row objects
  • 5. ADT attributes and their methods
  • Two BUCKY performance metrics
  • O-R Efficiency Index, for comparing O-R and
    relational implementations
  • O-R Power Rating, for comparing O-R systems

48
Summary
  • Database application performance can be improved
    by
  • Indexes
  • Clustering
  • De-normalisation
  • Query Optimisation
  • But it is important at the design stage to ensure
    that the application is designed with optimal
    performance in mind
  • Benchmarking is a tool which can help with this
  • Its a black art!

49
Further Reading
  • Connelly and Begg, chapter 18, 21
  • Also information in OODB chapter on benchmarking
  • Date chapter on Query Optimisation
  • Bitton, De Witt and Turbyfill Benchmarking
    Database Systems A Systematic Approach, Proc.
    9th VLDB 1983.
  • Carey, et. al The BUCKY Object-Relational
    Benchmark,
  • http//www.cs.wisc.edu/naughton/bucky.html
Write a Comment
User Comments (0)
About PowerShow.com