Physical Database Design presentation

About This Presentation

Transcript and Presenter's Notes

Title: Physical Database Design

1
Physical Database Design

University of California, Berkeley
School of Information
IS 257 Database Management

2
Lecture Outline

Review
Normalization
Physical Database Design
Access Methods

3
Lecture Outline

Review
Normalization
Physical Database Design
Access Methods

4
Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
5
Normalization

Normalization theory is based on the observation
that relations with certain properties are more
effective in inserting, updating and deleting
data than other sets of relations containing the
same data
Normalization is a multi-step process beginning
with an unnormalized relation
Hospital example from Atre, S. Data Base
Structured Techniques for Design, Performance,
and Management.

6
Normal Forms

First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)

7
Normalization
8
Unnormalized Relations

First step in normalization is to convert the
data into a two-dimensional table
In unnormalized relations data can repeat within
a column

9
Unnormalized Relation
10
First Normal Form

To move to First Normal Form a relation must
contain only atomic values at each row and
column.
No repeating groups
A column or set of columns is called a Candidate
Key when its values can uniquely identify the row
in the relation.

11
First Normal Form
12
Second Normal Form

A relation is said to be in Second Normal Form
when every nonkey attribute is fully functionally
dependent on the primary key.
That is, every nonkey attribute needs the full
primary key for unique identification

13
Second Normal Form
14
Second Normal Form
15
Second Normal Form
16
Third Normal Form

A relation is said to be in Third Normal Form if
there is no transitive functional dependency
between nonkey attributes
When one nonkey attribute can be determined with
one or more nonkey attributes there is said to be
a transitive functional dependency.
The side effect column in the Surgery table is
determined by the drug administered
Side effect is transitively functionally
dependent on drug so Surgery is not 3NF

17
Third Normal Form
18
Third Normal Form
19
Boyce-Codd Normal Form

Most 3NF relations are also BCNF relations.
A 3NF relation is NOT in BCNF if
Candidate keys in the relation are composite keys
(they are not single attributes)
There is more than one candidate key in the
relation, and
The keys are not disjoint, that is, some
attributes in the keys are common

20
BCNF Relations
21
Fourth Normal Form

Any relation is in Fourth Normal Form if it is
BCNF and any multivalued dependencies are trivial
Eliminate non-trivial multivalued dependencies by
projecting into simpler tables

22
Fifth Normal Form

A relation is in 5NF if every join dependency in
the relation is implied by the keys of the
relation
Implies that relations that have been decomposed
in previous NF can be recombined via natural
joins to recreate the original relation.

23
Normalization

Normalization is performed to reduce or eliminate
Insertion, Deletion or Update anomalies.
However, a completely normalized database may not
be the most efficient or effective
implementation.
Denormalization is sometimes used to improve
efficiency.

24
Denormalization

Usually driven by the need to improve query speed
Query speed is improved at the expense of more
complex or problematic DML (Data manipulation
language) for updates, deletions and insertions.

25
Downward Denormalization
26
Upward Denormalization
27
Using RDBMS to help normalize

Example database Cookie
Database of books, libraries, publisher and
holding information for a shared (union) catalog

28
Cookie relationships
29
Cookie BIBFILE relation
30
How to Normalize?

Currently no way to have multiple authors for a
given book, and there is duplicate data spread
over the BIBFILE table
Can we use the DBMS to help us normalize?
Access example

31
Lecture Outline

Review
Normalization
Using Relational DBs in normalization
Physical Database Design
Access Methods

32
Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
PhysicalDesign
33
Physical Database Design

Many physical database design decisions are
implicit in the technology adopted
Also, organizations may have standards or an
information architecture that specifies
operating systems, DBMS, and data access
languages -- thus constraining the range of
possible physical implementations.
We will be concerned with some of the possible
physical implementation issues

34
Physical Database Design

The primary goal of physical database design is
data processing efficiency
We will concentrate on choices often available to
optimize performance of database services
Physical Database Design requires information
gathered during earlier stages of the design
process

35
Physical Design Information

Information needed for physical file and database
design includes
Normalized relations plus size estimates for them
Definitions of each attribute
Descriptions of where and when data are used
entered, retrieved, deleted, updated, and how
often
Expectations and requirements for response time,
and data security, backup, recovery, retention
and integrity
Descriptions of the technologies used to
implement the database

36
Physical Design Decisions

There are several critical decisions that will
affect the integrity and performance of the
system
Storage Format
Physical record composition
Data arrangement
Indexes
Query optimization and performance tuning

37
Storage Format

Choosing the storage format of each field
(attribute). The DBMS provides some set of data
types that can be used for the physical storage
of fields in the database
Data Type (format) is chosen to minimize storage
space and maximize data integrity

38
Objectives of data type selection

Minimize storage space
Represent all possible values
Improve data integrity
Support all data manipulations
The correct data type should, in minimal space,
represent every possible value (but eliminate
illegal values) for the associated attribute and
can support the required data manipulations (e.g.
numerical or string operations)

39
Access Data Types

Numeric (1, 2, 4, 8 bytes, fixed or float)
Text (255 max)
Memo (64000 max)
Date/Time (8 bytes)
Currency (8 bytes, 15 digits 4 digits decimal)
Autonumber (4 bytes)
Yes/No (1 bit)
OLE (limited only by disk space)
Hyperlinks (up to 64000 chars)

40
Access Numeric types

Byte
Stores numbers from 0 to 255 (no fractions). 1
byte
Integer
Stores numbers from 32,768 to 32,767 (no
fractions) 2 bytes
Long Integer (Default)
Stores numbers from 2,147,483,648 to
2,147,483,647 (no fractions). 4 bytes
Single
Stores numbers from -3.402823E38 to 1.401298E45
for negative values and from 1.401298E45 to
3.402823E38 for positive values. 4 bytes
Double
Stores numbers from 1.79769313486231E308 to
4.94065645841247E324 for negative values and
from 1.79769313486231E308 to 4.94065645841247E324
for positive values. 15 8 bytes
Replication ID
Globally unique identifier (GUID) N/A 16 bytes

41
Controlling Data Integrity

Default values
Range control
Null value control
Referential integrity (next time)
Handling missing data

42
Designing Physical Records

A physical record is a group of fields stored in
adjacent memory locations and retrieved together
as a unit
Fixed Length and variable fields

43
Designing Physical/Internal Model

Overview
terminology
Access methods

44
Physical Design

Internal Model/Physical Model

User request
Interface 1
Interface 2
Operating System Access Methods
Interface 3
Data Base
45
Physical Design

Interface 1 User request to the DBMS. The user
presents a query, the DBMS determines which
physical DBs are needed to resolve the query
Interface 2 The DBMS uses an internal model
access method to access the data stored in a
logical database.
Interface 3 The internal model access methods
and OS access methods access the physical
records of the database.

46
Physical File Design

A Physical file is a portion of secondary storage
(disk space) allocated for the purpose of storing
physical records
Pointers - a field of data that can be used to
locate a related field or record of data
Access Methods - An operating system algorithm
for storing and locating data in secondary
storage
Pages - The amount of data read or written in one
disk input or output operation

47
Internal Model Access Methods

Many types of access methods
Physical Sequential
Indexed Sequential
Indexed Random
Inverted
Direct
Hashed
Differences in
Access Efficiency
Storage Efficiency

48
Physical Sequential

Key values of the physical records are in logical
sequence
Main use is for dump and restore
Access method may be used for storage as well as
retrieval
Storage Efficiency is near 100
Access Efficiency is poor (unless fixed size
physical records)

49
Indexed Sequential

Key values of the physical records are in logical
sequence
Access method may be used for storage and
retrieval
Index of key values is maintained with entries
for the highest key values per block(s)
Access Efficiency depends on the levels of index,
storage allocated for index, number of database
records, and amount of overflow
Storage Efficiency depends on size of index and
volatility of database

50
Index Sequential
51
Indexed Sequential Two Levels
52
Indexed Random

Key values of the physical records are not
necessarily in logical sequence
Index may be stored and accessed with Indexed
Sequential Access Method
Index has an entry for every data base record.
These are in ascending order. The index keys are
in logical sequence. Database records are not
necessarily in ascending sequence.
Access method may be used for storage and
retrieval

53
Indexed Random
54
Btree
55
Inverted

Key values of the physical records are not
necessarily in logical sequence
Access Method is better used for retrieval
An index for every field to be inverted may be
built
Access efficiency depends on number of database
records, levels of index, and storage allocated
for index

56
Inverted
57
Direct

Key values of the physical records are not
necessarily in logical sequence
There is a one-to-one correspondence between a
record key and the physical address of the record
May be used for storage and retrieval
Access efficiency always 1
Storage efficiency depends on density of keys
No duplicate keys permitted

58
Hashing

Key values of the physical records are not
necessarily in logical sequence
Many key values may share the same physical
address (block)
May be used for storage and retrieval
Access efficiency depends on distribution of
keys, algorithm for key transformation and space
allocated
Storage efficiency depends on distibution of keys
and algorithm used for key transformation

59
Comparative Access Methods
Indexed No wasted space for data but extra space
for index Moderately Fast Moderately Fast Very
fast with multiple indexes OK if dynamic OK if
dynamic Easy but requires Maintenance of indexes
60
Database Creation in Access

Simplest to use a design view
wizards are available, but less flexible
Need to watch the default values
Helps to know what the primary key is, or if one
is to be created automatically
Automatic creation is more complex in other RDBMS
and ORDBMS
Need to make decision about the physical storage
of the data

61
Database Creation in Access

Some Simple Examples

62
Next Time

Indexes and when to index
Integrity Constraints
Referential Integrity

Write a Comment

User Comments (0)

About PowerShow.com

Physical Database Design PowerPoint PPT Presentation