Physical Database Design - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Physical Database Design

Description:

Normalized relations plus size estimates for them. Definitions of each attribute ... Storage Efficiency depends on size of index and volatility of database. IS ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 63
Provided by: ValuedGate1
Category:

less

Transcript and Presenter's Notes

Title: Physical Database Design


1
Physical Database Design
  • University of California, Berkeley
  • School of Information
  • IS 257 Database Management

2
Lecture Outline
  • Review
  • Normalization
  • Physical Database Design
  • Access Methods

3
Lecture Outline
  • Review
  • Normalization
  • Physical Database Design
  • Access Methods

4
Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
5
Normalization
  • Normalization theory is based on the observation
    that relations with certain properties are more
    effective in inserting, updating and deleting
    data than other sets of relations containing the
    same data
  • Normalization is a multi-step process beginning
    with an unnormalized relation
  • Hospital example from Atre, S. Data Base
    Structured Techniques for Design, Performance,
    and Management.

6
Normal Forms
  • First Normal Form (1NF)
  • Second Normal Form (2NF)
  • Third Normal Form (3NF)
  • Boyce-Codd Normal Form (BCNF)
  • Fourth Normal Form (4NF)
  • Fifth Normal Form (5NF)

7
Normalization
8
Unnormalized Relations
  • First step in normalization is to convert the
    data into a two-dimensional table
  • In unnormalized relations data can repeat within
    a column

9
Unnormalized Relation
10
First Normal Form
  • To move to First Normal Form a relation must
    contain only atomic values at each row and
    column.
  • No repeating groups
  • A column or set of columns is called a Candidate
    Key when its values can uniquely identify the row
    in the relation.

11
First Normal Form
12
Second Normal Form
  • A relation is said to be in Second Normal Form
    when every nonkey attribute is fully functionally
    dependent on the primary key.
  • That is, every nonkey attribute needs the full
    primary key for unique identification

13
Second Normal Form
14
Second Normal Form
15
Second Normal Form
16
Third Normal Form
  • A relation is said to be in Third Normal Form if
    there is no transitive functional dependency
    between nonkey attributes
  • When one nonkey attribute can be determined with
    one or more nonkey attributes there is said to be
    a transitive functional dependency.
  • The side effect column in the Surgery table is
    determined by the drug administered
  • Side effect is transitively functionally
    dependent on drug so Surgery is not 3NF

17
Third Normal Form
18
Third Normal Form
19
Boyce-Codd Normal Form
  • Most 3NF relations are also BCNF relations.
  • A 3NF relation is NOT in BCNF if
  • Candidate keys in the relation are composite keys
    (they are not single attributes)
  • There is more than one candidate key in the
    relation, and
  • The keys are not disjoint, that is, some
    attributes in the keys are common

20
BCNF Relations
21
Fourth Normal Form
  • Any relation is in Fourth Normal Form if it is
    BCNF and any multivalued dependencies are trivial
  • Eliminate non-trivial multivalued dependencies by
    projecting into simpler tables

22
Fifth Normal Form
  • A relation is in 5NF if every join dependency in
    the relation is implied by the keys of the
    relation
  • Implies that relations that have been decomposed
    in previous NF can be recombined via natural
    joins to recreate the original relation.

23
Normalization
  • Normalization is performed to reduce or eliminate
    Insertion, Deletion or Update anomalies.
  • However, a completely normalized database may not
    be the most efficient or effective
    implementation.
  • Denormalization is sometimes used to improve
    efficiency.

24
Denormalization
  • Usually driven by the need to improve query speed
  • Query speed is improved at the expense of more
    complex or problematic DML (Data manipulation
    language) for updates, deletions and insertions.

25
Downward Denormalization
26
Upward Denormalization
27
Using RDBMS to help normalize
  • Example database Cookie
  • Database of books, libraries, publisher and
    holding information for a shared (union) catalog

28
Cookie relationships
29
Cookie BIBFILE relation
30
How to Normalize?
  • Currently no way to have multiple authors for a
    given book, and there is duplicate data spread
    over the BIBFILE table
  • Can we use the DBMS to help us normalize?
  • Access example

31
Lecture Outline
  • Review
  • Normalization
  • Using Relational DBs in normalization
  • Physical Database Design
  • Access Methods

32
Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
PhysicalDesign
33
Physical Database Design
  • Many physical database design decisions are
    implicit in the technology adopted
  • Also, organizations may have standards or an
    information architecture that specifies
    operating systems, DBMS, and data access
    languages -- thus constraining the range of
    possible physical implementations.
  • We will be concerned with some of the possible
    physical implementation issues

34
Physical Database Design
  • The primary goal of physical database design is
    data processing efficiency
  • We will concentrate on choices often available to
    optimize performance of database services
  • Physical Database Design requires information
    gathered during earlier stages of the design
    process

35
Physical Design Information
  • Information needed for physical file and database
    design includes
  • Normalized relations plus size estimates for them
  • Definitions of each attribute
  • Descriptions of where and when data are used
  • entered, retrieved, deleted, updated, and how
    often
  • Expectations and requirements for response time,
    and data security, backup, recovery, retention
    and integrity
  • Descriptions of the technologies used to
    implement the database

36
Physical Design Decisions
  • There are several critical decisions that will
    affect the integrity and performance of the
    system
  • Storage Format
  • Physical record composition
  • Data arrangement
  • Indexes
  • Query optimization and performance tuning

37
Storage Format
  • Choosing the storage format of each field
    (attribute). The DBMS provides some set of data
    types that can be used for the physical storage
    of fields in the database
  • Data Type (format) is chosen to minimize storage
    space and maximize data integrity

38
Objectives of data type selection
  • Minimize storage space
  • Represent all possible values
  • Improve data integrity
  • Support all data manipulations
  • The correct data type should, in minimal space,
    represent every possible value (but eliminate
    illegal values) for the associated attribute and
    can support the required data manipulations (e.g.
    numerical or string operations)

39
Access Data Types
  • Numeric (1, 2, 4, 8 bytes, fixed or float)
  • Text (255 max)
  • Memo (64000 max)
  • Date/Time (8 bytes)
  • Currency (8 bytes, 15 digits 4 digits decimal)
  • Autonumber (4 bytes)
  • Yes/No (1 bit)
  • OLE (limited only by disk space)
  • Hyperlinks (up to 64000 chars)

40
Access Numeric types
  • Byte
  • Stores numbers from 0 to 255 (no fractions). 1
    byte
  • Integer
  • Stores numbers from 32,768 to 32,767 (no
    fractions) 2 bytes
  • Long Integer (Default)
  • Stores numbers from 2,147,483,648 to
    2,147,483,647 (no fractions). 4 bytes
  • Single
  • Stores numbers from -3.402823E38 to 1.401298E45
    for negative values and from 1.401298E45 to
    3.402823E38 for positive values. 4 bytes
  • Double
  • Stores numbers from 1.79769313486231E308 to
    4.94065645841247E324 for negative values and
    from 1.79769313486231E308 to 4.94065645841247E324
    for positive values. 15 8 bytes
  • Replication ID
  • Globally unique identifier (GUID) N/A 16 bytes

41
Controlling Data Integrity
  • Default values
  • Range control
  • Null value control
  • Referential integrity (next time)
  • Handling missing data

42
Designing Physical Records
  • A physical record is a group of fields stored in
    adjacent memory locations and retrieved together
    as a unit
  • Fixed Length and variable fields

43
Designing Physical/Internal Model
  • Overview
  • terminology
  • Access methods

44
Physical Design
  • Internal Model/Physical Model

User request
Interface 1
Interface 2
Operating System Access Methods
Interface 3
Data Base
45
Physical Design
  • Interface 1 User request to the DBMS. The user
    presents a query, the DBMS determines which
    physical DBs are needed to resolve the query
  • Interface 2 The DBMS uses an internal model
    access method to access the data stored in a
    logical database.
  • Interface 3 The internal model access methods
    and OS access methods access the physical
    records of the database.

46
Physical File Design
  • A Physical file is a portion of secondary storage
    (disk space) allocated for the purpose of storing
    physical records
  • Pointers - a field of data that can be used to
    locate a related field or record of data
  • Access Methods - An operating system algorithm
    for storing and locating data in secondary
    storage
  • Pages - The amount of data read or written in one
    disk input or output operation

47
Internal Model Access Methods
  • Many types of access methods
  • Physical Sequential
  • Indexed Sequential
  • Indexed Random
  • Inverted
  • Direct
  • Hashed
  • Differences in
  • Access Efficiency
  • Storage Efficiency

48
Physical Sequential
  • Key values of the physical records are in logical
    sequence
  • Main use is for dump and restore
  • Access method may be used for storage as well as
    retrieval
  • Storage Efficiency is near 100
  • Access Efficiency is poor (unless fixed size
    physical records)

49
Indexed Sequential
  • Key values of the physical records are in logical
    sequence
  • Access method may be used for storage and
    retrieval
  • Index of key values is maintained with entries
    for the highest key values per block(s)
  • Access Efficiency depends on the levels of index,
    storage allocated for index, number of database
    records, and amount of overflow
  • Storage Efficiency depends on size of index and
    volatility of database

50
Index Sequential
51
Indexed Sequential Two Levels
52
Indexed Random
  • Key values of the physical records are not
    necessarily in logical sequence
  • Index may be stored and accessed with Indexed
    Sequential Access Method
  • Index has an entry for every data base record.
    These are in ascending order. The index keys are
    in logical sequence. Database records are not
    necessarily in ascending sequence.
  • Access method may be used for storage and
    retrieval

53
Indexed Random
54
Btree
55
Inverted
  • Key values of the physical records are not
    necessarily in logical sequence
  • Access Method is better used for retrieval
  • An index for every field to be inverted may be
    built
  • Access efficiency depends on number of database
    records, levels of index, and storage allocated
    for index

56
Inverted
57
Direct
  • Key values of the physical records are not
    necessarily in logical sequence
  • There is a one-to-one correspondence between a
    record key and the physical address of the record
  • May be used for storage and retrieval
  • Access efficiency always 1
  • Storage efficiency depends on density of keys
  • No duplicate keys permitted

58
Hashing
  • Key values of the physical records are not
    necessarily in logical sequence
  • Many key values may share the same physical
    address (block)
  • May be used for storage and retrieval
  • Access efficiency depends on distribution of
    keys, algorithm for key transformation and space
    allocated
  • Storage efficiency depends on distibution of keys
    and algorithm used for key transformation

59
Comparative Access Methods
Indexed No wasted space for data but extra space
for index Moderately Fast Moderately Fast Very
fast with multiple indexes OK if dynamic OK if
dynamic Easy but requires Maintenance of indexes
60
Database Creation in Access
  • Simplest to use a design view
  • wizards are available, but less flexible
  • Need to watch the default values
  • Helps to know what the primary key is, or if one
    is to be created automatically
  • Automatic creation is more complex in other RDBMS
    and ORDBMS
  • Need to make decision about the physical storage
    of the data

61
Database Creation in Access
  • Some Simple Examples

62
Next Time
  • Indexes and when to index
  • Integrity Constraints
  • Referential Integrity
Write a Comment
User Comments (0)
About PowerShow.com