Physical Database Design - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Physical Database Design

Description:

The purpose of physical database design is to translate the logical description ... Consider contriving a shorter field or selecting another candidate key to ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 46
Provided by: wweymanwh
Category:

less

Transcript and Presenter's Notes

Title: Physical Database Design


1
Physical Database Design
  • Chapter 7
  • MIS 2403
  • Dr. Segall
  • Fall 2001

2
Introduction
  • The purpose of physical database design is to
    translate the logical description of data into
    the technical specifications for storing and
    retrieving data
  • The goal is to create a design for storing data
    that will provide adequate performance, and
    insure database integrity, security, and
    recoverability

3
Inputs to Physical Design
  • Normalized relations
  • Attribute definitions
  • Estimations of data processing volume
  • Descriptions of where and when data are entered,
    retrieved, deleted, and updated
  • Response time expectations/requirements
  • Requirements for data security, backup, recovery,
    retention, and integrity
  • Characteristics of the DBMS to be used

4
Data Volume andUsage Analysis
  • Often the first step in the physical design
    process
  • Relative numbers are what are most important(to
    identify areas requiring attention to
    performance)
  • Figures may be recorded and displayed using an
    annotated entity-relationship diagram (Fig. 7-1)
  • Data volumes reflect number of records in tables
  • Access frequencies reflect number of table record
    accesses per unit of time
  • Note what attributes are used in table
    accesses(to aid design of table indexes)
  • Note of creations/retrievals/updates/deletions
    (to aid selection of file organization technique)

5
Physical Design Decisions
  • Specify the data type for each attribute from the
    logical data model
  • Specify physical records by grouping attributes
    from the logical data model
  • Specify the file organization technique to use
    for physical storage of data records
  • Specify indexes to optimize data retrieval
  • Specify query optimization strategies

6
Designing Fields
  • Choosing the field data type
  • Select from available types such as text, memo,
    number, date/time, currency, etc.
  • Seek to
  • Minimize storage space
  • e.g., Integer vs. Floating Point
  • Represent all possible values
  • e.g., Floating Point vs. Integer
  • Improve data integrity (more on next slide)
  • e.g., Yes/No
  • Support all data manipulations
  • e.g., Date/Time

7
Designing Fields
  • Controlling data integrity
  • Default value
  • e.g., value FL for State field
  • Range control
  • e.g., value lt100 for Test_Score field
  • Null value control
  • e.g., prohibit leaving Date_of_Birth field blank
  • Referential integrity
  • e.g., restrict valid values for Part_No field in
    Order table to the contents of this field in the
    Part table

8
Designing Fields
  • Coding
  • e.g., J(onesboro), B(eebe), M(outain Home) for
    the ASU branch campuses
  • implement by creating a look-up table
  • there is a trade-off in that you must create and
    store a second table and you must access this
    table to look up the code value
  • consider using when a field has a limited number
    of possible values, each of which occupies a
    relatively large amount of space, and the number
    of records is large and/or the number of record
    accesses is small.

(see next slide also)
9
Example Code Look-up Table (Figure 7-2)
10
Designing Fields
  • Fixed-Length Fields
  • Make it easy to locate a specific record in a
    file and/or a specific field in that record
  • Each field has its maximum length specified and
    unused space in any given field is padded with
    spaces (text) or leading zeros (numeric)
  • Variable-Length Fields
  • When the need arises for a variable-length field
    (e.g., a memo field), this field can be stored
    separate from the rest of the record with a
    pointer used to locate it when needed

11
Physical Records
  • Physical Record A group of fields stored in
    adjacent memory locations and retrieved together
    as a unit.
  • Page The amount of data read or written in one
    secondary memory (disk) input or output
    operation.
  • Blocking Factor The number of physical records
    per page.

12
Database Access Model
The goal in structuring physical records is to
minimize performance bottlenecks resulting from
disk accesses (accessing data from disk is slow
compared to main memory)
13
Optimization Decisions
  • Denormalization
  • Partitioning
  • Selection of File Organization
  • Clustering Files
  • Creation of Indexes
  • RAID
  • Other

14
Denormalization
  • The process of transforming normalized relations
    into unnormalized physical record specifications
    for the purpose of improv-ing overall database
    performance.

15
Denormalization
  • Involves a trade-off
  • Reduced disk accesses and greater performance
    (due, for example, to fewer table joins)
  • - But -
  • Introduction of anomalies (and thus redundancies)
    that will necessitate extra data maintenance

16
Partitioning
  • Horizontal Partitioning Distributing the rows
    of a table into two or more separate files
  • e.g., Customer table is partitioned into four
    separate files, one for each geographical region
  • Vertical Partitioning Distributing the columns
    of a table into two or more separate files
  • e.g., Employee table is partitioned into public
    file (name, office, extension, etc.) and private
    file (salary, health history, etc.)
  • Note the primary key is repeated in each file

17
Partitioning
  • Advantages of Partitioning
  • Records used together are grouped together
  • Each partition can be optimized for performance
  • Security and recovery
  • Partitions stored on different disks less
    contention
  • Parallel processing capability
  • Disadvantages of Partitioning
  • Slower retrievals when across partitions
  • Complexity for application programmers
  • Anomalies and extra storage space requirements
    due to duplication of data across partitions

18
Physical Files
  • Physical File A file as stored on disk
  • Constructs to link two pieces of data
  • Sequential storage
  • Pointers
  • File Organization How the files are arranged on
    the disk (more on this later)
  • Access Method How the data can be retrieved
    based on the file organization
  • Relative - data accessed as an offset from the
    most recently referenced point in secondary
    memory
  • Direct - data accessed as a result of a
    calculation to generate the beginning address of
    a record

19
File Organizations
  • A technique for physically arranging the records
    of a file on secondary storage devices.
  • Goals in selecting (trade-offs exist, of
    course)
  • Fast data retrieval
  • High throughput for input and maintenance
  • Efficient use of storage space
  • Protection from failures or data loss
  • Minimal need for reorganization
  • Accommodation for growth
  • Security from unauthorized use

20
File Organizations
  • Sequential
  • Indexed
  • Indexed Sequential
  • Indexed Nonsequential
  • Hashed (also called Direct)
  • See Table 7-3 for comparison

21
Sequential File Organization
  • Records of the file are stored in sequence by the
    primary key field values

22
Comparisons of file organizations (a) Sequential
23
Sequential Retrieval
  • Consider a file of 10,000 records each occupying
    1 page
  • Queries that require processing all records will
    require 10,000 accesses
  • e.g., Find all items of type 'E'
  • Many disk accesses are wasted if few records meet
    the condition
  • However, very effective if most or all records
    will be accessed (e.g., payroll)

24
Indexed File Organization
  • Index concept is like index in a book
  • Indexed-sequential file organization The
    records are stored sequentially by primary key
    values and there is an index built on the primary
    key field (and possibly indexes built on other
    fields, also)

25
(b) Indexed
26
Hashed File Organization
  • Hashing Algorithm Converts a primary key value
    into a record address
  • Division-remainder method is common hashing
    algorithm(more to come on this)

27
(c ) Hashed
28
Hashing
  • A technique for reducing disk accesses for direct
    access
  • Avoids an index
  • Number of accesses per record can be close to one
  • The hash field is converted to a hash address by
    a hash function

29
Hashing
30
Shortcomings of Hashing
  • Different hash fields may convert to the same
    hash address
  • these are called Synonyms
  • store the colliding record in an overflow area
  • Long synonym chains degrade performance
  • There can be only one hash field per record
  • The file can no longer be processed sequentially

31
Clustering
  • In some relational DBMS, related records from
    different tables that are often retrieved
    together can be stored close together on disk
  • Since the related records are stored close to one
    another on the physical disk, less time is needed
    to retrieve the data
  • E.g., Customer data and Order data may frequently
    be retrieved together
  • Can require substantial maintenance if the
    clustered data changes frequently

32
Indexing
  • An index is a table file that is used to
    determine the location of rows in another file
    that satisfy some condition

33
Querying with an Index
  • Read the index into memory
  • Search the index to find records meeting the
    condition
  • Access only those records containing required
    data
  • Disk accesses are substantially reduced when the
    query involves few records

34
Maintaining an Index
  • Adding a record requires at least two disk
    accesses
  • Update the file
  • Update the index
  • Trade-off
  • Faster queries
  • Slower maintenance (additions, deletions, and
    updates of records)
  • Thus, more static databases benefit more overall

35
Rules of Thumbfor Using Indexes
  • 1. Indexes are most useful on larger tables
  • 2. Index the primary key of each table(may be
    automatic, as in Access)
  • 3. Indexes are useful on search fields (WHERE)
  • 4. Indexes are also useful on fields used for
    sorting (ORDER BY) and categorizing (GROUP BY)
  • 5. Most useful to index on a field when there are
    many different values for that field

36
Rules of Thumbfor Using Indexes
  • 6. Find out the limits placed on indexing by your
    DBMS (Access allows 32 indexes per table, and no
    index may contain more than 10 fields)
  • 7. Depending on the DBMS, null values may not be
    referenced from an index (thus, rows with a null
    value in the field that is indexed may not be
    found by a search using the index)

37
RAID
  • Redundant Arrays of Inexpensive Disks
  • Exploits economies of scale of disk manufacturing
    for the computer market
  • Can give greater security
  • Increases fault tolerance of systems
  • Not a replacement for regular backup

38
RAID
  • The operating system sees a set of physical
    drives as one logical drive
  • Data are distributed across physical drives
  • All levels, except 0, have data redundancy or
    error-correction features
  • Parity codes or redundant data are used for data
    recovery

39
Mirroring
  • Write
  • Identical copies of file are written to each
    drive in array
  • Read
  • Alternate pages are read simultaneously from each
    drive
  • Pages put together in memory
  • Access time is reduced by approximately the
    number of disks in the array
  • Read error
  • Read required page from another drive
  • Tradeoffs
  • Provides data security
  • Reduces access time
  • Uses more disk space

40
Mirroring
Complete Data Set
Complete Data Set
No parity
41
Striping
  • Three drive model
  • Write
  • Half of file to first drive
  • Half of file to second drive
  • Parity bit to third drive
  • Read
  • Portions from each drive are put together in
    memory
  • Read error
  • Lost bits are reconstructed from third drives
    parity data
  • Tradeoffs
  • Provides data security
  • Uses less storage space than mirroring
  • Not as fast as mirroring

42
Striping
One-Half Data Set
One-Half Data Set
Parity Codes
43
One Other Rule of Thumbfor Increasing Performance
  • Consider contriving a shorter field or selecting
    another candidate key to substitute for a long,
    multi-field primary key (and all associated
    foreign keys)

44
Database Architectures
  • See pages 282-285
  • Hierarchical
  • Network
  • Relational
  • Object-oriented
  • Multidimensional

45
Physical Database Design
  • Chapter 7
  • THE END!!!
Write a Comment
User Comments (0)
About PowerShow.com