Class 4: Data Relatives

About This Presentation

Title:

Class 4: Data Relatives

Description:

Before a detailed examination of Relational Databases and relationship ... Can help eliminate homonyms and synonyms. A system catalog can be generated by some DMBS. ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 124

Provided by: glenndo

Category:

more less

Transcript and Presenter's Notes

Title: Class 4: Data Relatives

1
Class 4Data Relatives
2
Overview of Class

Review of Current Knowledge
Review of Concepts for Project
Introduction to Relational Databases
Introduction to Mapping and Performance Tuning

3
Motivation for Studying Data Relatives

To provide an appreciation for how Relational
Databases have influenced database design
To understand how relationship assignment or
mapping is used in database design

4
Understanding Needed

Before a detailed examination of Relational
Databases and relationship assignment is done, it
is useful to review our acquired knowledge of
databases thus far.
Moreover, a summary review will help in the
evolution of the class project.
Accordingly, we begin with a summary of major
concepts covered, along with a general
introduction to relational databases.

5
Review of Key Database Concepts

The following concepts will be reviewed, to help
us get a solid footing on our progress so far,
facilitate development of the class project, and
ease our introduction to relational databases
Database Models
Relationships, Types
Models Hierarchical, Network, Relational
(introduction)

6
Further Database Concepts

Additional concepts also be reviewed
Tables
Keys
Operators
Data Dictionaries
E-R Modeling
Cardinality
Dependence

7
Database Models

A database model is a set of logical constructs
Database models represent a data stricture and
relationships within the database
Two phases of database models are
Conceptual
Implementation

8
Conceptual Model

Conceptual models focus on the logical nature of
data
Conceptual models are concerned with what is
represented in the database
There are three representations of a conceptual
model, each of which describes a relationship
between entities

9
One to Many Relationship

The first representation is the one to many
(labeled 1M)

10
Many to Many Relationship

The second representation is the one to many
(labeled MN)

11
One to One Relationship

The third representation is the one to many
(labeled 11)

12
Implementation Model

The other major type of database model is the
Implementation Model
Implementation models emphasize how the data will
be represented.
The implementation model illustrates what is
modeled.

13
Database Architectures

Database architectures that can be easily
represented with an implementation model, include
Hierarchical Model
Network Model
Relational Model

14
Hierarchical Model

The hierarchical model was developed in 1969.
The fundamental characteristic of a hierarchical
model is the one way, top down, parent child
relationship
The hierarchical model can have 1M relationship,
where one parent can have many children, but one
child can have only one parent.

15
Hierarchical Example

Model Illustration

16
Network Model

The network model was developed in 1971 to
address inconsistencies in the hierarchical
model.
The network model can have a MN relationship,
with many children per parent, and more than one
parent possible for each child.

17
Network Model Components

Schema
conceptual view of the database from the DBAs
perspective
Subschema
view of the database from the application
software
DML (Data Manipulation Language)
Defines the data characteristics
Defines the data structure (record layout)

18
DML Components

Schema DDL (Data Definition Language)
Used to define schema
Sets up record types
Sets up attributes
Subschema DDL
Used by the development language to define
database components
DML
Used to manipulate database contents

19
Network Structure

Two record types
owner (parent)
member (child)

20
Network Example

Model Illustration

21
Relational Model

Developed in 1970 by E.F. Codd
The relational model supported the development of
a Relational Database Management System (RDBMS)
The multidimensional model developed by E.F. Codd
preceded by many years the technology required to
support the design.

22
Relational Definitions

Table
contain data
is a logical construct
Relation
is established by a common characteristic
characteristic must exist in each table
characteristic must be identical
usually called a primary key

23
Relational Database Model

Three major components of the relational database
model include
Entities
Attributes
Entity Sets
A basic data set in Codds theory is the table.
A table contains related entities. It can also
be called a relation.

24
Table Characteristics

It is a logical construct
Tables consist of tuples and rows
Each tuple and attribute intersection contains a
single value
Type of attributes support include
numeric
character
date
logical

25
More Table Characteristics

Each tuple must have a primary key
Each attribute must have a unique name
An attributes range of values is known as its
domain

26
Table Details

Tables consist of columns and rows
Tuples are
the rows in a table
represent an entity
Entity attribute
is represented by the column
is consistent across the entities

27
Naming Conventions for Tables

Tables should
be 8 characters long
have meaningful names
Columns should
be 10 characters long
start with an alpha
be descriptive

28
Types of Keys

Superkey
is an attribute that uniquely identifies an
entity
Candidate key
is a potential key

29
Other Types of Keys

Primary key
is an attribute that uniquely identifies an
entity
is used to access data
Secondary Key
is also called an Alternate Key
is used to provide alternate access path
could be used to create a primary key

30
Other Types of Keys

Foreign Key
is an attribute used to provide a link to another
table
Composite Key
contains more than one attribute
also known as a Concatenated Key

31
Relational Operators

Select
selects tuples (records) based on a specified
value for an attribute
provides a horizontal subset of a table
Project
provides a list of all values for an attribute
(field), and vertical subset of a table
Join
combines data from two or more tables

32
Relational Operators

Intersect
produces a listing that contains all tuples that
are common to both tables
attribute characteristics must be identical
Union
combines all tuples from two tables
attribute characteristics must be identical
Difference
produces all tuples in one table that are not
contained in another table

33
Relational Operators

Product
produces a list of all possible pairs from two
tables
I.E. Table 1 100 Tuples
I.E. Table 2 5 Tuples
Divide
requires the use of one single-column table and
one two-column table

34
Relational Database Classification

Fully Relational
supports all eight relational algebra-functions
enforces entity and referential integrity rules
Relational Complete
supports all eight relational algebra-functions
does not support the integrity rules

35
Relational Database Classification

Minimally Relational
Supports Select, Project, Join
Tabular
Supports ONLY Select, Project, Join
Requires all access paths be defined by the DBA

36
Data Dictionary

Used to provide a detailed description of all
tables and attributes found in the database
Can help eliminate homonyms and synonyms
A system catalog can be generated by some DMBS.

37
E-R Model

Two types of ER Models
Global E-R Model
Application (local) E-R Model
Is used to develop a common vision of the
requirements
Can define processing and constraint requirements
Can be used by the DBA to implement the database

38
E-R Components

Entities
represented by a rectangle

39
E-R Components

Attributes
represented by an oval with a line to the entity
Single value attribute represented by a line
Multi valued attribute represented by a double
line
Derived attribute represented by a -- line

40
E-R Components

Relationship
An association between entities
Types of relationships include
Optional
Mandatory
Degree
The number of associated entities

41
E-R Model

The E-R model is a simplified picture of the
relationship
between entities
between entities and the attributes

42
Relationships Weve Seen

11 Relationship
1M Relationship
MN Relationship

43
Unary Relationship
44
Binary Relationship
45
Ternary Relationship
46
Index

An index is composed of an index key and a series
of pointers
An index key is a reference point

47
Data Model

A data model is a diagram of real world data
structures
The modeling process takes us through a number of
phases
Conceptual Data Models
Internal Data Models
External Data Models
Physical Data Models

48
Conceptual Data Model

Represents the global or enterprise view of data
Identifies and describes data objects
Can be thought of as a database blueprint
Is hardware independent
Is software independent

49
Internal Data Model

Adapts the conceptual model to the DBMS selected
by the analyst
Is hardware independent
Is software dependent
changes in software will change database
requirements and characteristics
This is a critical model for developers of a
hierarchical or network DBMS

50
External Data Model

Is software development specialists view of the
internal model
Is hardware independent
is DBMS dependent
Once the DBMS is selected, the external model can
be developed

51
Physical Model

Describes the way the data will be stored
describes the tables
describes access paths
Is hardware dependent
The model can be dependent on the type of storage
media used
Is software dependent
Is DBMS dependent

52
Cardinality

Cardinality is referred to in relationships as
recording the number of entities occurrences.
Cardinality records the potential upper and lower
limits of the entity occurrences

53
Cardinality Example
54
Existence Dependency

This occurs when the existence of one entity
depends solely on the existence of another entity
For example, the existence of a child depends on
whether or not a parent exists.

55
Subtypes and Supertypes

Within an entity, it is possible to break down
and further categorize that entity into other
sub-entities.
We refer to the all encompassing entity and the
further categorized sub entities and Supertypes
and Subtypes, respectively.
An example of a supertype is a parent, subtypes
could be a mother, father, or stepparent.

56
Summary Review

Thus far, we have reviewed fundamentals of
database design, including
Database Models, Relationships, Types
Database Types
Hierarchical, Network, Relational Databases
Other Database Concepts
Tables, Keys, Operators, Data Dictionaries
E-R Modeling, Cardinality, Dependence

57
Introduction to Data Relatives

At this point, we are familiar with fundamental
design concepts required to understand the
development of a database.
This prepares us for further exploration of the
technology that has become increasingly dominant
in database architecture, the relational database.

58
Introduction to Relational Databases
59
Relational Databases

Relational Databases, as weve seen, owe their
development to the paper by E.F. Codd
A fundamental requirement of relational databases
is that they use the Structure Query Language for
Data Manipulation, Data Definition and Data
Control Languages.

60
The Need for Relational Databases

As older, larger systems evolved, they became
very cumbersome and resistant to database
updates, mostly because of the many programs,
written for a particular database, that would
have to be changed along with them.
The workaround solution was to simply add more
databases

61
Problems with Adding Databases

As new databases were continually added, all
kinds of data redundancy crept into information
systems, increasing the likelihood of errors.
Furthermore, large systems became increasingly
difficult, and correspondingly expensive, to
maintain.

62
The Relational Database Solution

By implementing relational databases,
administrators were able to add to existing
databases, while maintaining data integrity, and
not needing to redesign programs which access the
data.

63
Relational Database Benefits

Relational Databases serve a number of purposes
including
provides greater structure and analytical
capabilities for measuring data processing
performance
aids in the design of databases by providing a
design model
increases data storage and access efficiency for
widespread applications

64
Relational Database Theory

Database theory concerns itself with three areas
of data management
Data Structure
Data Integrity
Data Manipulation

65
Data Management - Structure

Data is viewed as a data element, known as atomic
(not possible to break down further)
Data elements can be contained in a domain (range
of valid values)
Data in a domain can be related with operators
(mathematical comparisons)
Data element types are attributes
Related data attributes comprise a relation. One
occurrence is a tuple.

66
Data Management - Integrity

A base table represents the stored relations of
data, and use these integrity rules
Each relation has a series of candidate keys
(attributes that ID tuples)
Each relation has one candidate key which serves
as primary (unique) key
Any attribute of a relation can be a foreign key
(primary key of another table)
Each relation can have many alternate keys (non
primary candidate keys)

67
Data Management - Manipulation

Once tables representing relations are created,
data must be accessed using operators.
A data access language used to run the operations
using operators is Structured Query Language
(SQL).

68
SQL

SQL is divided into
Data Definition Language
concerned with the creation of tables
Data Control Language
concerned with security and data access control
Data Manipulation Language
concerned with storing, modifying, and retrieving
data
SQL is executed by simple commands

69
Key SQL Commands

Queries begin with SELECT, and must come FROM
somewhere
WHERE adds conditions to refine search
To relate two or more tables, use the JOIN command

70
Specific Advantages of Relational Databases

Facilitates adhoc, dynamic information queries
Generally user friendly, consistent interface for
data access
No navigation is needed to search data, simply
select relationships for analysis
Suited for distributed systems due to data access
independence.

71
Examples in Text

DB2
Highly popular system, runs primarily on
mainframe platforms.
Very robust design, handles high rates of
transaction processing.
Informix
Geared to UNIX operating systems
Allows for manipulation of graphical objects in a
database environment.

72
Relational Database Review

Data Description
Tables, tuples, attributes
Organization
Collections of autonomous files, logically
related via common field values
Data Constructs
Relational Data Definition Language

73
Relational Database Navigational Capabilities

Search Capability
Simple and complex search
Cross Reference
via the SQL JOIN
Mapping
Dynamic only
Processing
Set at a time

74
Relational Databases Summary

The dramatic improvements in system technology to
support distributed, client server architecture,
vastly greater storage, memory and processing
capabilities, have made the arrival of relational
databases on complex networks a reality.
Combined with the inherent advantages of
relational design, it is clear why this
technology has become so prominent.

75
Introduction to Mapping and Performance Tuning
76
Introduction to Mapping and Performance Tuning

Remember that mapping refers to relationship
assignments and performance tuning relates to the
physical stage of database implementation.
These are critical elements in the design process
which we will break into 3 phases
Conceptual Design
Construct Design
Physical Design

77
Design Phases

Conceptual Design
Preliminary Design Phase, an abstract
representation of a real system
Construct Design
Defining base constructs and mapping their
relationships
Physical Design
Take results of construct design to do a real
database implementation

78
CONCEPTUAL DESIGN

During this phase, we are concerned with
understanding reality of an information system
from the perspective of the users.
We will focus on
Input - user interviews
Output - data models
Other Uses for Conceptual Design

79
Conceptual Design - Inputs

Information gathered to prepare a conceptual
design is derived through preliminary
investigation and fact finding. This includes
accumulation on inputs.

80
Conceptual Design - Input Gathering

Input needed for conceptual design is primarily
accumulated through user interviews. This
includes
Informal Interviews discussions with users and
participants
JAD Sessions group represented from different
functional areas affected by a system
Surveys questionnaires designed to solicit
feedback on existing systems

81
Conceptual Design - Output

The results of these inputs are building blocks
for completing the conceptual design. This is
needed for the data models which represent that
design. Methods we have seen are
Informal conceptual design
Entity / Relationship Modeling
Semantic Data Modeling
Object Oriented Modeling

82
Conceptual Design - Other Applications

Recall that conceptual design can be utilized for
a number of other purposes in addition to basic
design. This included
Data Administration
The use of CASE tools in automating design
Data Repositories

83
CONSTRUCT DESIGN

During this phase, we define the base constructs,
called base construct assignment, and map the
relationships between them, called relationship
assignment.
Construct design is largely concerned with the
architecture profile.

84
The Construct Design Process

To evaluate this process, we will look at
Inputs Needed for Construct Design
The Steps in the Construct Design Process
Outputs from Construct Design

85
Construct Design - Inputs

Input required to progress from the conceptual
stage to the construct stage is primarily one of
documentation. This includes
Conceptual design documents those documents
generated from the conceptual design stage
Database architecture information parameters
about the architecture used

86
Construct Design Process

The steps included in construct design are
Transforming models into constructs
Definition of base constructs (assignments)
Definition of mapping relationships (logical or
physical)
Normalization (from atomization to 0th, 1st, 2nd,
and 3rd normal form)

87
Transforming models into constructs

The first step is to turn the logically defind
entities and objects into base construct
definitions.
The second step is to determine how the
relationships that the model shows are going to
be capture within the structure of the database

88
Definition of base constructs (assignments)

Definition of constructs include the following
Names
Elements
Primary Keys

89
Definition of mapping relationships

Defining mappings can be done as either logical
or physical mappings.
Logical mappings are done for the xbase, inverted
list, and relational database architectures.
Physical mappings are done for the hierarchical
and network architectures.

90
Logical Mappings

Examples of logical mappings
1m Foreign key assignment
Mn Bridge constructs
We use this approach for our project

91
Physical Mappings

Examples of physical mappings
Hierarchy diagrams
Network diagrams

92
Normalization

Normalization is a formal approach to applying a
set of rules used in associating attributes with
entities. The set of rules is designed to help
the designer convert the logical entities into a
conceptual model for the database design.

93
Normalization Process

During normalization, the structure of the
logical model that may translate into undesirable
properties in the physical model is examined and
resolved.

94
Why Normalization?

The normalization process is used to ensure that
the conceptual model of the database will work.
An un-normalized model can be implemented, but it
will present problems in application development
and data manipulation operations.

95
Benefits of Normalization

A normalized model is more flexible and better
able to meet a wide range of end-user application
requirements with minimal structural change. New
applications are less likely to force a database
design change.

96
Benefits of Normalization

Normalization reduces redundant data, minimizing
the amount of disk storage required to store the
data and making it easier to maintain accurate
and consistent data.

97
Benefits of Normalization

A simple and logical design can result in
increased programmer productivity. Normalization
reduces the maintenance costs for an application
because changes to the application are easier.

98
Summary Benefits

Greater flexibility
Ensures attributes are placed in proper tables
Reduces data redundancy
Increases programmer productivity
Decreases application maintenance costs
Maximizes stability of the data model

99
De-Normalizing

Although there are benefits, do not
over-normalize. A normalized data model might not
meet all design objectives.
You can use selective de-normalization in some
areas of the data model to increase performance.

100
Rules

Normalization includes several rules for
converting the logical design to a more effective
physical design. These rules are referred to as
the normal forms.

101
Normal Forms

There are several normal forms used to organize
data. The first three forms
first normal form (1NF),
second normal form (2NF), and
third normal form (3NF)
these are the most commonly used.

102
Normalization in Order

Each normal form constrains the data to be more
organized than the previous form.
For this reason, each normal form must be
achieved in turn before the next normal form can
be applied.

103
First Normal Form

An entity relationship is in first normal form if
there are no repeating groups (domains).
Each entity must have a fixed number of
single-valued attributes.
A table that is not in first normal form is less
flexible and can also waste disk storage space.
It can also make data searches more difficult.

104
1NF Process

To put this order entity into first normal form,
separate the entity into two.
The first entity removes the repeating groups.
The second entity has a single copy of the
attribute group, together with a new primary key
and a foreign key reference to the first entity.
Note Example

105
1NF Process

Notice that the restriction on the number of
order items was removed. Only items that are
ordered now use disk storage space. Also,
searching for a particular item on order requires
a search on only one attribute.

106
Second Normal Form

An entity relationship is in second normal form
if it is in first normal form and all its
attributes depend on the whole primary key.
Remember that the primary key is a minimal set of
attributes that uniquely identify an entity
instance.

107
Second Normal Form

Second normal form requires that every attribute
must be fully functionally dependent on the
primary key.
Functional dependence means there is a link
between the values in the different attributes.

108
Third Normal Form

An entity relationship is in third normal form
and
if it is in second normal form and all its
attributes depend on the whole primary key and
nothing but the primary key.

109
Transitive Dependence

This eliminates attributes that not only depend
on the whole primary key but also on other
non-key attributes, which in turn depend on the
whole primary key.
This is known as transitive (indirect) dependence.

110
Outputs

Output in the construct design phase involves the
production of a construct layout document.
Diagrams which can be used to achieve this are
Studer Diagrams
Hierarchy Diagrams
Network Diagrams

111
PHYSICAL DESIGN

During this phase, we carry out the actual
implementation of the design.
This also is refereed to as construct tuning
We are primarily concerned with the storage
profile.
We do not concern ourselves with this stage in
the project, but we need to be aware of it.

112
Physical Design - Inputs

Documentation needed to follow from the construct
design phase to the physical design phase
include
Construct Layouts
Database Product Information
Performance Statistics

113
Inputs

Construct Layouts are a deliverable from the
construct phase and are needed to determine how
the structure will drive storage requirements
Database Product Information will also include
information on recommended storage requirements,
minimum specifications, recommended processing
capacity, and so forth.

114
Performance Statistics

Existing performance statistics are critical to
optimize the database design to the current
hardware and software environment. These
statistics include
Complexity
Volume
Transactions

115
Analysis of Statistics

Complexity largely referring to the number of
constructs and cross construct capability
Volume derived by multiplying number of records
per construct, their widths, block size, and
number of blocks, then adding the product for all
the constructs
Transactions calculations based on transactions
per given unit of time to determine access rates.
(see p 489)

116
Physical Design - Processes

After gather the necessary information or input,
a series of processes are required to implement
the physical design. These are
General configuration
Direct access optimization
Scan access optimization
Nonkey search access optimization

117
Continuing Processes in Physical Design

Further processes include
Cross-construct optimization
Grouping constructs
Overall system optimization

118
Physical Design Validation

After completing the implementation of the
physical database design, it is necessary to
verify and validate the success of this design.
A series of tests and measures can be taken to
achieve this, including
benchmarking
volume testing
pressure testing
prototyping

119
Benchmarking

This involves general testing under normal
conditions to see what performance statistics can
be reasonably expected in the future
This is useful to see if a system is performing
in a less than optimal way.

120
Volume Testing

Exceeding the maximum designed specification for
volume is a kind of stress test which tells the
designer if the system can handle the job.

121
Pressure Testing

Similar to volume testing, though pressure
testing is concerned with high transaction rates,
rather than large amounts of data over time.
Both pressure and volume testing can reveal
shortcomings in design before they become a
problem.

122
Prototyping

This is merely the development of a small scale
version of an entire system. This is often done
before a full system roll out, and can even be
initiated to test the concept of a system before
deciding to implement it.

123
Mapping and Performance Tuning Summary

Clearly, a rigorous process is required to
implement a database design successfully.
The steps required in the conceptual, construct,
and physical stages reflect this complexity, and
underline the need to take great care before
declaring the implementation a success.

Write a Comment

User Comments (0)