Database Design: Normalization - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Database Design: Normalization

Description:

How tables that contain redundant data can suffer from anomalies, which can ... A, B and C are attributes of a relation such that if A B and B C, ... – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 41
Provided by: srip1
Category:

less

Transcript and Presenter's Notes

Title: Database Design: Normalization


1
Database Design Normalization
  • Reading CB, Chaps 13

2
In this lecture you will learn
  • Mathematical notions behind relational model
  • Anomalies
  • How tables that contain redundant data can suffer
    from anomalies, which can introduce
    inconsistencies into a database.
  • Normalization
  • The rules associated with the most commonly used
    normal forms, namely first (1NF), second (2NF),
    and third (3NF).

3
Introduction
  • Relations derived from ER model may be faulty
  • May cause data redundancy
  • May cause insert/delete/update anomalies
  • We use some mathematical properties of relations
    to
  • locate these faults
  • fix them

4
Mathematical notions behind relational model
  • Set a collection of objects characterized by
    some defining property
  • E.g. a column in a database table such as last
    names of all staff
  • Cross Product of sets one of the operations (X)
    on sets
  • E.g. consider two sets, set of all first names
    and set of all last names in the staff table
  • fName Mary, David
  • lName Howe, Ford
  • fNameXlName (Mary,Howe), (Mary,Ford), (David,
    Howe), (David, Ford)
  • Relation defined between two sets and is a
    subset of cross product between those two sets
  • E.g. FirstNameOf (Mary, Howe), (David, Ford)

5
Relational model
  • The name relational model comes from this
    mathematical notion of relation
  • Where a relation is a set (collection) of tuples
    that have related objects such as first name and
    last name of the same person
  • E.g. (fName, lName) is a relation
  • We can have relations over any number of sets
  • E.g. (staffNo, fName, lName, position)
  • In general we can denote a relation as
    (A,B,C,D,.,Z) where A, B, C and Z are all its
    attribute sets

6
Function
  • A function is a special kind of relation
  • In a relation (X,Y), if every value of X is
    associated with exactly one value of Y, then we
    say Y is a function of X.
  • E.g. the relation (1,2),(2,4),(3,6),(4,8) is a
    function, Y 2X for 0ltXlt5

Y
X
Only one arrow can start from any single value in
X
1 2 3 4
2 4 6 8
7
Functional Dependency
  • If Y is a function of X
  • Y is dependent on X
  • there is a relationship of functional dependency
    between Y and X
  • If we know the value of X, we find only one value
    of Y in all the records that has this value of X,
    at any moment in time.
  • In databases, we work with relations in general
    form (A,B,C,D,,Z)

8
Functional Dependency
  • Functional Dependency
  • Describes relationship between
  • attributes in a relation
  • or columns in a table.
  • If A and B are columns of table R
  • B is functionally dependent on A
  • if each value of A in R is associated with
    exactly one value of B in R.
  • We are interested in finding such functional
    dependencies among database relations

9
Functional Dependency
  • Functional dependency is a property of the
    meaning of the attributes in a relation (or the
    columns in a table).
  • Diagrammatic representation

10
Functional Dependency
  • Determinant of a functional dependency refers to
    attribute or group of attributes on left-hand
    side of the arrow.
  • If the determinant can maintain the functional
    dependency with a minimum number of attributes,
    then we call it full functional dependency

11
Data Redundancy
  • Major aims of relational database design
  • to group columns into tables to minimize data
    redundancy
  • to reduce file storage space required by base
    relations.

12
Data Redundancy
  • Tables that contain redundant information may
    potentially suffer from update anomalies
  • Insert anomalies
  • Delete anomalies
  • Update anomalies
  • We illustrate these anomalies with an example.

13
Data redundancy and anomalies

Staff_Branch
14
Data redundancy and update anomalies
  • StaffBranch table has redundant data
  • details of a branch are repeated for every member
    of staff.
  • In contrast, consider Branch and Staff tables
  • branch information appears only once for each
    branch in the Branch table
  • only the branch number (branchNo) is repeated in
    the Staff table, to represent where each member
    of staff is located.

15
Data redundancy and anomalies
Staff
Branch
16
Insert Anomalies
  • Try to insert details for a new member of staff
    into StaffBranch.
  • You also need to insert branch details that are
    consistent with existing details for the same
    branch.
  • Hard to maintain data consistency with StaffBranch

17
Delete Anomalies
  • Try to delete details for a member of staff from
    StaffBranch.
  • You also lose branch details in that row (tuple).

18
Update Anomalies
  • Try to update the value of one of the attributes
    of a branch.
  • You also need to update that information in all
    the rows about the same branch.

19
Decomposition of Relations
  • Staff and Branch relations which are obtained by
    decomposing StaffBranch do not suffer from these
    anomalies.
  • Two important properties of decomposition
  • Lossless-join property enables us to find any
    instance of original relation from corresponding
    instances in the smaller relations.
  • Dependency preservation property enables us to
    enforce a constraint on original relation by
    enforcing some constraint on each of the smaller
    relations.

20
Normalization
  • A technique for producing a set of tables with
    desirable properties that support the
    requirements of a user or company.

21
The Process of Normalization
  • Formal technique for analyzing a relation based
    on
  • its primary key
  • functional dependencies between attributes.
  • Often executed as a series of steps.
  • Each step corresponds to a specific normal form,
    which has known properties.
  • As normalization proceeds, relations become
  • progressively more restricted in format
  • less vulnerable to update anomalies.

22
The Process of Normalization
  • Given a relation, use the following cycle
  • Find out what normal form it is in
  • Transform the relation to the next higher form by
    decomposing it to form simpler relations
  • You may need to refine the relation further if
    decomposition resulted in undesirable properties

23
Normalization Flow
UNF
Remove repeating groups
1NF
Remove partial dependencies
2NF
Remove transitive dependencies
3NF
More normalized forms
24
Unnormalized Form (UNF)
  • A table that contains one or more repeating
    groups.
  • To create an unnormalized table
  • transform data from information source (e.g.
    form) into table format with columns and rows.

Example 1 address and name fields are composite
25
Another example of UNF
Example 2 repeating columns for each client
composite name field
26
UNF to 1NF
  • Remove repeating group by
  • entering appropriate data into the empty columns
    of rows containing repeating data (flattening
    the table).
  • Or by
  • placing repeating data along with copy of the
    original key attribute(s) into a separate table.

27
First Normal Form (1NF)
  • A relation in which intersection of each row and
    column contains one and only one value.
  • Steps from UNF to 1NF
  • Nominate an attribute or group of attributes to
    act as the key for the unnormalized table.
  • Identify repeating group(s) in unnormalized table
    which repeats for the key attribute(s).

28
Branch table in UNF
29
Converting Branch table to 1NF
30
Second Normal Form (2NF)
  • Based on concept of full functional dependency
  • A and B are attributes of a relation R,
  • B is fully dependent on A (denoted A-gtB) if B is
    functionally dependent on A but not on any proper
    subset of A.
  • 2NF only applies to tables with composite primary
    keys.
  • 2NF is
  • A table that is in 1NF and
  • In which the values of each non-primary-key
    column can be worked out from the values in all
    the columns that make up the primary key.

31
1NF to 2NF
  • Steps
  • Identify primary key for the 1NF relation.
  • Identify functional dependencies in the relation.
  • If partial dependencies exist on the primary key
    remove them by placing them in a new relation
    along with copy of their determinant.

32
TempStaffAllocation table is not in 2NF
33
Converting TempStaffAllocation table to 2NF
34
Third normal form (3NF)
  • A table that is
  • in 1NF and 2NF and
  • in which all non-primary-key column can be worked
    out from only the primary key column(s) and no
    other columns.

35
Third Normal Form (3NF)
  • Based on concept of transitive dependency
  • A, B and C are attributes of a relation such that
    if A ? B and B ? C,
  • then C is transitively dependent on A via B.
    (Provided that A is not functionally dependent on
    B or C).
  • 3NF
  • A relation that is in 1NF and 2NF and
  • no non-primary-key attribute is transitively
    dependent on the primary key.

36
2NF to 3NF
  • Steps
  • Identify the primary key in the 2NF relation.
  • Identify functional dependencies in the relation.
  • If transitive dependencies exist on the primary
    key, remove them by placing them in a new
    relation along with copy of their determinant.

37
StaffBranch table is not in 3NF
38
Converting the StaffBranch table to 3NF
39
Review Normalization Flow
UNF
Remove repeating groups
1NF
Remove partial dependencies
2NF
Remove transitive dependencies
3NF
More normalized forms
40
Conclusion
  • Quality of the relations derived from ER models
    is unknown.
  • Normalization is a systematic process of either
    assessing or converting these relations into
    progressively stricter normal forms.
  • Advanced normal forms such as Boyce-Codd normal
    form (BNCF), 4NF and 5NF exist.
Write a Comment
User Comments (0)
About PowerShow.com