Normalization 1NF, 2NF, 3NF, BCNF - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Normalization 1NF, 2NF, 3NF, BCNF

Description:

Nonloss Decomposition and Functional Dependencies. First, Second, and Third Normal Forms ... key dependencies promote a miasma of update anomalies difficulties with ... – PowerPoint PPT presentation

Number of Views:5310

Avg rating:3.0/5.0

Slides: 38

Provided by: Richar502

Category:

more less

Transcript and Presenter's Notes

Title: Normalization 1NF, 2NF, 3NF, BCNF

1
Normalization 1NF, 2NF, 3NF, BCNF
2
Outline

Introduction
Nonloss Decomposition and Functional Dependencies
First, Second, and Third Normal Forms
Dependency Preservation
Boyce/Codd Normal Form
A Note on Relation-Valued Attributes

3
Normalization

Normalized and 1 NF are the same thing typically
normalized refers incorrectly to 3NF
Normalization helps control redundancy
Normalization is reversible i.e. nonloss, or
information preserving
Six normal forms are discussed 1 through 5, and
Boyce-Codd Normal Form (BCNF), which is an
improvement on 3NF

4
Levels of Normalization
5
Nonloss Decomposition and Functional Dependencies
6
Overview

Nonloss decomposition the normalization
procedure involves decomposing a given relvar
into other relvars, and moreover that the
decomposition is required to be reversible, so
that no information is lost in the process
Example
Case a is an nonloss decomposition
If you join SST and SC back together again, you
get back to S
Case b is a lossy decomposition
Some additional spurious tuples will appear

7
Sample value for relvar S and two corresponding
decompositions
8
Nonloss Decomposition and Functional Dependencies

Normalization uses a process of projection to
decompose relvars
Re-composition is a process of joins
Heaths theorem Let RA, B, C be a relvar,
where A, B, and C are sets of attributes. If R
satisfies the FD A?B, then R is equal to the join
of its projection on A, B and A, C
The decomposition of relvar R into projections
R1Rn is nonloss if R the join of R1Rn

9
First, Second, and Third Normal Forms

Throughout this section, we assume for simplicity
that each relvar has exactly one candidate key,
which we further assume is the primary key

10
Third Normal Form

A relvar is in 3NF if and only if the nonkey
attributes are both mutually independent and
irreducibly dependent on the primary key
Two or more attributes are mutually independent
if none of them is FD on any combination of the
others. Such independence implies that each
attribute can be updated independently of the
rest
Example the parts relvar P is in 3NF
A relvar is in 3NF if and only if, for all time,
each tuple consists of a primary key value that
identifies some entity, together with a set of
zero or more mutually independent attribute
values that describe that entity in some way

11
First Normal Form

A relvar is in 1NF if and only if in every legal
value of that relvar, every tuple contains
exactly one value for each attribute
In this way, relvars are always in 1NF
A relvar in 1NF may display functional
dependencies other than those emanating from the
primary key
Such non-primary-key dependencies promote a
miasma of update anomalies difficulties with
the update operations INSERT, DELETE, and UPDATE

12
Example

FIRST S, STATUS, CITY, P, QTY PRIMARY KEY
S, P
One more constraint CITY ? STATUS
A suppliers status is determined by the location
of that supplier (e.g. all London suppliers must
have a status of 20)
Non3NF diagram has arrows out of candidate keys
together with certain additional arrows

13
Sample Value for relvar FIRST
Note relvar FIRST violates both conditions a and
b in the preliminary 3NF definition the nonkey
attributes are not all mutually independent, and
they are not all irreducible dependent on the
primary key
14
The Suppliers-and-Parts Database (Sample Values)
S (suppliers) and P (parts) are entities, and SP
(shipment) is a relationship between S and P (a
relationship is a special case of an entity) In
RDB, entities and relationships are also
represented in the same uniform way
15
Update Anomalies For FIRST

Focus on the FD S ? CITY
INSERT we cannot insert the fact that a
particular supplier is located in a particular
city until that supplier supplies at least one
part (e.g. supplier S5 is located in Athens)
DELETE if we delete the sole FIRST tuple for a
particular supplier, we delete not only the
shipment concerning that supplier to a particular
part but also the information that the supplier
is located in a particular city (e.g. the S3
tuple)
UPDATE the city value for a given supplier
appears in FIRST many times, in general

16
Solution for FIRST

The real problem is that relvar FIRST contains
too much information (shipment, and supplier) all
bundled together
If we delete a tuple, we delete too much
The solution to this problem is to un-bundle
Replace relvar FIRST by the two relvars
SECOND S, STATUS, CITY
SP S, P, QTY
The decomposition of FIRST into SECOND and SP
eliminates the dependencies that were not
irreducible
The attribute CITY in FIRST did not describe the
entity described by the primary key instead, it
described the supplier involved in that shipment

17
Relvars SECOND and SP
18
Step Summarization

The first step in the normalization procedure is
to take projections to eliminate
non-irreducible FD
Before normalization
R A, B, C, D PRIMARY KEY A, B /
assume A ? D holds /
After normalization
R1 A, D PRIMARY KEY A
R2 A, B, C PRIMARY KEY A, B
FOREIGN KEY A REFERENCES R1

19
Second Normal Form

A relvar is in 2NF if and only if it is in 1NF
and every nonkey attribute is irreducibly
dependent on the primary key
Assumes only one candidate key
A relvar in 2NF is less susceptible to update
anomalies, but may still exhibit transitive
dependencies
Both attributes in a transitive dependency are
irreducibly implied by the primary key, and each
implies the other

20
Update Anomalies For SECOND

Focus on the FD CITY ? STATUS
S ? CITY CITY ? STATUS, then S ? STATUS
INSERT we cannot insert the fact that a
particular city has a particular status until we
have some suppliers actually located in that city
(e.g. city ROME has a status of 50)
DELETE if we delete the sole SECOND tuple for a
particular city, we delete not only the supplier
concerned but also the information that that
city has that particular status (e.g. the S5
tuple)
UPDATE the status value for a given city appears
in SECOND many times, in general

21
Solution for SECOND

The real problem is that relvar SECOND contains
too much information (supplier, and city) all
bundled together
If we delete a tuple, we delete too much
The solution to this problem is to un-bundle
Replace relvar SECOND by the two relvars
SC S, CITY
CS CITY, STATUS
The decomposition of SECOND into SC and CS
eliminates the transitive dependencies
The attribute STATUS in SECOND did not describe
the entity described by the primary key instead,
it described the city involved in which that
supplier happened to be located

22
Relvars SC and CS
23
Step Summarization

The second step in the normalization procedure is
to take projections to eliminate transitive FD
Before normalization
R A, B, C PRIMARY KEY A / assume
B ? C holds /
After normalization
R1 B, C PRIMARY KEY B
R2 A, B PRIMARY KEY A FOREIGN KEY
B REFERENCES R1

24
Third Normal Form

A relvar is in 3NF if and only if it is in 2NF
and every nonkey attribute is nontransitively
dependent on the primary key
Assumes only one candidate key

25
Dependency Preservation

It is frequently the case that a given relvar can
be nonloss-decomposed in a variety of different
ways
Example
SC S, CITY CS CITY, STATUS
SC S, CITY SS S, STATUS
Dependency preservation refers to a specific case
of nonloss decomposition, such that the
normalized relvars are independent of each other
Updates can be made to either one without regard
for others

Which decomposition is better ?
26
Dependency Preservation (Cont.)

Projections R1 and R2 of a relvar R are
independent in the foregoing sense if and only if
both of the following are true
Every FD in R is a logical consequence of those
in R1 and R2
The common attributes of R1 and R2 form a
candidate key for at least one of the pair
More on dependency preservation pp. 366-367

27
Boyce/Codd Normal Form
28
Overview

Codds original definition of 3NF did not
adequately deal with the case of a relvar that
Had two or more candidate keys, such that
The candidate keys were composite, and
The are overlapped (i.e. had at least one
attribute in common)
Definition of Boyce/Codd normal form
A relvar is in BCNF if and only if every
nontrivial, left-irreducible FD has a candidate
key as its determinant
A relvar is in BCNF if and only if every
determinant is a candidate key (informal
definition)
The only arrows in the FD diagram are arrows out
of candidate keys

29
Overview (Cont.)

Examples
FIRST and SECOND, which were not in 3NF, are not
in BCNF either
SP, SC, and CS, which were in 3NF, are also in
BCNF
Another example involving two disjoint
candidate keys
Suppose S and SNAME are both candidate keys, and
STATUS and CITY are mutually independent ? BCNF

30
BCNF Example I

SSP S, SNAME, P, QTY
Candidate key S, P , SNAME, P
Non-BCNF determinants S and SNAME are not
candidate keys
Suffer the same kind of problems as FIRST and
SECOND did
SSP is in 3NF by the old definition (not the
simplified version)
That definition did not require an attribute to
be irreducibly dependent on each candidate key if
it was itself a component of some candidate key
of the relvar
The fact that SNAME is notirreducibly dependent
on S, P was ignored

31
Solution for SSP

Break SSP into two projections
SS S, SNAME
SP S, P, QTY
OR
SS S, SNAME
SP SNAME, P, QTY

S
P
QTY
SNAME
P
32
BCNF Example II

SJT (student, subject, teacher)
The meaning of an SJT tuple (s, j, t) is that a
student s is taught subject j by teacher t
Constraints
For each subject, each student of that subject is
taught by only one teacher
Each teacher teaches only one subject (but one
subject is taught by several teachers)

33
BCNF Example II (Cont.)

SJT (Cont.)
Two overlapping candidate keys, S, J and S,
T
SJT is in 3NF and not BCNF
Attribute T is a determinant but not a candidate
key
Solution for SJT break SJT into
ST S, T and TJ T, J

34
BCNF Example II (Cont.)

However, the two projections ST and TJ are not
independent
FD S, J ? T cannot be deduced from the FD T?J
Example an attempt to insert a tuple for Smith
and Prof. Brown into ST must be rejected Why ???
The twin objectives of (a) decomposing a relvar
into BCNF components, and (b) decomposing it into
independent components, can occasionally be in
conflict

35
BCNF Example III

EXAM
The meaning of an EXAM tuple (s, j, p) is that a
student s was examined in subject j and achieved
position p
Constraint no two students obtained the same
position in the same subject
Two overlapping candidate keys, S, J , and
J, P
They are the only determinants
EXAM is in BCNF

36
Relation-Valued Attributes

A relation may include attributes whose values
are relations
Traditionally this would be seen to violate 1NF,
which was held to prohibit repeating groups
Now they are theoretically sound, but in practice
you should avoid them because they have
complicated predicates

GET S for suppliers who supply part P1 ( SPQ
WHERE TUPLE P P (P1) ? PQ P ) S
GET P for parts supplied by supplier S1 ( SPQ
WHERE S S (S1) ) UNGROUP PQ ) P
Matters are worse for update operations
37
Conclusions

The whole point of normalization theory is to try
to identify commonsense principles and formalize
them
In this way, we can mechanize those principles by
an algorithm
BCNF and indeed 5NF is always achievable

Write a Comment

User Comments (0)