Relational database - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Relational database

Description:

Lecture 3 on Data Normalization Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to ... – PowerPoint PPT presentation

Number of Views:182
Avg rating:3.0/5.0
Slides: 42
Provided by: csjf
Category:

less

Transcript and Presenter's Notes

Title: Relational database


1
Lecture 3 on Data Normalization
Study the methods of first, second, third,
Boyce-Codd, fourth and fifth normal form for
relational database design, in order to eliminate
data redundancy and update abnormality.
2
Normalization TheoryRefine database design
to eliminate abnormalities (irregularities) of
manipulating database
3
1NF, 2NF and 3NF
  • Built around the concept of normal forms
  • Normal form Contains atomic values only
  • All normalized relations are in 1NF
  • 2NF is the subset of 1NF, 3NF is the subset of
    2NF and so on
  • 3NF is more desirable than 2NF, 2NF is more
    desirable than 1NF

4
BCNF, 4NF and 5NF(PJNF)
  • Boyce-Codd Normal Form
  • A stronger form of 3NF
  • Every BCNF is also 3NF, but some 3NF are not BCNF
  • 4NF and 5NF
  • Defined recently
  • Deal with multi-valued dependency (MVD) and join
    dependency (JD)

5
Relationship between Normal Forms
Universe of relations
1NF relations
2NF relations
3NF relations
BCNF relations
4NF relations
5NF/PJNF relations
6
First Normal Form
  • A relation is in 1NF if each attribute contains
    only one value (not a set of values)
  • The primary key (PK) can not be null

7
First Normal Form
S S-name Enrollments
S1 Brown C1 Math
S1 Brown C2 Chem
S1 Brown C3 Phys
S2 Smith C2 Chem
S2 Smith C3 Phys
S2 Smith C4 Math
S3 Brown C2 Chem
S3 Brown C3 Phys
Is this relation in 1NF?
Relation STUDENT-A
8
First Normal Form
S S-name Enrollments
S1 Brown C1 Math
S1 Brown C2 Chem
S1 Brown C3 Phys
S2 Smith C2 Chem
S2 Smith C3 Phys
S2 Smith C4 Math
S3 Brown C2 Chem
S3 Brown C3 Phys
  • NO!!!
  • Elements in the domain Enrollments are not atomic
  • Could be split into two domains C and C-Name

Relation STUDENT-B
9
First Normal Form
  • Enrollments is split into C and C-Name
  • Use S and C as a compound PK
  • A student may attend several courses and a course
    may have several students
  • So S and C has a mn mapping

S S-Name C C-Name
S1 Brown C1 Math
S1 Brown C2 Chem
S1 Brown C3 Phys
S2 Smith C2 Chem
S2 Smith C3 Phys
S2 Smith C4 Math
S3 Brown C2 Chem
S3 Brown C3 Phys
Relation STUDENT-B
10
Functional Dependency (FD)
  • Attribute Y of relation R is functionally
    dependent on attribute X of R ? each value of X
    is associated with exactly one value of Y
  • Denoted by X ? Y
  • In the relation STUDENT-B
  • S ? S-Name
  • C ? C-Name
  • S, C ? 0

11
Anomalies using 1NF
  • 1NF relations require less complicated
    application to operate as opposed to unnormalized
    relations
  • Anomalies in insert
  • Since PK is composed of C and S, both details
    of student and course must be known before
    inserting a entry
  • Eg to add a course, at least one student is
    enrolled

12
Anomalies using 1NF
  • Anomalies in delete
  • If all students attending a particular course are
    deleted, the course will not be found in the
    database
  • Anomalies in update
  • Redundancy of S-Name and C-Name
  • Increase storage space and effort to modify data
    item
  • If a course is modified, all tuples containing
    that course must be updated

13
Second Normal Form
  • A relation is in 2NF if it is in 1NF and every
    non-PK attribute is fully functionally dependant
    on the PK
  • In the relation STUDENT-B
  • PK C, S
  • Non-PK attribute C-Name, S-Name
  • C, S ? S-Name
  • S ? S-Name
  • Since S-Name is only partially dependent on the
    PK, relation Student-B is not in 2NF

14
Second Normal Form
S S-Name
S1 Brown
S2 Smith
S3 Brown
C C-Name
C1 Math
C2 Chem
C3 Phys
C4 Math
S C
S1 C1
S1 C2
S1 C3
S2 C2
S2 C3
S2 C4
S3 C2
S3 C3
Relation SC
Relation STUDENT
Relation COURSE
  • All of them are in 2NF as none of them has
    partial dependency
  • Original information can be reconstructed by
    natural join operation

15
Anomalies in 2NF
  • Suppose we have the relations PRODUCT, MACHINE
    and EMPLOYEE
  • P ? M
  • P ? E
  • M ? E
  • The tuple (P1, M1, E1) means product P1 is
    manufactured on machine M1 which is operated by
    employee E1

16
Anomalies in 2NF
  • Anomalies in insert
  • It is not possible to store the fact that which
    machine is operated by which employee without
    knowing at least one product produced by this
    machine
  • Anomalies in delete
  • If an employee is fired the fact that which
    machine he operated and what product that machine
    produced are also lost

17
Anomalies in 2NF
  • Anomalies in update
  • If one employee is assigned to operate another
    machine then several tuples have to be updated as
    well

18
Third Normal Form
  • A relation is in 3NF if it is in 2NF and no
    non-PK attributes is transitively dependent on
    the PK
  • In the manufacture relations
  • P ? M and M ? E implies P ? E
  • So P ? E is a transitive dependency

19
Third Normal Form
P M E
P1 M1 E1
P2 M2 E3
P3 M1 E1
P4 M1 E1
P5 M3 E2
P6 M4 E1
P M
P1 M1
P2 M2
P3 M1
P4 M1
P5 M3
P6 M4
M E
M1 E1
M2 E3
M3 E2
M4 E1
  • No loss of information
  • Insert, delete and update anomalies are eliminated

R2
R1
MANUFACTURE
20
Boyce/Codd Normal Form
  • A relation is BCNF ? every determinant is a
    candidate key
  • A determinant is an attribute, possibly
    composite, on which some other attribute is fully
    functionally dependent

21
Boyce/Codd Normal Form
S J T
Smith Math Prof. White
Smith Physics Prof. Green
Jones Math Prof. White
Jones Physics Prof. Brown
  1. For each subject (J), each student (S) of that
    subject taught by only one teacher (T) FD S, J
    ? T
  2. Each teacher (T) teaches only one subject (J)
    FD T ? J
  3. Each subject (J) is taught by several teacher
    MVD J? ? T

Relation SJT
  • There exists a relation SJT with attributes S
    (student), J (subject) and T (teacher). The
    meaning of SJT tuple is that the specified
    student is taught the specified subject by the
    specified teacher.

22
Boyce/Codd Normal Form
  • There are two determinants (S, J) and T in
    functional dependency
  • Anomalies in update
  • If the fact that Jones studies physics is
    deleted, the fact that Professor Brown teaches
    physics is also lost. It is because T is a
    determinant but not a candidate key

23
Boyce/Codd Normal Form
S J
Smith Math
Smith Physics
Jones Math
Jones Physics
T J
Prof. White Math
Prof. Green Physics
Prof. Brown Physics
Relation TJ
Relation ST
Relations (S, J) and (T, J) are in BCNF because
all determinants are candidate keys.
24
Multi-valued Dependency
  • Given a relation R with attributes A, B and C.
    The multi-valued dependence R.A ?? R.B holds ?
    the set of B-values matching a given (A-value,
    C-value) pair in R depends only on the A-value
    and is independent of the C-value

25
Fourth Normal Form
  • A relation is in 4NF ? whenever there exists an
    multi-valued dependence (MVD), say A ?? B, then
    all attributes are also functionally dependent on
    A, i.e. A ? X for all attribute X of the relation

26
Fourth Normal Form
Course Teacher Text
Physics Prof. Green Basic Mechanics
Physics Prof. Green Principles of Optics
Physics Prof. Brown Basic Mechanics
Physics Prof. Brown Principles of Optics
Physics Prof. Black Basic Mechanics
Physics Prof. Black Principles of Optics
Math Prof. White Modern Algebra
Math Prof. White Projective Geometry
Relation CTX (not in 4NF)
27
Fourth Normal Form
  • A tuple (C, T, X) appears in CTX ? course C can
    be taught by teacher T and uses X as a reference.
    For a given course, all possible combinations of
    teacher and text appear that is, CTX satisfies
    the constraint if tuples (C, T1, X1), (C, T2,
    X2) both appears, then tuples (C, T1, X2), (C,
    T2, X1) both appears also

28
Fourth Normal Form
  • CTX contains redundancy
  • CTX is in BCNF as there are no other functional
    determinants
  • But CTX is not in 4NF as it involves an MVD that
    is not an FD at all, let alone an FD in which the
    determinant is a candidate key

29
Anomalies in insert
  • For example, to add the information that the
    physics course uses a new text called Advanced
    Mechanism, it is necessary to create three new
    tuples, one for each of the three teachers.

30
Fourth Normal Form
Course Teacher
Physics Prof. Green
Physics Prof. Brown
Physics Prof. Black
Math Prof. White
Course Text
Physics Basic Mechanics
Physics Principles of Optics
Math Modern Algebra
Math Projective Geometry
Relation CT
Relation CX
  • 4NF is an improvement over BCNF, in that it
    eliminates another form of undesirable structure

31
Fifth Normal Form
  • Join dependency relation R satisfies the JD (X,
    Y,Z) ? it is the join of its projections on X,
    Y,Z where X, Y,Z are subsets of the set of
    attributes of R
  • A relation is in 5NF/PJNF (Projection-join normal
    form) ? every join dependency in R is implied by
    the candidate keys of R
  • 5NF is the ultimate normal form with respect to
    projection and join

32
Fifth Normal Form
S P J
S1 P1 J2
S1 P2 J1
S2 P1 J1
S1 P1 J1
S P
S1 P1
S1 P2
S2 P1
J S
J2 S1
J1 S1
J1 S2
P J
P1 J2
P2 J1
P1 J1
SP
Join over P
JS
PJ
Relation SPJ
S P J
S1 P1 J2
S1 P1 J1
S1 P2 J1
S2 P1 J2
S2 P1 J1
Join over (J, S)
  • SPJ is the join of all of its three projections,
  • not of any two!

Spurious
33
Join Dependence constraint
  • Condition JD(join dependence) in relation R(S,
    P, J)
  • Constraint if R1(S, P), R2(P, J) and R3(J,
    S) exists
  • then R(S, P, J) exists

34
Connection Trap
  • Condition Without JD(join dependence) in
    relation (S, P, J)
  • Connect trap if R1(S, P), R2(P, J) and
    R3(J, S) exists
  • then R(S, P, J) may not exist and R1, R2 and
    R3 may not be able to be connected

35
Abnomalies in insert with JD
  • If insert (S1, P1, J2), (S1, P2, J1), and
  • (S2, P1, J1)
  • Then (S1, P1, J1) must also be inserted
  • On the other hand, if one of (S1, P1, J2), (S1,
    P2, J1) and (S2, P1, J1) is deleted, then (S1,
    P1, J1) must also be deleted.

36
Fifth Normal Form (5NF)
JS
PJ
SP
S P
S1 P1
S1 P2
S2 P1
J S
J2 S1
J1 S1
J1 S2
P J
P1 J2
P2 J1
P1 J1
37
Steps in normalization
Unnormalized form
  1. Decompose all data structures that are not 2D
    into 2D relations of segments
  2. Eliminate any partial dependency
  3. Eliminate any transitive dependency
  4. Eliminate any remaining FD in which determinant
    is not a candidate key
  5. Eliminate any MVD
  6. Eliminate any JD that are implied by candidate
    keys

1NF
2NF
3NF
BCNF
4NF
5NF/PJNF
38
Lecture Summary
  • The 1NF, 2NF, 3NF, BCNF, 4NF and 5NF are to split
    the unnormalized table into normalized table(s),
    and which can eliminate data redundancy and
    update abnormality. The higher norm form implies
    the lower norm form.

39
Review Question
  • Explain the differences between Third Normal Form
    and Boyce Codd Normal Form with respect to
    functional dependencies.
  • Why Boyce Codd is called Strong third normal
    form?
  • How can one normalize relations of Third Normal
    Form into Boyce Codd Normal Form?

40
Tutorial Question
  • Describe and derive the unnormal, first, second
    and third normal form for the following unnormal
    form including 12 data fields with 4 of them are
    in repeating groups in a table. Identify the
    functional dependencies of each normal form.

41
Reading Assignment
  • Chapter 10 Functional Dependencies and
    Normalization for Relational Databases and
    Chapter 11 Relational Database Design Algorithms
    and Further Dependencies of Fundamentals of
    Database Systems fifth edition, by Elmasri
    Navathe, Pearson, 2007.
Write a Comment
User Comments (0)
About PowerShow.com