CHAPTER 19 SCHEMA, REFINEMENT AND NORMAL FORMS - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

CHAPTER 19 SCHEMA, REFINEMENT AND NORMAL FORMS

Description:

... a change in the database we have to re-joinR1 and R2 to be able to verify ... The text suggests adding the relation ABC to avoid repetitive construction of ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 37
Provided by: CSU117
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: CHAPTER 19 SCHEMA, REFINEMENT AND NORMAL FORMS


1
CHAPTER 19SCHEMA, REFINEMENT AND NORMAL FORMS
2
INTRODUCTION
  • Having defined all the relational schemas that
    we want in our database, the next step is to
    refine them for so as to (near)optimize them with
    respect to (memory) space and time (of response
    to queries).
  • Much of this optimization can be performed by
    the optimizer software within the DBMS, but the
    actual schema cannot be modified by the
    optimizer, and that is where important
    improvement can be achieved as we shall find out
    in this new chapter.

3
REDUNDANCY AND ITS PROBLEMS (1)
  • Consider the following instance of the
    Hourly_Emps relation which we have seen earlier

It has been assumed in this relation that the
rating implies the hourly wages. This is a
special constraint called a functional dependency
(FD) which is denoted as follows rating ?
hourly_wages. As a result of this constraint, the
above relation contains a certain amount of
redundancy and creates what are called anomalies.
4
REDUNDANCY AND ITS PROBLEMS (2)
  • - Redundancy It will be seen, in the above
    instance, that an hourly_wages of 10 for a
    rating of 8 is specified three times similarly
    an hourly_wages of 7 for a rating of 5 is
    specified twice.
  • Such redundancy represent wasted memory space
    i.e. poor design.
  • - Anomalies There are three kinds of anomalies
  • (1) Update Anomalies If we want to change the
    hourly_wages in the first tuple, we must perform
    similar changes in the second and fifth tuples.
    We could specify an integrity constraint for the
    DBMS to perform such additional changes
    automatically, but it still requires extra
    (wasted) time to perform these additional
    updates.
  • (2) Insertion Anomalies We cannot insert a (new)
    employee tuple unless we know the hourly_wages
    for his rating.
  • (3) Deletion Anomalies If we delete all tuples
    with a given rating, we lose the information that
    this rating corresponds to a certain
    hourly_wages.

5
REDUNDANCY AND ITS PROBLEMS (3)
  • The Possibility of Using Null Values
  • - The use of null values cannot resolve the
    problems of redundant storage and update
    anomalies.
  • - null value can sometimes help in insertion and
    delete anomalies, at least when the null value is
    not forced on a prime key.
  • - Since null values also introduce their own
    problems it is preferable to avoid them if at all
    possible.

6
DECOMPOSITION
  • - All the problems we have described will vanish
    if we decompose our original relation into two
    relations as shown below

N. B. (1) Note that some extra space is now
needed to accommodate the second relation and
that the ratings show up in two places, but the
net saving in a relation containing some hundreds
or thousand of tuples is quite significant. (2)
It should be immediately apparent that it is
generally more efficient to store entity sets
separately from the relations in which they
participate, unless certain other constraints are
present.
7
PREVIEW OF THIS CHAPTER
  • Our goal in this chapter is to develop the tools
    that will
  • (1) permit us to determine when a relational
    schema is subject to the redundancies and
    anomalies we have just witnessed.
  • (2) allow us to refine such deficient schema by
    suitable decomposition, thereby avoiding
    redundancies and anomalies without loss of
    information.

8
THE PROGRAM FOR THIS CHAPTER
  • In order to achieve our stated goals we shall
    proceed as follows
  • (1) Define a new type of ICs called Functional
    Dependencies (FD) which can be used to detect the
    presence of redundancies and anomalies (R/A) in
    relational schemas (henceforth we shall simply
    say schemas).
  • (2) Define a number of normal forms which can be
    used to classify relations with respect to their
    liability to incorporate R/A.
  • (3) Introduce a decomposition technique that
    permits us to replace a relations containing RA
    by equivalent ones devoid of R/A.
  • (4) Discuss other potential problems (loss of FD
    characterization) that may plague the resulting
    relations, and how to deal with such problems

9
FUNCTIONAL DEPENDENCIES
  • - Functional Dependencies (FDs) are ICs that
    characterize relations by generalizing the
    concept of keys.
  • - A functional dependency, X?Y, characterizing a
    relation R (where X and Y are sets of attributes
    of R) states that the attribute set Y is
    completely determined (we say functionally
    determined) by the attribute set X.
  • - Thus if X is a candidate key (or a superkey) Y
    is the entire set of attributes of R.
  • - There can be any number and any mixture of FDs
    that characterize a relation.
  • - The FDs characterizing relations must be
    specified as part of the semantics of these
    relations.
  • - The presence of certain FDs in a relation is
    responsible for the R/A plaguing that relation.
  • - Our job is to locate such FDs and render them
    harmless.

10
EXAMPLE OF FUNCTIONAL DEPENDENCIES
  • Consider a relation Contract described by the
    schema
  • Contracts (contractid, supplierid, projectid,
    depid, partid, qty, value) and denoted CSJDPQV
  • where the meaning of a tuple is that the contract
    identified by
  • contractid C is an agreement that supplier S
    (supplierid) will supply
  • Q items of part P (partid) to project J (projid)
    associated with
  • department D (deptid), the value V ot this
    contract being (value).
  • 1. The fact that the key of this relation is C
    is denoted by the FD
  • C?CSJDPQV which is really an abbreviation for
    the seven FDs
  • (i) C?C, (ii) C?S, (iii) C?J, (iv) C?D, (v) C?P,
    (vi) C?Q, (vii) C?V
  • N. B. The FD (i) C?C is called a trivial FD since
    every attribute is trivially implied by itself.
  • Now suppose we wish to add the following company
    rules
  • 2. A project purchases a given part using a
    single contract JP?C.
  • 3. A department purchases at most one part from a
    supplier SD?P.
  • Thus relation C is characterized by the set of
    FDs C? CSJDPQV, JP? C, SD? P

11
CLOSURE OF A SET OF FDs ARMSTRONG AXIOMS
  • - The set of all FDs implied by a given set F of
    FDs is called the
  • closure of F, denoted F.
  • - In order to check for the presence of R/A we
    need to ascertain the possible
  • presence of other FDs implied by those stated
    explicitly.
  • This means that we have to calculate the closure
    F.
  • - This may be done utilizing the Armstrong Axioms
    which may be stated as follows letting X, Y, Z
    denote sets of attributes of a relation R,
  • Reflexivity If X?Y (i.e. X contains Y) then X?Y.
    (This rule really generates only trivial FDs).
  • Augmentation If X?Y, then XZ?YZ for any Z.
  • Transitivity If X?Y and Y?Z, then, X?Z.
  • It may be proven that
  • (1) Armstrongs axioms are sound, i.e. they
    generate only FDs in F.
  • (2)Armstrongs axioms are complete, i. e. they
    generate all the FDs in F.
  • It is convenient to add the following additional
    rules which may even be considered as denotation
    rules
  • Union If X?Y and X?Z, then X?YZ.
  • Decomposition If X?YZ, then X?Y and X?Z.
  • N. B. Note that these axioms do not imply that
    you may cancel attributes appearing on both
    sides. Thus if AB? BC, then you may not
    conclude that A?B.

12
EXAMPLES OF APPLICATION OF ARMSTRONG AXIOMS (1)
  • Consider the relation ABC with FDs (i) A?B and
    (ii) B?C.
  • 1. From Reflexivity we get all the trivial FDs
    which are of the form
  • X?Y, where Y ? X, X ? ABC and Y ? ABC.
  • 2. Applying transitivity to (i) and (ii) we get A
    ? C.
  • 3. From augmentation we get
  • AC ? BC, AB ? AC, AB ?CB.
  • Thus the closure of the set F of given FDs is
    (apart from trivial FDs)
  • F A?B, B?C, A ? C, AC ? BC, AB ? AC, AB
    ?CB

13
EXAMPLES OF APPLICATION OF ARMSTRONG AXIOMS (2)
  • Consider the previous relation Contracts that is
    characterized by the set of FDs (i) C ?
    CSJDPQV, (ii) JP ?C, (iii) SD ? P.
  • 1. From (ii), (i), and transitivity we get (iv)
    JP ? CSJDPQV.
  • 2. From (iii) and augmentation we get (v) SDJ ?
    JP.
  • 3. From (v), (iv) and transitivity we get (vi)
    SDJ ? CSDJPQV.
  • 4. From (i) and decomposition we can get
  • C ?S, C ? J, C ? D, C ? P,C ? Q, C ? V.
  • N. B. We have not included trivial dependencies
    in the above
  • derivations.

14
ATTRIBUTE CLOSURE
  • Constructing the closure of a set of FDs may be
    fairly laborious. It
  • may be avoided when one wishes to check what are
    the possible
  • right-hand sides of an FD X ? Y, for a given X,
    by means of the
  • following algorithm which calculates the
    so-called attribute closure,
  • denoted X, of a set X A1, A2, ,An of
    attributes, with respect
  • to the set F of FDs.
  • 1. Let X be a set of attributes that eventually
    will become the closure. First we initialize X to
    be A1, A2, ,An.
  • 2. We repeatedly search for some FD B1 B2 Bm ?
    C such that all of B1 B2 Bm are in the set of
    attributes X, but C is not. We then add C to the
    set X.
  • 3. Repeat step 2 as many times as necessary until
    no more new
  • attributes can be added to X.
  • 4. The final set X is the correct value of A1,
    A2, ,An.

15
EXAMPLE OF ATTRIBUTE CLOSURE COMPUTATION
  • Given the previous Contracts relation
    characterized by the FDs
  • (i) C ? CSJDPQV, (ii) JP?C, (iii)
    SD?P
  • Suppose we wish to get the attribute closure of
    JP, i.e. (JP)
  • 1. Initialize the closure (X) as JP.
  • 2. (i) does not satisfy the requirement that the
    left side be in JP.
  • (ii) does, therefore we set (X) (X) ? C
    JPC.
  • (iii) does not.
  • We now repeat step 2
  • 2. (i) now does satisfy the requirement that the
    left side be in JP, therefore we set (X) (X)
    ? CSJDPQV JPCSDQV.
  • (ii) and (iii) add nothing new. Repeating step 2
    does not change (X).
  • Therefore we stop having obtained (JP)
    JPCSDQV.

16
NORMAL FORMS (1)
  • The following Normal Forms are used in the RMD to
    characterize relations
  • First Normal Form (1NF).
  • Second Normal Form (2NF).
  • Third Normal Form (3NF).
  • Boyce-Codd Normal Form (BCNF).
  • Later we shall also consider
  • Fourth Normal Form (4NF)
  • Fifth Normal Form (5NF)

17
NORMAL FORMS (2)
  • First Normal Form (1NF)
  • - A relation is said to be in 1NF if it has only
    single-valued attributes.
  • (This is such a pillar of the RMD that we shall
    assume it is always true for all the relations we
    encounter).
  • Second Normal form (2NF)
  • - A relation is said to be in 2NF if it is in 1NF
    and contains partial dependencies
  • (Actually, 2NF is really of historic interest
    only).
  • Third Normal Form (3NF)
  • - A relation is said to be in 3NFif it is in 2NF
    and, for every FD X?A in F, one of the following
    statements is true
  • A?X i. e. it is a trivial FD, or
  • X is a superkey (a superkey is a set of
    attributes which contains a key), or
  • A is part of some key for R.
  • Boyce-Codd Normal Form (BCNF)
  • - A relation is said to be in BCNF if it is in
    2NF and one of the following two statements is
    true
  • A?X i. e. it is a trivial FD, or
  • X is a superkey.

18
NORMAL FORMS (3)
  • The successive containment of Normal Forms may be
    illustrated by the following diagram

BCNF
3NF
2NF
1NF
19
DIAGRAMATIC VIEW OF FDs
PARTIAL DEPENDENCIES ? IN 2NF, BUT NOT 3NF
TRANSITIVE DEPENDENCIES ? IN 3NF BUT NOT IN BCNF
X IS A SUPERKEY ? IN BCNF
20
DECOMPOSITION THE REQUIREMENTS
  • - In order to remove the R/A from a relation it
    is necessary to decompose it into two or more
    relations utilizing one of the FDs responsible
    for the R/A as a pivot point of this
    decomposition.
  • - The decomposition must be lossless, i.e. the
    final relations must contain exactly the
    information contained in the original relation,
    without losing any nor adding any other ones.
  • - The decomposition must be dependency-preserving,
    i.e. the final relations must be characterized
    by the same FDs as the original relation so as to
    permit their verification when modifications are
    requested.
  • - The above demands lead to the requirement that
    the final relations be in BCNF if at all
    possible, or in 3NF otherwise.
  • - We shall now examine some examples drawn from
    odd-numbered problems in the text.

21
THE DECOMPOSITION ALGORITHM
  • 1. Check which normal form the relation is in.
  • 2. Locate the FDs responsible for it not being in
    BCNF.
  • 3. Select one of the responsible FDs.
  • 4. Decompose the relation using the selected
    responsible FD, say A?B, as the pivot of the
    decomposition the decomposition yields two
    separate relations, say R1 and R2R1 has for
    attributes all the attributes occurring in the
    pivot FD i. e. R1(AB) and no other R2 has for
    attributes the set A.
  • 5. Check whether the resulting relations are in
    BCNF. If they are, go to step 6 otherwise repeat
    from step 2 on.
  • 6. Check whether the final relations are
    dependency preserving. If they are, stop if they
    are not, you may have several choices
  • (i) The problem may disappear if you select some
    alternate responsible FD try all permissible
    combinations.
  • (ii) If the problem persist you may choose to
    leave it in 3NF, or
  • (iii) you may decide to add some other relation
    to those you have obtained in order to preserve
    dependencies,

22
EXAMPLE 1 (19.7-1)
  • Consider a relation R (A B C D) with FDs
  • (i) C ? D, (ii) C ? A, (iii) B ? C
  • - The only candidate key is B.
  • - (i) and (ii) are partial dependencies, so R is
    not even in 2NF.
  • - pick (ii) as the pivot the decomposition is
    now as follows
  • R (A B C D )

C ?A
R1(CA) R2(BCD) with C?A
with C?D and B?C
R2 is not in BCNF ?decompose with (ii) as pivot.
C?D
R21(CD) R22(BC) with C?D
with B?C
R1, R21, and R22 are now in BCNF ?lossless
decomposition and dependencies are preserved.
23
EXAMPLE 1 (19.7-1) (contd)
  • Suppose we now use (i) as the first pivot. We
    would get
  • R(ABCD)

C?D
R1(CD) R2(ABC) with
C?D with C?A and B?C
B?C
R21(CA) R22(BC) with C?A
with B?C
Thus, in this case, the final results are similar.
24
EXAMPLE 2 (19.7-3)
  • Consider R(ABCD) with (i) ABC?D and (ii) D?A.
  • -The candidate key is (ABC)
  • - R is in 3NF because of (ii)
  • - Decomposing R using (ii) as pivot
  • R(ABCD)

D?A
R1(DA) R2(BCD) with
D?A
The final relations R1 and R2 are both in BCNF.
However, the decomposition is not
dependency-preserving for FD (i) is not
preserved whenever we make a change in the
database we have to re-joinR1 and R2 to be able
to verify that (i) is preserved. Therefore we
have the following choice (a) leave R in 3NF as
it was originally with its R/A or (b) decompose
it into R1 and R2 and join them whenever, there
is a change.
25
EXAMPLE 3 (19.7-5)
  • Consider R(ABCD) with (i) AB?C, (ii) AB?D, (iii)
    C?A, (iv) D ?B
  • - The candidate keys are now AB, BC, CD, AD.
  • - Because of (iii) and (iv), R is in 3NF but not
    in BCNF.
  • - Decompose using (iii) as pivot
  • R(ABCD)

C?A
R1(CA) R2(BCD) with
C?A with D?B
D?B
R21(DB) R22(CD) with
D?B
The three final relations are in BCNF, but the
decomposition is not dependencypreserving since
(i) and (ii) cannot be associated with any of the
three final relations. The text suggests adding
ABC and ABD to R1, R21 and R22. That is an
expensive solution!
26
EXAMPLE 3 (19.10-1)
  • Consider R(ABCD) with (i) B?C, (ii) D?A.
  • Questions (a) Candidate keys? (b) Is
    decomposition into BC and AD good or bad? Why?
  • Answers (a) BD
  • (b) Both (i) and (ii) are partial dependencies.
    Thus R is in 2NF.
  • Therefore a decomposition is in order.
  • - Following our decomposition algorithm, we
    should decompose using either (i) or (ii) as
    pivots.
  • - Using (i) as pivot we get R(ABCD)

B?C
R1(BC) R2(ABD) with B?C
with D?A
D?A
This decomposition is good It is lossless and
dependency- preserving.
R21(DA) R22(BD) with D?A

27
EXAMPLE 3 (19.10-1) (contd)
  • However, decomposition into BC and AD is
    unsatisfactory because it
  • is not lossless it is lossy! Indeed when we
    join BC and AD we get
  • their cartesian product because they have no
    attribute in common,
  • and cartesian products are generally much larger
    than join.
  • Example Let R

A B C D
a1 b1 c1 d1
a2 b2 c2 d2
R(BC)
B C
b1 c1
b2 c2
R(AD)
A D
a1 d1
a2 d2
Not equal
R(AD) ? R(BC)
A B C D
a1 b1 c1 d1
a1 b1 c2 d2
a2 b2 c1 d1
a2 b2 c2 d2
28
LOSSLESS DECOMPOSITIONS
  • A useful test for lossless decomposition is given
    in the text by
  • Theorem 3 (p.620)Let R be a relation and F be a
    set of FDs that hold over R. The decomposition
    into relations with attribute sets R1 and R2 is
    lossless if and only if F contains
  • either the FD R1 n R2 ? R1 or the FD R1 n R2 ?
    R2.
  • This theorem yields the following lemma which we
    have been using in our decomposition
  • Lemma (p. 620) If an FD X?Y holds over a
    relation R and X n Y is empty, the decomposition
    of R into R Y and XY is lossless.
  • N.B. Do not assume that a lossless decomposition
    and a decomposition yielding BCNF relations are
    synonymous.

29
EXAMPLE 3 (19.10-2)
  • Consider R(ABCD) with (i) AB?C, (ii) C?A,
    (iii) C?D.
  • Questions (a) Candidate keys (b) Is
    decomposition into
  • R1 ACD and R2 BC
  • good or not? Why?
  • Answers
  • (a) Candidate keys AB, BC.
  • (b) Consider the losslessness decomposition test
  • Here R1 ACD and R2 BC ?R1 n R2 C thus R1 n
    R2 ? R1
  • corresponds to C ? ACD which is in F since it is
    derivable from
  • C?C (trivial FD), C?A (ii), C?D (iii).
  • Thus theorem 3 is satisfied and the proposed
    decomposition is lossless.
  • - Note that this is a BCNF decomposition because
  • - R1 is characterized by the FDs (ii) C?A and
    (iii) C?D so that C
  • is the key in R1 and the two FDs follow BCNF
    requirements.
  • - R2 has the key BC and thus also satisfies BCNF
    conditions,

30
EXAMPLE 3 (19.10-2) )contd)
  • The decomposition is thus as follows

  • R(ABCD)
  • with AB?C, C?A, C?D

N.B. No pivot FD.
R1(ACD) R2(BC)
with C?A and C?D
However, the decomposition is not
dependency-preserving since the FD AB?C cannot
be included in either relation. The text suggests
adding the relation ABC to avoid repetitive
construction of joins to check for violations of
the missing FD.
31
DEPENDENCY-PRESERVING DECOMPOSITIONS
  • - Definition of the projection of a set F of
    dependencies
  • Let R be a relation decomposed into two schemas
    with attribute sets
  • X and Y, and let F be a set of FDs over R.
  • The projection of F on X is the set of FDs in F
    that involve only
  • attributes in X. It is denoted as FX .
  • - Definition of dependency-preserving
    decomposition
  • The decomposition of relation R with the set F of
    FDs into two relations with attribute sets X and
    Y is dependency-preserving if
  • (FX ? FY) F .

32
ANOTHER LOOK AT DECOMPOSITION
  • So far we have seen lossless-join decomposition
    that were not dependency-preserving, The opposite
    can also easily occur dependency-preserving
    decomposition which are not lossless-join.
  • Example Consider the relation R (ABC) with FD
    A?B.
  • If we decompose it into R1(AB) and R2(BC), we
    have a dependency-preserving decomposition,
    however it is not lossless-join.
  • Note that has the key AC
  • To find a lossless-join decomposition we can use
    our pivot point algorithm

  • R(ABC)

  • with A?B

A?B
R1(AB) R2(AC)
with A?B
33
DECOMPOSITION INTO 3NF (1)
  • The decomposition techniques we have developed
    for BCNF also work for 3NF. However, here we can
    develop an algorithm to decompose a relation into
    a collection of 3NF relations which are both
    lossless-join and dependency-preserving.
  • The desired algorithm depends on the use of a
    minimal cover for a set of FDs, which is defined
    as follows
  • Definition
  • A minimal cover for a set F of FDs is a set G of
    FDs such that
  • 1. Every FD in G is of the form X?A, where A is a
    single attribute.
  • 2. The closure of F and G are equal F G .
  • 3. If we obtain a se H of FDs from G by deleting
    one or more FDs, or by deleting attributes from
    an FD in G, then, F ? H.

34
DECOMPOSITION INTO 3NF (2)
  • Example
  • Let F (i) A?B, (ii) ABCD?E, (iii) EF?G, (iv)
    EF?H, (v) ACDF?EG.
  • 1. rewrite (v) as (v) ACDF?E, (vi) ACDF?G.
  • 2. Consider (vi). It is implied by (i), (ii), and
    (iii) ?it can be deleted.
  • Similarly we ca delete (v).
  • 3. Consider (ii) since (i) holds, we can replace
    (ii) with (vii) ACD?E
  • Finally, a minimal cover for F is the set
  • A?B, ACD?E, EF?G, EF?H.
  • These operations are easily transformed into an
    algorithm which appears on p. 626.
  • We can now give an algorithm for putting
    relations into 3NF with lossless-join and
    dependency preservation.

35
DECOMPOSITION INTO 3NF (3)
  • Algorithm for obtaining a lossless-join,
    dependency preserving
  • decomposition into 3NF relations from a relation
    R with a set F of
  • FDs that is a minimal cover
  • Let R1, R2, , Rn be the desired decomposition
    where Ri is in 3NF and let Fi denote the
    projection of F onto the attributes of Ri.
  • 1.Identify the set N of FDs in F that is not
    preserved, i. e. not included in the closure of
    the union of Fis.
  • 2.For each FD X?A in N, create a relation XA and
    add it to the decomposition of R.

36
EXAMPLE OF DEPENDENCY PRESERVATION
  • Example
  • - Consider a relation R (A, B, C) with F A ?
    B , B ? C, C ? A
  • Now, we decompose R into two relations R1 and
    R2 using A ? B
  • R1 (A, B) with FD A ? B and
    R2 (B, C) with FD B ? C
  • Question Is this decomposition
    dependency-preserving?
  • - At first sight, it may appear that the answer
    is 'no' for the FD, C ? A, is associated with
    neither R1 nor R2.
  • - However, if we apply the test described above,
    we find
  • FAB A ? B , B ? A and FBC B ? C
    , C ? B ,
  • as the second element in each set is implied by
    application of transitivity to the original FDs.
  • Thus, (FAB ? FBC) F. Therefore, the
    decomposition is dependency-preserving.
  • - But then, where is the FD, C ? A, in the above
    decomposition? The answer is that we should have
    really written the decomposed schemas as
  • R1 (A, B) with FAB A ? B, B ? A and R2
    (B, C) with FBC B ? C, C ? B
  • and C ? A is automatically preserved when the
    other FDs are preserved.
Write a Comment
User Comments (0)
About PowerShow.com