CS 453: Database Systems - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

CS 453: Database Systems

Description:

Provides a set of guidelines, does not result in a unique database schema. Does not provide a way of ... Person1 (SSN, Name, Address) Hobbies (SSN, Hobby) ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 49
Provided by: Ghem
Category:

less

Transcript and Presenter's Notes

Title: CS 453: Database Systems


1
CS 453 Database Systems
Chapter 4 Relational Normalization
2
Chapter outline
  • Introduction
  • Limitations of E-R Designs (Redundancy,
    Anomalies)
  • Schema Refinement (Decomposition)
  • Relational Normalization Theory
  • Functional Dependencies (FD)
  • Armstrongs Axioms for FD
  • Derived inference rules
  • Entailment, Closure, Equivalence
  • Normal Forms (1NF, 2NF, 3NF, BCNF)
  • Decomposition

3
Limitations of E-R Designs
  • Provides a set of guidelines, does not result in
    a unique database schema
  • Does not provide a way of evaluating alternative
    schemas
  • Normalization theory provides a mechanism for
    analyzing and refining the schema produced by an
    E-R design

4
Problem Redundancy
  • Dependencies between attributes cause redundancy
  • Eg. All addresses in the same town have the same
    zip code

5
Problem Example
ER Model
Relational Model
6
Problem Anomalies
  • Redundancy leads to anomalies
  • Update anomaly A change in Address must be made
    in several places
  • Deletion anomaly Suppose a person gives up all
    hobbies. Do we
  • Set Hobby attribute to null? No, since Hobby is
    part of key
  • Delete the entire row? No, since we lose other
    information in the row
  • Insertion anomaly Hobby value must be supplied
    for any inserted row since Hobby is part of key

7
Solution Decomposition
  • Solution use two relations to store Person
    information
  • Person1 (SSN, Name, Address)
  • Hobbies (SSN, Hobby)
  • The decomposition is more general people with
    hobbies can now be described
  • No update anomalies
  • Name and address stored once
  • A hobby can be separately supplied or deleted

8
Normalization Theory
  • Result of E-R analysis need further refinement
  • Appropriate decomposition can solve problems
  • The underlying theory is referred to as
    normalization theory and is based on functional
    dependencies (and other kinds, like multi-valued
    dependencies)

9
Functional Dependence
  • Existence dependence The existence of B depends
    on A
  • Functional dependence Bs value depends on As
    value
  • EmpName is functionally dependent on EmpNo
  • Given the EmpNo, I can one and only one value of
    EmpName
  • Constraints on the set of legal relation
    instances
  • Require that the value for a certain set of
    attributes determines uniquely the value for
    another set of attributes.
  • Functional dependence is a generalization of the
    notion of a key

10
Functional Dependencies
  • Definition A functional dependency (FD) on a
    relation schema R is a constraint X ? Y, where X
    and Y are subsets of attributes of R
  • An FD X ? Y is satisfied in an instance r of R if
    for every pair of tuples, t1 and t2 if t1 and
    t2 agree on all attributes in X then they must
    agree on all attributes in Y
  • Key constraint is a special kind of functional
    dependency all attributes of relation occur on
    the right-hand side of the FD
  • SSN ? SSN, Name, Address

11
Functional Dependencies
  • Let R be a relation schema ? ? R, ? ? R
  • The functional dependency ? ? ?
  • Definition
  • If two tuples agree on the attributes
  • then they must also agree on the attributes
  • Formally

R ( A, B, C, D, E ) ? A, B, C ? C, D
A1 , A2 , An
B1 , B2 , Bm
12
Functional Dependencies
  • holds on R if and only if for any legal relation
    r(R), whenever any two tuples t1 and t2 of r
    agree on the attributes ?, they also agree on the
    attributes ?. That is,
  • t1? t2? ? t1? t2?
  • True for all tuple pairs
  • True for all instances

13
Examples
  • EmpID Name, Phone, Position
  • Position Phone
  • but Phone Position

14
In General
  • To check A ? B, erase all other columns
  • check if the remaining relation is many-one
    (called functional in mathematics)

15
Typical Examples of FDs
Product name ? price, manufacturerPerson
ssn ? name, ageCompany name ?
stockprice, president
  • loan-number ? amountloan-number ?
    branch-nameloan-number ? customer-name

?
Another example reverse of the fds above
16
Alternative Definitions of Keys
  • K is a superkey for relation schema R if and only
    if K ? R
  • This is the uniqueness property of key
  • K is a candidate key for R if and only if
  • K ? R, and
  • there is no ? ? K, ? ? R ?make sure key is
    shortest possible (minimality)

17
Use of Functional Dependencies
  • We use functional dependencies to
  • test relations to see if they are legal under a
    given set of functional dependencies. If a
    relation r is legal under a set F of functional
    dependencies, we say that r satisfies F.
  • Specify constraints on the set of legal
    relations we say that F holds on R if all legal
    relations on R satisfy the set of functional
    dependencies F.

A specific instance of a relation schema may
satisfy a functional dependency even if the
functional dependency does not hold on all legal
instances. For example, a specific instance of
Loan-schema may, by chance, satisfy loan-number ?
customer-name.
18
Armstrongs Axioms for FDs
  • This is the syntactic way of computing/testing
    the various properties of FDs
  • Reflexivity If Y ? X, then X ? Y (trivial FD)
  • Name, Address ? Name
  • Augmentation If X ? Y, then X Z ? YZ
  • If Town ? Zip then Town, Name ? Zip, Name
  • Transitivity If X ? Y and Y ? Z, then X ? Z

19
Derived inference rules
  • These additional rules are not essential their
    soundness can be proved using Armstrongs Axioms
  • Union if X Y and X Z, then X YZ
  • Decomposition if X YZ, then X Y and X Z
  • Pseudo-transitivity if X Y and WY Z, then WX
    Z
  • Exercise Prove rules Decomposition and
    Pseudo-transitivity using A.A

20
Entailment, Closure, Equivalence
  • Definition If F is a set of FDs on schema R and
    f is another FD on R, then F entails f if every
    instance r of R that satisfies every FD in F also
    satisfies f
  • Ex F A ? B, B? C and f is A ? C
  • If Streetaddr ? Town and Town ? Zip then
    Streetaddr ? Zip
  • Definition The closure of F, denoted F, is the
    set of all FDs entailed by F
  • Definition F and G are equivalent if F entails G
    and G entails F

21
Closure of a Set of Functional Dependencies
  • Given a set of functional dependencies F, there
    are certain other functional dependencies that
    are logically implied by F.
  • The set of all functional dependencies logically
    implied by F is the closure of F.
  • We denote the closure of F by F.
  • We can find all of F by applying Armstrongs
    Axioms
  • if ? ? ?, then ? ? ? (reflexivity)
  • if ? ? ?, then ?? ? ?? (augmentation)
  • if ? ? ? and ?? ?, then ? ? ? (transitivity)
  • these rules are sound and complete

22
Derived inference rules for F
  • We can further simplify computation of F by
    using the following additional rules.
  • If ? ? ? holds and ? ? ? holds, then ? ? ?? holds
    (union)
  • If ? ? ?? holds, then ? ? ? holds and ? ? ? holds
    (decomposition)
  • If ? ? ? holds and ?? ? ? holds, then ?? ? ?
    holds (pseudo-transitivity)
  • The above rules can be inferred from Armstrongs
    axioms.
  • E.g., ? ? ?, ?? ? ? (given)
  • ?? ? ?? (by augmentation)
  • ?? ? ? (by transitivity)

23
Exercise
  • Given loan-no? amount
  • Does loan-no, branch-name ? amount
  • Why???
  • It is not covered by any of the above axioms, so
    we must derive it
  • loan-no, branch-name ? loan-no (reflexivity)
  • loan-no? amount (given)
  • loan-no, branch-name ? amount (transitivity)

24
Examples of Armstrongs Axioms
  • We can find all of F by applying Armstrongs
    Axioms
  • if ? ? ?, then ? ? ? (reflexivity)loan-no ?
    loan-no loan-no, amount ? loan-noloan-no,
    amount ? amount
  • if ? ? ?, then ?? ? ?? (augmentation)loan-no ?
    amount (given)loan-no, branch-name ? amount,
    branch-name
  • if ? ? ? and ?? ?, then ? ? ? (transitivity)loan-
    no ? branch-name (given) branch-name ?
    branch-city (given)loan-no ? branch-city

25
Example
  • R (A, B, C, G, H, I)F A ? B A ? C CG
    ? H CG ? I B ? H
  • some members of F
  • A ? H
  • by transitivity from A ? B and B ? H
  • AG ? I
  • by augmenting A ? C with G, to get AG ? CG
    and then transitivity with CG ? I
  • CG ? HI
  • from CG ? H and CG ? I union rule can be
    inferred from
  • definition of functional dependencies, or
  • Augmentation of CG ? I to infer CG ? CGI,
    augmentation ofCG ? H to infer CGI ? HI, and
    then transitivity

26
Generating F
F AB? C
AB? BCD A? D AB? BD
AB? BCDE AB? CDE D? E
BCD ? BCDE
union
decomp
aug
trans
Aug,ref
Thus, AB? BD, AB ? BCD, AB ? BCDE, and AB ? CDE
are all elements of F
27
Attribute Closure
  • Calculating attribute closure leads to a more
    efficient way of checking entailment
  • The attribute closure of a set of attributes, X,
    with respect to a set of functional dependencies,
    F, (denoted XF) is the set of all attributes,
    A, such that X ? A
  • X F1 is not necessarily the same as X F2 if F1
    ? F2
  • Attribute closure and entailment
  • Algorithm Given a set of FDs, F, then X ? Y if
    and only if XF ? Y

28
Closure of Attribute Sets
  • Define the closure of ? under F (denoted by ?)
    as the set of attributes that are functionally
    determined by ? under F ? ? ? is in F ? ? ?
    ?Given loan-noIf loan-no ? amountthen amount
    is part of loan-no I.e., loan-no
    loan-no,amount, If loan-no ? branch-namethen
    branch-name is part of loan-no I.e., loan-no
    loan-no,amount, branch-name If loan-no ?
    customer-name then continue .Else stop

? is a set of attributes
29
Example Computing Attribute Closure XF
X XF A A, D, E AB
A, B, C, D, E (Hence AB is
a key) B B D D, E
F AB ? C A ? D D ? E AC
? B
Is AB ? E entailed by F? Yes Is D? C
entailed by F? No Result XF allows us
to determine FDs of the form X ? Y entailed by F
30
Computation of Attribute Closure XF
closure X // since X ?
XF repeat old closure if there is an
FD Z ? V in F such that Z ?
closure and V ? closure then closure
closure ? V until old closure If T ?
closure then X ? T is entailed by F
31
Example Computation of Attribute Closure
Problem Compute the attribute closure of AB with
respect to the set of FDs
AB ? C (a) A ? D (b)
D ? E (c) AC ? B (d)
Solution
Initially closure AB Using (a) closure
ABC Using (b) closure ABCD Using (c)
closure ABCDE
32
Normal Forms
  • Each normal form is a set of conditions on a
    schema that guarantees certain properties
    (relating to redundancy and update anomalies)
  • First normal form (1NF) is the same as the
    definition of relational model (relations sets
    of tuples each tuple sequence of atomic
    values)
  • Second normal form (2NF) a research lab
    accident has no practical or theoretical value
    wont discuss
  • The two commonly used normal forms are third
    normal form (3NF) and Boyce-Codd normal form
    (BCNF)

33
First Normal Form 1NF
  • Atomicity is actually a property of how the
    elements of the domain are used.
  • E.g. Strings would normally be considered
    indivisible
  • Suppose that students are given roll numbers
    which are strings of the form CS0012 or EE1127
  • If the first two characters are extracted to find
    the department, the domain of roll numbers is not
    atomic
  • Doing so is a bad idea leads to encoding of
    information in application program rather than in
    the database

34
First Normal Form 1NF
  • A relation is said normalized (or in first normal
    form 1NF), if and only if it possesses a key and
    that every value of column is atomic
  • Example The relation TYPE_OF_BOOK following is
    not 1NF
  • Possible normalization

35
Second Normal Form 2NF
  • A relation is said in second normal form (2NF),
    if and only if it is 1NF and that every column
    non key depends elementarily on the primary key
  • Example Let's consider the relation R(A, B, C,
    D) with the following functional dependencies
  • A ? B A, D ? C B ? C
  • The graph of the functional dependencies is the
    next one
  • Transitive closing ?
  • Key of the relation ?
  • Relation in second normal form?

36
Second Normal Form 2NF
The relation is not 2NF because B that doesn't
belong at the key depends on a part of the key
that is A, D
R1(A, B) A ? B R2(A, D, C) A, D ? C R3(B, C)
B ? C
Decomposition in 3 relations 2NF
37
Third Normal Form 3NF
  • A relation is said in third normal form (3NF), if
    and only if it is 2NF and that none of its
    attributes non keys depends on another attribute
    non key
  • Example Let's consider the relation R(A, B, C,
    D) with the following functional dependencies
  • A ? B B ? C C ? D
  • The graph of the functional dependencies is the
    next one
  • Transitive closing ?
  • Key of the relation ?
  • Relation in third normal form?

38
Third Normal Form 3NF
The relation is not 3NF because C depends on an
attribute non key (D also)
R1(A, B) A ? B R2(B, C) B ? C R3(C, D) C ? D
Decomposition in 3 relations 3NF
39
Boyce-Codd Normal Form BCNF
  • Let's consider relation Wines (Raw, Country,
    Region) with the supposes functional dependences
  • Region ? Country
  • (Raw, Country) ? Region
  • This relation is well in third normal form
    because no attribute non key doesn't depends on a
    part of the key or on an attribute non key.
    However, one finds numerous redundancies

40
Boyce-Codd Normal Form BCNF
  • In order to eliminate these redundancies, Boyces
    and Codds introduced a normal form that carries
    their names (Boyce Codd Normal Form / BCNF)
  • A relation is in BCNF if and only if the only
    elementary functional dependencies are those in
    which a key determines an attribute
  • Relation Wines will be able to be decomposed in
    two relations 
  • Raw (Raw, Region)
  • Regions (Region, Country)
  • The functional dependence (Raw, Country) ? Region
    is lost but it can be recomposed by joint.

41
Boyce-Codd Normal Form BCNF
  • Definition A relation schema R is in BCNF if for
    every FD X ? Y associated with R either
  • Y ? X (i.e., the FD is trivial) or
  • X is a superkey of R
  • Example Person1(SSN, Name, Address)
  • The only FD is SSN ? Name, Address
  • Since SSN is a key, Person1 is in BCNF
  • (non) BCNF Examples Person (SSN, Name, Address,
    Hobby)
  • The FD SSN ? Name, Address does not satisfy
    requirements of BCNF
  • since the key is (SSN, Hobby)

42
Comparison of BCNF and 3NF
  • A relational schema R is in 3NF if for every FD
    X ? Y associated with R either
  • Y ? X (i.e., the FD is trivial) or
  • X is a superkey of R or
  • Every A ? Y is part of some key of R
  • 3NF is weaker than BCNF (every schema that is in
    BCNF is also in 3NF)

BCNF conditions
43
Redundancy
  • Suppose R has a FD A ? B. If an instance has 2
    rows with same value in A, they must also have
    same value in B (gt redundancy, if the A-value
    repeats twice)
  • If A is a superkey, there cannot be two rows with
    same value of A
  • Hence, BCNF eliminates redundancy

SSN ? Name, Address SSN Name
Address Hobby 1111 Joe 123 Main
stamps 1111 Joe 123 Main coins
redundancy
44
Decompositions
  • Goal Eliminate redundancy by decomposing a
    relation into several relations in a higher
    normal form
  • Decomposition must be lossless it must be
    possible to reconstruct the original relation
    from the relations in the decomposition
  • We will see why

45
Decomposition
  • Schema R (R, F)
  • R is a set of attributes
  • F is a set of functional dependencies over R
  • The decomposition of schema R is a collection of
    schemas Ri (Ri, Fi) where
  • R ?i Ri for all i (no new attributes)
  • Fi is a set of functional dependences involving
    only attributes of Ri
  • F entails Fi for all i (no new FDs)
  • The decomposition of an instance, r, of R is a
    set of relations ri ?Ri(r) for all i

46
Example Decomposition
Schema (R, F) where R SSN, Name,
Address, Hobby F SSN? Name, Address
can be decomposed into R1 SSN, Name,
Address F1 SSN ? Name, Address and
R2 SSN, Hobby F2
47
Relational Database Design Review
  • Two approaches to DB design
  • 1) Design ER model, then translate to relation
    schemes
  • 2) Put every attribute together in one relation,
    identify all the functional dependencies, and
    then decompose into 3NF at least
  • The first approach is more popular, but
    relational theory helps formalizing some concepts
    such as key (what does it mean by A key uniquely
    identifies the tuples?)
  • Identifying the FDs is part of the DB design
    process it helps you understand the requirements
    better

48
Relational Normalization Summary
  • Normalization theory provides a mechanism for
    analyzing and refining the schema produced by an
    E-R design.
  • Result of E-R analysis need further refinement
  • Eliminate redundancy by decomposing a relation
    into several relations in a higher normal form
  • The underlying theory is referred to as
    normalization theory and is based on functional
    dependencies
  • ? ? ?, if and only if, ? t1 and t2 in r(R) t1?
    t2? ? t1? t2?
  • Armstrongs Axioms for FD
  • Derived inference rules
  • Entailment, Closure, Equivalence
  • Each normal form is a set of conditions on a
    schema that guarantees certain properties
    (relating to redundancy and update anomalies)
  • The two commonly used normal forms are 3NF and
    BCNF
  • Every schema that is in BCNF is also in 3NF
Write a Comment
User Comments (0)
About PowerShow.com