Title: Relational Database Design Chapter 8 [1 of 2]
1Relational Database DesignChapter 8 1 of 2
- Features of Good Relational DesignsAtomic
Domains and First Normal FormDecomposition Using
Functional DependenciesFunctional-Dependency
Theory
2 Approaches toDesigning a Set of Relations
- The 1st approach is (Chapter 7) a Top-Down Design
(Relational Design by Analysis) - 1. Designing a conceptual schema in a high-level
data model, such as the EER model - 2. Mapping the conceptual schema into a set of
relations using mapping procedures.
3 Approaches toDesigning a Set of Relations
- The 2nd approach is (Chapter 8) a Bottom-Up
Design (Relational Design by Synthesis) - 1. First construct a minimal set of FDs
- Assumes that all possible functional dependencies
are known. - 2. a normalization algorithm is applied to
construct a target set of 3NF or BCNF relations. - start by one large relation schema, called the
universal relation, which includes all the
database attributes. - then repeatedly perform decomposition until it is
no longer feasible or no longer desirable, based
on the functional and other dependencies
specified by the database designer.
4Well-Structured Relations
- A relation that contains minimal data redundancy
- Allows users to insert, delete, and update rows
without causing data inconsistencies - Goal is to avoid "anomalies"
General rule of thumb A table should not pertain
to more than one entity type
5Anomalies
- An anomaly is an inconsistent, incomplete, or
contradictory state of the database - Insertion anomaly user is unable to insert a
new record when it should be possible to do so - Deletion anomaly when a record is deleted,
other information that is tied to it is also
deleted - Update anomaly a record is updated, but other
appearances of the same items are not updated
6Example
QuestionIs this a relation?
AnswerYes Unique rows
QuestionWhats the primary key?
AnswerComposite Emp_ID, Course_Title
7Anomalies in this Table
- Insertionwe need to specify a class when adding
a new person, need to add person when adding a
class - Deletionif we remove employee 140, we lose
information about the existence of a Tax Acc
class - Modificationgiving a salary increase to employee
100 forces us to update multiple records
Why do these anomalies exist? Because there are
two themes (entity types) in this one relation.
This results in data duplication and an
unnecessary dependency between the entities
8Example of Bad Design
DRINKER(name, addr, beersLiked, manf,
favBeer) name addr beersLiked manf favBeer Jane
way Voyager Bud A.B. WickedAle Janeway ??? Wicke
dAle Petes ??? Spock Enterprise Bud ??? Bud
Data is redundant, because each of the ???s can
be figured out since name implies addr favBeer
and beersLiked implies manf.
9This Bad Design AlsoExhibits Anomalies
name addr beersLiked manf favBeer Janeway Voya
ger Bud A.B. WickedAle Janeway Voyager WickedAle
Petes WickedAle Spock Enterprise Bud A.B. Bud
- Update anomaly if Janeway is transferred to
Intrepid, - will we remember to change each of her tuples?
- Deletion anomaly If nobody likes Bud, we lose
track - of the fact that Anheuser-Busch manufactures
Bud.
10Objectives of Normalization
- Develop a "good" description of the data, its
relationships and constraints - Produce a stable set of relations that
- Is a faithful model of the enterprise
- Is highly flexible
- Reduces redundancy
- saves space and reduces inconsistency in data
- Is free of update, insertion and deletion
anomalies
11Normal Forms
- First Normal Form - 1NF
- Second Normal Form - 2NF
- Third Normal Form - 3NF
- Boyce-Codd Normal Form - BCNF
- Fourth Normal Form - 4NF
- Fifth Normal Form - 5NF
- Domain/Key Normal Form DKNF
- Others
12Steps in normalization
12
13Data Normalization Background
- Info Needed
- all the attributes that must be incorporated into
the database - a list of all the defining associations between
the attributes (i.e., the functional
dependencies). - a means of expressing that the value of one
particular attribute is associated with a single,
specific value of another attribute. - If we know that one of these attributes has a
particular value, then the other attribute must
have some other value.
14First Normal Form
- A relation is in 1NF if every attribute is
single-valued for each tuple - Each cell of the table has only one value in it
- Domains of attributes are atomic no sets, lists,
repeating fields or groups allowed in domains
15Table with multivalued attributes, not in 1st
normal form
15
16Table with no multivalued attributes and unique
rows, in 1st normal form
Note this is a relation, but not a
well-structured one
16
17Anomalies in this Table
- Insertionif new product is ordered for order
1007 of existing customer, customer data must be
re-entered, causing duplication - Deletionif we delete the Dining Table from Order
1006, we lose information concerning this item's
finish and price - Updatechanging the price of product ID 4
requires update in several records
Why do these anomalies exist? Because there are
multiple themes (entity types) in one relation.
This results in duplication and an unnecessary
dependency between the entities
18Ensuring 1NF 1 of 3
- Best solution For each multi-valued attribute,
create a new table, in which you place the key of
the original table and the multi-valued
attribute. Keep the original table, with its key - Example
- STUDENT(stuId, lastName, major, credits,
status, socSecNo) - becomes
- NEWSTUDENT(stuId, lastName, credits,status,
socSecNo) - MAJORS(stuId, major)
19Another method for 1NF 2 of 3
- Flatten the original table by making the
multi-valued attribute part of the key - Student(stuId, lastName, major, credits, status,
socSecNo)
20A Third Method for 1NF 3 of 3
- If the number of repeats is limited, make
additional columns for multiple values - Student(stuId, lastName, major1, major2, credits,
status, socSecNo) - Hannay Reels
- 3 Product Types, 2 Address Lines
21Just Getting Started
- First normal form is merely a starting point in
the normalization process. - First normal form contains a great deal of data
redundancy.
22Functional Dependency
- A functional dependency (FD) is a type of
relationship between attributes - If A and B are sets of attributes of relation R,
we say B is functionally dependent on A if each A
value in R has associated with it exactly one
value of B in R. - Recall the concept of "single-valued" function
- Alternatively, if two tuples have the same A
values, they must also have the same B values - Write A?B, read A functionally determines B, or B
functionally dependent on A
23Example of FDs
- Let R be
- NewStudent(stuId, lastName, major, credits,
status, socSecNo) - FDs in R include
- stuId ? lastName, but not the reverse
- stuId ? lastName, major, credits, status,
socSecNo, stuId - socSecNo ? stuId, lastName, major, credits,
status, socSecNo - credits ? status, but not status?credits
- A functional dependency is a generalization of
the notion of a key.
24"General Hardware" SALESPERSON and PRODUCT
Functional Dependencies
25Beer Example
- Drinkers(name, addr, beersLiked, manf, favBeer).
- Reasonable FDs to assert
- name ? addr
- name ? favBeer
- beersLiked ? manf
26Related Attributes
name addr beersLiked manf favBeer Janeway
Voyager Bud A.B. WickedAle Janeway Voyager
WickedAle Petes WickedAle Spock Enterprise Bud
A.B. Bud
27FDs With Multiple Attributes
- No need for FDs with gt 1 attribute on right.
- But sometimes convenient to combine FDs as a
shorthand. - Example name ? addr and name ? favBeer become
name ? addr, favBeer - gt 1 attribute on left may be essential.
- Example bar, beer ? price
28FDs From Laws of Physics
- While most FDs come from E-R keyness and
many-one relationships, some are really physical
laws. - Example "no two courses can meet in the same
room at the same time" tells us - hour, room ? course.
29Trivial Functional Dependency
- The FD X?Y is trivial if set Y is a subset of
set X - Examples If A and B are attributes of R,
- A?A
- A,B ?A
- A,B ?B
- A,B ?A,B
- are all trivial FDs
30Full Functional Dependency
- In relation R, set of attributes B is fully
functionally dependent on set of attributes A of
R, if B is functionally dependent on A but not
functionally dependent on any proper subset of A - This means every attribute in A is needed to
functionally determine B
31Partial Functional Dependency
- NewClass(courseNo, stuId, stuLastName, facId,
schedule, room, grade) - FDs
- courseNo,stuId ? grade
- courseNo,stuId ? lastName partial FD
- stuId ? lastName
- courseNo,stuId ? facId partial FD
- courseNo ? facId
- courseNo,stuId ? schedule partial FD
- courseNo ? schedule
- courseNo,stuId ? room partial FD
- courseNo ?room
- plus trivial FDs that are partial
32Functional Dependencies and Keys
- Functional Dependency The value of one attribute
(the determinant) determines the value of another
attribute - Candidate Key
- A unique identifier. One of the candidate keys
will become the primary key - E.g. perhaps there is both credit card number and
SS in a tablein this case both are candidate
keys - Each non-key field is functionally dependent on
every candidate key
33Candidate Key
- Which candidate key to pick depends on the
application using the database. - Alternate key is a candidate key that was not
chosen to be the primary key of the relation.
34Second Normal Form
- A relation is in second normal form (2NF) if it
is in first normal form and all the non-key
attributes are fully functionally dependent on
the key. - No non-key attribute is FD on just part of the
key - If key has only one attribute and R is 1NF, then
R is automatically 2NF
35Converting to 2NF
- Identify each partial FD
- Remove the attributes that depend on each of the
determinants so identified - Place these determinants in separate relations
along with their dependent attributes - In original relation keep the composite key and
any attributes that are fully functionally
dependent on all of it - Even if the composite key has no dependent
attributes, keep that relation to connect the
others
362NF Example
- NEWCLASS(courseNo, stuId, stuLastName, facId,
schedule, room, grade ) - FDs grouped by determinant
- courseNo ? courseNo, facId, schedule, room
- stuId ? stuId, lastName
- courseNo, stuId ? courseNo, stuId, facId,
schedule, room, lastName, grade - Create tables grouped by determinants
- COURSE(courseNo, facId, schedule, room)
- STUD(stuId, lastName)
- Keep relation with original composite key, with
attributes FD on it, if any - NEWSTUD2( courseNo, stuId, grade)
37General Hardware Company Second Normal Form
- In SALESPERSON, Salesperson Number is the sole
primary key attribute. Every nonkey attribute of
the table is fully defined just by Salesperson
Number. - Similar logic for PRODUCT and QUANTITY tables.
38General Hardware Company Second Normal Form
39Transitive Dependency
- If A, B, and C are attributes of relation R, such
that A ? B, and B ? C, then C is transitively
dependent on A -
- Example
- NewStudent (stuId, lastName, major, credits,
status) - FD
- credits?status
- By transitivity
- stuId?credits AND credits?status implies
stuId?status - Transitive dependencies cause update, insertion,
deletion anomalies.
40Third Normal Form
- To be 3NF, relation must be 2NF and have no
transitive dependencies - No non-key attribute determines another non-key
attribute. Here key includes candidate key
41Making a relation 3NF
- For example,
- NEWSTUDENT(stuId, lastName, major, credits,
status) - with FD credits?status
- Remove the dependent attribute, status, from the
relation - Create a new table with the dependent attribute
and its determinant, credits - Keep the determinant, but not the dependent
attribute, in the original table - NEWSTUD2 (stuId, lastName, major, credits)
- STATS (credits, status)
42Functional dependency diagram for INVOICE
Order_ID ? Order_Date, Customer_ID,
Customer_Name, Customer_Address
Customer_ID ? Customer_Name, Customer_Address
Product_ID ? Product_Description, Product_Finish,
Unit_Price
Order_ID, Product_ID ? Order_Quantity
Therefore, NOT in 2nd Normal Form
43Removing partial dependencies
Getting it into Second Normal Form
Partial dependencies are removed, but there are
still transitive dependencies
44Removing Transitive dependencies
Getting it into Third Normal Form
Transitive dependencies are removed
44
45General Hardware Company Second Normal Form
46General Hardware Company Third Normal Form
47Third Normal Form
- Important points about the third normal form
structure are - It is completely free of data redundancy.
- All foreign keys appear where needed to logically
tie together related tables.
48Violating 3NF?
- Zip Codes determine City State
- AND Company (therefore CustID) determines Zip
- Hannay Reels CustomerMaster does not split out
zip/city/state into another table - Why might this be?
- I screwed up???
- ease/speed of access
- get full address with one data fetch (one table)
- lack of repetition
- (not many customers at same zip code in same
city) - foreign customers (no zip)
49Closure of a Set of Functional Dependencies
- Given a set F of functional dependencies, there
are certain other functional dependencies that
are logically implied by F. - For example If A ? B and B ? C, then we can
infer that A ? C - The set of all functional dependencies logically
implied by F is the closure of F. - We denote the closure of F by F.
- F is a superset of F.
50Closure of a Set of Functional Dependencies
- We can find F, the closure of F, by repeatedly
applying Armstrongs Axioms - if ? ? ?, then ? ? ?
(reflexivity) - if ? ? ?, then ? ? ? ? ?
(augmentation) - if ? ? ?, and ? ? ?, then ? ? ? (transitivity)
- These rules are
- sound (generate only functional dependencies that
actually hold), and - complete (generate all functional dependencies
that hold).
51Example
- R (A, B, C, G, H, I)F A ? B A ? C CG
? H CG ? I B ? H - some members of F
- A ? H
- by transitivity from A ? B and B ? H
- AG ? I
- by augmenting A ? C with G, to get AG ? CG
and then transitivity with CG ? I - CG ? HI
- by augmenting CG ? I to infer CG ? CGI,
- and augmenting of CG ? H to infer CGI ? HI,
- and then transitivity
52Procedure for Computing F
- To compute the closure of a set of functional
dependencies F - F Frepeat for each functional
dependency f in F apply reflexivity and
augmentation rules on f add the resulting
functional dependencies to F for each pair of
functional dependencies f1and f2 in F
if f1 and f2 can be combined using
transitivity then add the resulting functional
dependency to F until F does not change any
further
53Closure of Functional Dependencies (Cont.)
- Additional rules
- If ? ? ? holds and ? ? ? holds, then ? ? ? ?
holds (union) - If ? ? ? ? holds, then ? ? ? holds and ? ? ?
holds (decomposition) - If ? ? ? holds and ? ? ? ? holds, then ? ? ? ?
holds (pseudotransitivity) - The above rules can be inferred from Armstrongs
axioms.
54Canonical Cover
- Sets of functional dependencies may have
redundant dependencies that can be inferred from
the others - For example A ? C is redundant in A ? B,
B ? C, A? C - Parts of a functional dependency may be redundant
- E.g. on RHS A ? B, B ? C, A ? CD can
be simplified to A ?
B, B ? C, A ? D - E.g. on LHS A ? B, B ? C, AC ? D can
be simplified to A ?
B, B ? C, A ? D - Intuitively, a canonical cover of F is a
minimal set of functional dependencies
equivalent to F, having no redundant dependencies
or redundant parts of dependencies - Every set of FDs has an equivalent minimal set
- There can be several equivalent minimal sets
55Equivalent FDs
- Two sets of FDs F and G are equivalent if
- Every FD in F can be inferred from G, and
- Every FD in G can be inferred from F
- Hence, F and G are equivalent if F G
- Definition (Covers)
- F covers G if every FD in G can be inferred from
F - (i.e., if G is subset-of F)
- F and G are equivalent if F covers G and G covers
F
56Finding a Minimal Cover
- Finding a minimal cover G for F
- 1. Set G F.
- 2. Replace each FD X?A1, A2, ..., Ak in G by
the n functional dependencies X?A1, X?A2 , ,
X?Ak. - 3. For each FD X?A in G
- For each attribute B that is an element of X
- if ((G -X?A) ? (X-B)?A) is equivalent to G,
- then replace X?A with (X-B)?A in G.
- 4. For each remaining FD X?A in G,
- if (G-X?A) is equivalent to G, then remove X?A
from G.
57Finding a Minimal CoverExample 1
- Example
- A?B, ABCD?E, EF?GH, ACDF?EG
- Make RHS a single attribute
- A?B, ABCD?E, EF?G, EF?H, ACDF?E, ACDF?G
- Minimize LHS ACD?E instead of ABCD?E
- Eliminate redundant FDs
- Can ACDF?G be removed?
- Can ACDF?E be removed?
- Final answer A?B, ACD?E, EF?G, EF?H
58Canonical CoverExample 2
- Compute the minimal cover of the following set of
functional dependencies - ABC ? DE, BD ? DE, E ? CF, EG ? F
- ABC ? D
- ABC ? E
- BD ? D // reflexive
- BD ? E
- E ? C
- E ? F
- EG ? F // augmentation
- The minimal cover is
- ABC ? D, BD ? E, E ? C, E ? F
59Homework/Project
- Homework due Next Class
- 7.1, 7.3, 7.5, 7.6, 7.7, 7.8, 7.9, 7.10, 7.11,
7.12 - No Homework due in One Week (Test)
- Project
- Verify (or convert to) 1NF, 2NF, 3NF
60Test in One Week
- Chapters 6-8
- Give/Interpret Domain Relational Calculus
Expressions - Design an EER Diagram Convert to Relational
Schema - (tuples, primary keys, foreign keys)
- Convert a relation to 1NF using each of 3 methods
- Test for and/or Convert to 2NF, 3NF, BCNF, 4NF
- Find a 3NF lossless join decomposition that
preserves dependencies - List all non-trivial Functional Dependencies for
a relation - Give Closure Set for Functional Dependencies
- List Candidate Keys Give Canonical Cover
- Open Book
- 4 Sheets of Notes
61In-Class Exercise(You may work together. You are
free to go once you show me your work.)
- List the Functional Dependences in
- (EmpID, CourseTitle, Name, Phone, Salary,
DateCompleted) - 8.6 Given the relation R and Functional
Dependencies F, compute F - R (A, B, C, D, E)
- F A ? BC, B ? D, CD ? E, E ? A
- Find Candidate Keys (more than 1)
- Find Canonical Cover