Lecture 22 - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Lecture 22

Description:

Use a data model language to come up with an accurate, high-level ... E.g. SSN DOB and DOB horoscope sign then. SSN horoscope sign. Inference Rules for FDs ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 30
Provided by: imadr
Category:

less

Transcript and Presenter's Notes

Title: Lecture 22


1
Lecture 22
  • Functional Dependencies (FDs) and
  • Normalization

2
Schedule Change
  • http//www.users.csbsju.edu/irahal/

3
The Database Design Process
  • Conceptual design
  • Use a data model language to come up with an
    accurate, high-level description of the system
    requirements
  • Words (unstructured) ? Diagrams
  • Logical design
  • Map the resulting EERD into a set of relations
  • Diagram ? Relations
  • Physical design
  • Use DDL on some DBMS to create tables
    corresponding to your relations

4
The Database Design Process
  • Limitations of E-R Designs
  • The EER model provides a set of guidelines
  • Does not result in a unique database schema
  • Does not provide a formal way of evaluating
    alternatives
  • Relies largely on the common sense of the
    designer
  • Here we try to answer
  • What are the criteria for "good" base relations? 
  • Meaningful grouping of attributes
  • When designing a relation schema, how to decide
    which attributes to include?
  • So far, attributes are grouped to form the
    relation schema by using the common sense of the
    database designer

5
The Database Design Process
  • First discuss informal guidelines for good
    relational design
  • Then we discuss formal concepts of functional
    dependencies and normal forms
  • - 1NF (First Normal Form)
  • - 2NF (Second Normal Form)
  • - 3NF (Third Normal Form)
  • - BCNF (Boyce-Codd Normal Form)
  • Additional types of dependencies, further normal
    forms, relational design algorithms by synthesis
    are discussed in Chapter 11

6
Relation Schema Informal Measures
  • We have some informal measures
  • Semantics of the attributes
  • Reducing the redundancy values in tuples
  • Disallowing the possibility of generating
    spurious tuples
  • Reducing null values
  • Not always independent of one another

7
Semantics of the Relation Attributes (1)
  • Any grouping of attributes to form a relation
    schema must portray a certain real-world meaning
  • Each tuple in a relation should represent one
    entity or relationship instance
  • Guideline 1 Design a relation schema so that it
    is easy to explain its meaning
  • Semantics of attributes should be easy to
    interpret
  • Attributes of different entities should not be
    mixed
  • Only foreign keys should be used to refer to
    other entities

8
Relations that violate Guideline 1 by intermixing
attributes from different relations
9
Relation schemas that abide by Guideline 1
10
Redundant Information (2)
  • One goal of schema design is to reduce redundancy
  • Information is stored redundantly wasting storage
  • Problems with update anomalies
  • Modification anomalies
  • Insertion anomalies
  • Deletion anomalies
  • Mixing attributes of multiple entities may cause
    the above problems

11
  • Update anomalies
  • Modification Anomalies
  • Update PNAME from ProductY to
    Customer-Accounting
  • Insert Anomalies
  • Insert a new employee not assigned to known
    project
  • Insert a new project with no working employees
  • Delete Project
  • Delete PNUMBER2
  • Delete the sole employee of a project

12
Modification Anomalies
  • Consider the relation
  • EMP_PROJ ( Emp, Proj, No_hours, Ename, Pname,
    Plocation)
  • Modification Anomaly
  • Changing the name of project number P2 from
    Project Y to Customer-Accounting
  • May cause this update to be made for all
    employees working on project P2 otherwise the DB
    will become inconsistent

13
Modification Anomalies
  • Consider the relation
  • EMP_PROJ( Emp, Proj, No_hours, Ename, Pname,
    Plocation)
  • Insert Anomaly Cannot insert a project unless
    an employee is assigned to
  • Inversely - Cannot insert an employee unless
    he/she is assigned to a project.
  • Delete Anomaly When a project is deleted ?
    delete all the employees who work on the project
  • Alternately, if an employee is the sole employee
    on a project, deleting that employee would result
    in deleting the corresponding project

14
Guidelines to Redundant Information in Tuples and
Update Anomalies
  • Guideline 2 Design a schema that does not suffer
    from the insertion, deletion and update anomalies
  • If there are any present, then note them so that
    applications can be made to take them into
    account
  • Might need to break the guidelines to improve
    performance for certain queries
  • Assume that we always access employee information
    only with department information
  • The design EMP_PROJ (Emp, Proj, No_hours,
    Ename, Pname, Plocation) might be could for such
    cases

15
Example
ER Model
SSN Name Address
Hobby 1111 Joe 123 Main biking,
hiking
Relational Model (SSN, Hobby, Name, Address)
16
Example
  • Redundancy leads to anomalies
  • A change in Address must be made in several
    places
  • Suppose a person gives up all hobbies. Do we
  • Set Hobby attribute to null? No, since Hobby is
    part of key
  • Delete the entire row? No, since we lose other
    information in the row
  • No hobby information?
  • Hobby value must be supplied for any inserted row
    since Hobby is part of key

17
Decomposition
  • Solution use two relations to store Person
    information
  • Person1 (SSN, Name, Address)
  • Hobbies (SSN, Hobby)
  • People with/without hobbies can now be described
  • No update anomalies
  • Name and address stored once
  • A hobby can be separately supplied or deleted

18
Spurious Tuples (3)
  • Bad designs for a relational database (or bad
    decompositions) may result in erroneous results
    for certain JOIN operations
  • Any decomposition MUST have the "lossless join"
    property
  • No spurious tuples should be generated by doing a
    natural-join of any decomposed relations
  • Person1 (SSN, Name, Address)
  • Hobbies (SSN, Name)
  • Here, loss relates to loss of information

19
Spurious Tuples (3)
  • Suppose we replace
  • EMP_PROJ ( SSN, PNUMBER, ENAME, PNAME, PLOCATION,
    HOURS) by
  • EMP_PROJ1(SSN, PNUMBER, PNAME, PLOCATION, HOURS)
    ANDEMP_LOCS (ENAME, PLOCATION)
  • Guideline 3 The relations should be designed to
    satisfy the lossless join condition
  • Avoid relations that contain matching attributes
    that are not (foreign key, primary key)
    combinations

20
Null Values in Tuples (4)
  • Guideline 4 Relations should be designed such
    that their tuples will have as few NULL values as
    possible
  • Make sure only NULLs are exceptional cases
  • If many attributes do not apply to all tuples in
    the relation, we end up with many nulls
  • Waste space
  • Ambiguity in meaning
  • Attribute not applicable or invalid
  • Value known to exist, but unavailable
  • Attribute value unknown (may or may not exist)
  • Difficulty specifying JOIN operations (inner or
    outer joins)
  • Attributes that are NULL frequently could be
    placed in separate relations (with the primary
    key)

21
Functional Dependencies
  • Functional dependencies (FDs) are used to specify
    formal measures of the "goodness" of a
    relational database design
  • FDs are constraints that are derived from the
    meaning and interrelationships of the data
    attributes
  • An FD is a constraint between two sets of
    attributes X and Y
  • X ? Y holds if whenever two tuples have the same
    value for X, they must have the same value for Y
  • For any two tuples t1 and t2 in any relation
    instance r(R) If t1Xt2X, then t1Yt2Y

22
Functional Dependencies
  • X ? Y A set of attributes X functionally
    determines a set of attributes Y (or Y is
    functionally determined by X) if the value of X
    determines a unique value for Y
  • X ? Y in R specifies a constraint on all
    relation instances r(R)
  • FDs are derived from the real-world constraints
    on the attributes
  • Property of the intention of the database
  • An FD is a property of the attributes in the
    schema R
  • The constraint must hold on every relation
    instance r(R)
  • Can NEVER be deduced from an extension
  • E.g. if in some case, all people having the same
    first name are registered for the same course,
    can we deduce that name ? course?

23
Examples of FD Constraints
  • EMP_PROJ (SSN, PNUMBER, ENAME, PNAME, PLOCATION,
    HOURS)
  • social security number determines employee name
  • SSN ? ENAME
  • project number determines project name and
    location
  • PNUMBER ? PNAME, PLOCATION
  • employee SSN and project number determines the
    hours per week that the employee works on the
    project
  • SSN, PNUMBER ? HOURS

24
More on FD Constraints
  • Definition of a relation KEY (If K is a key of R)
  • K functionally determines all attributes in R
  • If X?Y is true, does that make Y?X true?
  • Some FDs are always true regardless of the
    relation in which they occur
  • State, Driver_License_Number ? SSN
  • Zip ? City, State

25
Inference Rules for FDs
  • Given a set of FDs F, we can infer additional FDs
    that hold whenever the FDs in F hold
  • Armstrong's inference rules
  • IR1. (Reflexive) If Y ? X, then X ? Y
  • (Generates trivial FDs)
  • E.g. SSN, ENAME ? ENAME
  • IR2. (Augmentation) If X ? Y, then XZ ? YZ (Note
    that XZ stands for X U Z)
  • E.g. SSN ? ENAME, then SSN, PNUMBER ? ENAME,
    PNUMBER
  • IR3. (Transitive) If X ? Y and Y ? Z, then X ? Z
  • E.g. SSN ? DOB and DOB ? horoscope sign then SSN
    ? horoscope sign

26
Inference Rules for FDs
  • IR1, IR2, IR3 form a sound and complete set of
    inference rules
  • Sound ? Any rule inferred using IR1, IR2 or IR3 a
    valid FD
  • Complete ? All possible FDs can be generated
    using them
  • Some additional inference rules that are useful
  • IR4. (Decomposition) If X ? YZ, then X ? Y and X
    ? Z
  • SSN ? ENAME, DOB then SSN ? DOB SSN ? ENAME
  • IR5. (Union) If X ? Y and X ? Z, then X ? YZ
  • SSN ? DOB SSN ? ENAME then SSN ? ENAME, DOB
  • IR6. (Pseudo-transitivity) If X ? Y and WY ? Z,
    then WX ? Z
  • Can be deduced from IR1, IR2, and IR3
    (completeness property)
  • OfficeLocation? Department Department, Ename ?
    Salary-level then
  • OfficeLocation,Ename ? Salary-level

27
Proofs
  • IR1. (Reflexive) If Y ? X, then X ? Y
  • For any two tuples t1 and t2 with t1X t2X
    then t1Y t2Y because Y ? X
  • IR2. (Augmentation) If X ? Y, then XZ ? YZ
  • Proof by contradiction
  • If for two tuples t1 and t2 we have
  • (1) t1X t2X
  • (2) t1Y t2Y
  • (3) t1XZ t2XZ
  • (4) t1YZ ? t2YZ
  • Cant be true since from (1) and (3) we have (5)
    t1Z t2Z and from (2) and (5) we have t1YZ
    t2YZ which contradicts (4)

28
Proofs
  • IR3. (Transitive) If X ? Y and Y ? Z, then X ? Z
  • For any two tuples t1 and t2 with t1X t2X
    then t1Y t2Y which implies that t1Z
    t2Z hence X ? Z holds
  • IR4. (Decomposition) If X ? YZ, then X ? Y and X
    ? Z
  • X ? YZ
  • YZ ? Y (Using IR1)
  • X ? Y (Using IR3)
  • Similarly for X ? Z

29
Proofs
  • IR5. (Union) If X ? Y and X ? Z, then X ? YZ
  • X ? Y
  • X ? Z
  • X ? XY (Using IR1)
  • XY ? YZ (Using IR2)
  • X ? YZ (Using IR3)
  • IR6. (Psuedotransitivity) If X ? Y and WY ? Z,
    then WX ? Z
  • X ? Y
  • WY ? Z
  • WX ? WY (Using IR2)
  • WX ? Z (Using IR3)
Write a Comment
User Comments (0)
About PowerShow.com