Normalisation - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Normalisation

Description:

Term 2, 2004, Normalisation Marian Ursu, Department of Computing, Goldsmiths College ... [ Hint: you may use PHP code samples from pervious lab] ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 38
Provided by: marian5
Category:

less

Transcript and Presenter's Notes

Title: Normalisation


1
Normalisation
5
2
Outline
  • Boyce-Codd Normal Form (BCNF)
  • normalisation
  • non-loss decomposition
  • Heaths theorem
  • normalisation process
  • semantic assumptions and FDs
  • CKs
  • decomposition
  • normalisation vs dependency preservation
  • a decomposition may yield to a better solution
    than another one
  • either-or situations normalise or preserve FDs

3
1
4
2NF and 3NF equivalent definitions
  • 2NF (in Connollys book)
  • a relation is in 2NF if and only if it is in 1NF
    and all non-primary key attributes are
    irreducibly dependent (or fully functional
    dependent that is, the functional dependency is
    left irreducible) on the primary key.
  • 3NF (in Connollys book)
  • a relation is in 3NF if and only if it is in 2NF
    and no non-primary key attribute A is
    transitively dependent on the primary key X (that
    is, if one has a FD X-gtY one cannot find a FD
    Y-gtA that makes A to depend on X via Y).
  • These definitions can be generalised if one has
    more than one CK in the table (see Connollys)

5
BCNF
  • a relation is in Boyce/Codd normal form (BCNF) if
    and only if every non-trivial irreducible FD has
    a candidate key as its determinant
  • informally
  • the determinant of each relevant FD is a CK

6
Example
  • (M_id, M_name, Type, Value)
  • M_id ? M_name
  • M_id ? Type
  • M_id ? Value
  • Type ? Value problematic FD since Type is
    not key
  • not BCNF
  • (Type, Value)
  • (M_id, M_name, Type)
  • both in BCNF

7
BCNF
  • any relation can be non-loss decomposed into an
    equivalent set of BCNF relations
  • BCNF ? 3NF ? 2NF ? 1NF
  • BCNF is still not guaranteed to be free of any
    update anomalies

8
2
9
Normalisation
  • the process of transforming a relation with
    redundancies into an equivalent set of
    relations that have less redundancies
  • transformation ? projection
  • input one relation, say R
  • output many relations, say R1, , Rn
  • equivalent ? non-loss decomposition
  • R1 join R2 join Rn R
  • R1, , Rn should have normal forms higher than or
    equal to that of R

10
Non-loss decomposition
  • semantic assumptions
  • exercise

11
Lossy decomposition
  • semantic assumptions
  • exercise

12
Heaths theorem
  • can be used as the basis for normalisation
  • theorem
  • suppose
  • R (A, B, C), where A, B and C are disjoint sets
    of attributes
  • A?B
  • then
  • R (A, B) join (A, C)
  • state in English

13
Normalisation rules of thumb
  • take as basis for normalisation/Heaths theorem a
    problematic FD A?B
  • maximise B when applying Heaths theorem, on the
    basis of A?B
  • Example (stud_id, stud_name, module_id,
    module_name, level, result)
  • FDsmodule_id ?module_name FDs are
    replaced by what?
  • module_id ?level
  • module_id, stud_id ? result
  • stud_id ? stud_name
  • try to maintain a one-to-one correspondence with
    real life entities

14
Normalisation
  • steps
  • semantic assumptions
  • FDs
  • CKs
  • decomposition

15
Simple example
  • (M_id, M_name, Type, Value)
  • M_id ? M_name
  • M_id ? Type
  • M_id ? Value
  • Type ? Value
  • not BCNF
  • Apply Heaths theorem for Type ? Value
  • (Type, Value)
  • (M_id, M_name, Type)
  • both relations are now in BCNF

16
3
17
Example (R)
  • (project, task, max-budget, duration,
    payment-rate, contractor, contr-time)

FDs (project, task) ? max_budget, duration
(task, max_budget, duration) ? payment_rate
(project, task, contractor) ? contr_time
(project, task, max-budget, duration,
payment-rate, contractor, contr-time)
18
Example decomposition for R
  • Heaths theorem for R (the initial relation)
    based on
  • task, max_budget, duration ? payment_rate
  • leads to
  • R1 (task, max_budget, duration, payment_rate)
  • R2 (project, task, max_budget, duration,
    contractor, contr_time)
  • R1 is in BCNF
  • R2 is not in BCNF, due to
  • project, task ? max_budget, duration

19
Example decomposition for R2
  • Heaths theorem for R2, based on
  • project, task ? max_budget, duration
  • leads to
  • R21 (project, task, max_budget, duration)
  • R22 (project, task, contractor, contracted_time)
  • R21 is in BCNF
  • R22 is in BCNF

20
Example solution
  • (task, max_budget, duration, payment_rate)
  • (project, task, max_budget, duration)
  • (project, task, contractor, contracted_time)

21
Exercise
  • Consider the following table R with FDs.
    Decompose it in BCNF.
  • (patient, drug_id, drug_name, admin, start, end,
    dosage, special_diet)
  • 1) drug_id ? drug_name
  • 2) drug_id ? admin
  • 3) patient, drug_id ? start, end, dosage
  • 4) drug_id, dosage ? special_diet

22
4
23
Decomposition 2 or more solutions
  • in the normalisation process, it may be possible
    that a certain (non-loss) decomposition yields to
    a better solution than another one

24
Decomposition 2 solutions example
  • Modules(M_id, M_name, Type, Value)
  • FD Type ? Value M_id ?M_name, Type
  • solution 1
  • Modules_Descr(M_id, M_name, Type)
  • Type_Val(Type, Val)
  • solution 2
  • Modules_Descr(M_id, M_name, Type)
  • Module_Val(M_id, Val)
  • are they both non-loss? (find appropriate FDs)
  • is there one better than the other?

25
Solution 1 vs Solution 2
  • updates
  • u1 insert the fact that a 3 semester module is
    worth 1.5cu
  • u2 modify 1 semester modules they are not worth
    0.5cu any longer, they are 0.75cu
  • u3 change the type of a module but forget to
    change its value
  • solution 2
  • u1 and u2 are impossible or difficult to perform
  • u3 is allowed
  • solution 1
  • u1 and u2 are straightforward
  • u3 is not allowed

26
Solution 1 vs Solution 2
  • solution 1 more expressive and better
  • certain facts cannot be expressed in solution 2
    e.g. the value of a new type
  • in solution 2 Type ? Value is lost, so this
    constraint must be enforced by the user by
    procedural code

27
Independent projections (smaller tables)
M_name
M-id
Type
Value
Solution 1
Solution 2
one FD is lost Type ?Value Additional code using
two tables needs to be written to impose
FD, thus tables are dependent
all FD are preserved via Keys
28
5
29
Normalisation vs dependency preservation
  • there are cases when there is an either-or
    situation regarding the normalisation and the
    preserving of functional dependencies
  • either the relation is normalised and some FDs
    are lost
  • or, some FDs are not lost (they are expressed in
    the original relation), but the relation is not
    in its higher normal form possible (BCNF)
  • in this case, no solution is better than the
    other
  • other criteria will have to be considered to
    judge better

30
Normalisation vs dependency preservation Example
  • a patient is treated by a single doctor for a
    certain kind of disease
  • each doctor only treats one kind of disease
  • a doctor can treat more than one patient
  • What happens in all possible splits?
  • consider also (Patient, Disease, Doctor,
    Treatment)
  • with Patient, Disease ? Treatment

31
All possible decompositions
Not acceptable lossy decomposition. Why?
Not acceptable lossy decomposition. Why?
Heaths theorem Doctor ? Disease You get BCNF
Non-loss decomposition
32
BCNF vs dependency preservation
and
do not enforce a FD existing in the original
specification, namely
e.g. a patient can be given two doctors that
treat the same disease (the system will not
disallow this) the constraint would have to be
maintained by additional procedural code
33
BCNF vs dependency preservation
  • not every FD is preserved through normalisation
  • In initial table (Patient, Doctor, Disease) not
    BCNF
  • Doctor ? Disease could generate update anomalies
  • When table is decomposed in BCNF
  • (Patient, Doctor) (Doctor, Disease)
  • Doctor ? Disease is preserved
  • (Patient, Disease) ? Doctor is lost because you
    can insert a patient with two doctors, both
    treating same patients disease
  • Thus update (insert) anomalies
  • this latter FD would not have been preserved via
    any decomposition

34

35
Conclusions
  • normal forms formalization of common sense
  • art ? engineering
  • possibility for automation yes, but DB designer
    needs to assess result
  • BCNF
  • always achievable
  • not always completely free of update anomalies
  • because some FD might have been lost

36
Conclusions
  • Other normal forms based on other concepts
  • We study normalization up to BCNF
  • Sufficient for our purposes in database design
  • With normalization we finish the Database Logical
    Design
  • Next Physical Design

37
Practical work
  • Assume the table tutors_modules
  • (lectid, lectname, job, address, modid, modname,
    level)
  • Assumptions
  • A module can be taught by one ore more tutors
  • A tutor can teach one or more modules
  • lectid ? lectname, job, address
  • modid ? modname, level
  • Do the following
  • Normalize the table tutors_modules to BCNF
  • Connect to the DB server and create the tables
    you obtain, and populate them with rows
  • Write a PHP interface for the normalized tables,
    that would display the tutors names in a form.
    When you select one tutor name from the form, all
    the modules taught by him/her should be displayed
    (module names and levels). Hint you may use PHP
    code samples from pervious lab
  • If you finish, tackle Task 3 from the lab handout
    for current week
Write a Comment
User Comments (0)
About PowerShow.com