Title: Relational Database Design Theory
1Relational Database Design Theory
- Database Design Concepts
- Update, deletion and insertion anomalies
- Functional dependency
2Database Design Concepts 0
- The attributes in a database are determined by
the set - of observables that determine the state of the
enterprise - These attributes can in general be clustered into
- subsets according to the contexts for observation
in - which they are encountered
- Database design is concerned with how attributes
can - be organised into plausible clusters of
attributes as - relations within a relational scheme more
formally ...
3Database Design Concepts 1
- A relational database scheme can be regarded as
- defined by an abstract relation R(A1, A2, ...,
An) where - A1, A2, ..., An are all the attributes of
interest. This is - simplest possible relation scheme for the
database. - In general, when setting up the database,
decompose - the relation R to form a new relational scheme.
- Issues
- what is the nature of this decomposition?
- does it matter what is associated in tables?
- if so, what principles apply to the design of
tables?
4Database Design Concepts 2
- Answer
- the association of attributes should be guided by
data dependency - failure to take account of dependency creates
anomalies on modifying the database - anomalies can be largely eliminated through an
appropriate choice of tables and normalisation - cf. an object-oriented approach, where attributes
are grouped according to the objects that contain
them
5Database Design Concepts 3
- The relational database design problem
- Any relational database could in principle be
organised as a single relation R(A1, A2, ..., An)
where A1, A2, ..., An are all the attributes of
interest. - Why not do it this way?
- convenience easy to handle small relations
- ... but also need to consider semantics of
attributes - In practice, typically choose to decompose R into
a set of smaller relations R1, R2, ..., Rn, where
the set of attributes in R the set of
attributes in R1, R2, ..., Rn.
6Database Design Concepts 4
- Illustrative example of design
- E.g. could take HVFC as comprising a single
relation - HVFC (NAME, ADDRESS, BALANCE, ORDER_NO,
- ITEM, QUANTITY, SNAME, SADDRESS, PRICE)
- defined by the natural join of the relations
- MEMBERS(NAME, ADDRESS, BALANCE)
- ORDERS(ORDER_NO, NAME, ITEM, QUANTITY)
- SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE)
- No information loss involved in this can recover
all the original relations by projection ...
7Database Design Concepts 5
- Data dependencies in the HVFC
- In HVFC DB, can infer some attributes from
others - e.g. know SNAME Þ know SADDRESS
- know SNAME and ITEM Þ know PRICE
- NB knowledge of such dependencies is derived from
real-world meaning of the attributes - This informs good decomposition of a relational
scheme
8Database Design Concepts 6
- E.g. if we have a single relation for HVFC, then
- every tuple with SNAME "S" has SADDRESS "A"
- Þ very high level of redundancy in the table
- Actually have this redundancy in the relation
- SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE)
- SMITH LONDON NUT 20p
- SMITH LONDON BOLT 25p
- JONES HULL PIN 15p
- .......
- ... this redundancy has consequences for update
9Database Design Concepts 7
- Anomalies and the SUPPLIER relation
- Form of the SUPPLIER relation leads to anomalies
on update, deletion and insertion - update anomalies when supplier address is
updated, must be updated in all tuples, else two
addresses recorded for same supplier - insertion anomalies can't record info on
supplier unless he supplies part, whence .... - deletion anomalies delete all items supplied by
one supplier and you lose supplier's address.
10Database Design Concepts 8
- A better design of the HVFC relation scheme will
- resolve these problems. Thus decomposition
- SA (SNAME, SADDRESS)
- SIP (SNAME, ITEM, PRICE)
- resolves the anomalies.
- Relational database design theory guides the
choice of decomposition for a given relation
scheme. - Use the data dependencies for this if no data
dependencies, then no decomposition
11Functional Dependency 1
- Formalising Data Dependency
- Let R º R(A1, A2, ..., An) be a relation scheme,
and let X - and Y be subsets of A1, A2, ..., An . Think
of R as an - intensional relation with many possible
extensions. - Definition of functional dependency
- "X functionally determines Y" if, in every
extension - r of R, two tuples in r which agree on the
attribute set X - also agree on Y.
12Functional Dependency 2
- Vocabulary and notation for functional dependency
- "X functionally determines Y" if, in every
extension r of - R, two tuples in r which agree on the attribute
set X also - agree on Y
- Also say "Y functionally depends on X and denote
- dependency by X Y to be read as "X determines
Y"
13Functional Dependency 3
- Notes on functional dependency
- Functional dependency is a semantic notion
defined in terms of the intensional rather than
extensional data-base. Can only infer (possibly)
that certain functional dependencies do not hold
by inspecting the extensional part of a db. - The set of functional dependencies can be viewed
as a set of integrity constraints on the relation
scheme, and where possible it should be preserved
under decomposition.
14Functional Dependency 4
- An illustrative example
- Abbreviate names of attributes in
- SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE)
- to S, A, I, P respectively. Use notation
- S A, SI P
- to represent functional dependencies (abbrev.
FDs). - The set of ALL functional dependencies is usually
large - SI AP, SP SAP, S SA, SI SAIP
- are all examples of valid functional dependencies
15Functional Dependency 5
- If X is a set (modulo discounting the empty set)
- a functional dependency on X ? a pair of subsets
of X - ? each functional dependency on X ? ?(X) ? ?(X)
- If X is the basis of a relational scheme, then
the set of all functional dependencies on X is a
special kind of subset of ?(X) ? ?(X). - Set X has N attributes ? 22N instances of FDs on
set X - ? around 2 to the 2 to the 2N patterns of FD on
X - even though this set of patterns is constrained
16Functional Dependency 6
- Suppose that R(X) is a relational scheme based on
X - Number of FDs in a R(X) can be of size 2N, where
XN - ? is in general very inefficient, if not
infeasible, to list all the FDs in a relational
scheme on X - Number of patterns of FDs on a set X grows in a
doubly exponential manner with the size of X - ? relational tables are semantically
inconceivably rich
17Functional Dependency 7
- Two agendas for the study of functional
dependency - attempting to automate database design by finding
computationally feasible ways to manipulate FDs - investigating the principles that bind the
particular pattern of FDs in a relational scheme
to the real-world observables that it is intended
to model - Relational theory addresses both of these agendas
18Functional Dependency 8
- Number of FDs in a R(X) can be of size 2N, where
XN - Number of patterns of FDs on a set X grows in a
doubly exponential manner with the size of X - Problematic issues
- representing set of ALL FDs in a relational
scheme X - processing the set of ALL functional dependencies
- determining how patterns of FDs are constrained
- Solution
- find axioms to characterise functional dependency
- represent the set of all FDs as the set of FDs
that can be inferred from a basic set of FDs
19Functional Dependency 9
- Inference of Functional Dependencies
- Suppose R is a relation scheme, and F is a set of
functional dependencies for R. - If X, Y are subsets of attributes of R and all
relations r which satisfy the FDs in F also
satisfy X Y, then say F logically implies X
Y, written F X Y - e.g. if F A B, B C then F A C
- if F S A, SI P then
- F S I AP, F SP SAP etc.
20Functional Dependency 10
- Inference of Functional Dependencies (cont.)
- Definition
- if F is a set of FDs, then F ? X Y ½ F
X Y - Informally, F is set of ALL functional
dependencies that - can be inferred from the functional dependencies
in F. - ... usually F is much too large even to
enumerate!
21Functional Dependency 11
- More about the set F
- ... usually F is much too large even to
enumerate! - Example with 3 attributes, 2 FDs and 35
dependencies - If R ABC and F A B, B C then F is
- A S for all non-empty subsets S of ABC 7 FDs
- B BC, B B, B C 3 FDs
- C C 1 FD
- AB S for all n-e subsets S of ABC 7 FDs
- AC S for all n-e subsets S of ABC 7 FDs
- BC BC, BC B, BC C 3 FDs
- ABC S for all n-e subsets S of ABC 7 FDs
22Functional Dependency 12
- Can interpret keys in terms of functional
dependency ... - Definition of a superkey
- If R(A1, A2, ..., An) is a relation scheme, and
F is the set of FDs in R then a subset X of A1,
A2, ..., An is a superkey for R if F X Ai
for 1 i n - Definition of a key
- X is a key if no proper subset Y of X is a
superkey
23Armstrongs axioms for reasoning about FDs 1
- Armstrong 1974 proved following theorem
- Theorem F is the set of dependencies which can
be deduced from F by applying three inference
rules - (i) reflexivity if Y ? X ? U then X Y
- (ii) augmentation if X Y then XZ YZ (" Z ?
U) - (iii) transitivity if X Y and Y Z then X Z
- (i) (ii) and (iii) are Armstrong's axioms.
- The theorem asserts that Armstrongs axioms are a
- sound and complete set of inference rules.
24Armstrongs axioms for reasoning about FDs 2
- First prove the easy part of Armstrongs Theorem
- Axioms are sound if Y ? X ? U then X Y
- if X Y, then XZ YZ (" Z ? U)
- if X Y and Y Z then X Z
- (i) if r is a relation with two tuples s and t
that agree on X, and Y ? X, then s and t agree on
Y - (ii) if r is a relation with two tuples s and t
that agree on X and Z and X Y, then s and t
must agree on Y and Z - (iii) if r is a relation with two tuples s and t
that agree on X and both X Y and Y Z then s
and t must agree on Y, and hence they must agree
on Z
25Armstrongs axioms for reasoning about FDs 3
- Completeness of Armstrongs axioms
- if Y ? X ? U then X Y
- if X Y, then XZ YZ (" Z ? U)
- if X Y and Y Z then X Z
- First define X, the closure of X ? U with
respect to F - X is the set of attributes A such that X A
can be deduced from Armstrongs axioms. - Note that we can deduce that X Y for some set Y
by consulting Armstrongs axioms if and only if Y
? X - if Y contains A then X Y and Y A, so X A
- if X A and X B, then XXX XA AX AB
26Armstrongs axioms for reasoning about FDs 4
- Completeness of Armstrongs axioms (continued)
- if Y ? X ? U then X Y
- if X Y, then XZ YZ (" Z ? U)
- if X Y and Y Z then X Z
- To establish completeness, it is sufficient to
show - if X Y cannot be deduced from Armstrongs
axioms (the syntactic characterisation of ),
then there is a relational extension for R in
which all the dependencies in F are true, but X
Y does not hold (the semantic characterisation of
)
27Armstrongs axioms for reasoning about FDs 5
- Completeness of Armstrongs axioms (continued)
- Suppose cant deduce X Y from Armstrongs
axioms - Consider the relational extension R0 for R with 2
tuples - attributes of X other attributes
- 1 1 1 1 1 1
- 1 1 1 0 0 0
- (assuming boolean attributes, or more generally
that the two tuples agree on X but disagree
elsewhere) - Need to establish that R0 is a valid extension of
R i.e. that the dependencies in F are respected
in R0.
28Armstrongs axioms for reasoning about FDs 6
- Completeness of Armstrongs axioms (continued)
- Consider the relational extension R0 for R with
2 tuples - attributes of X other attributes
- 1 1 1 1 1 1
- 1 1 1 0 0 0
- Check that all the dependencies in F are true
- Suppose that V W is a dependency in F
- If V is not a subset of X, the dependency holds
in R0 - If V is a subset of X, then both X V, and X
W can be deduced by Armstrongs axioms. This
means that W is a subset of X, and thus V W
holds in R0
29Armstrongs axioms for reasoning about FDs 7
- Completeness of Armstrongs axioms (continued)
- Consider the relational extension R0 for R with
2 tuples - attributes of X other attributes
- 1 1 1 1 1 1
- 1 1 1 0 0 0
- Confirm that X Y does not hold in R0
- Recall that we can deduce that X Y for some set
Y by consulting Armstrongs axioms if and only if
Y ? X. - By assumption, we cant deduce that X Y holds
in R0. - Hence Y contains an attribute not in the subset
X, confirming that X Y does not hold in R0.
30Computational aspects of dependencies 1
- Computing the closure of an attribute set
- Given relation scheme R with set of attributes U
and a set of dependencies F involving attributes
in U - The closure of the set of attributes X from U is
- X ? A ½ F X A
- Note that X is a (small) set of attributes,
unlike F, - which is a (large) set of functional
dependencies. - can't compute F efficiently, but can compute
X.
31Computational aspects of dependencies 2
- Computing the closure of an attribute set (cont.)
- Can't compute F efficiently, but can compute X
- Construct sequence of attribute sets
- X(0) X
- X(i1) X(i) È A ½ V W Î F, V ? X(i), A Î W
- Since number of attributes in U is finite, get
X(j1) X(j), and have X X(j) at this point. - Complexity of this construction is O(F.R)
need to check whether X ? X(i) in an efficient
way, but otherwise amount of computation
determined by number of dependencies processed at
each iteration and number of iterations (
number of attributes in U)
32Computational aspects of dependencies 3
- Illustrative example
- Take F to be the set of dependencies
- AB C ACD B CG BD
- C A D EG CE AG
- BC D BE C
- Let X BD. Then can compute X thus
- X(0) BD, X(1) BDEGBEDG, X(2) BECDGBCEDG,
- X(3) BCEAGDGABCDEG X(4).
- Computation of this type is used for key checking
etc
33Computational aspects of dependencies 4
- Representing the set of dependencies
- Obviously better to work with F rather than F.
- Issue
- How small and simple can we choose G for G F?
- ... if G F, describe this as G generates F
- ... if G Ê F, say that G covers F
- motivates the definition of a minimal cover ...
34Computational aspects of dependencies 5
- Representing the set of dependencies (cont.)
- Definition G is a minimal cover for F if
- a) G F
- b) every RHS in G is a single attribute
- c) every LHS minimal subject to determining RHS
- i.e. for no X Y Î G is there a proper subset
Z of X such that G \ X Y È Z Y
generates F - d) no proper subset of G also generates F.
- Minimal cover isn't necessarily unique.
35Computational aspects of dependencies 6
- Computing a minimal cover for F
- Have a reasonably efficient algorithm for
computing a minimal cover. Reasonably means
efficient relative to the size of the given set F
that generates F. - Can't expect efficiency irrespective of size of
F. - Minimal cover isn't necessarily unique the
algorithm to compute a minimal cover is
non-deterministic (that is to say, it involves
arbitrary choices)
36Computational aspects of dependencies 7
- Computing a minimal cover for F (cont.)
- Input to the algorithm is a set F that generates
F. - Transform F in 3 stages
- 1. replace X Y by the X A such that A Î Y
- 2. check that no X A is redundant
- 3. eliminate any redundant attribute of X in X
A - Need to carry out each step efficiently ...
37Computational aspects of dependencies 8
- Computing a minimal cover for F (cont.)
- 1. replace X Y by the X A such that A Î
Y - 2. check that no X A is redundant
- 3. eliminate any redundant attribute of X in X
A - In Stage 1, have already transformed F so that
all dependencies are of the form X A, where
A1. - To check for redundancy, take each dependency X
A of F in turn, and compute the closure X of X
relative to the set of dependencies F \ X A.
If A Î X, then discard X A from F, and
continue the stage 2 redundancy checking with the
new F.
38Computational aspects of dependencies 9
- Computing a minimal cover for F (cont.)
- 1. replace X Y by the X A such that A Î
Y - 2. check that no X A is redundant
- 3. eliminate any redundant attribute of X in X
A - At Stage 3, can assume that all dependencies in F
are of the form X A, where A1, and no X A
dependency is redundant. - For each dependency X A in turn, and for each
attribute B in X, compute the closure (X \ B)
relative to the set of dependencies F. If AÎ(X \
B), then replace X A by X \ B A in F,
and continue the Stage 3 redundancy checking with
the new F.
39Computational aspects of dependencies 10
- Computing a minimal cover for F (cont.)
- Note eliminating redundancy at Stage 3 relies on
the fact that if X \ B A, then X A, but in
general the converse implication is false. - From this fact it follows that if X \ B A can
be inferred from the dependencies in F then X A
can be replaced by X \ B A - Algorithm is tolerably efficient (ex.), since
none of this checking involves explicit
computation with F.
40Computational aspects of dependencies 11
- Illustrative example
- Find a minimal cover for F, where F is
- AB C ACD B CG BD C A
- D EG CE AG BC D BE C
- Stage 1 transforms F to form
- AB C ACD B CG B CG D
- C A D E D G CE A
- CE G BC D BE C
41Computational aspects of dependencies 12
- Illustrative example of computing a minimal cover
... - AB C ACD B CG B CG D
- C A D E D G CE A
- CE G BC D BE C
- Stage 2 finds that CE A is redundant, since C
A. - After CE A eliminated, there is still
redundancy - CG B can be derived from the 3 dependencies
- CG D, C A and ACD B
- There is no further redundancy
- Thus CE A and CG B are eliminated at Stage 2.
42Computational aspects of dependencies 13
- Illustrative example of computing a minimal cover
... - AB C ACD B CG D ... after Stage 2
- C A D E D G
- CE G BC D BE C
- Stage 3 replaces ACD B by CD B, since C A
- and as a result (CD) relative to F contains B
- AB C CD B CG D
- C A D E D G minimal cover
- CE G BC D BE C
- To get another minimal cover (ex.) at Stage 2
eliminate - ACD B, CG D, CE A