Relational Database Design Theory presentation

About This Presentation

Transcript and Presenter's Notes

Title: Relational Database Design Theory

1
Relational Database Design Theory

Database Design Concepts
Update, deletion and insertion anomalies
Functional dependency

2
Database Design Concepts 0

The attributes in a database are determined by
the set
of observables that determine the state of the
enterprise
These attributes can in general be clustered into
subsets according to the contexts for observation
in
which they are encountered
Database design is concerned with how attributes
can
be organised into plausible clusters of
attributes as
relations within a relational scheme more
formally ...

3
Database Design Concepts 1

A relational database scheme can be regarded as
defined by an abstract relation R(A1, A2, ...,
An) where
A1, A2, ..., An are all the attributes of
interest. This is
simplest possible relation scheme for the
database.
In general, when setting up the database,
decompose
the relation R to form a new relational scheme.
Issues
what is the nature of this decomposition?
does it matter what is associated in tables?
if so, what principles apply to the design of
tables?

4
Database Design Concepts 2

Answer
the association of attributes should be guided by
data dependency
failure to take account of dependency creates
anomalies on modifying the database
anomalies can be largely eliminated through an
appropriate choice of tables and normalisation
cf. an object-oriented approach, where attributes
are grouped according to the objects that contain
them

5
Database Design Concepts 3

The relational database design problem
Any relational database could in principle be
organised as a single relation R(A1, A2, ..., An)
where A1, A2, ..., An are all the attributes of
interest.
Why not do it this way?
convenience easy to handle small relations
... but also need to consider semantics of
attributes
In practice, typically choose to decompose R into
a set of smaller relations R1, R2, ..., Rn, where
the set of attributes in R the set of
attributes in R1, R2, ..., Rn.

6
Database Design Concepts 4

Illustrative example of design
E.g. could take HVFC as comprising a single
relation
HVFC (NAME, ADDRESS, BALANCE, ORDER_NO,
ITEM, QUANTITY, SNAME, SADDRESS, PRICE)
defined by the natural join of the relations
MEMBERS(NAME, ADDRESS, BALANCE)
ORDERS(ORDER_NO, NAME, ITEM, QUANTITY)
SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE)
No information loss involved in this can recover
all the original relations by projection ...

7
Database Design Concepts 5

Data dependencies in the HVFC
In HVFC DB, can infer some attributes from
others
e.g. know SNAME Þ know SADDRESS
know SNAME and ITEM Þ know PRICE
NB knowledge of such dependencies is derived from
real-world meaning of the attributes
This informs good decomposition of a relational
scheme

8
Database Design Concepts 6

E.g. if we have a single relation for HVFC, then
every tuple with SNAME "S" has SADDRESS "A"
Þ very high level of redundancy in the table
Actually have this redundancy in the relation
SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE)
SMITH LONDON NUT 20p
SMITH LONDON BOLT 25p
JONES HULL PIN 15p
.......
... this redundancy has consequences for update

9
Database Design Concepts 7

Anomalies and the SUPPLIER relation
Form of the SUPPLIER relation leads to anomalies
on update, deletion and insertion
update anomalies when supplier address is
updated, must be updated in all tuples, else two
addresses recorded for same supplier
insertion anomalies can't record info on
supplier unless he supplies part, whence ....
deletion anomalies delete all items supplied by
one supplier and you lose supplier's address.

10
Database Design Concepts 8

A better design of the HVFC relation scheme will
resolve these problems. Thus decomposition
SA (SNAME, SADDRESS)
SIP (SNAME, ITEM, PRICE)
resolves the anomalies.
Relational database design theory guides the
choice of decomposition for a given relation
scheme.
Use the data dependencies for this if no data
dependencies, then no decomposition

11
Functional Dependency 1

Formalising Data Dependency
Let R º R(A1, A2, ..., An) be a relation scheme,
and let X
and Y be subsets of A1, A2, ..., An . Think
of R as an
intensional relation with many possible
extensions.
Definition of functional dependency
"X functionally determines Y" if, in every
extension
r of R, two tuples in r which agree on the
attribute set X
also agree on Y.

12
Functional Dependency 2

Vocabulary and notation for functional dependency
"X functionally determines Y" if, in every
extension r of
R, two tuples in r which agree on the attribute
set X also
agree on Y
Also say "Y functionally depends on X and denote
dependency by X Y to be read as "X determines
Y"

13
Functional Dependency 3

Notes on functional dependency
Functional dependency is a semantic notion
defined in terms of the intensional rather than
extensional data-base. Can only infer (possibly)
that certain functional dependencies do not hold
by inspecting the extensional part of a db.
The set of functional dependencies can be viewed
as a set of integrity constraints on the relation
scheme, and where possible it should be preserved
under decomposition.

14
Functional Dependency 4

An illustrative example
Abbreviate names of attributes in
SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE)
to S, A, I, P respectively. Use notation
S A, SI P
to represent functional dependencies (abbrev.
FDs).
The set of ALL functional dependencies is usually
large
SI AP, SP SAP, S SA, SI SAIP
are all examples of valid functional dependencies

15
Functional Dependency 5

If X is a set (modulo discounting the empty set)
a functional dependency on X ? a pair of subsets
of X
? each functional dependency on X ? ?(X) ? ?(X)
If X is the basis of a relational scheme, then
the set of all functional dependencies on X is a
special kind of subset of ?(X) ? ?(X).
Set X has N attributes ? 22N instances of FDs on
set X
? around 2 to the 2 to the 2N patterns of FD on
X
even though this set of patterns is constrained

16
Functional Dependency 6

Suppose that R(X) is a relational scheme based on
X
Number of FDs in a R(X) can be of size 2N, where
XN
? is in general very inefficient, if not
infeasible, to list all the FDs in a relational
scheme on X
Number of patterns of FDs on a set X grows in a
doubly exponential manner with the size of X
? relational tables are semantically
inconceivably rich

17
Functional Dependency 7

Two agendas for the study of functional
dependency
attempting to automate database design by finding
computationally feasible ways to manipulate FDs
investigating the principles that bind the
particular pattern of FDs in a relational scheme
to the real-world observables that it is intended
to model
Relational theory addresses both of these agendas

18
Functional Dependency 8

Number of FDs in a R(X) can be of size 2N, where
XN
Number of patterns of FDs on a set X grows in a
doubly exponential manner with the size of X
Problematic issues
representing set of ALL FDs in a relational
scheme X
processing the set of ALL functional dependencies
determining how patterns of FDs are constrained
Solution
find axioms to characterise functional dependency
represent the set of all FDs as the set of FDs
that can be inferred from a basic set of FDs

19
Functional Dependency 9

Inference of Functional Dependencies
Suppose R is a relation scheme, and F is a set of
functional dependencies for R.
If X, Y are subsets of attributes of R and all
relations r which satisfy the FDs in F also
satisfy X Y, then say F logically implies X
Y, written F X Y
e.g. if F A B, B C then F A C
if F S A, SI P then
F S I AP, F SP SAP etc.

20
Functional Dependency 10

Inference of Functional Dependencies (cont.)
Definition
if F is a set of FDs, then F ? X Y ½ F
X Y
Informally, F is set of ALL functional
dependencies that
can be inferred from the functional dependencies
in F.
... usually F is much too large even to
enumerate!

21
Functional Dependency 11

More about the set F
... usually F is much too large even to
enumerate!
Example with 3 attributes, 2 FDs and 35
dependencies
If R ABC and F A B, B C then F is
A S for all non-empty subsets S of ABC 7 FDs
B BC, B B, B C 3 FDs
C C 1 FD
AB S for all n-e subsets S of ABC 7 FDs
AC S for all n-e subsets S of ABC 7 FDs
BC BC, BC B, BC C 3 FDs
ABC S for all n-e subsets S of ABC 7 FDs

22
Functional Dependency 12

Can interpret keys in terms of functional
dependency ...
Definition of a superkey
If R(A1, A2, ..., An) is a relation scheme, and
F is the set of FDs in R then a subset X of A1,
A2, ..., An is a superkey for R if F X Ai
for 1 i n
Definition of a key
X is a key if no proper subset Y of X is a
superkey

23
Armstrongs axioms for reasoning about FDs 1

Armstrong 1974 proved following theorem
Theorem F is the set of dependencies which can
be deduced from F by applying three inference
rules
(i) reflexivity if Y ? X ? U then X Y
(ii) augmentation if X Y then XZ YZ (" Z ?
U)
(iii) transitivity if X Y and Y Z then X Z
(i) (ii) and (iii) are Armstrong's axioms.
The theorem asserts that Armstrongs axioms are a
sound and complete set of inference rules.

24
Armstrongs axioms for reasoning about FDs 2

First prove the easy part of Armstrongs Theorem
Axioms are sound if Y ? X ? U then X Y
if X Y, then XZ YZ (" Z ? U)
if X Y and Y Z then X Z
(i) if r is a relation with two tuples s and t
that agree on X, and Y ? X, then s and t agree on
Y
(ii) if r is a relation with two tuples s and t
that agree on X and Z and X Y, then s and t
must agree on Y and Z
(iii) if r is a relation with two tuples s and t
that agree on X and both X Y and Y Z then s
and t must agree on Y, and hence they must agree
on Z

25
Armstrongs axioms for reasoning about FDs 3

Completeness of Armstrongs axioms
if Y ? X ? U then X Y
if X Y, then XZ YZ (" Z ? U)
if X Y and Y Z then X Z
First define X, the closure of X ? U with
respect to F
X is the set of attributes A such that X A
can be deduced from Armstrongs axioms.
Note that we can deduce that X Y for some set Y
by consulting Armstrongs axioms if and only if Y
? X
if Y contains A then X Y and Y A, so X A
if X A and X B, then XXX XA AX AB

26
Armstrongs axioms for reasoning about FDs 4

Completeness of Armstrongs axioms (continued)
if Y ? X ? U then X Y
if X Y, then XZ YZ (" Z ? U)
if X Y and Y Z then X Z
To establish completeness, it is sufficient to
show
if X Y cannot be deduced from Armstrongs
axioms (the syntactic characterisation of ),
then there is a relational extension for R in
which all the dependencies in F are true, but X
Y does not hold (the semantic characterisation of
)

27
Armstrongs axioms for reasoning about FDs 5

Completeness of Armstrongs axioms (continued)
Suppose cant deduce X Y from Armstrongs
axioms
Consider the relational extension R0 for R with 2
tuples
attributes of X other attributes
1 1 1 1 1 1
1 1 1 0 0 0
(assuming boolean attributes, or more generally
that the two tuples agree on X but disagree
elsewhere)
Need to establish that R0 is a valid extension of
R i.e. that the dependencies in F are respected
in R0.

28
Armstrongs axioms for reasoning about FDs 6

Completeness of Armstrongs axioms (continued)
Consider the relational extension R0 for R with
2 tuples
attributes of X other attributes
1 1 1 1 1 1
1 1 1 0 0 0
Check that all the dependencies in F are true
Suppose that V W is a dependency in F
If V is not a subset of X, the dependency holds
in R0
If V is a subset of X, then both X V, and X
W can be deduced by Armstrongs axioms. This
means that W is a subset of X, and thus V W
holds in R0

29
Armstrongs axioms for reasoning about FDs 7

Completeness of Armstrongs axioms (continued)
Consider the relational extension R0 for R with
2 tuples
attributes of X other attributes
1 1 1 1 1 1
1 1 1 0 0 0
Confirm that X Y does not hold in R0
Recall that we can deduce that X Y for some set
Y by consulting Armstrongs axioms if and only if
Y ? X.
By assumption, we cant deduce that X Y holds
in R0.
Hence Y contains an attribute not in the subset
X, confirming that X Y does not hold in R0.

30
Computational aspects of dependencies 1

Computing the closure of an attribute set
Given relation scheme R with set of attributes U
and a set of dependencies F involving attributes
in U
The closure of the set of attributes X from U is
X ? A ½ F X A
Note that X is a (small) set of attributes,
unlike F,
which is a (large) set of functional
dependencies.
can't compute F efficiently, but can compute
X.

31
Computational aspects of dependencies 2

Computing the closure of an attribute set (cont.)
Can't compute F efficiently, but can compute X
Construct sequence of attribute sets
X(0) X
X(i1) X(i) È A ½ V W Î F, V ? X(i), A Î W
Since number of attributes in U is finite, get
X(j1) X(j), and have X X(j) at this point.
Complexity of this construction is O(F.R)
need to check whether X ? X(i) in an efficient
way, but otherwise amount of computation
determined by number of dependencies processed at
each iteration and number of iterations (
number of attributes in U)

32
Computational aspects of dependencies 3

Illustrative example
Take F to be the set of dependencies
AB C ACD B CG BD
C A D EG CE AG
BC D BE C
Let X BD. Then can compute X thus
X(0) BD, X(1) BDEGBEDG, X(2) BECDGBCEDG,
X(3) BCEAGDGABCDEG X(4).
Computation of this type is used for key checking
etc

33
Computational aspects of dependencies 4

Representing the set of dependencies
Obviously better to work with F rather than F.
Issue
How small and simple can we choose G for G F?
... if G F, describe this as G generates F
... if G Ê F, say that G covers F
motivates the definition of a minimal cover ...

34
Computational aspects of dependencies 5

Representing the set of dependencies (cont.)
Definition G is a minimal cover for F if
a) G F
b) every RHS in G is a single attribute
c) every LHS minimal subject to determining RHS
i.e. for no X Y Î G is there a proper subset
Z of X such that G \ X Y È Z Y
generates F
d) no proper subset of G also generates F.
Minimal cover isn't necessarily unique.

35
Computational aspects of dependencies 6

Computing a minimal cover for F
Have a reasonably efficient algorithm for
computing a minimal cover. Reasonably means
efficient relative to the size of the given set F
that generates F.
Can't expect efficiency irrespective of size of
F.
Minimal cover isn't necessarily unique the
algorithm to compute a minimal cover is
non-deterministic (that is to say, it involves
arbitrary choices)

36
Computational aspects of dependencies 7

Computing a minimal cover for F (cont.)
Input to the algorithm is a set F that generates
F.
Transform F in 3 stages
1. replace X Y by the X A such that A Î Y
2. check that no X A is redundant
3. eliminate any redundant attribute of X in X
A
Need to carry out each step efficiently ...

37
Computational aspects of dependencies 8

Computing a minimal cover for F (cont.)
1. replace X Y by the X A such that A Î
Y
2. check that no X A is redundant
3. eliminate any redundant attribute of X in X
A
In Stage 1, have already transformed F so that
all dependencies are of the form X A, where
A1.
To check for redundancy, take each dependency X
A of F in turn, and compute the closure X of X
relative to the set of dependencies F \ X A.
If A Î X, then discard X A from F, and
continue the stage 2 redundancy checking with the
new F.

38
Computational aspects of dependencies 9

Computing a minimal cover for F (cont.)
1. replace X Y by the X A such that A Î
Y
2. check that no X A is redundant
3. eliminate any redundant attribute of X in X
A
At Stage 3, can assume that all dependencies in F
are of the form X A, where A1, and no X A
dependency is redundant.
For each dependency X A in turn, and for each
attribute B in X, compute the closure (X \ B)
relative to the set of dependencies F. If AÎ(X \
B), then replace X A by X \ B A in F,
and continue the Stage 3 redundancy checking with
the new F.

39
Computational aspects of dependencies 10

Computing a minimal cover for F (cont.)
Note eliminating redundancy at Stage 3 relies on
the fact that if X \ B A, then X A, but in
general the converse implication is false.
From this fact it follows that if X \ B A can
be inferred from the dependencies in F then X A
can be replaced by X \ B A
Algorithm is tolerably efficient (ex.), since
none of this checking involves explicit
computation with F.

40
Computational aspects of dependencies 11

Illustrative example
Find a minimal cover for F, where F is
AB C ACD B CG BD C A
D EG CE AG BC D BE C
Stage 1 transforms F to form
AB C ACD B CG B CG D
C A D E D G CE A
CE G BC D BE C

41
Computational aspects of dependencies 12

Illustrative example of computing a minimal cover
...
AB C ACD B CG B CG D
C A D E D G CE A
CE G BC D BE C
Stage 2 finds that CE A is redundant, since C
A.
After CE A eliminated, there is still
redundancy
CG B can be derived from the 3 dependencies
CG D, C A and ACD B
There is no further redundancy
Thus CE A and CG B are eliminated at Stage 2.

42
Computational aspects of dependencies 13

Illustrative example of computing a minimal cover
...
AB C ACD B CG D ... after Stage 2
C A D E D G
CE G BC D BE C
Stage 3 replaces ACD B by CD B, since C A
and as a result (CD) relative to F contains B
AB C CD B CG D
C A D E D G minimal cover
CE G BC D BE C
To get another minimal cover (ex.) at Stage 2
eliminate
ACD B, CG D, CE A

Write a Comment

User Comments (0)

Relational Database Design Theory PowerPoint PPT Presentation