Relational Database Design Theory PowerPoint PPT Presentation

presentation player overlay
1 / 42
About This Presentation
Transcript and Presenter's Notes

Title: Relational Database Design Theory


1
Relational Database Design Theory
  • Database Design Concepts
  • Update, deletion and insertion anomalies
  • Functional dependency

2
Database Design Concepts 0
  • The attributes in a database are determined by
    the set
  • of observables that determine the state of the
    enterprise
  • These attributes can in general be clustered into
  • subsets according to the contexts for observation
    in
  • which they are encountered
  • Database design is concerned with how attributes
    can
  • be organised into plausible clusters of
    attributes as
  • relations within a relational scheme more
    formally ...

3
Database Design Concepts 1
  • A relational database scheme can be regarded as
  • defined by an abstract relation R(A1, A2, ...,
    An) where
  • A1, A2, ..., An are all the attributes of
    interest. This is
  • simplest possible relation scheme for the
    database.
  • In general, when setting up the database,
    decompose
  • the relation R to form a new relational scheme.
  • Issues
  • what is the nature of this decomposition?
  • does it matter what is associated in tables?
  • if so, what principles apply to the design of
    tables?

4
Database Design Concepts 2
  • Answer
  • the association of attributes should be guided by
    data dependency
  • failure to take account of dependency creates
    anomalies on modifying the database
  • anomalies can be largely eliminated through an
    appropriate choice of tables and normalisation
  • cf. an object-oriented approach, where attributes
    are grouped according to the objects that contain
    them

5
Database Design Concepts 3
  • The relational database design problem
  • Any relational database could in principle be
    organised as a single relation R(A1, A2, ..., An)
    where A1, A2, ..., An are all the attributes of
    interest.
  • Why not do it this way?
  • convenience easy to handle small relations
  • ... but also need to consider semantics of
    attributes
  • In practice, typically choose to decompose R into
    a set of smaller relations R1, R2, ..., Rn, where
    the set of attributes in R the set of
    attributes in R1, R2, ..., Rn.

6
Database Design Concepts 4
  • Illustrative example of design
  • E.g. could take HVFC as comprising a single
    relation
  • HVFC (NAME, ADDRESS, BALANCE, ORDER_NO,
  • ITEM, QUANTITY, SNAME, SADDRESS, PRICE)
  • defined by the natural join of the relations
  • MEMBERS(NAME, ADDRESS, BALANCE)
  • ORDERS(ORDER_NO, NAME, ITEM, QUANTITY)
  • SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE)
  • No information loss involved in this can recover
    all the original relations by projection ...

7
Database Design Concepts 5
  • Data dependencies in the HVFC
  • In HVFC DB, can infer some attributes from
    others
  • e.g. know SNAME Þ know SADDRESS
  • know SNAME and ITEM Þ know PRICE
  • NB knowledge of such dependencies is derived from
    real-world meaning of the attributes
  • This informs good decomposition of a relational
    scheme

8
Database Design Concepts 6
  • E.g. if we have a single relation for HVFC, then
  • every tuple with SNAME "S" has SADDRESS "A"
  • Þ very high level of redundancy in the table
  • Actually have this redundancy in the relation
  • SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE)
  • SMITH LONDON NUT 20p
  • SMITH LONDON BOLT 25p
  • JONES HULL PIN 15p
  • .......
  • ... this redundancy has consequences for update

9
Database Design Concepts 7
  • Anomalies and the SUPPLIER relation
  • Form of the SUPPLIER relation leads to anomalies
    on update, deletion and insertion
  • update anomalies when supplier address is
    updated, must be updated in all tuples, else two
    addresses recorded for same supplier
  • insertion anomalies can't record info on
    supplier unless he supplies part, whence ....
  • deletion anomalies delete all items supplied by
    one supplier and you lose supplier's address.

10
Database Design Concepts 8
  • A better design of the HVFC relation scheme will
  • resolve these problems. Thus decomposition
  • SA (SNAME, SADDRESS)
  • SIP (SNAME, ITEM, PRICE)
  • resolves the anomalies.
  • Relational database design theory guides the
    choice of decomposition for a given relation
    scheme.
  • Use the data dependencies for this if no data
    dependencies, then no decomposition

11
Functional Dependency 1
  • Formalising Data Dependency
  • Let R º R(A1, A2, ..., An) be a relation scheme,
    and let X
  • and Y be subsets of A1, A2, ..., An . Think
    of R as an
  • intensional relation with many possible
    extensions.
  • Definition of functional dependency
  • "X functionally determines Y" if, in every
    extension
  • r of R, two tuples in r which agree on the
    attribute set X
  • also agree on Y.

12
Functional Dependency 2
  • Vocabulary and notation for functional dependency
  • "X functionally determines Y" if, in every
    extension r of
  • R, two tuples in r which agree on the attribute
    set X also
  • agree on Y
  • Also say "Y functionally depends on X and denote
  • dependency by X Y to be read as "X determines
    Y"

13
Functional Dependency 3
  • Notes on functional dependency
  • Functional dependency is a semantic notion
    defined in terms of the intensional rather than
    extensional data-base. Can only infer (possibly)
    that certain functional dependencies do not hold
    by inspecting the extensional part of a db.
  • The set of functional dependencies can be viewed
    as a set of integrity constraints on the relation
    scheme, and where possible it should be preserved
    under decomposition.

14
Functional Dependency 4
  • An illustrative example
  • Abbreviate names of attributes in
  • SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE)
  • to S, A, I, P respectively. Use notation
  • S A, SI P
  • to represent functional dependencies (abbrev.
    FDs).
  • The set of ALL functional dependencies is usually
    large
  • SI AP, SP SAP, S SA, SI SAIP
  • are all examples of valid functional dependencies

15
Functional Dependency 5
  • If X is a set (modulo discounting the empty set)
  • a functional dependency on X ? a pair of subsets
    of X
  • ? each functional dependency on X ? ?(X) ? ?(X)
  • If X is the basis of a relational scheme, then
    the set of all functional dependencies on X is a
    special kind of subset of ?(X) ? ?(X).
  • Set X has N attributes ? 22N instances of FDs on
    set X
  • ? around 2 to the 2 to the 2N patterns of FD on
    X
  • even though this set of patterns is constrained

16
Functional Dependency 6
  • Suppose that R(X) is a relational scheme based on
    X
  • Number of FDs in a R(X) can be of size 2N, where
    XN
  • ? is in general very inefficient, if not
    infeasible, to list all the FDs in a relational
    scheme on X
  • Number of patterns of FDs on a set X grows in a
    doubly exponential manner with the size of X
  • ? relational tables are semantically
    inconceivably rich

17
Functional Dependency 7
  • Two agendas for the study of functional
    dependency
  • attempting to automate database design by finding
    computationally feasible ways to manipulate FDs
  • investigating the principles that bind the
    particular pattern of FDs in a relational scheme
    to the real-world observables that it is intended
    to model
  • Relational theory addresses both of these agendas

18
Functional Dependency 8
  • Number of FDs in a R(X) can be of size 2N, where
    XN
  • Number of patterns of FDs on a set X grows in a
    doubly exponential manner with the size of X
  • Problematic issues
  • representing set of ALL FDs in a relational
    scheme X
  • processing the set of ALL functional dependencies
  • determining how patterns of FDs are constrained
  • Solution
  • find axioms to characterise functional dependency
  • represent the set of all FDs as the set of FDs
    that can be inferred from a basic set of FDs

19
Functional Dependency 9
  • Inference of Functional Dependencies
  • Suppose R is a relation scheme, and F is a set of
    functional dependencies for R.
  • If X, Y are subsets of attributes of R and all
    relations r which satisfy the FDs in F also
    satisfy X Y, then say F logically implies X
    Y, written F X Y
  • e.g. if F A B, B C then F A C
  • if F S A, SI P then
  • F S I AP, F SP SAP etc.

20
Functional Dependency 10
  • Inference of Functional Dependencies (cont.)
  • Definition
  • if F is a set of FDs, then F ? X Y ½ F
    X Y
  • Informally, F is set of ALL functional
    dependencies that
  • can be inferred from the functional dependencies
    in F.
  • ... usually F is much too large even to
    enumerate!

21
Functional Dependency 11
  • More about the set F
  • ... usually F is much too large even to
    enumerate!
  • Example with 3 attributes, 2 FDs and 35
    dependencies
  • If R ABC and F A B, B C then F is
  • A S for all non-empty subsets S of ABC 7 FDs
  • B BC, B B, B C 3 FDs
  • C C 1 FD
  • AB S for all n-e subsets S of ABC 7 FDs
  • AC S for all n-e subsets S of ABC 7 FDs
  • BC BC, BC B, BC C 3 FDs
  • ABC S for all n-e subsets S of ABC 7 FDs

22
Functional Dependency 12
  • Can interpret keys in terms of functional
    dependency ...
  • Definition of a superkey
  • If R(A1, A2, ..., An) is a relation scheme, and
    F is the set of FDs in R then a subset X of A1,
    A2, ..., An is a superkey for R if F X Ai
    for 1 i n
  • Definition of a key
  • X is a key if no proper subset Y of X is a
    superkey

23
Armstrongs axioms for reasoning about FDs 1
  • Armstrong 1974 proved following theorem
  • Theorem F is the set of dependencies which can
    be deduced from F by applying three inference
    rules
  • (i) reflexivity if Y ? X ? U then X Y
  • (ii) augmentation if X Y then XZ YZ (" Z ?
    U)
  • (iii) transitivity if X Y and Y Z then X Z
  • (i) (ii) and (iii) are Armstrong's axioms.
  • The theorem asserts that Armstrongs axioms are a
  • sound and complete set of inference rules.

24
Armstrongs axioms for reasoning about FDs 2
  • First prove the easy part of Armstrongs Theorem
  • Axioms are sound if Y ? X ? U then X Y
  • if X Y, then XZ YZ (" Z ? U)
  • if X Y and Y Z then X Z
  • (i) if r is a relation with two tuples s and t
    that agree on X, and Y ? X, then s and t agree on
    Y
  • (ii) if r is a relation with two tuples s and t
    that agree on X and Z and X Y, then s and t
    must agree on Y and Z
  • (iii) if r is a relation with two tuples s and t
    that agree on X and both X Y and Y Z then s
    and t must agree on Y, and hence they must agree
    on Z

25
Armstrongs axioms for reasoning about FDs 3
  • Completeness of Armstrongs axioms
  • if Y ? X ? U then X Y
  • if X Y, then XZ YZ (" Z ? U)
  • if X Y and Y Z then X Z
  • First define X, the closure of X ? U with
    respect to F
  • X is the set of attributes A such that X A
    can be deduced from Armstrongs axioms.
  • Note that we can deduce that X Y for some set Y
    by consulting Armstrongs axioms if and only if Y
    ? X
  • if Y contains A then X Y and Y A, so X A
  • if X A and X B, then XXX XA AX AB

26
Armstrongs axioms for reasoning about FDs 4
  • Completeness of Armstrongs axioms (continued)
  • if Y ? X ? U then X Y
  • if X Y, then XZ YZ (" Z ? U)
  • if X Y and Y Z then X Z
  • To establish completeness, it is sufficient to
    show
  • if X Y cannot be deduced from Armstrongs
    axioms (the syntactic characterisation of ),
    then there is a relational extension for R in
    which all the dependencies in F are true, but X
    Y does not hold (the semantic characterisation of
    )

27
Armstrongs axioms for reasoning about FDs 5
  • Completeness of Armstrongs axioms (continued)
  • Suppose cant deduce X Y from Armstrongs
    axioms
  • Consider the relational extension R0 for R with 2
    tuples
  • attributes of X other attributes
  • 1 1 1 1 1 1
  • 1 1 1 0 0 0
  • (assuming boolean attributes, or more generally
    that the two tuples agree on X but disagree
    elsewhere)
  • Need to establish that R0 is a valid extension of
    R i.e. that the dependencies in F are respected
    in R0.

28
Armstrongs axioms for reasoning about FDs 6
  • Completeness of Armstrongs axioms (continued)
  • Consider the relational extension R0 for R with
    2 tuples
  • attributes of X other attributes
  • 1 1 1 1 1 1
  • 1 1 1 0 0 0
  • Check that all the dependencies in F are true
  • Suppose that V W is a dependency in F
  • If V is not a subset of X, the dependency holds
    in R0
  • If V is a subset of X, then both X V, and X
    W can be deduced by Armstrongs axioms. This
    means that W is a subset of X, and thus V W
    holds in R0

29
Armstrongs axioms for reasoning about FDs 7
  • Completeness of Armstrongs axioms (continued)
  • Consider the relational extension R0 for R with
    2 tuples
  • attributes of X other attributes
  • 1 1 1 1 1 1
  • 1 1 1 0 0 0
  • Confirm that X Y does not hold in R0
  • Recall that we can deduce that X Y for some set
    Y by consulting Armstrongs axioms if and only if
    Y ? X.
  • By assumption, we cant deduce that X Y holds
    in R0.
  • Hence Y contains an attribute not in the subset
    X, confirming that X Y does not hold in R0.

30
Computational aspects of dependencies 1
  • Computing the closure of an attribute set
  • Given relation scheme R with set of attributes U
    and a set of dependencies F involving attributes
    in U
  • The closure of the set of attributes X from U is
  • X ? A ½ F X A
  • Note that X is a (small) set of attributes,
    unlike F,
  • which is a (large) set of functional
    dependencies.
  • can't compute F efficiently, but can compute
    X.

31
Computational aspects of dependencies 2
  • Computing the closure of an attribute set (cont.)
  • Can't compute F efficiently, but can compute X
  • Construct sequence of attribute sets
  • X(0) X
  • X(i1) X(i) È A ½ V W Î F, V ? X(i), A Î W
  • Since number of attributes in U is finite, get
    X(j1) X(j), and have X X(j) at this point.
  • Complexity of this construction is O(F.R)
    need to check whether X ? X(i) in an efficient
    way, but otherwise amount of computation
    determined by number of dependencies processed at
    each iteration and number of iterations (
    number of attributes in U)

32
Computational aspects of dependencies 3
  • Illustrative example
  • Take F to be the set of dependencies
  • AB C ACD B CG BD
  • C A D EG CE AG
  • BC D BE C
  • Let X BD. Then can compute X thus
  • X(0) BD, X(1) BDEGBEDG, X(2) BECDGBCEDG,
  • X(3) BCEAGDGABCDEG X(4).
  • Computation of this type is used for key checking
    etc

33
Computational aspects of dependencies 4
  • Representing the set of dependencies
  • Obviously better to work with F rather than F.
  • Issue
  • How small and simple can we choose G for G F?
  • ... if G F, describe this as G generates F
  • ... if G Ê F, say that G covers F
  • motivates the definition of a minimal cover ...

34
Computational aspects of dependencies 5
  • Representing the set of dependencies (cont.)
  • Definition G is a minimal cover for F if
  • a) G F
  • b) every RHS in G is a single attribute
  • c) every LHS minimal subject to determining RHS
  • i.e. for no X Y Î G is there a proper subset
    Z of X such that G \ X Y È Z Y
    generates F
  • d) no proper subset of G also generates F.
  • Minimal cover isn't necessarily unique.

35
Computational aspects of dependencies 6
  • Computing a minimal cover for F
  • Have a reasonably efficient algorithm for
    computing a minimal cover. Reasonably means
    efficient relative to the size of the given set F
    that generates F.
  • Can't expect efficiency irrespective of size of
    F.
  • Minimal cover isn't necessarily unique the
    algorithm to compute a minimal cover is
    non-deterministic (that is to say, it involves
    arbitrary choices)

36
Computational aspects of dependencies 7
  • Computing a minimal cover for F (cont.)
  • Input to the algorithm is a set F that generates
    F.
  • Transform F in 3 stages
  • 1. replace X Y by the X A such that A Î Y
  • 2. check that no X A is redundant
  • 3. eliminate any redundant attribute of X in X
    A
  • Need to carry out each step efficiently ...

37
Computational aspects of dependencies 8
  • Computing a minimal cover for F (cont.)
  • 1. replace X Y by the X A such that A Î
    Y
  • 2. check that no X A is redundant
  • 3. eliminate any redundant attribute of X in X
    A
  • In Stage 1, have already transformed F so that
    all dependencies are of the form X A, where
    A1.
  • To check for redundancy, take each dependency X
    A of F in turn, and compute the closure X of X
    relative to the set of dependencies F \ X A.
    If A Î X, then discard X A from F, and
    continue the stage 2 redundancy checking with the
    new F.

38
Computational aspects of dependencies 9
  • Computing a minimal cover for F (cont.)
  • 1. replace X Y by the X A such that A Î
    Y
  • 2. check that no X A is redundant
  • 3. eliminate any redundant attribute of X in X
    A
  • At Stage 3, can assume that all dependencies in F
    are of the form X A, where A1, and no X A
    dependency is redundant.
  • For each dependency X A in turn, and for each
    attribute B in X, compute the closure (X \ B)
    relative to the set of dependencies F. If AÎ(X \
    B), then replace X A by X \ B A in F,
    and continue the Stage 3 redundancy checking with
    the new F.

39
Computational aspects of dependencies 10
  • Computing a minimal cover for F (cont.)
  • Note eliminating redundancy at Stage 3 relies on
    the fact that if X \ B A, then X A, but in
    general the converse implication is false.
  • From this fact it follows that if X \ B A can
    be inferred from the dependencies in F then X A
    can be replaced by X \ B A
  • Algorithm is tolerably efficient (ex.), since
    none of this checking involves explicit
    computation with F.

40
Computational aspects of dependencies 11
  • Illustrative example
  • Find a minimal cover for F, where F is
  • AB C ACD B CG BD C A
  • D EG CE AG BC D BE C
  • Stage 1 transforms F to form
  • AB C ACD B CG B CG D
  • C A D E D G CE A
  • CE G BC D BE C

41
Computational aspects of dependencies 12
  • Illustrative example of computing a minimal cover
    ...
  • AB C ACD B CG B CG D
  • C A D E D G CE A
  • CE G BC D BE C
  • Stage 2 finds that CE A is redundant, since C
    A.
  • After CE A eliminated, there is still
    redundancy
  • CG B can be derived from the 3 dependencies
  • CG D, C A and ACD B
  • There is no further redundancy
  • Thus CE A and CG B are eliminated at Stage 2.

42
Computational aspects of dependencies 13
  • Illustrative example of computing a minimal cover
    ...
  • AB C ACD B CG D ... after Stage 2
  • C A D E D G
  • CE G BC D BE C
  • Stage 3 replaces ACD B by CD B, since C A
  • and as a result (CD) relative to F contains B
  • AB C CD B CG D
  • C A D E D G minimal cover
  • CE G BC D BE C
  • To get another minimal cover (ex.) at Stage 2
    eliminate
  • ACD B, CG D, CE A
Write a Comment
User Comments (0)