Fundamentals of CS 2: Databases REVISION WEEK 2 - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Fundamentals of CS 2: Databases REVISION WEEK 2

Description:

... database language, PostgreSQL (a version of SQL: very widely used ... Full Monty chip shop. 16 Sep? ex-dir. The Old Black House, 15768 Aplanalp St. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 23
Provided by: scie205
Category:

less

Transcript and Presenter's Notes

Title: Fundamentals of CS 2: Databases REVISION WEEK 2


1
Fundamentals of CS 2DatabasesREVISION WEEK (2)
  • John Barnden
  • Professor of Artificial Intelligence
  • School of Computer Science
  • University of Birmingham, UK

2
What We Mainly Studied
  • The nature of relational databases, the central
    modern type of database. Entity types represented
    as tables, holding relations.
  • Some basic mathematical concepts underpinning
    relational databases, and useful also in many
    other branches of CS.
  • Key aspects of how to develop the
    conceptual/logical design of relational
    databases.
  • In particular, how to achieve certain types of
    good structuring, to help achieve certain types
    of correctness and efficiency.
  • How to create and query databases using a
    particular database language, PostgreSQL (a
    version of SQL very widely used in various
    forms).

3
Overall Nature of Exam
  • Half of a three-hour exam. Other half SE.
    Equally weighted.
  • Five questions. Equally weighted.
  • Questions range from precise technical things to
    more general considerations.
  • One question will mainly ask for SQL expressions.
    SQL may crop up in other questions.
  • The mathematical and relational algebra material
    may be needed for the exam, but the finest
    technical detail will not be expected.
  • Anything in the required textbook reading may be
    needed in the exam, except of course that a
    detailed memory of specific, data-full examples
    is not expected, and except for some SQL detail
    (see next slide).

4
Textbook Chapters (7th Ed)
  • Chapters 1-9, except that
  • On SQL (Chs 7,8) the exam doesnt rest on fine
    detail beyond whats in the handouts (and
    occasional lecture)
  • In Chapter 8 only up to section 8.4 inclusive
  • Chapter 9 note the important concepts of the
    Systems Development Life Cycle and the Database
    Life Cycle.

5
Initial Considerations
  • What a database is, and how it relates to other
    types of data structure/repository in CS.
  • Data integrity, data redundancy, data anomalies.
  • Associative links between parts of a database, as
    opposed to pointing.
  • Ways data is stored/linked in physical human
    media such as diaries, address books and
    timetables.
  • Various complications in tables in human
    documents.
  • Restricted type of table used in relational DBs.

6
Entities and Relationships
  • Any relational DB as consisting of entity types
    and relationships between them Entity
    Relationship Model (ERM) in general.
  • Specific ERMs for specific applications, and
    distinction from Entity Relationship Diagrams
    (ERDs).
  • Entity Types as represented by tables.
  • The question of what types of thing should
    correspond to entity types, and hence tables,
    depends on the application and your design
    judgment.

7
Attribute Determination and Keys
  • One or more attributes determining another
    attribute. Can also be described as that
    attribute being functionally dependent on the
    former attributes.
  • Various notions of key, especially superkeys,
    candidate keys and primary keys
  • And foreign keys as (primary) implementation of
    relationships between entity types.

8
Strength and Weakness
  • Strong and weak relationships. (Also called
    identifying and non-identifying relationships
    respectively.)
  • Weak entity types, as defined according to the
    strength of their relationships to other entity
    types and existence-dependence with respect to
    those types.
  • Depiction of strength or weakness in different
    styles of ERD.

9
Connectivity, Cardinality Participation
  • Connectivity uniqueness or multiplicity of
    entities at either end of a relationship.
  • Cardinality precise numerical info about how
    many entities allowed or required at either end
    of a relationship.
  • Participation optionality or mandatoriness of a
    relationship, in either direction.
  • Overlap between these notions.
  • Notation in ERDs.

10
Table Representation of Relationships of
Different Connectivities
  • Basic case is 1M non-recursive. (Recursive is
    when two or more entity types in a relationship
    are the same.)
  • MN, MNP, etc. standardly handled by breaking
    down into two, three, etc. 1M relationships
    going to a new entity type a bridging or
    linking type.
  • 11 recursivevarious different methods according
    to circumstances, one involving a bridging type.
  • 1M recursivecan often be handled within a
    single table, but may be reasonable to introduce
    a bridging type.

11
Other Representation Issues
  • Multivalued attributes. OK in themselves in early
    stages of design, but should eventually be broken
    down into single-valued attributes in some way.
  • A main divergence in ways of doing this is based
    on whether the different values are for stably
    identifiable subattributes.
  • Generalization hierarchies. Exhaustiveness,
    disjointness.

12
Normalization
  • What normalization is and what role it plays in
    the database design process
  • The normal forms 1NF, 2NF, 3NF, BCNF, and 4NF.
  • How normal forms can be transformed from lower
    normal forms to higher normal forms.
  • That normalization and ER modeling are used
    concurrently to produce a good database design,
    helping to eliminate data redundancies
    anomalies.
  • That some situations require denormalization to
    generate information efficiently.

13
Creating ER Models/Diagrams
  • Designing an ER model for a database is an
    iterative process, because, e.g.
  • As you proceed, you think of new ways of
    conceiving whats going on (much as in ordinary
    programming)
  • Multivalued attributes need to be re-represented
  • MN relationships can be included as such at an
    early stage, but generally need to be replaced by
    bridging entity types at some point
  • 11 relationships raise a red flag may indicate
    poor design (though standard in supertype/subtype
    representation)
  • Special supertype/subtype notation may need to be
    converted into more standard diagram notation
  • Conversion to a Normal Form.

14
SQL
  • Mainly, the module only covers how to query
    tables and how to create tables.
  • See manual and textbook for much more if you
    want!

15
MATHEMATICAL VIEW
16
Tuples, Relations and Tables
  • A relation is simply a set of tuples drawn from
    some sets.
  • A relation is therefore a subset of the Cartesian
    product of those sets.
  • A row is a tuple. Hence a table at any given
    moment induces a relation over the value domains
    of the table.
  • The table consists of not just the induced
    relation but also the attributes themselves,
    their domains, specification of primary and
    foreign keys, etc.

17
Totality, Partiality and Functionality of
Relations
  • Totality and partiality in general.
  • Relations induced by tables as almost certainly
    being merely partial (i.e., not total).
  • Functionality of relations. Functions and partial
    functions.
  • Determination relationships (i.e., functional
    dependence relationships) within a table as
    inducing partial functions from one or more of
    the tables value domains to others.

18
Some Categories of Relation
  • One-to-one
  • One-to-many and many-to-one
  • Many-to-many

19
Relations from Entity Relationships
  • The connection between relationships in ERMs and
    mathematical relations.
  • E.g., the EMPLOYED-BY relationship from the
    People entity type to the Organizations entity
    type says that
  • the database (at any moment) stores a relation
    from the People entity set to the Organizations
    entity set.
  • The connection between connectivity of a
    relationship between entity types and the issue
    of whether the corresponding relation is
    one-to-one, one-to-many, etc.
  • The connection between mandatoriness of a
    relationship from entity set E to entity set F
    and a restricted notion of relation totality.
    (The relation is total on the set of current E
    things.)

20
Some Operations on Sets in General
  • Union, intersection, difference and Cartesian
    product of two sets A and B (of any sort).
  • When A and B are relations the set of all
    possible concatenations of a tuple within A and a
    tuple within B.
  • I have called this the flattened Cartesian
    product of A and B, notated as A ? B as opposed
    to A ? B.
  • This name is intended to emphasize that the
    operation is the mathematical basis of the
    product of two tables.

21
Relational DB Operators Relational Algebra
  • Defines theoretical way of manipulating tables
    using relational DB operators that mainly
    manipulate the relations in the tables.
  • SELECT
  • PROJECT
  • JOIN (various sorts)
  • INTERSECT
  • Use of relational DB operators on existing tables
    produces new tables. Strong connection to SQL
    commands/operators.
  • Relational algebra puts relational DB operators
    into a mathematical notation that is more
    convenient than, e.g., SQL operators.
  • UNION
  • DIFFERENCE
  • PRODUCT
  • DIVIDE

22
QUESTIONS?
23
What Is a Database? contd.
  • A database is generally a regularly (but often
    complexly) structured body of information about
    entities of various specific, precisely defined
    types.
  • The entities are generally in various specific
    types of relationship to each other
  • Each entity has a specific set of (intrinsic)
    attributes of interest.
  • The values of intrinsic attributes are generally
    of fairly basic, simple sorts (e.g., numbers,
    dates, names).
  • Generally there are many entities of some of the
    types or rather the expectation is that the
    numbers could get large.
  • The entities of a given type are typically not in
    any special order.

24
What Is a Database(?) contd.
  • The individual data elements held, though of
    simple sorts, are generally close to end-user
    concepts and concerns -- they are directly
    meaningful interesting to users. E.g., price of
    a car.
  • Contrast the detailed meteorological
    measurements across the country and beyond, used
    in a weather forecasting system for the UK.
  • The data held and retrieved is generally of exact
    form (no vagueness expressed) and of definite
    form (no uncertainty expressed or expected).
  • The operations provided to users for extracting,
    inserting and updating data are of conceptually
    straightforward sorts, not requiring elaborate
    reasoning, problem-solving or analysis.
  • However, aggregate/overview/statistical
    information (counts, averages, maxima, graphs,
    histograms, etc.) often needs to be computed from
    the data.

25
Data Redundancy, Data Anomalies, and Data
Integrity
  • See Ch.1 and many other parts of book.
  • Data Redundancy replicating data in different
    places in a data repository.
  • E.g., in a recipe book, saying how to fry onions
    every time fried onions are needed in a recipe.
  • Encourages data anomalies and lack of
    integrity basically, inconsistency between
    the different places.
  • Such problems arise with insertions, deletions
    and modifications in general.
  • Also causes a type of inefficiency replicated
    updates.

26
Redundancy, etc., contd. 1
  • Redundancy implies that if you want to
    modify/delete a piece of information, you need to
  • know that there is (or is not) replication, or
    check for possible replications
  • go to the effort of repeating changes when the
    item is replicated
  • avoid errors in such repeated changes.

27
Restrictions on our Tables
  • We will apply the following restrictions amongst
    others
  • Regular overall shape rows all same length,
    similarly columns. No division into different
    regions (with a certain exception).
  • No labels for rows, as opposed to columns. No
    significance to the order or number of rows (the
    number can change).
  • All cells in any one column given same intuitive
    interpretation.
  • One data item per cell (but it can be a
    variable-length character string, containing
    anything).
  • Each cells item restricted to a pre-specified,
    fairly simple format, and all cells in any given
    column restricted to same format. No exceptional
    entries.
  • Uncertainty markers not recognized though
    tolerated in principle.
  • Some columns not supposed to have empty or vague
    entries.
  • No additional comments, footnotes, etc.

28
A Bad Table
29
Extra, Crucial Restriction
  • No row can be repeated in a table. (I.e., no two
    rows can contain be exactly the same in terms of
    the values they contain.) Exception
    dynamically-created, temporary tables.
  • Equivalent to saying
  • Rows are uniquely determined (picked out) by the
    values in some set of columns. That is, given
    some values for those columns, there is at most
    one row that has those values in those columns.

30
A Conceptual Model
  • Represents global view of the database
  • Enterprise-wide representation of data as viewed
    by high-level managers
  • Basis for identification and description of main
    data objects and relationships, avoiding
    details

31
The Entity Relationship Model
  • Introduced by Chen in 1976
  • Most widely used conceptual model of DBs
  • The ER model strictly speaking is just the
    approach of thinking of things as composed of
    entities, attributes and relationships it has
    nothing intrinsically to do with diagrams
    J.A.B.
  • We also say that applying this approach to a
    particular body of data gives rise to an ER model
    of the specific intended database
  • But diagrams based on the model are a widely
    accepted and adapted graphical approach to data
    modeling

32
Entity Relationship Diagrams (ERDs)
  • The ER model of a database forms the basis of an
    ER diagram (or several diagrams).
  • The ERDs represent the database as viewed by end
    users.
  • There are several markedly different styles of
    ERD, and for each main style there are several
    variants.
  • And the style in the module handouts will differ
    somewhat from that in the textbook and these
    lectures
  • An ERD depicts (some of) the ER models entities,
    attributes and relationships, and (depending on
    the diagram style) varying amounts of other info
    such as connectivities, cardinalities, keys,
    weakness,

33
What Is Represented as Tables?
  • The question of what types of thing should
    correspond to entity types and hence tables
    depends on the application and your design
    judgment.
  • It all depends on things like
  • what variety of data is needed about something
  • how separate the pieces of data about a given
    thing are
  • what operations are needed
  • how often theyre needed.

34
Attribute Determination
  • A collection of one or more attributes determines
    another attribute A if only one value for A is
    possible given the values for the former
    attributes.
  • E.g., the collection DAY-NUMBER, MONTH and YEAR
    could determine DAY-NAME.
  • Also write this as DAY-NUMBER, MONTH, YEAR ?
    DAY-NAME
  • We alternatively say that DAY-NAME is
    functionally dependent on DAY-NUMBER, MONTH and
    YEAR.

35
Keys(RC pp.82/3)
  • A key for a table is a collection of one or more
    attributes that determines some other
    attribute(s) in that same table.
  • A superkey for a table is a collection of one or
    more attributes that determines all the other
    attributes in the table, i.e. determines a whole
    row.
  • Trivially, the collection of all the attributes
    is a superkey.
  • A candidate key is a minimal superkey (i.e., you
    cant remove attributes from it and still have a
    superkey.)

36
Primary Keys
  • A primary key for a table (entity type) is a
    candidate key that the DB designer has chosen as
    being the main way of uniquely identifying a row
    (entity). Extra restriction Its attributes are
    not allowed to have null values.
  • It could be that theres only one candidate key
    in practice anyway, such as a persons ID number.
  • Primary keys are the main way of identifying
    target entities in entity relationships, e.g.,
    the way to identify someones employing
    organization.
  • For efficiency reasons, the simpler primary keys
    are the better.
  • Identity numbers (of people, companies, products,
    courses, etc.), or combinations of them with one
    or two other attributes, are the typical primary
    keys in examples in RC.

37
Strong and Weak Entity Types
  • A weak entity type is one to which there is at
    least one strong relationship going to it from
    another entity type.
  • So on previous slide, Dependents is weak, because
    there is a strong relationship to it from People.
  • A strong entity type is one that is not weak!
    i.e. there are no strong relationships going to
    it from another type.
  • A weak entity is existence-dependent on every
    entity type that has a strong relationship going
    to it.
  • Marys existence in the database as a member of
    Dependents relies on the existence of person
    1698674 in the database. (Doesnt mean Mary would
    vanish from the planet if person 1698674 left the
    database! And indeed Mary could herself be an
    entity in type People even if 1698674 leaves.)

38
Strong and Weak Relationships
  • Also called identifying and non-identifying
    relationships respectively.
  • A relationship from entity type A to entity type
    B, mediated by having As primary key (PK) as an
    attribute of B, is strong when Bs PK contains
    As.
  • E.g., A People, B Dependents, where each
    entity in People is identified by a PERS_ID (As
    PK) and each entity in Dependents is identified
    by a PERS_ID plus a first name and a
    kinship/friendship attribute.
  • So a PK value in B could be (1698674, Mary,
    child), meaning that this entity is the child
    called Mary of person 1698674.
  • A relationship is weak when it isnt strong.

39
Connectivity Cardinality
  • Relationships are importantly categorized as to
    uniqueness or multiplicity of entities at either
    end connectivity. Has big effect on DB design.
  • Relationships can be further specified as to how
    many entities allowed or required at either end
    cardinality.
  • Also has significant effect on DB design.

40
Relationship Participation
  • Optional in a particular direction, X to Y
  • an X entity occurrence does not require a
    corresponding Y entity occurrence
  • i.e. the minimum number of Ys per X is 0
  • Mandatory in a particular direction, X to Y
  • an X entity occurrence requires a corresponding Y
    entity occurrence
  • i.e. the minimum number of Ys per X is 1 or more

41
Relationships and Foreign Keys
  • A foreign key in a table T is a chosen collection
    of attributes intended to match the attributes in
    the primary key in another table. In essence, the
    foreign key is that primary keys ambassador in
    T.
  • Standardly, a relationship is represented by
    means of foreign keys.

42
1M Connectivity between Tables
More than one employee allowed per organization,
but no more than one employer per person. NOTE
direction of use of a foreign key. Why so??
People
Organizations
43
MN Connectivity between Tablesusing a Bridging
Entity Type
People
Employments
Organizations
44
Examples in Two Styles of Diagram
45
Multivalued Attributes in ERMs and ERDs
46
Splitting the Multivalued Attribute into New
Namable Component Attributes
47
OR, Replace the Attribute by a New Entity Set
with an Attribute identifying the Original
Attributes Separate Components
48
Generalization Hierarchies in ERMs and ERDs
49
A Generalization Hierarchy with Overlapping
Subtypes (Gs)and Disjoint Subtypes (G)
50
Supertypes/Subtypes in ERDs
A supertype maintains a 11 relationship with
each subtype optional in the super-to-sub
direction and mandatory in the other
51
Non-Binary and Recursive Relationships in ERMs
and ERDs
52
Tables for a Ternary Relationship
CFR is just like a bridging entity type for a
binary MN relationship, but has 3 foreign keys
instead of 2
53
Recursive Relationships
  • Relationship can exist between occurrences of the
    same entity set
  • E.g. marriage, management, parthood,

54
Table for the 1M EMPLOYEE Manages EMPLOYEE
Recursive Relationship
55
Tables for the MN Recursive PART Contains PART
Relationship
The COMPONENT entity type is just a bridging type
56
Alternative Implementations of a 11 Recursive
Relationship
  • As previously.
  • MARRIED_VI is just a bridging entity type.
  • MARRIAGE together with MARPART act similarly as
    bridge.

57
NORMALIZATION(Ch. 5 of RC)
58
First Normal Form
  • Tabular format in which
  • All attributes are dependent on a chosen
    primary key
  • Primary-key attributes do not have NULL values
  • There are no repeating groups in the table
  • All relational tables satisfy 1NF requirements

59
A Table that is Not in 1NF because of repeating
groups
60
Second Normal Form
  • Table is in second normal form (2NF) if
  • It is in 1NF and
  • It includes no partial dependencies
  • Convert to 2NF by creating a new table for each
    part of the primary key that provides partial
    dependencies, and moving out the corresponding
    dependent attributes into the new tables.

61
A Dependency Diagram for a Table that is merely
in 1NF (by having a partial dependency)
62
Second Normal Form (2NF) Conversion Results
63
Third Normal Form
  • A table is in third normal form (3NF) if
  • It is in 2NF and
  • It contains no transitive dependencies

64
Conversion to Third Normal Form
  • For every transitive dependency, take its
    determinant (the attribute or collection of
    attributes on which the dependency rests) as a
    PK for a new table.
  • Move the attributes dependent on the determinant
    to that new table.

65
Third Normal Form (3NF) Conversion Results
66
The Boyce-Codd Normal Form (BCNF)
  • A table is in BCNF if every one-attribute
    determinant in the table is a candidate key
  • i.e., every attribute that determines any other
    attribute determines all the attributes, and
    therefore determines a unique row, so theres no
    redundancy problem

67
A Table in 3NF but not in BCNF
68
Decomposition to BCNF
The middle diagram shows that changing the PK so
as to include C doesnt work
69
Plusses/Minusses of Normalization
  • PLUS Unnormalized tables in a production
    database tend to have these defects
  • Data updates are less efficient because programs
    that read and update tables must deal with larger
    tables
  • Indexing is much more cumbersome
  • MINUS Joining larger number of tables takes
    additional disk input/output (I/O) operations and
    processing logic
  • Reduces system speed
  • Conflicts among design efficiency, information
    requirements, and processing speed are often
    resolved through compromises that may include
    denormalization

70
Tuples in a Table
People
  • The tuples are
  • ?9568876A, Chopples, 37gt
  • ?2544799Z, Blurp, 21gt
  • ?1698674F, Rumpel, 88gt

71
Cartesian Product of Domains
  • The set of all possible tuples formed from the
    domains is called the Cartesian product of the
    domains.
  • (This applies to any sets, not just domains in
    tables.)
  • Notation, e.g. D ? E ? F ? G ? H
  • if D, E, F, G, H are the sets (domains)not
    necessarily different.
  • In discrete mathematics, any subset at all of
    that Cartesian product is called a relation on
    the sets in question (D, E, )
  • even the whole of the product (even if infinite)
  • and even the empty set.
  • (The notion of relation is very useful in CS as a
    whole.)

72
Relation from a Table
People
  • The relation at the moment is
  • ? ?9568876A, Chopples, 37gt
  • ?2544799Z, Blurp, 21gt
  • ?1698674F, Rumpel, 88gt ?

73
Relations from Somewhere to Somewhere
  • Some convenient terminology of my own
  • A relation R from A1, A2, A3, to B1, B2, B3,
    is a subset of A1 ? A2 ? A3 ? B1 ? B2 ? B3 .
  • i.e., R ? A1 ? A2 ? A3 ? B1 ? B2 ? B3 .
  • Same thing as a relation on A1, A2, A3, B1, B2,
    B3, just different terminology.

74
Partiality of Tables
  • The relation in a table, considered as a relation
    from its first attribute domain to the remaining
    attribute domains, will almost always be merely
    partial.
  • Similarly, any attribute-reordered version of the
    tables relation will be merely partial.
  • We can sum this up by saying that the relation is
    partial on each of its attribute domains.

75
Functional Relations
  • A relation from A to B is functional if, for any
    a in A, there is at most one b in B such that ?a,
    bgt is in R.
  • A total functional relation from A to B is called
    a function from A to B.
  • A (merely) partial functional relation from A to
    B is called a (merely) partial function from A to
    B.
  • Can generalize a relation from A1, A2, A3 to
    B1, B2, B3, is functional if, for any a1, a2,
    a3, in A1, A2, A3, respectively, there is at
    most one b1, b2, b3, in B1, B2, B3,
    respectively such that ?a1, a2, a3, , b1, b2,
    b3, gt is in R.

76
Functional Relations arising from Dependency
Relationships
  • Suppose attribute X is functionally dependent on
    ( determined by) attributes A, B, in a table.
  • Restrict to the subtable that just has attributes
    X and A, B, , with A, B, first.
  • Then the relation in this subtable is a partial
    function from the A, B, domains to the X
    domain.
  • In particular, each attribute X outside the
    primary key of a table is functionally dependent
    on the primary key, so we have a partial function
    from the PK domains to the X domain.

77
Other Categories of Relation
  • A relation R from A to B is one-to-one (1-1) if,
    for any a in A, there is at most one b in B such
    that ?a, bgt is in R, AND for any b in B, there is
    at most one a in A such that ?a, bgt is in R.
  • That is, both the relation and its inverse from B
    to A are functional.
  • To put it another way it is functional and
    different members of A map to different members
    of B.
  • Or again Different members of A map to different
    members of B and different members of B map to
    different members of A.

78
  • A relation R from A to B is many-to-one if it is
    functional but not necessarily one-to-one i.e.,
    there may be at least one case of different
    members of A mapping to the same member of B.
  • A relation R from A to B is one-to-many if it is
    NOT necessarily functional but its inverse from B
    to A is functional i.e., there may be at least
    one case of a member of A mapping to more than
    one member of B, but each member of B can map to
    at most one member of A.
  • A relation R from A to B is many-to-many if
    neither it nor its inverse is necessarily
    functional i.e., there may be at least one case
    of a member of A mapping to more than one member
    of B, and there may be at least one case of a
    member of B mapping to more than one member of A.

79
Example Continued
  • So at any given moment the relation might be
  • ?Person1, Org1gt, ?Person2, Org1gt, ?Person3,
    Org1gt,
  • ?Person4, Org2gt, ?Person3, Org2gt
  • Each Person.. and Org.. is an entity ...
    therefore represented as a row of the
    corresponding table ... therefore itself
    mathematically represented as a tuple of
    attribute values
  • So ?Person1, Org1gt could be, in more detail,
  • ? ?E156, Sam, Finks, I678gt, ?I678, IBM,
    USAgt gt
  • Note the nested tuples.

80
Bridging Entity Types
  • Recall that bridging entity types are brought in
    to represent MN relationships (and similarly
    MNP relationships, etc.)
  • People/Organizations again the relation in the
    bridging table would look like ? E156, I678gt, ?
    E257, I996gt, ?E714, I678gt, .
  • This relation can also be said to correspond to
    the original People-Organization relationship,
    but is abstracted from the above relation by
    replacing tuples representing entities, such as
    ?E156, Sam, Finks, I678gt, by the PK values in
    them, such as E156.

81
Connectivities
  • If a relationship from an entity type to another
    is 11, then at any moment the actual relation
    must be one-to-one (1-1).
  • If the relationship is 1M then the relation at
    any moment is allowed to be one-to-many (but may
    by chance be one-to-one).
  • If the relationship is MN then the relation at
    any moment is allowed to be many-to-many (but may
    by chance be one-to-one, one-to-many, or
    many-to-one).

82
Optionality/Mandatoriness
  • If a relationship from an entity type E to
    another type F is mandatory then the relation at
    any moment is total when restricted to the set
    of entities currently in E (and NOT on the whole
    entity set of E, unless all possible entities of
    type E are in the E table!!).
  • Observation So if it is also one-to-one or
    many-to-one it is in fact a function on the
    current set of E entities to the set of
    organizations.
  • If a relationship from an entity type E to
    another type F is optional then the relation is
    not required to be total in the above sense (but
    may happen to be).

83
Some Operations on Sets in General
  • Union of sets A and B
  • A ? B the set of things that are in A or B (or
    both).
  • NB no repetitions allowed!
  • Intersection of sets A and B
  • A ? B the set of things that are in both A and
    B.
  • Difference of sets A and B
  • A ? B the set of things that are in A but not
    B.
  • Note also notated by a backslash instead of a
    minus sign.

84
General Set Operations contd.
  • I need to define also the non-standard notion of
    flattened Cartesian product of two sets A and
    B, applicable when the members of A and B are
    already tuples. Notated by the symbol ?
    (underlined multiplication symbol).
  • A ? B the set of tuples that are the
    concatenations of members of A and members of B.
    E.g., if lta,b,cgt is in A and ltd,e,fgt is in B then
    lta,b,c,d,e,fgt is in A ? B.

85
General Set Operations contd.
  • So if A is the People relation and B is the
    Organizations relation,
  • A ? B has members of form
  • ? ?E156, Sam, Finks, I678gt, ?I459,
    Dell, UKgt gt
  • BUT
  • A ? B has members of form
  • ?E156, Sam, Finks, I678, I459, Dell,
    UKgt

86
Select
  • SQL
  • SELECT FROM WHERE
  • Note its the WHERE part that is actually doing
    the selection according to a criterion.
  • Maths of relations
  • No particular notation for result relation. Could
    express as
  • t ?R t satisfies C
  • where R is the relation in the given table and C
    is the criterion.
  • Relational algebra notation in handout
  • Result table is ?C(T) where T is the given table.
  • More compact than SQL notation.
  • Problem relations and RA dont account for
    duplicates of rows, if there are any in the given
    table.

87
Project
  • SQL
  • SELECT FROM
  • Note its the after the SELECT that is
    actually doing the projection (selection of
    specified attributes)!!
  • Maths of relations
  • No particular concise notation for result
    relation. Could express as
  • t ?P there is a tuple u in R such that t is u
    with such and such components removed
  • where R is the relation in the given table and P
    is the Cartesian product of the domains of the
    selected attributes.
  • Relational algebra notation in handout
  • Result table is ?X(T) where T is the given table
    and X is the list of selected attributes.
  • Problem relations and RA dont account for
    duplicates of rows, if there are any in the given
    or result tables.

88
Union and Intersection
  • SQL
  • UNION, INTERSECT (UNION ALL, INTERSECT ALL)
  • Maths of relations
  • Result relations are R1 ? R2 and R1 ? R2
  • where R1and R2 are the relations in the given
    tables (under the assumption that they have their
    attributes listed in the same order).
  • Relational algebra notation in handout
  • Result tables are T1 ? T2 and T1 ? T2 where T1
    and T2 are the given tables.
  • Problem relations and RA dont account for
    duplicates of rows, so dont handle UNION ALL or
    INTERSECT ALL.

89
Difference
  • SQL
  • MINUS or EXCEPT (MINUS ALL or EXCEPT ALL)
  • Maths of relations
  • Result relation is R1 ? R2
  • where R1and R2 are the relations in the given
    tables (under the assumption that they have their
    attributes listed in the same order).
  • Relational algebra notation in handout
  • Result table is T1 ? T2 where T1 and T2 are the
    given tables.
  • Problem relations and RA dont handle EXCEPT ALL.

90
Product or Cross Join
  • SQL
  • SELECT FROM two or more tables
  • NB its the mere listing of the tables that does
    the Product, but its possible also to write
  • SELECT FROM T1 CROSS JOIN T2 CROSS JOIN ...
  • Maths of relations
  • Result relation is R1 ? R2 where R1and R2 are the
    relations in the given tables.
  • Relational algebra notation
  • Result table is T1 ? T2 where T1 and T2 are the
    given tables.
  • Problem relations and RA dont account for
    duplicates of rows, if there are any in the given
    tables.

91
Natural Join
  • Links two tables by finding rows in each that
    match on the attributes the two tables have in
    common (if any), and then joining the matching
    rows together in a natural way.
  • The common attributes or columns are called the
    join attributes or columns) just the AGENT_CODE
    attribute in above example
  • Can be thought of as the result of a three-stage
    process
  • the PRODUCT of the tables is created
  • a SELECT is performed on the resulting table to
    yield only the rows for which the join-attribute
    values (e.g. AGENT_CODE values) are equal
  • a PROJECT is now performed to yield a single copy
    of each join attribute, thereby eliminating
    duplicate columns

92
Natural Join (continued)
  • SQL
  • SELECT FROM T1, T2 WHERE explicit
  • statements of equality of ALL the shared
    attributes ...
  • NB its possible also to write
  • SELECT FROM T1 NATURAL JOIN T2
  • and see book section 7.2 for more.
  • Maths of relations
  • No standard concise notation. Result relation
    could be expressed as the result of Project and
    Select operations on R1 ? R2 where R1 and R2 are
    the relations in the given tables.
  • Relational algebra notation
  • Result table is T1 ... T2 where T1 and T2 are the
    given tables and the is a symbol I cant find
    in PowerPoint!

93
Other Forms of Join
  • Equijoin (often what you have been doing in SQL
    work)
  • Links tables on the basis of an equality
    condition that compares SPECIFIED attributes of
    each table, rather than automatically taking the
    common attributes.
  • These specified join attributes do not have to
    have the same name.
  • Result does not eliminate duplicate columns
  • Outer Join (various forms) includes (some)
    non-matching rows
Write a Comment
User Comments (0)
About PowerShow.com