Operations in the Relational Model - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Operations in the Relational Model

Description:

Not expressible directly by users, but used in database ... literal: atom or negated atom. clause: a disjunction ... negated atom is allowed to occur only ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 57
Provided by: dblab1
Category:

less

Transcript and Presenter's Notes

Title: Operations in the Relational Model


1
Operations in the Relational Model
  • Focused on database queries in an abstract point
    of view
  • relational algebra
  • Datalog

2
Relational algebra an overview
  • a relational query language
  • Not expressible directly by users, but used in
    database systems internally
  • Includes a number of operations to manipulate
    relations.
  • Used to specify retrieval requests (queries)
  • essentially DML
  • Query result is also in the form of a relation

3
Relational algebra operations
  • SELECT and PROJECT operations.
  • Set operations
  • UNION
  • INTERSECTION
  • SET DIFFERENCE
  • CARTESIAN PRODUCT
  • JOIN operations
  • Other relational operations
  • DIVISION
  • OUTER JOIN
  • AGGREGATE FUNCTIONS
  • SUM, MIN, MAX, COUNT, AVERAGE

4
Set operation
  • Binary operations from mathematical set theory
  • union R ? S
  • intersection R ? S
  • set difference R S
  • For set operations, the operand relations
    R(A1,A2,,An) and S(B1,B2,,Bn) must have the
    same number of attributes, and the domains of
    corresponding attributes must be compatible that
    is, dom(Ai) dom(Bi) for i 1,2,,n. This
    condition is called union compatibility
  • Duplicate tuples are eliminated!
  • If R and S have different attributes names,
    rename them

5
Examples of set operations
  • Relation R and S
  • R ? S
  • R ? S
  • R S

6
Project operation (?)
  • Keep only certain attributes (columns) from a
    relation R specified in an attribute list L
  • ?L(R)
  • Resulting relation has only those attributes of R
    specified in L
  • Example ? title,year,length(Movie)

  • ?
  • In principle, PROJECT eliminates duplicate tuples
    in the resulting relation so that it remains a
    mathematical set (no duplicate elements)

7
Select operation (?)
  • Selects (retrieves) the tuples from a relation R
    that satisfy a certain selection condition c
  • ?c(R)
  • The condition c is an arbitrary Boolean
    expression on the attributes of R
  • Resulting relation includes each tuple in r(R)
    whose attribute values satisfy the condition c
  • Example
  • ?length ? 100(Movie)
  • ?length ? 100 AND studioName Fox(Movie)

8
Cartesian product (?)
  • T(A1,A2,,An,B1,B2,,Bm) ? R(A1,A2,,An) ?
    S(B1,B2,,Bm)
  • A tuple t exists in T for each combination of
    tuples t1 from R and t2 from S such that
    tA1,A2,,An t1 and tB1,B2,,Bm t2
  • If R has n1 tuples and S has n2 tuples, then T
    will have n1 n2 tuples
  • CARTESIAN PRODUCT is a meaningless operation on
    its own !
  • Example

9
Natural join (?)
  • In a natural join, the redundant join attributes
    appear only once in the result
  • The equality condition is implied and need not be
    specified
  • Example

10
Theta-join
  • Cartesian product followed by a SELECT
  • T(A1,A2,,An, B1,B2,,Bn) ? R(A1,A2,,An) ?c
    S(B1,B2,,Bn)
  • c the join condition
  • Equijoin
  • the join condition c includes one or more
    equality comparisons involving attributes

11
Queries expression
  • Several operations can be combined to form a
    relational algebra expression (query)
  • What are the titles and years of movies made by
    Fox that are at least 100 minutes long
  • ?title,year(?length?100 (MOVIE) ?
    ?studioNameFox (MOVIE))
  • ?title,year(?length?100 AND studioNameFox
    (MOVIE))

12
Queries expression example
  • Movie1(title, year, length, filmType,
    studioName)Movie2(title, year, starName)
  • Find the stars of movies that are at least 100
    minutes long?starName(?length?100(Movie1 ?
    Movie2))

13
Renaming
  • Change attribute name or relation name (denoted
    by ?)
  • ?s(A1,A2,,An)(R)
  • The resulting relation has exactly same tuples as
    R, but the name of relation is S
  • ?s(R), to change the relation name only
  • Example

14
Complete set of relational algebra
  • Some of the operations can be expressed in terms
    of other relational-algebra operations
  • R ? S R ? (R ? S)
  • R ?C S ?C(R ? S)
  • R ? S ?L(?C(R ? S))
  • ? , ?, ?, ?, ?, ? is the complete set of
    relational algebra operation
  • none of which can be written in terms of the
    others
  • Any query language equivalent to these operations
    is called relational complete
  • additional operations that were not part of the
    original relational algebra
  • Aggregate functions and grouping
  • division
  • Outer join, outer union

15
Division operation (?)
  • is useful for request that occurs frequently in
    database applications
  • to formulate for all (?) semantic
  • join is used to formulate there exists (?)

T ? R ? S
16
Division example
  • Retrieve the SSNs of employees who work on all
    the projects that 'John Smith' works on
  • SSN_PNOS ? SMITH_PNOS

17
Outer join
  • In a regular join, tuples in R1 or R2 that do not
    have matching tuples in the other relation do not
    appear in the result
  • Some queries require all tuples in R1 (or R2 or
    both) to appear in the result
  • When no matching tuples are found, nulls are
    placed for the missing attributes
  • Left outer join R1 R2 lets every tuple in R1
    appear in the result
  • Right outer join R1 R2 lets every tuple in R2
    appear in the result
  • Full outer join R1 R2 lets every tuple in R1
    or R2 appear in the result

18
Outer union (union join)
  • used when not union compatible
  • student(name, ssn, department, advisor)faculty(na
    me, ssn, department, rank)student outer-union
    faculty
  • R(name, ssn, department, advisor, rank)
  • all tuples from both relations are included
  • for student tuple, null for rank attribute
  • for faculty tuple, null for advisor attribute

19
full outer join vs. outer union
  • T1 outer-join c2c3 T2
  • T1 outer-union T2

20
Deductive databases
  • Deductive databases
  • Database Theorem proving Logic programming
  • Apply AI techniques (logic and resolution) to
    databases
  • Very active research area since mid 80s
  • Major contributions to database technology
  • Solid formalization of databases in a logical way
  • Recursive query processing
  • Theoretical provision of query processing
  • Inclusion of incomplete knowledge into databases
  • and so on
  • Major prototypes
  • Datalog (J.D. Ullman)
  • LDL (S. Naqvi and S. Tsur)
  • CORAL (U. of Wisconsin at Madison)

21
Basis
  • predicate
  • P, Q, R,
  • variable x,y,z,
  • atom a predicate followed by its arguments
  • P(x,y,z), Q(x,y,z,w), R(x),
  • positive literal
  • arithmetic atom x lt y, x1 gt y4?z,
  • literal atom or negated atom
  • clause a disjunction of literals
  • Horn clause a clause with at most one positive
    literal

22
How to model DB
  • R(1,2) is true while R(5,6) is false
  • Presents databases as conjunction of logical
    formulas
  • DB EDB (extensional databases) IDB
    (intensional databases) UNA (unique name
    axiom) DCA (domain closure axiom) EQA
    (equality axiom) etc.
  • EDB corresponds to relations that are stored in
    databases
  • IDB corresponds to virtual relations that are
    computed as needed

23
Datalog (Database logic)
  • Datalog rule
  • relational atom (head) ? body consisting of one
    or more literals (called subgoal)
  • Note
  • there is only one predicate as head literal
  • head literal must be positive
  • body literals must be connected by AND
  • negated atom is allowed to occur only in the body
  • arithmetic atom is allowed to occur only in the
    body

24
Datalog example
  • P(x,y) ? Q(x,z) AND R(z,y) AND NOT Q(x,y)Q(x,z)
    lt1,2gt, lt1,3gtR(z,y) lt2,3gt, lt3,1gt
  • Tuple lt1,1gt is the only one in P

25
Datalog example
  • Movie(title, year, length, inColor, studioName,
    producerC)
  • Query find movie title and year whose run time
    is at least 100 minutes long
  • LongMovie(t,y) ? Movie(t,y,l,c,s,p) AND l ?
    100LongMovie(t,y) ? Movie(t,y,l,_,_,_) AND l ?
    100 // anonymous variable
  • LongMovie ?title, year(?length?100(Movie))

26
Not all formulae are valid !!!
  • Not safe formulas
  • biggerthan(x,y) ? x gt y
  • loves(x,y) ? lover(x)
  • Definition limited variable
  • any variable that appears as an argument in a
    relational atom of the body is limited
  • any variable X that appears in a subgoal Xa or
    aX, where a is a constant, is limited
  • variable X is limited if it appears in a subgoal
    XY or YX, where Y is a variable already known
    to be limited
  • Definition A rule is safe if all its variables
    are limited
  • P(x,y) ? Q(x,z) AND R(w,x,z) AND xlty // not
    safe
  • P(x,y) ? Q(x,z) AND wa AND yw // safe

27
From relational algebra to Datalog (1)
  • R ? S
  • I(n,a,g,b) ? R(n,a,g,b) AND S(n,a,g,b)
  • R ? S
  • U(n,a,g,b) ? R(n,a,g,b)U(n,a,g,b) ? S(n,a,g,b)
  • R ? S
  • D(n,a,g,b) ? R(n,a,g,b) AND S(n,a,g,b)
  • ?
  • P(t,y,l) ? Movie(t,y,l,_,_,_)
  • ?
  • P(a,b,c,d,w,x,y,z) ? R(a,b,c,d) AND S(w,x,y,z)

28
From relational algebra to Datalog (2)
  • ?
  • ?length?100 AND studioName Fox(Movie)S(t,y,l,
    c,s,p) ? Movie(t,y,l,c,s,p) AND l ? 100 AND s
    Fox
  • ?length?100 OR studioName Fox(Movie)
    S(t,y,l,c,s,p) ? Movie(t,y,l,c,s,p) AND l ? 100
    S(t,y,l,c,s,p) ? Movie(t,y,l,c,s,p) AND s Fox
  • Any logical expression has an equivalent
    disjunctive normal form, in which the expression
    is the OR of conjuncts
  • one rule for each conjuncts
  • ?NOT (length?100 OR studioName
    Fox)(Movie)?lengthlt100 AND studioName ?
    Fox)(Movie) S(t,y,l,c,s,p) ?
    Movie(t,y,l,c,s,p) AND l lt 100 AND s ? Fox

29
From relational algebra to Datalog (3)
  • ? (natural join)
  • J(a,b,c,d) ? R(a,b) AND S(b,c,d)
  • theta-join
  • U(A,B,C) V(B,C,D)
  • U ? AltD AND U.B ? V.B VJ(a,ub,uc,vb,vc,d) ?
    U(a,ub,uc) AND V(vb,vc,d) AND altd AND ub ? vb
  • U ? AltD OR U.B ? V.B VJ(a,ub,uc,vb,vc,d) ?
    U(a,ub,uc) AND V(vb,vc,d) AND altd
    J(a,ub,uc,vb,vc,d) ? U(a,ub,uc) AND V(vb,vc,d)
    AND ub ? vb

30
More example
  • W(t,y,l,c,s,p) ? Movie(t,y,l,c,s,p) AND l ?
    100X(t,y,l,c,s,p) ? Movie(t,y,l,c,s,p) AND s
    FoxY(t,y,l,c,s,p) ? W(t,y,l,c,s,p) AND
    X(t,y,l,c,s,p)Z(t,y) ? Y(t,y,l,c,s,p)
  • equivalentlyZ(t,y) ? Movie(t,y,l,c,s,p) AND l ?
    100 AND s Fox

31
Recursive query in Datalog
  • ancestor(x,y) ? parent(x,y)ancestor(x,y) ?
    parent(x,z) and ancestor(z,y)
  • Query find all ancestors of Park ?
    ancestor(x, Park) // recursive query
  • Such query is beyond the expressiveness of
    relational algebra expression
  • Note
  • there is only one recursive predicate in the body
    (linear recursion)
  • there is a linear variable pattern
  • the recursive predicate is always positive
  • there is always an exit rule

32
Least fixed point
  • Consider an equation Rf(R) where f is a
    relational algebra expression
  • A least fixed point of the equation is a relation
    R such that
  • R f(R) and
  • If R is any relation such that R f(R), then R?
    R
  • Thm A unique least fixed point exists if f is
    monotonic, in which in the context of the partial
    order ? on relations means thatif R1 ? R2 ,
    then f(R1) ? f(R2)
  • Thm Any relational algebra expression that does
    not use the set difference operator is monotonic

33
Computing the least fixed point
  • FollowOn(x,y) ? sequel(x,y)FollowOn(x,y) ?
    sequel(x,z) and FollowOn(z,y)
  • first round third round
  • second round

34
Non-linear recursion example 1
  • Flight(airline, from, to, departs,
    arrives)Reaches(x,y) ? Flight(a,x,y,d,r)Reaches(
    x,y) ? Reaches(x,z) and Reaches(z,y)
  • The first round generates 7 tuples
  • The second round generates 3 tuples (SF,CHI),
    (DEN,NY), (SF,NY)
  • The third round generates no new tuple, so stop

35
Another example
  • Flight(airline, from, to, departs,
    arrives)connects(x,y,d,r) ? Flight(a,x,y,d,r)con
    nects(x,y,d,r) ? connects(x,z,d,t1) ?
    connects(z,y,t2,r) ? t1 ? t2-100
  • fourth round gives no new tuples, so stop

36
Negated recursion example 1
  • UAreaches defines pairs of cities such that UA
    flies UAreaches(x,y) ? Flight(UA,x,y,d,r)UAreach
    es(x,y) ? UAreaches(x,z) ? UAreaches(z,y)
  • AAreaches defines pairs of cities such that AA
    flies AAreaches(x,y) ? Flight(AA,x,y,d,r)AAreach
    es(x,y) ? AAreaches(x,z) ? AAreaches(z,y)
  • UAonly defines pairs of cities such that UA flies
    but AA does notUAonly(x,y) ? UAreaches(x,y) ?
    ?AAreaches(x,y)
  • First find instances of UAreaches and AAreaches,
    then perform set difference

37
Negated recursion example 2
  • R(0) P(x) ? R(x) ? ?Q(x) // P R ? Q
    Q(x) ? R(x) ? ?P(x) // Q R ? P
  • Informally, an element x in R is either in P or
    in Q, but not both
  • Since R contains only one tuple (0), we know only
    (0) can be in either P or Q, but not both
  • Two solutions are possible
  • P (0), Q ?
  • P ?, Q (0)
  • Since there are two least fixedpoints, we cannot
    answer a simple question such as Is P(0) true !

38
Stratification
  • Confine us to recursions in which negation is
    stratified
  • SQL3 supports stratified recursion only
  • Stratification
  • Draw a graph whose node is IDB predicate
  • Draw an arc from node A to node B if a rule has A
    ? B ...
  • Attach - to the edge if B is negative
  • If the graph has a cycle with one or more
    negative arcs, the recursion is not stratified
  • Stratum of a predicate A is the largest number of
    negative arcs on a path beginning from A
  • Evaluate predicate with lower strata first

39
Stratification example 1
  • UAreaches(x,y) ? Flight(UA,x,y,d,r)UAreaches(x,y)
    ? UAreaches(x,z) ? UAreaches(z,y)
    AAreaches(x,y) ? Flight(AA,x,y,d,r)AAreaches(x,y
    ) ? AAreaches(x,z) ? AAreaches(z,y)UAonly(x,y) ?
    UAreaches(x,y) ? ?AAreaches(x,y)
  • AAreaches and UAreaches are in stratum 0UAonly
    has stratum 1
  • Hence, evaluate AAreaches and UAreaches first

40
Stratification example 2
  • P(x) ? R(x) ? ?Q(x) Q(x) ? R(x) ? ?P(x)
  • There is a negative cycle, hence the rules are
    not stratified

41
Constraints in relation model
  • Constraints are conditions that must hold on all
    valid relation instances
  • Three major constraints
  • key constraints
  • entity integrity constraint
  • referential integrity constraint

42
Entity Integrity Constraints
  • The primary key attributes PK of each relation
    schema R cannot have null values in any tuple of
    r(R)
  • This is because primary key values are used to
    identify the individual tuples
  • tPK ? null for any tuple t in r(R)
  • Other attributes of R may be similarly
    constrained to disallow null values, even though
    they are not members of the primary key

43
Why referential integrity in relational model
  • Values in Dep_id of Student should appear in NO
    of Department (Why?)
  • Dep_id is said to reference to NO of Department
  • Dep_id is called foreign key
  • Student referential relation, Department
    referenced relation
  • A directed arc from Dep_id of Student to NO of
    Department

44
Referential Integrity Constraints
  • A constraint involving two relations (the
    previous constraints involve a single relation).
  • Tuples in the referencing relation R1 have
    attributes FK (foreign key attributes) that
    reference the primary key attributes PK of the
    referenced relation R2. A tuple t1 in R1 is said
    to reference a tuple t2 in R2 if t1FK t2PK
    or t1 is null.
  • displayed as a directed arc from R1.FK to R2.PK
  • May exist at one relation !

45
Referential constraint example
  • Movie(title, year, length, inColor, studioName,
    producerC)MovieExec(name, address, cert,
    netWorth)
  • Producer of every movie would have to appear in
    the MovieExec relation
  • ?producerC(Movie) ? ?cert(MovieExec)
    or?producerC(Movie) ?cert(MovieExec)
    ?(Presuming that NULL is not allowed in
    producerC of Movie)

46
Additional constraint examples
  • MovieStar(name, address, gender, birthdate)
  • Functional dependency constraint
  • name ? address
  • That fd can be expressed with relational algebra
  • ?MS1.name MS2.name and MS1.address ?
    MS2.address(MS1 ? MS2) ?where MS1 denotes
    ?MS1(name,address,gender,birthdate)(MovieStar)
  • Domain constraint
  • The only legal value for the gender attribute are
    F and M
  • ?gender ? F and gender ? M(MovieStar) ?

47
Constraints should be enforced wrt updates
  • DB update operations INSERT, DELETE, MODIFY
    tuple
  • Integrity constraints should not be violated by
    the update operations
  • Several update operations may have to be grouped
    together (i.e. transaction)
  • Updates may propagate to cause other updates
    (trigger) to be performed automatically to
    maintain integrity constraints
  • In case of integrity violation, several actions
    can be taken
  • cancel the operation that causes the violation.
  • perform the operation but inform the user of the
    violation.
  • trigger additional updates so the violation is
    corrected.
  • execute a user-specified error-correction routine.

48
Relational Operations on Bags
  • Commercial database systems rarely support notion
    of set, instead do support multiset (bag) as
    relation
  • Relations to be bags rather than sets can speed
    up operations on relations
  • projection operation
  • bag allows us to work with each tuple
    independently
  • With set, we need to compare with the result of
    all other projected tuples
  • Aggregate (say average) operation

49
Union, intersection, and difference of bags
  • if R and S are bag in which the tuple t appears n
    and m times respectively,
  • tuple t appears mn times in R?S
  • tuple t appears min(n,m) times in R?S
  • tuple t appears max(0, n-m) times in R?S

50
Projection of bags
  • Each tuple is processed independently during the
    projection
  • duplicate tuples are not eliminated from the
    result of a bag-projection
  • ?A,B(R)

51
Selection on bag
  • Apply the selection condition to each tuple
    independently
  • do not eliminate duplicate tuples in the result
  • ?C?6(R)

52
Product of bags
  • Each tuple of one relation is paired with each
    tuple of the other, regardless of whether it is a
    duplicate or not
  • if a tuple r appears in relation R m times, and
    tuple s appears n times in relation S, then the
    tuple rs will appear mn times in the product R ?
    S

53
Joins of bags
  • Compare each tuple of one relation with each
    tuple of the other.
  • When constructing the answer, do not eliminate
    duplicate tuples
  • Example

54
Datalog Rules Applied to Bags
  • The techniques for computing selections,
    projections and joins of bags can be applied to
    Datalog rules
  • do not eliminate duplicates from the head
  • H(x,z) ? R(x,y) AND S(y,z)
  • H(x,y) ? S(x,y) AND x gt 1H(x,y) ? S(x,y) AND y lt
    5

55
Algebraic laws for bags
  • An algebraic law is an equivalence between two
    expressions of relational algebra whose arguments
    are variables standing for relations
  • the commutative law for union R ? S ? S ? R
  • There are a number of laws that hold with set,
    but do not hold with bag
  • Distributive law of set difference over union(R
    ? S) T ? (R T) ? (S T) (Think about WHY
    !!!)

56
Extensions to the relational model
  • Other concepts and operations that are not a part
    of the formal relational model but appear in real
    query language
  • Modifications
  • Aggregations
  • Views
  • Null value
Write a Comment
User Comments (0)
About PowerShow.com