Relational Algebra (continued) - PowerPoint PPT Presentation

About This Presentation
Title:

Relational Algebra (continued)

Description:

Patty. 1234. Apu. 3421. Selma. 2342. Ned. 4531. customer-name ... borrower.l-number. loan.l-number. branch. Patty. 1234. 1234. Dublin. Apu. 3421. 3421. Irvine ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 60
Provided by: jics
Learn more at: https://cs.gmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Relational Algebra (continued)


1
Relational Algebra(continued)
  • Lecture 5

2
Relational Algebra (Summary)
  • Basic operations
  • Selection ( ? ) Selects a subset of rows from
    relation.
  • Projection ( ? ) Deletes unwanted columns from
    relation.
  • Cross-product ( ? ) Allows us to combine two
    relations.
  • Set-difference ( - ) Tuples in reln. 1, but not
    in reln. 2.
  • Union ( ? ) Tuples either in reln. 1 or in
    reln. 2 or in both.
  • Rename ( ? ) Changes names of the attributes
  • Since each operation returns a relation,
    operations can be composed ! (Algebra is
    closed.)
  • Use of temporary relations recommended.

3
Additional operators
  • We define additional operations that do not add
    any power to the relational algebra, but that
    simplify common queries.
  • Natural join
  • Conditional Join
  • Equi-Join
  • Division
  • Also, weve already seen Set intersection
  • r ? s r - (r - s)

All joins are really special cases of conditional
join
4
Quick note on notation
good_customers
bad_customers
customer-name loan-number
Patty 1234
Apu 3421
Selma 2342
Ned 4531
customer-name loan-number
Seymour 3432
Marge 3467
Selma 7625
Abraham 3597
If we have two or more relations which feature
the same attribute names, we could confuse them.
To prevent this we can use dot notation. For
example good_customers.loan-number
5
Natural-Join Operation Motivation
borrower
loan
Very often, we have a query and the answer is not
contained in a single relation. For example, I
might wish to know where Apu banks. The classic
relational algebra way to do such queries is a
cross product, followed by a selection which
tests for equality on some pair of fields.
l-number branch
1234 Dublin
3421 Irvine
cust-name l-number
Patty 1234
Apu 3421
cust-name borrower.l-number loan.l-number branch
Patty 1234 1234 Dublin
Patty 1234 3421 Irvine
Apu 3421 1234 Dublin
Apu 3421 3421 Irvine
?borrower.l-number loan.l-number(borrower x
loan)))
  • While this works
  • it is unintuitive
  • it requires a lot of memory
  • the notation is cumbersome

cust-name borrower.l-number loan.l-number branch
Patty 1234 1234 Dublin
Apu 3421 3421 Irvine
Note that in this example the two relations are
the same size (2 by 2), this does not have to be
the case.
So, we have a more intuitive way of achieving the
same effect, the natural join, denoted by the
symbol
6
Natural-Join Operation Intuition
Natural join combines a cross product and a
selection into one operation. It performs a
selection forcing equality on those attributes
that appear in both relation schemes. Duplicates
are removed as in all relation operations. For
the previous example (one attribute in common
l-number), we have
?borrower.l-number loan.l-number(borrower x
loan)))
  • If the two relations have no attributes in
    common, then their natural join is simply their
    cross product.
  • If the two relations have more than one
    attribute in common, then the natural join
    selects only the rows where all pairs of matching
    attributes match. (lets see an example on the
    next slide).

7
l-name f-name age
Bouvier Selma 40
Bouvier Patty 40
Smith Maggie 2
l-name f-name ID
Bouvier Selma 1232
Smith Selma 4423
B
A
l-name f-name age l-name f-name ID
Bouvier Selma 40 Bouvier Selma 1232
Bouvier Selma 40 Smith Selma 4423
Bouvier Patty 2 Bouvier Selma 1232
Bouvier Patty 40 Smith Selma 4423
Smith Maggie 2 Bouvier Selma 1232
Smith Maggie 2 Smith Selma 4423
Both the l-name and the f-name match, so select.
Only the f-names match, so dont select.
Only the l-names match, so dont select.
l-name f-name age l-name f-name ID
Bouvier Selma 40 Bouvier Selma 1232
We remove duplicate attributes
The natural join of A and B
l-name f-name age ID
Bouvier Selma 40 1232
A B
Note that this is just a way to visualize the
natural join, we dont really have to do the
cross product as in this example
8
Natural-Join Operation
  • Notation r s
  • Let r and s be relation instances on schemas R
    and S respectively.The result is a relation on
    schema R ? S which is obtained by considering
    each pair of tuples tr from r and ts from s.
  • If tr and ts have the same value on each of the
    attributes in R ? S, a tuple t is added to the
    result, where
  • t has the same value as tr on r
  • t has the same value as ts on s
  • Example
  • R (A, B, C, D)
  • S (E, B, D)
  • Result schema (A, B, C, D, E)
  • r s is defined as
  • ?r.A, r.B, r.C, r.D, s.E (?r.B s.B r.D s.D
    (r x s))

9
Natural Join Operation Example
  • Relation instances r, s

B
D
E
A
B
C
D
1 3 1 2 3
a a a b b
? ? ? ? ?
? ? ? ? ?
1 2 4 1 2
? ? ? ? ?
a a b a b
r
s
A
B
C
D
E
How did we get here? Lets do a trace over the
next few slides
? ? ? ? ?
1 1 1 1 2
? ? ? ? ?
a a a a b
? ? ? ? ?
10
A
B
C
D
B
D
E
? ? ? ? ?
1 2 4 1 2
? ? ? ? ?
a a b a b
1 3 1 2 3
a a a b b
? ? ? ? ?
r
s
First we note which attributes the two relations
have in common
11
A
B
C
D
B
D
E
? ? ? ? ?
1 2 4 1 2
? ? ? ? ?
a a b a b
1 3 1 2 3
a a a b b
? ? ? ? ?
r
s
A
B
C
D
E
? ?
1 1
? ?
a a
? ?
There are two rows in s that match our first row
in r, (in the relevant attributes) so both are
joined to our first row
12
B
D
E
A
B
C
D
1 3 1 2 3
a a a b b
? ? ? ? ?
? ? ? ? ?
1 2 4 1 2
? ? ? ? ?
a a b a b
r
s
A
B
C
D
E
? ?
1 1
? ?
a a
? ?
there are no rows in s that match our second row
in r, so do nothing
13
B
D
E
A
B
C
D
1 3 1 2 3
a a a b b
? ? ? ? ?
? ? ? ? ?
1 2 4 1 2
? ? ? ? ?
a a b a b
r
s
A
B
C
D
E
? ?
1 1
? ?
a a
? ?
there are no rows in s that match our third row
in r, so do nothing
14
B
D
E
A
B
C
D
1 3 1 2 3
a a a b b
? ? ? ? ?
? ? ? ? ?
1 2 4 1 2
? ? ? ? ?
a a b a b
r
s
A
B
C
D
E
? ? ? ?
1 1 1 1
? ? ? ?
a a a a
? ? ? ?
There are two rows in s that match our fourth row
in r, so both are joined to our fourth row
15
B
D
E
A
B
C
D
1 3 1 2 3
a a a b b
? ? ? ? ?
? ? ? ? ?
1 2 4 1 2
? ? ? ? ?
a a b a b
r
s
A
B
C
D
E
? ? ? ? ?
1 1 1 1 2
? ? ? ? ?
a a a a b
? ? ? ? ?
There is one row that matches our fifth row in
r,.. so it is joined to our fifth row and we are
done!
16
Natural Join on Sailors Example
S1
R1
17
Last week we saw
Query Find the name of the sailor who reserved
boat 101.
Note my use of temporary relation Temp.
18
Query revisited using natural join
Query Find the name of the sailor who reserved
boat 101.
19
Conditional-Join Operation
The conditional join is actually the most general
type of join. We introduced the natural join
first only because it is more intuitive and...
natural!
Just like natural join, conditional join combines
a cross product and a selection into one
operation. However instead of only selecting rows
that have equality on those attributes that
appear in both relation schemes, we allow
selection based on any predicate.
Where c is any predicate on the attributes of r
and/or s
Duplicate rows are removed as always, but
duplicate columns are not removed!
20
Conditional-Join Example
We want to find all women that are younger than
their husbands
l-name f-name marr-Lic age
Simpson Homer 777 36
Lovejoy Timothy 234 36
Simpson Bart null 9
l-name f-name marr-Lic age
Simpson Marge 777 35
Lovejoy Helen 234 38
Flanders Maude 555 24
Krabappel Edna 978 40
s
r
r.l-name r.f-name r.Marr-Lic r.age s.l-name s.f-name s.marr-Lic s.age
Simpson Marge 777 35 Simpson Homer 777 36
Note we have removed ambiguity of attribute names
by using dot notation Also note the redundant
information in the marr-lic attributes
21
Equi-Join
  • Equi-Join Special case of conditional join where
    the conditions consist only of equalities.
  • Natural Join Special case of equi-join in which
    equalities are specified on ALL fields having the
    same names in both relations.

22
Equi-Join
l-name f-name marr-Lic age
Simpson Marge 777 35
Lovejoy Helen 234 38
Flanders Maude 555 24
Krabappel Edna 978 40
l-name f-name marr-Lic age
Simpson Homer 777 36
Lovejoy Timothy 234 36
Simpson Bart null 9
s
r
r.l-name r.f-name Marr-Lic r.age s.l-name s.f-name s.age
Simpson Marge 777 35 Simpson Homer 36
Lovejoy Helen 234 38 Lovejoy Timothy 36
23
Review on Joins
  • All joins combine a cross product and a selection
    into one operation.
  • Conditional Join
  • the selection condition can be of any predicate
    (e.g. rating1 gt rating2)
  • Equi-Join
  • Special case of conditional join where the
    conditions consist only of equalities.
  • Natural Join
  • Special case of equi-join in which equalities are
    specified on ALL fields having the same names in
    both relations.

24
Banking Examples
  • branch (branch-name, branch-city, assets)
  • customer (customer-name, customer-street,
    customer-only)
  • account (account-number, branch-name, balance)
  • loan (loan-number, branch-name, amount)
  • depositor (customer-name, account-number)
  • borrower (customer-name, loan-number)

Note that I have not indicated primary keys here
for simplicity.
25
Example Queries
  • Find all loans of over 1200
  • ?amount gt 1200 (loan)

select from the relation loan, only the rows
which have an amount greater than 1200
loan-number branch-name amount
1234 Riverside 1,923.03
3421 Irvine 123.00
2342 Dublin 56.25
4531 Prague 120.03
loan
1234 Riverside 1,923.03
?amount gt 1200 (loan)
26
Example Queries
  • Find the loan number for each loan of an amount
    greater than 1200
  • ?loan-number (?amount gt
    1200 (loan))

select from the relation loan, only the rows
which have an amount greater than 1200, then
project out just the loan_number
loan-number branch-name amount
1234 Riverside 1,923.03
3421 Irvine 123.00
2342 Dublin 56.25
4531 Prague 120.03
loan
1234 Riverside 1,923.03
?amount gt 1200 (loan)
?loan-number (?amount gt 1200 (loan))
1234
27
Example Queries
  • Find all loans greater than 1200 or less than
    75
  • ?amount gt 1200 ? amount lt
    75(loan)

select from the relation loan, only the rows
which have an amount greater than 1200 or an
amount less than 75
loan-number branch-name amount
1234 Riverside 1,923.03
3421 Irvine 123.00
2342 Dublin 56.25
4531 Prague 120.03
loan
1234 Riverside 1,923.03
2342 Dublin 56.25
?amount gt 1200 ? amount lt 75(loan)
28
Example Queries
  • Find the names of all customers who have a loan,
    an account, or both, from the bank
  • ?customer-name (borrower) ? ?customer-name
    (depositor)

borrower
depositor
customer-name loan-number
Patty 1234
Apu 3421
Selma 2342
Ned 4531
customer-name account-number
Moe 3467
Apu 2312
Patty 9999
Krusty 3423
Moe
Apu
Patty
Krusty
Selma
Ned
?customer-name (depositor)
?customer-name (borrower)
Patty
Apu
Selma
Ned
Moe
Apu
Patty
Krusty
29
Example Queries
Note this example is split over two slides!
  • Find the names of all customers who have a loan
    at the Riverside branch.
  • ?customer-name (?branch-nameRiverside
    (?borrower.loan-number loan.loan-number(borrower
    x loan)))

borrower
loan
customer-name loan-number
Patty 1234
Apu 3421
loan-number branch-name amount
1234 Riverside 1,923.03
3421 Irvine 123.00
We retrieve borrower and loan we calculate
their cross product
customer-name borrower.loan-number loan.loan-number branch-name amount
Patty 1234 1234 Riverside 1,923.03
Patty 1234 3421 Irvine 123.00
Apu 3421 1234 Riverside 1,923.03
Apu 3421 3421 Irvine 123.00
30
?customer-name (?branch-nameRiverside
(?borrower.loan-number loan.loan-number(borrower
x loan)))
we calculate their cross product we select
the rows where borrower.loan-number is equal to
loan.loan-number we select the rows where
branch-name is equal to Riverside we project
out the customer-name.
customer-name borrower.loan-number loan.loan-number branch-name amount
Patty 1234 1234 Riverside 1,923.03
Patty 1234 3421 Irvine 123.00
Apu 3421 1234 Riverside 1,923.03
Apu 3421 3421 Irvine 123.00
customer-name borrower.loan-number loan.loan-number branch-name amount
Patty 1234 1234 Riverside 1,923.03
Apu 3421 3421 Irvine 123.00
customer-name borrower.loan-number loan.loan-number branch-name amount
Patty 1234 1234 Riverside 1,923.03
Patty
31
Now Using Natural Join
  • Find the names of all customers who have a loan
    at the Riverside branch.
  • ?customer-name (?branch-nameRiverside
    (?borrower.loan-number loan.loan-number(borrower
    x loan)))
  • gt ?customer-name (?branch-nameRiverside
    borrower loan))

borrower
loan
We retrieve borrower and loan 1234 in borrower
is matched with 1234 in loan 3421 in borrower
is matched with 3421 in loan The rest is the
same.
loan-number branch-name amount
1234 Riverside 1,923.03
3421 Irvine 123.00
customer-name loan-number
Patty 1234
Apu 3421
customer-name borrower.loan-number loan.loan-number branch-name amount
Patty 1234 1234 Riverside 1,923.03
Apu 3421 3421 Irvine 123.00
customer-name borrower.loan-number loan.loan-number branch-name amount
Patty 1234 1234 Riverside 1,923.03
32
Example Queries
Note this example is split over three slides!
  • Find the largest account balance
  • ...we will need to rename account relation as
    d...
  • ?balance(account) - ?account.balance(?account.bala
    nce lt d.balance (account x r(d,account)))

d
account
account-number balance
Apu 100.30
Patty 12.34
Lenny 45.34
account-number balance
Apu 100.30
Patty 12.34
Lenny 45.34
We do a rename to get a copy of account which
we call d next we will do a cross product
33
?balance(account) - ?account.balance(?account.bala
nce lt d.balance (account x r(d,account)))
account.account-number account.balance d.account-number d.balance
Apu 100.30 Apu 100.30
Apu 100.30 Patty 12.34
Apu 100.30 Lenny 45.34
Patty 12.34 Apu 100.30
Patty 12.34 Patty 12.34
Patty 12.34 Lenny 45.34
Lenny 45.34 Apu 100.30
Lenny 45.34 Patty 12.34
Lenny 45.34 Lenny 45.34
do a cross product select out all rows
where account.balance is less than
d.balance .. next we project
account.account-number account.balance d.account-number d.balance
Patty 12.34 Apu 100.30
Patty 12.34 Lenny 45.34
Lenny 45.34 Apu 100.30
34
?balance(account) - ?account.balance(?account.bala
nce lt d.balance (account x r(d,account)))
account.account-number account.balance d.account-number d.balance
Patty 12.34 Apu 100.30
Patty 12.34 Lenny 45.34
Lenny 45.34 Apu 100.30
.. next we project out account.balance then we
do a set difference between it and the original
account.balance from the account relation
the set difference leaves us with one number, the
largest value!
account.balance
12.34
12.34
45.34
account
account-number balance
Apu 100.30
Patty 12.34
Lenny 45.34
100.30
35
Now Using Conditional Join
  • Find the largest account balance
  • ...we will need to rename account relation as
    d...
  • ?balance(account) - ?account.balance(?account.bala
    nce lt d.balance (account x r(d,account)))
  • r(d,account)
  • ?balance(account) - ?account.balance(account
    account.balance lt d.balance d)

36
More Examples on Sailors Relations
  • Sailors(sid, sname, rating, age)
  • Boats(bid, bname, color)
  • Reserves(sid, bid, day)

37
Find names of sailors whove reserved boat 103
  • Solution 1
  • Solution 2
  • Solution 3

38
Find names of sailors whove reserved a red boat
  • Information about boat color only available in
    Boats so need an extra join
  • A more efficient solution
  • A query optimizer can find this given the first
    solution!

39
Find sailors whove reserved a red or a green boat
  • Can identify all red or green boats, then find
    sailors whove reserved one of these boats
  • Can also define Tempboats using union! (How?)

40
Find sailors whove reserved a red and a green
boat
  • Previous approach wont work! Must identify
    sailors whove reserved red boats, sailors whove
    reserved green boats, then find the intersection
    (note that sid is a key for Sailors)



41
Consider yet another query
  • Find the sailor(s) (sid) who reserved all the red
    boats.

R1
B
42
Start an attempt
  • who reserved what boat
  • All the red boats

43
Division Operation
r / s
  • Suited to queries that include the phrase for
    all, e.g. Find sailors who have reserved all red
    boats.
  • Let S1 have 2 fields, x and y S2 have only field
    y
  • S1/S2
  • i.e., S1/S2 contains all x tuples (sailors) such
    that for every y tuple (redboat) in S2, there is
    an xy tuple in S1 (i.e, x reserved y).
  • In general, x and y can be any lists of fields y
    is the list of fields in S2, and x?y is the list
    of fields of S1.
  • Let r and s be relations on schemas R and S
    respectively where
  • R (A1, , Am, B1, , Bn)
  • S (B1, , Bn)
  • The result of r / s is a relation on schema
  • R S (A1, , Am)

44
Division Operation Example
B
A
B
Relations r, s
1 2
? ? ? ? ? ? ? ? ? ? ?
1 2 3 1 1 1 3 4 6 1 2
s
A
? ?
r / s
r
  • ? occurs in the presence of both 1 and 2, so it
    is returned.
  • ? occurs in the presence of both 1 and 2, so it
    is returned.
  • does not occur in the presence of both 1 and 2,
    so is ignored.
  • ...

45
Another Division Example
Relations r, s
A
B
C
D
E
D
E
? ? ? ? ? ? ? ?
a a a a a a a a
? ? ? ? ? ? ? ?
a a b a b a b b
1 1 1 1 3 1 1 1
a b
1 1
s
A
B
C
r /s
? ?
a a
? ?
r
  • lt?, a ,? gt occurs in the presence of both lta,1gt
    and ltb,1gt, so it is returned.
  • lt ?, a ,? gt occurs in the presence of both lta,1gt
    and ltb,1gt, so it is returned.
  • lt?, a ,? gt does not occur in the presence of
    both lta,1gt and ltb,1gt, so it is ignored.

46
Examples of Division A/B
B1
B2
B3
A/B1
A/B2
A/B3
A
47
Find the sailor(s) who reserved ALL red boats
  • who reserved what boat
  • All the red boats

gt S1/S2
48
Find the names of sailors whove reserved all
boats
  • Uses division schemas of the input relations
    must be carefully chosen
  • To find sailors whove reserved all Interlake
    boats

.....
49
Some Properties
  • Selection
  • Projection
  • where li is a subset of attributes in R, li ?
    li1
  • A projection commutes with a selection that only
    uses attributes retained by the projection
  • where c(a1,,an) is a condition on attributes
    a1,,an

?a1,..,an(?c(a1,..,an) (R)) ?c(a1,..,an)
(?a1,..,an( R ))
50
Properties of Joins
  • Joins
  • where c(a1,,an) is a condition on attributes
    only in R

(Associative)

(Commute)

51
Additional Operator Outer Join
  • Joins match tuples satisfying a join condition.
  • Tuples without a matching are discarded.
  • Tuples with Null value in at least one join
    attribute are discarded.
  • Outer Joins useful when we want to keep tuples
    in one or both of the involved relations in a
    join operation that do not satisfy the join
    condition.
  • Left outer join
  • Right outer join
  • Full outer join
  • Schema of any Outer Join Schema of Natural Join

52
Additional Operator Outer Join
  • An extension of the join operation that avoids
    loss of information.
  • Computes the join and then adds tuples from one
    relation that does not match tuples in the other
    relation to the result of the join.
  • Uses null values
  • null signifies that the value is unknown or does
    not exist
  • All comparisons involving null are (roughly
    speaking) false by definition.
  • Will study precise meaning of comparisons with
    nulls later

53
Outer Join Example
  • Relation loan
  • Relation borrower

54
Outer Join Example
  • Inner Joinloan Borrower
  • Left Outer Join

loan borrower
loan-number
amount
customer-name
branch-name
L-170 L-230 L-260
3000 4000 1700
Simpson Wiggum null
Springfield Shelbyville Dublin
55
Outer Join Example
loan-number
amount
customer-name
branch-name
L-170 L-230 L-155
3000 4000 null
Simpson Wiggum Flanders
Springfield Shelbyville null
  • Right Outer Join
  • loan borrower

Full Outer Join
loan-number
amount
customer-name
branch-name
loan borrower
L-170 L-230 L-260 L-155
3000 4000 1700 null
Simpson Wiggum null Flanders
Springfield Shelbyville Dublin null
56
Full Outer Join
  • Full Outer Join S R keeps every tuple in
    both R and S, if not matching tuples are found
    in R (S), then the attributes in R (S) are set
    to Null

57
Cost of Relational Algebra Operations
  • Given two relations R, S, such as
  • Cardinality (R) n, Cardinality(S) m
  • Cost O(n)
  • ?c ( R )
  • ?a1,,an ( R )
  • Cost O(n x m)
  • R x S
  • When working with Cross-product (Joins),
    keep input relations as small as possible!

58
Cost of Relational Algebra Operations
  • Let q max(m,n),
  • Set Operations R ? S, R ? S, R SCost O(q
    log q)
  • Join Operations R S, R cS
  • Outer Joins Operations
  • R S, R S, R S
  • Best Case attributes in the join condition
    include the primary key in S Cost O(q log q)
  • Otherwise worst case Cost O(m x n)
  • When working with these operations, keep input
    relations as small as possible

59
Summary
  • The relational model has rigorously defined query
    languages that are simple and powerful.
  • Relational algebra is more operational useful as
    internal representation for query evaluation
    plans.
  • Several ways of expressing a given query a query
    optimizer should choose the most efficient
    version.
  • Operations covered 5 basic operations
    (selection, projection, union, set difference,
    cross product), rename, joins (natural join,
    equi-join, conditional join, outer joins),
    division
Write a Comment
User Comments (0)
About PowerShow.com