Title: The Relational Model and Relational Algebra
1The Relational Model and Relational Algebra
- Srinath Srinivasa,
- Indian Institute of Information Technology,26/C
Electronic City,Hosur Road, Bangalore 560100 - sri_at_iiitb.ac.in
2Background
- Introduced by Ted Codd of IBM Research in 1970.
- Concept of mathematical relation as the
underlying basis. - The standard database model for most
transactional database today.
3Relational Model Concepts
- A relational database is a collection of
relation. - A relation resembles a Table or a Flatfile of
records. - Each row in such a table represents a collection
of such related data values ( an instance of
the relation depicted by the table) -
Table 1 A example relation called STUDENT
- Each row or a relation instance is called a
tuple and - each column is called attribute of the relation
4Some Definitions
- A data value is said to be atomic if it cannot be
subdivided into smaller data values. For example,
the age of a person in years is an atomic value. - A Domain is a set of atomic values. Examples are
the set of all Integers, the set of all valid
student roll number, the set of all Indian cities
with population above 6 million etc. - A Relation Schema denoted by R(A1,A2,A3,An) is
made up of relation name R and a list of
attributes A1,A2,A3,An . Each Ai is the name of
role played some domain D in the schema R. The
domain Ai is denoted by dom(Ai).
5Some Definitions
- The Degree of the relation is the number of
attributes of its relation schema. - A relation r of a relation schema
R(A1,A2,A3,An), is a set of n-tuples of the
form t1,t2,.,tn where each ti ? dom(Ai). - The Relation can be defined as
- r(R) ? (dom(A1) x dom(A2) x dom(An)),
where x is the cartesian product over sets.
6Characteristics of Relation
- Ordering of Tuples Mathematically tuples in
relation dont have ordering (although in
reality they do have order when stored in files
or disk) - Ordering of attributes Attributes within a tuple
have to maintain order if they are stored as
tuples. However a relation need not necessarily
be maintained as a set of ordered tuples as long
as a mapping between attribute and value is
maintained in each relation instance. - Values of Tuples Each value that makes up a
tuple is atomic and cannot be further subdivided.
This is also called the first normal form
assumption.
7Relational Constraints
- Domain constraints domain constraints specify
that value of each attribute A must be atomic and
a member of dom(A). - Key constraints Given a relational schema R
(Ai), i 1n , a subset of attributes K is
called a superkey if the values of attributes
in K is distinct for every relation instance.
This is, for any two tuples ti , tj , (ti ? tj)
? (ti K ? tjK). - A key k of a relation is a subset of the
superkey, k ? K,such that removal of any
attribute in k removes the superkey property for
k. They are also called Minimal superkeys.
8Relational Constraints
- In STUDENT table (Roll no, Name) is a superkey,
while (Roll no) is a minimal key. -
- A relation schema may have more than one key.
Each key is called a candidate key. -
- Usually one of the candidate key is chosen to
uniquely identify tuples in a relation. such a
key is called primary key for a relation. -
- In STUDENT relation, Roll no is a good primary
key.
9Relational Constraints
- Entity Integrity Constraint The primary key of a
tuple may never be null. - Referential Integrity Constraint Informally, a
tuple that refers to another tuple from another
relation should refer to an existing tuple. - In a given relation R1 a set of attributes
FK is said to be a foreign key of R1 referencing
another relation R2, if the following rules hold - The attribute in FK have the same domain as the
primary key of R2. - For every tuple in R1, the attributes in its
Foreign Key FK either reference a tuple in R2 or
is null.
10Relational Constraints
Employee
Department
Foreign Keys are diagrammatically depicted as
arrows. Note that foreign key can be from a
relation to itself.
11Relational Constraints
- Semantic Integrity Constraint Constraints on the
values of attributes. - Example The salary of the employee cannot
be more than of his/her supervisor. - The minimum age of an employee is 18 and
maximum is 65. - Semantic Integrity constraints are application
specific and not part of the relational model,
strictly speaking. Such constraints are specified
and enforced using general purpose constraint
specification languages.
12Basic Relational Algebra Operations
- Operations of relational model can be categorized
into retrievals and updates. - Updates can be either insert of new tuple ,
update( or modify) an existing tuple or delete an
existing tuple. - Retrieval is handled mostly by select (denoted by
?) and project (denoted by ?) operations.
13The Select Operation
- The SELECT operation is used to select (retrieve)
a subset of the tuples from a relation that
matches the selection condition . - ?salary gt 3000 (Employee)
-
- selects all tuples of relation Employee which
matches the criteria - Employee.Salary gt 3000
-
14The Select Operation
- The general syntax of SELECT is as follows
- ? ltConditiongt (Relation)
- where condition is of the form
- Condition? (Expression) logicaloperator condition
Expression - Expression ? attributename operator constant
attributename operator attributename - Logicaloperator ? AND OR
- Operator ? lt gt ? ? ?
15Properties of SELECT operation
- The SELECT operation is unary. It operates on
only one relation. - Selection conditions are applied to each tuple
individually in the relation. Hence the condition
cannot be span more than one tuple. - The degree of the relation resulting from ?c(R)
is the same as that of R. - ?c(R) ? R
- The SELECT operation is commutative
?c1 (?c2(R)) ? ?c2 (?c1(R)) - Hence, SELECT operation can be cascaded in any
fashion.
16The Project Operation
- The project operation selects specific
columns from the specified relation. - ?Name, Salary ( Employee)
- returns a relation comprising of only the
Employee.Name and Employee.Salary attributes. The
operation has said to have projected Employee
relation onto the required relation. -
17The Project Operation
- The general form of PROJECT is as follows
- ? ltattributeslistgt (Relation)
- where attributelist is a comma-separated
list of attributes belonging to the specified
relation. -
- The result of the PROJECT is in the same order as
the specified list of attributes.
18Properties of Project operation
- The PROJECT operation removes duplicate in its
results. - The number of tuples returned by PROJECT is less
than or equal to number of tuples in the
specified relation. - When attrubute list of PROJECT includes the
superkey then the number of tuples returned is
equal to the number of tuples in the specified
relation. - PROJECT is not commutative.
- ?l1?l2(R) ?l1 (R) if l1 is a substring of l2.
Else the operation is an incorrect expression.
19Composition and Assignment
- Since the input and output of the relational
operator is a single relation, they can be
composed in any fashion with the output of one
operation being the input of the other. - ? Name, Salary (?Salary gt 3000 (Employee ) )
- returns Name and Salary field of all records of
Employee, where salary is greater than 3000. - Results of a query can also be assigned to a name
to form a relation by that name. - SalaryStatement ? ? Name, Salary (? Salary gt
3000( Employee )) - create a new relation SalaryStatement out of the
specified query.
20Cartesian Join ( x)
- PROJECT and SELECT operators expect a single
relation. - When PROJECT and SELECT need to be run on a
multiple relation, a single relation can be
generated using the cartesian join ( x )
operator.
TABLE Student
TABLE Lab
Consider the Relational Query ? Student.Lab
Lab.Name (Student x Lab)
21Cartesian Join (x)
Columns matching the query are shown in PINK.
22Query Result
23Properties of Cartesian Join ( x )
- The Cartesian Join represents a Canonical Join of
two or more relations. - If relation R having R tuples and relation S
having S tuples are joined using x, then R
x S R ? S - Cartesian join is too inefficient for most
queries involving multiple relations.
24Relational Algebra (contd)
- Join operators and additional relational operators
25Relational Algebra Summary
- Based on the concept of mathematical relation
- Building block a relation comprising of
attributes within domains - Tuples Schema Relation
- Properties Ordering, Duplicates
- Constraints Key constraints, entity constraints,
referential integrity constraints - Basic retrieval operators Select, Project
- Composability of relational operators and
Cartesian join
26Theta Join Operator (??? )
- Theta join is used to combine related tuples
from two or more relations ( specified by the
condition theta ), to form a single tuple. - Consider the earlier relation Student and Lab.
- The relation
-
- Student ?? Student.Lab Lab.Name Lab
- returns the same result as
- ? Student.Lab Lab.Name (Student x Lab)
- However, the number of tuples generated by Join
is 4 rather than 12 as in the Cartesian join. - Note that Student.lab is a foreign key into Lab
(and Lab.Name is a primary key of Lab) and
referential Integrity has to be maintained to
perform this join.
27Theta Join Operator
- The general form of theta join is as follows
- R ?? join condition S
- where R and S are relation and join condition is
a logical expression over attributes of R and S . - Tuples whose join attributes are null do not
appear in the final result.
28Theta Join Operator
- Theta join where the only comparison operator
used is the equals () sign, are called equijoins
. - A natural join denoted by is an equijoin,where
the attribute names at both ends of the equality
sign are the same. In this case, the attribute is
not repeated in the final result. - Consider the Student and the Lab example. Let
the Student relation schema be modified as
Student(RollNo,Name,Labname) and Lab relation
schema be modified as Lab(LabName,Faculty,Dept). - The natural join Student Lab has a scheme
- (Roll No, Name, LabName, Faculty, Dept)
29Renaming
- TA(id,LabName) ? ?Roll No,Lab(Student)
- The above relation projects RollNo and Lab
attributes from Student and renames them as id
and LabName respectively in the output relation.
It also names the output relation as TA. - The above can also be achieved by rename (?)
operator as - ? TA(id,LabName) (? Roll No,Lab( Student ))
-
30Renaming
- The general form of the rename operator is as
follows - ? S(B1, B2, .Bn) (R)
- Or
- ? S (R)
-
- Or
- ? (B1, B2, .Bn) ( R)
- The first form renames both relation name and
attribute names, the second form renames only the
relation name and the third form renames only
attribute names. -
31Set Theoretic Operations
- Set theoretic operation like UNION ( U ),
INTERSECTION ( ) and SET DIFFERENCE can be
applied to compatible relations. - Compatibility Two relations R ( A1,A2An) and
S ( B1,B2,.Bn) are compatible if they have the
same number of attributes and for all
- The Union operator R U S returns the set of all
tuples that are present in either R or S or both.
Duplicate tuples will be eliminated. - The Intersection operator R S returns the
set of all tuples are present in both R and S. - The Set Difference operator R S returns the
set of all tuples in R but not in S. - Note that
, but
32The Division Operator
- The Division operator ( ) is used to denote
the conditions where a given relation R is to be
split based on whether its association with every
tuple in an other relation S. - Let the set of attributes of R be Z and set of
attributes of S be X. Let - Y Z X, that is the set of all
attributes of R that are not in S. - The result of the operation is as
follows. For every tuple
,
R
S
33Other Relational Operators
- Outer Join A normal join operation ?? requires
referential integrity. - Every attribute in R that refers S should refer
to an existing tuple in S. If either the
referencing attribute of S is null, the tuple is
not returned. - Outer join is used when all tuples are required
even when referential integrity is not met. Left
outer join includes tuples even when the
referencing attribute is null and Right outer
join includes tuples even when the referenced
attribute is null or referenced tuple does not
exist. - An outer join which is both left and right is
called a Full outer join. - Outer Union Outer Union computes the union of
partially compatible relations. Two relations
R(X) and S(Z) are said to be partially compatible
if there exists W ? X and Y ? Z such that
for every Ai ? W and Bi ? Y, dom(Ai)
dom(Bi). - Outer Union of R and S creates a new relation
that includes all attributes from
both relation. Null values fill up all undefined
attributes.
34The Complete Set of the Operators
- The following set of relational operators
, , U , -- , x is called the complete
set of the relational operators because all
other operators can be mathematically expressed
as a sequence of the above operators. - For Example
- Similarly
-
35Relational Operators on Bags
- A bag or a multi-set is a set that may have
multiple occurrences of a given element - Bags are necessary when aggregation (sum,
average,) has to be performed over query results - Tolerating bags while answering queries reduces
unnecessary overheads
36Set-theoretic operations on bags
Consider two bags R where tuple t occurs n times
and S, where
tuple t occurs m times. The union of bags or
disjoint union of sets denoted by R S is a
bag that contains R S tuples or mn
occurrences of tuple t. In the intersection of
bags R ? S, tuple t occurs min(n,m) times. In
the set difference of bags R S, tuple t
occurs max(0, n-m) times.
37Other operators on Bags
- The select operator on bags is applied to each
tuple in the bag independently and duplicates are
not removed from the result - The project operator on bags is applied to each
tuple independently and duplicates are not
removed from the result. Faster than project
operator on sets. - Cartesian product on bags If tuple t occurs m
times in relation R and tuple s occurs n times in
relation S, then tuple ts occurs mn times in RxS.
38Algebraic expressions on bags
The following algebraic expressions are valid
when R and S are sets, but invalid when they are
bags (R ? S) T (R T) ? (S T)
1-1-1 principle R ? (S ? T) (R ? S) ?
(R ? T) 2-1-2 principle ?C OR D (R) ?C
(R) ? ?D (R)
39Relational Operators Example
Consider the following schema Employee
(FNAME,MINIT,LNAME,PAN,DOB,ADDR
GENDER,SALARY,SUPERVISOR,DNO) Department
(DNAME, DNUMBER, MGRPAN,
MGRSTARTDATE) Dept_Locations
(DNUMBER, DLOCATION) Project (PNAME, PNUMBER,
PLOCATION, DNUM) Works_on (EPAN, PNO, HOURS)
40Relational Operators Example
Query 1 Retrieve the name and address of all
employees who work for Research department
Research_dept ? ?DNAMEResearch
(Department) Research_emps ?
(Research_dept ??DNUMBERDNO EMPLOYEE) Result ?
?FNAME,LNAME,ADDRESS (Research_emps)
41Relational Operators Example
Query 2 Find the names of employees who work on
all projects controlled by department no. 5
Dpt5_proj ? ?PNUMBER (?DNUM5
(Project)) Emp_Prj(PAN,PNUMBER) ? ?EPAN,PNO
(Works_on) Result_Pans ? Emp_Prj ? Dpt5_proj
Result ? ?FNAME,LNAME (Result_Pans Employee)
42Relational Operators Example
Query 3 List the names of all employees who are
not working on any projects All_Pans ? ?PAN
(Employee) Prj_Pans ? ?EPAN (Works_on) NoPrj_Pan
s ? All_Pans Prj_Pans Result ? ?FNAME,LNAME
(NoPrj_Pans Employee)
43Summary
- Definition of relation schema, relation, domain,
attributes. - Difference between the mathematical concept of
relation and their implementation as tables or
files. - Constraints in relational model Domain
Constraints, Key constraints (Super key,
Candidate key, Primary key definition), Entity
Integrity constraint, Referential integrity
constraint . - Basic relational algebra operation ,
- Relational Algebra operators
outer join, outer union. - The complete set of relational operators.