Database Systems Set Theory - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Database Systems Set Theory

Description:

Chase. Derek. first-name. Howard. Howard. Johnson. Utley. Jeter. last ... Chase. Fiction. 2006. Walking Promotions. www.zojjed.com. Howard. Ryan. Fiction. 2006 ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 72
Provided by: Salv9
Category:
Tags: chase | com | database | set | systems | theory | true | www

less

Transcript and Presenter's Notes

Title: Database Systems Set Theory


1
Database Systems Set Theory
  • RELATIONS
  • A relational database consists of tables, each
    of which is assigned a unique name.
  • A row in a table represents a relationship among
    a set of values.
  • A table is a collection of such relationships.
  • Column Headers are commonly referred to as
    attributes
  • Websites-Schema(website, organization,
    first-year, category)
  • websites relation

2
Database Systems Set Theory
  • RELATIONS
  • A relational database consists of tables, each
    of which is assigned a unique name.
  • A row in a table represents a relationship among
    a set of values.
  • A table is a collection of such relationships.
  • Column Headers are commonly referred to as
    attributes
  • Websites-Schema(website, organization,
    first-year, category)
  • websites relation

3
Database Systems Set Theory
  • DOMAINS
  • A Domain is the set of permitted values for a
    column/attribute.
  • The domain can be any positive number as in the
    case with first year
  • The domain can be a series of letters up to a
    maximum number of letters as in the case with
    organization.
  • The domain can be valid web addresses, whose
    rules might be slightly more complicated.
  • If
  • D1 denotes the set of all websites
  • D2 denotes the set of all organizations
  • D3 denotes the set of all first years
  • D4 denotes the set of all categories
  • Any row of websites must contain a
    4-tuple(v1,v2,v3, v4) where
  • v1 is a website in the domain D1
  • v2 is a organization in the domain D2
  • v3 is year in the domain D3
  • V4 is a category in the domain D4
  • Therefore, account is a subset of D1xD2xD3xD4.

4
Database Systems Set Theory
  • DOMAINS
  • In general a table must be a subset of
    D1xD2xxDn-1xDn
  • Tables vs. Relations
  • There exists a close relationship between this
    language and the terminology used in databases.
  • Instead of numbers DBs use names.
  • Relation -gt table
  • tuple -gt row
  • Website table has 6 tuples.

5
Database Systems Set Theory
  • TUPLE NOTATION
  • If t is a variable denoting the first tuple
    relationship, then twebsite denotes the website
    of the tuple t.
  • twebsite www.zojjed.com
  • torganizationWalking Promotions
  • tfirst-year 2006
  • tcategory Fiction
  • Alternatively
  • t1 www.zojjed.com
  • t2 Walking Promotions
  • t3 2006
  • t4 Fiction
  • t ? r, indicate the tuple t is in the relation r

6
Database Systems Set Theory
  • DOMAINS
  • It is possible for several attributes to have the
    same domain.
  • Later we will introduce a customer relation. It
    has a customer name, if I also had a employee
    table with the field employee name, technically
    they both have the same domain.
  • It depends upon how you look at it. If the domain
    is the set of all possible names, this is true.
  • What about the domains website and first-year.
    They are incompatible.
  • What about website and category? While they both
    may allow the same values, I would consider
    them as distinct domains.
  • In a set, a attribute may contain the value Null.
  • For now we will assume they do not.

7
Database Systems Set Theory
  • DOMAINS
  • It is possible for several attributes to have the
    same domain.
  • Later we will introduce a customer relation. It
    has a customer name, if I also had a employee
    table with the field employee name, technically
    they both have the same domain.
  • It depends upon how you look at it. If the domain
    is the set of all possible names, this is true.
  • What about the domains website and first-year.
    They are incompatible.
  • What about website and category? While they both
    may allow the same values, I would consider
    them as distinct domains.
  • In a set, a attribute may contain the value Null.
  • For now we will assume they do not.
  • Null values can be a large source of error and
    should be tested if they are allowed.

8
Database Systems Set Theory
  • DATABASE SCHEMAS
  • Logical design of the database
  • defines the type definition of a variable
  • DATABASE INSTANCE
  • Snapshot of the database at a given time
  • an instance of a variable
  • A database schema in relations is defined by
    using a capitalized name for the
    relationship-schema and a lowercase name of each
    attribute. An instance of a relation is
    represented by a lowercase name.
  • Websites-schema(website, organization,
    first-year, category)
  • A relation on the Website-schema is as follows
  • websites(Website-schema)
  • Side notes, very important
  • A relation has no order
  • A relation can not contain duplicate tuples

9
Database Systems Set Theory
  • Customers-Schema(website, first-name, last-name)
  • customers Relation

Notice the website attribute appears in both the
customers relation and Websites relation. This
is not a coincidence, often fields are
repeated. This allows distinct relations to be
related. If we wanted to gather website
information for all websites from customers need
information from both relations
10
Database Systems Set Theory
  • Combined information from website and customers
    relations

In real databases, unique id fields would be used
to identify the customer and the website so the
website name would not be repeated
11
Database Systems Set Theory
  • Instead of having two schemas, its possible to
    have one schema as follows
  • WebsiteCustomers(website, organization,
    first-year, category, first-name, last-name)
  • What is wrong with this?

12
Database Systems Set Theory
  • Redundant data as well as null fields.

13
Database Systems Set Theory
  • hit-counts-Schema (website, date, hit-count)
  • hit-counts relation

Is there anything wrong with the above relation?
14
Database Systems Set Theory
  • hit-counts-Schema (website, date, hit-count)
  • hit-counts relation

Is there anything wrong with the above
relation? No there is no reason why we can not
list a website more than once.
15
Database Systems Set Theory
  • If we did not care about the date and only cared
    about the hit count, could we define the
    hit-counts Schema as follows
  • hit-counts-Schema (website, hit-count)
  • hit-counts relation

16
Database Systems Set Theory
  • If we did not care about the date and only cared
    about the hit count, could we define the
    hit-counts Schema as follows
  • hit-counts-Schema (website, hit-count)
  • hit-counts relation

In real databases there would be no problem, but
we said that you can not repeat tuples in a
relation. So the answer is no.
17
Database Systems Set Theory
  • QUERY LANGUAGES
  • A query language is a language in which the user
    request information from the database.
  • Can be procedural or non-procedural.
  • We will study Relational Algebra
  • It Is a procedural language consisting of sets of
    operations that take one or two relations as
    input and output a relation. Operations include
  • select
  • project
  • union
  • set difference
  • Cartesian product
  • Rename
  • Intersection
  • Aggregate functions
  • We will also study various forms of joining
    relations.

18
Database Systems Set Theory
  • Unary- operates on one relation
  • Binary operates on a pair of relations
  • The Select Operation
  • Unary operation
  • Selects tuples that satisfy a given predicate
  • ? - represents a select operation - sigma
  • ?ltselect conditiongt(R)
  • ltselection conditiongt ltattribute namegt
    ltcomparison opgt ltconstant valuegt or
  • ltselection conditiongt ltattribute namegt
    ltcomparison opgt ltattribute namegt
  • comparison operators are , ltgt, lt, lt, gt, gt
  • equal, not equal, less than, less than or equal
    to, greater than, greater than or equal to

19
Database Systems Set Theory
  • To select those tuples of the hit-counts relation
    where the website is www.zojjed.com we write.
  • ?website www.zojjed.com(hit-counts)
  • This returns the relation

hit-counts relation
20
Database Systems Set Theory
  • To select those tuples of the hit-counts relation
    where the hit-count is greater than 1000 we
    write.
  • ?hit-count gt 1000 (hit-counts)
  • This returns the relation

hit-counts relation
21
Database Systems Set Theory
  • Can combine predicates with and, or, and not
  • To select those tuples of the hit-counts relation
    where the hit-count is greater than 5 and the
    website is www.zojjed.com, we write.
  • ?hit-count gt 5 and website www.zojjed.com(hi
    t-counts)
  • This returns the relation

hit-count gt 5
website www.zojjed.com
22
Database Systems Set Theory
  • The Project Operation
  • unary
  • returns arguments in relation without all
    attributes
  • duplicates are removed
  • ? - represent project operation - pi
  • ? ltattribute listgt (R)
  • ?website, category(Websites)

23
Database Systems Set Theory
  • Composition of Relational Operations
  • Often we need to combine operations. Often we
    wish to select a set of tuples and limit the
    relation returned to a few attributes.
  • What if we want to find out only the websites
    that have had greater than 1000 hits in a given
    day?
  • First we must find out what tuples have hit
    counts greater than 1000.
  • We can accomplish this with the following
    relational query
  • ?hit-countgt1000 (hit-counts)
  • By using the Project operation we can remove the
    extra attributes such as hit-count and date and
    only return the values in the website column.
  • ?website(?hit-countgt1000 (hit-counts))
  • What is the relation that is returned?

24
Database Systems Set Theory
  • What if we want to find out only the websites
    that have had greater than 1000 hits in a given
    day?
  • ?website(?hit-countgt1000 (hit-counts))
  • What is the relation that is returned?

hit-counts relation
25
Database Systems Set Theory
  • Union Operator
  • binary
  • ? - union operator
  • It is often useful to combine the results of
    queries.
  • Again, remember that set theory removes
    duplicates.
  • Relation 1 ? Relation 2 Result Set

26
Database Systems Set Theory
  • What is a query that returns all websites that
    have customers OR a hit count greater than 1000?
  • We need information from both the customers
    relation as well as the hit count relation.
  • First we need the names of all websites that have
    customers
  • ?website(customers)

customers Relation
27
Database Systems Set Theory
  • Then we need the names of the websites that have
    a hit count greater than 1000
  • ?website(?hit-countgt1000 (hit-counts))

hit-counts relation
28
Database Systems Set Theory
  • Combine the results using a union operation
  • ?website(customers) ? ?website(?hit-countgt1000
    (hit-counts))

?website(customers)
?website(?hit-countgt1000 (hit-counts))
  • Remember, order not important!
  • Unions MUST be of similar types
  • They MUST have the same number of attributes
  • The domains of the attributes MUST be the same

29
Database Systems Set Theory
  • Intersection Operator
  • binary
  • n - intersection operator
  • Returns all tuples contained within both
    relations
  • Relation 1 n Relation 2 Result Set

30
Database Systems Set Theory
  • What is a query that returns all websites that
    have customers AND a hit count greater than 1000?
  • First we need the names of all websites that have
    customers
  • ?website(customers)
  • Then we need the names of the websites that have
    a hit count greater than 1000
  • ?website(?hit-countgt1000 (hit-counts))

?customer-name(borrower) n ?customer-name(deposito
r)
31
Database Systems Set Theory
  • What is a query that returns all websites that
    have customers AND a hit count greater than 1000?
  • ?website(customers) n ?website(?hit-countgt1000
    (hit-counts))

?website(customers)
?website(?hit-countgt1000 (hit-counts))
32
Database Systems Set Theory
  • The Set Difference Operation (MINUS)
  • binary
  • -, denotes set difference
  • Relation 1 - Relation 2 Result Set
  • Finds tuples in one set but not in another
  • r s, produces a set containing those tuples in
    r but not in s.

33
Database Systems Set Theory
  • Produce a list of websites who have a hit count gt
    1000 and do not have a customer.
  • We need the names of all websites that have
    customers
  • ?website(customers)
  • Then we need the names of the websites that have
    a hit count greater than 1000
  • ?website(?hit-countgt1000 (hit-counts))

?website(?hit-countgt1000 (hit-counts)) -
?website(customers)
34
Database Systems Set Theory
  • Produce a list of websites who have a hit count gt
    1000 and do not have a customer.
  • ?website(?hit-countgt1000 (hit-counts)) -
    ?website(customers)

?website(customers)
?website(?hit-countgt1000 (hit-counts))
Notice the attributes in R1 that are not in R2
are included in the result set, but the
attributes in R2 that are not in R1 are not
included in the result set.
35
Database Systems Set Theory
Given the relation websites below
  • What is the result of the following?
  • ?website(empty set) - ?website(websites)

36
Database Systems Set Theory
Given the relation websites below
  • What is the result of the following?
  • ?website(empty set) - ?website(websites)

The result is the empty set, because no records
are contained in R1 and only the records in R1
that are not in R2 are returned from the MINUS
operator.
37
Database Systems Set Theory
Given the relation websites below
  • What is the result of the following?
  • ?website(websites) - ?website(empty set)

38
Database Systems Set Theory
The result of ?website(websites) - ?website(empty
set) is the complete relation websites, since no
records are contained in the empty set all
records from the websites relation are included
in the result set.
39
Database Systems Set Theory
The Cartesian-Product Operation binary x
combines information in two relations Relation 1
x Relation 2 Result Set because attributes can
be repeated in different relations, we need a
notation relation.attribute will be used.
Therefore, the resulting schema of r websites
x customers (websites.website,
websites.organization, websites.first-year,
websites.category, customers.website,
customers.first-name, customers.last-name) Note
that issues exist if you wish to use the same
relation twice, we will address this with the
rename operation shortly. What tuples exist in r
if r websites x customers? The combination of
all tuples in websites with every tuple in
customers. Given r1 with n1 tuples and r2 with
n2 tuples then r1xr2 has n1n2 tuples
40
Database Systems Set Theory
Lets look at a simplified example first. If
relation R1 contains the following
Then R1 x R2 contains the following
and if relation R2 contains the following
41
Database Systems Set Theory
Similarly, websites x customers appears as
follows
42
Database Systems Set Theory
What if we want to find all the customers who
bought from a website created before the year
2000? We could try the following ?first-year
lt2000 (websites x customers) Note that we are
not using a projection to reduce the number of
names to show what is really happening. In the
end, you would use a projection to show only the
fields requested by the question.
Oops, too many tuples!
43
Database Systems Set Theory
Because the Cartesian-product pairs all possible
tuples from websites are combined with all tuples
from customers. While only those with the
first-year lt 2000 are selected, it still returns
5 tuples. . Of those sets, we only want the ones
where the websites relations website attribute
equals the customers relations website attribute.
The only tuple we truly want is the highlighted
tuple. we can write this as follows ?websites.we
bsite customers.website(?first-year lt2000
(websites x customers))
44
Database Systems Set Theory
Since, ?websites.website customers.website(?fir
st-year lt2000 (websites x customers)) Returns
the following tuple with too many attributes, we
must also use a projection to remove the
excessive attributes.
Applying the projection of first-name, last name
to the previous query gives us the following
query ?first-name, last-name(?websites.website
customers.website(?first-year lt2000 (websites x
customers)))
45
Database Systems Set Theory
  • The Assignment Operator
  • unary
  • allows an expression to be assigned to a variable
  • NewRelation ? OldRelation
  • For example
  • 1200loans ? ?amount gt 1200(loan)
  • or
  • result ? ?loan-number(1200loans)
  • Or
  • The Rename Operation
  • Unary
  • ?x(E) renames the expression E to x.
  • Relational-algebra expressions do not have a name
    that we can refer to them by using the rename
    operator,

46
Database Systems Set Theory
Example, without using an aggregation function
(not yet shown), find the largest hit count of
any website for a single day. If the same max hit
count exists more than once, you are only allowed
to return a single tuple containing the
answer. We accomplish this in two steps First,
compute a temporary relationship consisting of
hit counts not greater than the largest hit
count. Second, take the set difference between
the relation ?hit-count(hit-counts) and the
temporary relation Compute all the websites hit
counts compared to all the websites hit counts,
in other words compute the Cartesian product of
the relation hit-counts with itself. hit-counts
x hit-counts However, we must rename one of the
hit-counts relations so that we can identify the
balance distinctly
47
Database Systems Set Theory
Given the projection of only the hit-count field
from the relation hit-counts via ?hit-count(hit-c
ounts) We have
If we rename the result of this projection ?d
(?hit-count(hit-counts)) And thus create a cross
product of the two relations as
?hit-count(hit-counts) x ?d (?hit-count(hit-coun
ts))
48
Database Systems Set Theory
?hit-count(hit-counts) x ?d (?hit-count(hit-count
s))
49
Database Systems Set Theory
?hit-count(hit-counts) x ?d (?hit-count(hit-count
s))
50
Database Systems Set Theory
Now we select only those tuples that have the
first attibute containing a value less than the
second attribute, we do so with the following
query ?hitcounts.hit-count lt d.hit-count
(?hit-count(hit-counts) x ?d (?hit-count(hit-coun
ts)))
51
Database Systems Set Theory
?hit-count(hit-counts) x ?d (?hit-count(hit-count
s))
52
Database Systems Set Theory
This certainly gives us a lot of tuples, but if
we then project just the hit-count from the first
column and remove the duplicates, we are left
with the following
This is the set containing all hit counts, but
the largest hit count.
53
Database Systems Set Theory
To get just the largest hit count we now simply
subtract our result set from the projection of
the original hit count relation as
follows ?hit-count(hit-counts) -
?hit-count(?hitcounts.hit-count lt d.hit-count
(?hit-count(hit-counts) x ?d (?hit-count(hit-coun
ts))))
54
Database Systems Set Theory
We need a better way to represent certain queries
as the notation for joining two relations and
only selecting records where the attributes match
is too cumbersome. Therefore we have The
Natural Join Operation Binary Result Set R1
x R2 The natural join operation finds the
Cartesian product of two relations, but only
returns tuples where the attributes whose names
are the same in both relations contain the same
values.
55
Database Systems Set Theory
Lets look at a simplified example first. If
relation R1 contains the following
and if relation R2 contains the following
56
Database Systems Set Theory
Then R1 x R2 contains the following
Then R1 x R2 contains the following
57
Database Systems Set Theory
  • Example
  • Find the names of all customers who have made a
    purchase from a health or travel website. Return
    the name of the customer, the website, and the
    category of the website.
  • The old way
  • Form a Cartesian product of the websites and
    customers relations.
  • Select the tuples of the same website as well as
    a category equal to health or travel.
  • Project the first-name, last-name, website, and
    category
  • first-name, last-name, website, category
  • (?websites.website customers.website and
  • category Health or category
    Travel(websites x customers))

58
Database Systems Set Theory
Another example Find all the names of websites
and the dates they have a hit count for web sites
that are in the health category.
59
Database Systems Set Theory
Another example Find all the names of websites
and the dates they have a hit count for web sites
that are in the health category. ? website, date
(?category Health (websites x hit-counts))
60
Database Systems Set Theory
Generalized Projections Allows basic arithmetic
operations within fields of a tuple Observe the
Sales relation
What was the price of the cost of the product
sold minus the tax paid? ? product, first-name,
last-name, (total-cost tax) as net-pay (Sales)
61
Database Systems Set Theory
Aggregate Functions takes a collection of values
and returns a single value as a result i.e sum
1, 1, 3, 4, 4, 11 returns the value 24. avg 1,
1, 3, 4, 4, 11 returns the value 4. count 1, 1,
3, 4, 4, 11 returns the value 6. min 1, 1, 3,
4, 4, 11 returns the value 1. max 1, 1, 3, 4,
4, 11 returns the value 11. count-distinct 1,
1, 3, 4, 4, 11 returns the value 5. Ignore the
fact that we said sets cant contain duplicate
values
62
Database Systems Set Theory
Operations to Modify the Contents of
Relations Deletion r ? r E Delete all of the
sale of Zojjed! from the Sales relation sales ?
sales - ?product Zojjed! (sales) Delete all
sales with no tax collected sales ? sales - ?tax
0 (sales) Insertion r ? r ? E Add a record to
the sales relation sales ? sales ? (I Walk to
Eat, Chase, Utley, 0, 15.95) Add a record
to the websites relation websites ? websites ?
(www.mediatitan.net, Walking Promotions,
2006, Fiction) You can also insert touples
based on the result of another query.
63
Database Systems Set Theory
Updating Remove all the tax from the sales
relation sales ? ? product, first-name,
last-name, 0, total-cost(sales)
64
Database Systems Set Theory
Outer Join There are other forms of joins. Lets
look at the following two simple relations
cities relation
teams relation
Natural Join -gt cities x teams The natural join
omits records that do not match, so we do not
have records for Jeter, Utley, Glavin, or Bonds.
65
Database Systems Set Theory
cities relation
teams relation
Left Outer Join -gt cities LOJ teams Includes all
records from the left and only those records on
the right that match
66
Database Systems Set Theory
cities relation
teams relation
Right Outer Join -gt cities ROJ teams Includes all
records from the right and only those records on
the left that match
67
Database Systems Set Theory
cities relation
teams relation
Full Outer Join -gt cities FOJ teams Includes all
records from the right and left, tuples that do
not match have nulls in their place.
68
Database Systems Set Theory
NULLS And (true and unknown unknown, false and
unknown false, unknown and unknown
unknown Or (true or unknown true, false or
unknown unknown, unknown or unknown
unknown) Not (not unknown unknown)
69
Database Systems Set Theory
REFERENTIAL INTEGRITY Superkey of R A unique
identified for a tuple. A set of attributes SK
of R such that no two tuples in any valid
relation instance r(R) will have the same value
for SK. That is, for any distinct tuples t1 and
t2 in r(R), t1SK ? t2SK. Key of R A
"minimal" superkey that is, a superkey K such
that removal of any attribute from K results in a
set of attributes that is not a
superkey. Example The CAR relation
schema CAR(State, Reg, SerialNo, Make, Model,
Year) has two keys Key1 State, Reg Key2
SerialNo Both are superkeys. SerialNo, Make
is a superkey but not a key. If a relation has
several candidate keys, one is chosen
arbitrarily to be the primary key. The primary
key attributes are underlined.
70
Database Systems Set Theory
REFERNTIAL INTEGRITY Relational Database
Schema A set S of relation schemas that belong
to the same database. S is the name of the
database. S R1, R2, ..., Rn Entity
Integrity The primary key attributes PK of
each relation schema R in S cannot have null
values in any tuple of r(R). This is because
primary key values are used to identify the
individual tuples. tPK ? null for any tuple t
in r(R) Note, other attributes of R may be
similarly constrained to disallow null values,
even though they are not members of the primary
key.
71
Database Systems Set Theory
Referential Integrity A constraint involving two
relations (the previous constraints involve a
single relation). Used to specify a
relationship among tuples in two relations the
referencing relation and the referenced
relation. Tuples in the referencing relation R1
have attributes FK (called foreign key
attributes) that reference the primary key
attributes PK of the referenced relation R2. A
tuple t1 in R1 is said to reference a tuple t2 in
R2 if t1FK t2PK. A referential integrity
constraint can be displayed in a relational
database schema as a directed arc from R1.FK to
R2.
Write a Comment
User Comments (0)
About PowerShow.com