MIS3150 Data and Information Management Query Languages - PowerPoint PPT Presentation

About This Presentation
Title:

MIS3150 Data and Information Management Query Languages

Description:

Title: SQL Subject: Database Management Systems Last modified by: Arijit Sengupta Created Date: 1/16/1997 2:19:00 PM Document presentation format – PowerPoint PPT presentation

Number of Views:183
Avg rating:3.0/5.0
Slides: 55
Provided by: wrightEd5
Category:

less

Transcript and Presenter's Notes

Title: MIS3150 Data and Information Management Query Languages


1
MIS3150 Data and Information ManagementQuery
Languages SQL
  • Arijit Sengupta

2
Structure of this semester
MIS3150
1. Design
2. Querying
0. Intro
Database Fundamentals
Conceptual Modeling
Query Languages
Relational Model
Advanced SQL
Normalization
Newbie
Users
Professionals
Designers
Developers
3
Todays Buzzwords
  • Query Languages
  • Formal Query Languages
  • Procedural and Declarative Languages
  • Relational Algebra
  • Relational Calculus
  • SQL
  • Aggregate Functions
  • Nested Queries

4
Objectives
  • At the end of the lecture, you should
  • Get a formal as well as practical perspective on
    query languages
  • Have a background on query language basics (how
    they came about)
  • Be able to write simple SQL queries from the
    specification
  • Be able to look at SQL queries and understand
    what it is supposed to do
  • Be able to write complex SQL queries involving
    nesting
  • Execute queries on a database system

5
Set Theory Basics
  • A set a collection of distinct items with no
    particular order
  • Set description
  • b b is a Database Book
  • c c is a city with a population of over a
    million
  • x 1 lt x lt 10 and x is a natural number
  • Most basic set operation
  • Membership x ? S (read as x belongs to S if x
    is in the set S)

6
Other Set Operations
  • Addition, deletion (note that adding an existing
    item in the set does not change it)
  • Set mathematics
  • Union R ? S x x ? R or x ? S
  • Intersection R ? S x x ?R and x ? S
  • Set Difference R S x x ? R and x ? S
  • Cross-product R x S ltx,ygt x ? R and y ? S
  • You can combine set operations much like
    arithmetic operations R (S ? T)
  • Usually no well-defined precedence

7
Relational Query Languages
  • Query languages Allow manipulation and
    retrieval of data from a database.
  • Relational model supports simple, powerful QLs
  • Strong formal foundation based on logic.
  • Allows for much optimization.
  • Query Languages ! programming languages!
  • QLs not expected to be Turing complete.
  • QLs not intended to be used for complex
    calculations.
  • QLs support easy, efficient access to large data
    sets.

8
Formal Relational Query Languages
  • Two mathematical Query Languages form the basis
    for real languages (e.g. SQL), and for
    implementation
  • Relational Algebra More operational, very
    useful for representing execution plans.
  • Relational Calculus Lets users describe what
    they want, rather than how to compute it.
    (Non-operational, declarative.)
  • Understanding Algebra Calculus is key to
    understanding SQL, query processing!

9
Structured Query Language
  • Need for SQL
  • Operations on Data Types
  • Definition Manipulation
  • Operations on Sets
  • Declarative (calculus) vs. Procedural (algebra)
  • Evolution of SQL
  • SEQUEL ..SQL_92 .. SQL_93
  • SQL Dialects
  • Does SQL treat Relations as Sets?

10
Preliminaries
  • A query is applied to relation instances, and the
    result of a query is also a relation instance.
  • Schemas of input relations for a query are fixed
    (but query will run regardless of instance!)
  • The schema for the result of a given query is
    also fixed! Determined by definition of query
    language constructs.
  • Positional vs. named-field notation
  • Positional notation easier for formal
    definitions, named-field notation more readable.
  • Both used in SQL

11
Example Instances
  • Students, Registers, Courses relations for our
    examples.

R1
C1
S2
S1
12
Relational Algebra
  • Basic operations
  • Selection ( ) Selects a subset of rows
    from relation.
  • Projection ( ) Deletes unwanted columns from
    relation.
  • Cross-product ( ) Allows us to combine two
    relations.
  • Set-difference ( ) Tuples in reln. 1, but
    not in reln. 2.
  • Union ( ) Tuples in reln. 1 and in reln. 2.
  • Additional operations
  • Intersection, join, division, renaming Not
    essential, but (very!) useful.
  • Since each operation returns a relation,
    operations can be composed! (Algebra is closed.)

13
Projection
  • Deletes attributes that are not in projection
    list.
  • Schema of result contains exactly the fields in
    the projection list, with the same names that
    they had in the (only) input relation.
  • Projection operator has to eliminate duplicates!
    (Why??)
  • Note real systems typically dont do duplicate
    elimination unless the user explicitly asks for
    it. (Why not?)

14
Vertical Slices
Algebra projection ?ltA1,A2,...Amgt (R)
  • Projection
  • Specifying Elements
  • No Specification
  • List all information about Students
  • select
  • from STUDENT
  • (Student)
  • Conditional
  • List IDs, names, and addresses of all students
  • select StudentID, name, address
  • from STUDENT
  • ? StudentID, name, address (Student)

15
Does SQL treat Relations as Sets?
  • What are the different salaries we pay to our
    employees?
  • select salary
  • from EMPLOYEE
  • OR is the following better?
  • select DISTINCT salary
  • from EMPLOYEE

16
Selection
  • Selects rows that satisfy selection condition.
  • No duplicates in result! (Why?)
  • Schema of result identical to schema of (only)
    input relation.
  • Result relation can be the input for another
    relational algebra operation! (Operator
    composition.)

17
Horizontal Slices
Algebra selection or restriction (R)
  • Restriction
  • Specifying Conditions
  • Unconditional
  • List all students
  • select
  • from STUDENT
  • (Student)
  • Conditional
  • List all students with GPA gt 3.0
  • select
  • from STUDENT
  • where GPA gt 3.0
  • ? GPA gt 3.0 (Student)

18
Specifying Conditions
List all students in ... select
from STUDENT where city in
(Boston,Atlanta)
List all students in ... select
from STUDENT where zip not between
60115 and 60123
19
Pattern Matching
any string with n characters, ngt0 _ any
single character. x exact sequence of string x.
List all CIS courses. select
from COURSE where course like CIS

List all CIS 3200 level courses. select
from COURSE where course
like ?
20
Missing or Incomplete Information
  • List all students whose address or telephone
    number is missing
  • select
  • from STUDENT
  • where Address is null or GPA is null

21
Horizontal and Vertical
  • Query
  • List all student ID, names and addresses who
    have
  • GPA gt 3.0 and date of birth before Jan 1, 1980.
  • select StudentID, Name, Address
  • from STUDENT
  • where GPA gt 3.0 and DOB lt 1-Jan-80
  • order by Name DESC
  • Algebra ? StudentID,name, address (? GPA gt 3.0
    and DOB lt 1-Jan-80 (STUDENT))
  • Calculus t.StudentID, t.name, t.address t ?
    Student ?t.GPA gt 3.0 ?
  • t.DOB lt 1-Jan-80
  • Order by sorts result in descending
    (DESC) order.
  • Note The default order is ascending (ASC)
    as in
  • order by Name

22
Union, Intersection, Set-Difference
  • All of these operations take two input relations,
    which must be union-compatible
  • Same number of fields.
  • Corresponding fields have the same type.
  • What is the schema of result?

23
Union
  • List students who live in Atlanta or GPA gt 3.0
  • select StudentID, Name, DOB, Address
  • from STUDENT
  • where Address Atlanta
  • union
  • select StudentID, Name, DOB, Address
  • from STUDENT
  • where GPA gt 3.0
  • Can we perform a Union on any two Relations ?

24
Union Compatibility
  • Two relations, A and B, are union-compatible
  • if
  • A and B contain a same number of attributes, and
  • The corresponding attributes of the two have the
    same domains
  • Examples
  • CISStudent (ID Did Name Dname Address
    Daddr Grade Dgrade)
  • Senior-Student (SName Dname S Did Home
    Daddr Grade Dgrade)
  • Course (C Dnumber Title Dstr Credits
    Dnumber)
  • Are CIS-Student and Senior-Student union
    compatible?
  • Are CIS-Student and Course union compatible?
  • What happens if we have duplicate tuples?
  • What will be the column names in the resulting
    Relation?

25
Union, Intersect, Minus
select CUSTNAME, ZIP from CUSTOMER where
STATE MA UNION select SUPNAME,
ZIP from SUPPLIER where STATE MA
ORDER BY 2
select CUSTNAME, ZIP from CUSTOMER where
STATE MA INTERSECT select SUPNAME,
ZIP from SUPPLIER where STATE MA
ORDER BY 2
select CUSTNAME, ZIP from CUSTOMER where
STATE MA MINUS select SUPNAME,
ZIP from SUPPLIER where STATE MA
ORDER BY 2
B
A
A
B
A
26
Cross-Product
  • Each row of S1 is paired with each row of R1.
  • Result schema has one field per field of S1 and
    R1, with field names inherited if possible.
  • Conflict Both S1 and R1 have a field called sid.
  • Renaming operator

27
Joins
  • Condition Join
  • Result schema same as that of cross-product.
  • Fewer tuples than cross-product, might be able to
    compute more efficiently
  • Sometimes called a theta-join.

28
Joins
  • Equi-Join A special case of condition join
    where the condition c contains only equalities.
  • Result schema similar to cross-product, but only
    one copy of fields for which equality is
    specified.
  • Natural Join Equijoin on all common fields.

29
Find names of students who have taken course 103
30
Connecting/Linking Relations
  • List information about all students and the
    classes they are taking

Student
Class
What can we use to connect/link Relations? Join
Connecting relations so that relevant tuples can
be retrieved.
31
Join
Cartesian Product
Student 30 tuples
Class 4 tuples
Total Number of Tuples in the Cartesian Product.
? (match each tuple of student to every tuple
of class) Select tuples having identical Student
Ids. Expected number of such Tuples
Join Selectivity
32
Join Forms
R1
R2
  • General Join Forms
  • Equijoin
  • Operator Dependent
  • Natural Join
  • Outer Join
  • Left
  • Right
  • Full

select s., c. from STUDENT s, CLASS
c where s.StudentID c. SID
R1
R2
x gt y ltgt ...
select s., c. from STUDENT s, CLASS
c where s.StudentID c.SID ()
33
Find names of students who have taken a CIS course
  • Information about departments only available in
    Courses so need an extra join
  • A query optimizer can find this given the first
    solution!

34
Find students who have taken an MIS or a CS course
  • Can identify all MIS or CS courses, then find
    students who have taken one of these courses
  • Can also define Temp1 using union! (How?)
  • What happens if is replaced by in this
    query?

35
Find students who have taken a CIS and an ECI
Course
  • Previous approach wont work! Must identify
    students who have taken CIS courses, students who
    have taken ECI courses, then find the
    intersection (note that sid is a key for
    Students)

36
Relational Calculus
  • Comes in two flavours Tuple relational calculus
    (TRC) and Domain relational calculus (DRC).
  • Calculus has variables, constants, comparison
    ops, logical connectives and quantifiers.
  • TRC Variables range over (i.e., get bound to)
    tuples.
  • DRC Variables range over domain elements (
    field values).
  • Both TRC and DRC are simple subsets of
    first-order logic.
  • Expressions in the calculus are called formulas.
    An answer tuple is essentially an assignment of
    constants to variables that make the formula
    evaluate to true.

37
Find students with GPA gt 3.7 who have taken a CIS
Course
DRC
38
Find students who have taken all CIS courses
How will you do this with Relational Algebra?
39
Monotonic and Non-Monotonic Queries
  • Monotonic queries queries for which the size of
    the results either increase or stay the same as
    the size of the inputs increase. The result size
    never decreases
  • Non-monotonic queries queries for which it is
    possible that the size of the result will
    DECREASE when the size of the input increases
  • Examples of each?
  • Which of the algebra operations is non-monotonic?
  • What does this signify?

40
Summaries and Aggregates
Calculate the average GPA select avg.
(GPA) from STUDENT, Find the lowest
GPA select min (GPA) as minGPA from
STUDENT, How many CIS majors? select count
(StudentId) from STUDENT where majorCIS
Discarding duplicates select avg (distinct
GPA) STUDENT where majorCIS (is
this above query correct?)
41
Aggregate Functions
  • COUNT (attr) - a simple count of values in
    attr
  • SUM (attr) - sum of values in attr
  • AVG (attr) - average of values in attr
  • MAX (attr) - maximum value in attr
  • MIN (attr) - minimum value in attr
  • Take effect after all the data is retrieved from
    the database
  • Applied to either the entire resulting relation
    or groups
  • Cant be involved in any query qualifications
    (where clause)
  • Would the following query be permitted?
  • select StudentId
  • from STUDENT
  • where GPA max (GPA)

42
Grouping Results Obtained
  • Show all students enrolled in each course.
  • select cno, StudentID
  • from REGISTRATION
  • group by cno Is this grouping OK?
  • Calculate the average GPA of students by county.
  • select county, avg (GPA) as CountyGPA
  • from STUDENT
  • group by county
  • Calculate the enrollment of each class.
  • select cno, year , term, count (StudentID) as
    enroll
  • from REGISTRATION
  • group by cno, year, term

43
Selections on Groups
  • Show all CIS courses that are full.
  • select cno, count (StudentID)
  • from REGISTRATION
  • group by cno
  • having count (StudentID) gt 29

44
Grouping Results after Join
  • Calculate the average GPA of each class

select course, avg (GPA) from STUDENT S, CLASS
C where S.StudentID C.SID group by course,
45
Nesting Queries
  • SELECT attribute(s)
  • FROM relation(S)
  • WHERE attr not in comparison operator
    exists
  • ( query statement(s) )
  • List names of students who are taking BA201
  • select Name
  • from Student
  • where StudentID in
  • ( select StudentID
  • from REGISTRATION
  • where courseBA201)

46
Sub Queries
List all students enrolled in CIS
courses select name from STUDENT where
StudentId in (select StudentId from REGISTR
ATION where cno like CIS)
List all courses taken by Student (Id 1011)
select cname from COURSE where cnum
any (select cno from REGISTRATION where S
tudentId 1011)
47
Sub Queries
Who received the highest grade in CIS
8140 select StudentId from REGISTRATION where c
num CIS 8140 and grade gtall (select grade
from REGISTRATION where cno CIS 8140)
List all students enrolled in CIS
courses. select name from STUDENT
S where exists (select from REGISTRATION
where StudentId S.StudentId and cno like
CIS)
48
Relational Views
  • Relations derived from other relations.
  • Views have no stored tuples.
  • Are useful to provide multiple user views.
  • What level in the three layer model do views
    belong?
  • Which kind of independence do they support?

49
View Creation
  • Create View view-name ( attr , attr ...)
  • AS subquery
  • with check option
  • DROP VIEW view-name
  • Create a view containing the student ID, Name,
    Age and GPA for those who are qualified to take
    300 level courses, i.e., GPA gt2.0.

50
View Options
  • With Check Option enforces the query condition
    for insertion or update
  • To enforce the GPA gt2.0 condition on all
    new student tuples inserted into the view
  • A view may be derived from multiple base
    relations
  • Create a view that includes student IDs,
    student names and their instructors names for
    all CIS 300 students.

51
View Retrieval
  • Queries on views are the same as that on base
    relations.
  • Queries on views are expanded into queries on
    their base relations.
  • select Name, Instructor-Name
  • from CIS300-Student
  • where Name Instructor-Name

52
View Update
  • Update on a view actually changes its base
    relation(s)!
  • update Qualified-Student
  • set GPA GPA-0.1
  • where StudentID s3
  • insert into Qualified-Student
  • values ( s9, Lisa, 4.0 )
  • insert into Qualified-Student
  • values ( s10, Peter, 1.7 )
  • Why are some views not updateable?
  • What type of views are updateable?

53
Non-monotonic queries again!
  • Need to use either MINUS or NOT EXISTS!
  • Find courses where no student has gpa over 3.5
  • Find students who have taken all courses that Joe
    has taken
  • How would you solve these?

54
Summary
  • SQL is a low-complexity, declarative query
    language
  • The good thing about being declarative is that
    internally the query can be changed automatically
    for optimization
  • Good thing about being low-complexity?
  • No SQL query ever goes into an infinite loop
  • No SQL query will ever take indefinite amount of
    space to get the solution
  • Can be used for highly complex problems!
Write a Comment
User Comments (0)
About PowerShow.com