CMSC 424 - PowerPoint PPT Presentation

1 / 211
About This Presentation
Title:

CMSC 424

Description:

... simple database, just giving names and incoming telephone number, but it helps ... Superkey: set of attributes whose values uniquely identify the entity ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 212
Provided by: david227
Learn more at: http://www.cs.umd.edu
Category:
Tags: cmsc | is | number | phone | this | whose

less

Transcript and Presenter's Notes

Title: CMSC 424


1
  • CMSC 424
  • Database Design
  • Section 401
  • Dr. David Kuijt

2
Contact Info
  • Professor David Kuijt
  • Office AVW 3205
  • Phone 5-0534
  • Email kuijt_at_cs.umd.edu
  • Office Hours T/Th 315-400 (or by appointment)
  • TA Debbie Heisler
  • Office AVW 3270
  • Phone 405-7027
  • Email heisler_at_cs.umd.edu
  • Office Hours TBA

3
Basic Information
  • Required text
  • Korth Silberschatz Database System Concepts,
    Fourth Edition, McGraw Hill 2001.
  • Warnings
  • Late homework or projects are not acceptable.
    Hand in what you have finished. Exceptions will
    be made only for emergencies or medical reasons
    with a doctor's note.
  • No makeup exams. Exceptions as above.
  • Cheating will result in an immediate grade of XF
    ("failure through academic dishonesty" -- this
    goes on your permanent transcript), and may
    result in suspension or expulsion from
    University. This is your only warning. Don't do
    it.

4
Motivation
  • We live in a database world. The simplest acts
    are tied to databases. The last time I called
    out for delivery pizza, it involved at least four
    enormous databases. What were they?
  • Pizza Hut knew what I had ordered before -- they
    asked if I wanted the same pizza as last time.
    They probably store lots more information than
    that -- perhaps all my old order information.
    They could use this information to make corporate
    decisions (quantities of materials to order
    forecasting pizza trends) as well as a local aid.

5
Motivation (2)
  • Every delivery food place around here uses caller
    ID. That's a relatively simple database, just
    giving names and incoming telephone number, but
    it helps them avoid some types of fraudulent
    orders and errors when writing down names,
    addresses, and stuff.
  • I used a telephone. When you use a telephone,
    all the details about the call are stored in a
    database. Call length, what number you called,
    time of call, billing information, and so on.
    Cell phone databases are even more complex.

6
Motivation (3)
  • I paid by credit card. Huge databases are
    involved.
  • Every time somebody runs your credit card through
    a swipe reader or types in the number, they're
    checking information in a database.
  • Is this a valid credit account?
  • Does it have enough money to cover the bill?
  • Is the credit card stolen?
  • Debit the merchant account, credit the consumer
    account
  • Whether the transaction is accepted or rejected,
    all the details are recorded in a database
    somewhere.

7
More Motivation
  • Every time you go to an ATM, use a credit card,
    buy something with a UPC bar code at a
    supermarket or department store, go to a movie,
    concert, or Caps game, register for classes, or
    get a parking ticket in Lot 4, you are working
    with databases.
  • Everything in your wallet that isn't a photograph
    is an entry in a database somewhere
  • medical plan cards
  • credit cards
  • student ID
  • driver's license
  • membership cards in clubs or interest groups
  • everything. Even the currency!

8
Final Motivation
  • Databases are all around us.
  • Knowledge is power.
  • Databases give us power (the ability to do things
    we couldnt otherwise do)
  • they give other people power over us, and
    knowledge about what we are doing.
  • This class is about Databases.
  • So what is a Database?

9
What is a Database?
  • At the simplest level, a database architecture
    has two components.
  • (1) Data.
  • Usually a whole lot of it
  • Representing multiple types of different objects
  • Each type may be related to itself and to other
    types in multiple ways
  • (2) A set of methods to access and manipulate the
    data.

10
Data
  • For any reasonable-size database the data may be
    quite complex.
  • It is an attempt to record or model all the
    aspects of the real world that are important to
    one specific purpose -- telephone calls, for
    example, or credit card accounts.
  • Lots of different objects need to be stored as
    data, and they need to be stored in such a way as
    to reflect the ways that the objects can interact
    with each other.

11
Database Example
  • For example, consider a local telephone system
    database. Types of data stored include
  • Account information
  • customers individuals, groups, companies that
    have leased numbers
  • billing addresses, payment history, calling plans
    and billing contracts
  • Hardware information
  • network structure (call routing),
  • hardware age, reliability, and maintenance
    information,
  • system load tracking,
  • network billing pattern (what numbers are long
    distance from what other numbers, and what ones
    are local)

12
Database Example (2)
  • Local telephone system database continued.
    Additional types of data would include
  • Call information
  • start and end time,
  • telephone number that initiated the call
  • telephone number(s) that received the call
  • All that information could be stored in files
    with much less fuss and bother. Why use a
    database? Why not just store the information in
    flat files?

13
Why Not Flat Files?
  • Why use a database? Why not just use a flat
    file?
  • Databases have a number of advantages over flat
    files.
  • Data Access. The set of programs that provide
    access to a database allow much more complex and
    flexible queries to the database with greater
    efficiency and convenience.
  • Reduced duplication and better control over data
    consistency. Data redundancy is bad. Which item
    to change in an update? How do you know that
    you've found all the copies? Data inconsistency
    (disagreement between various copies of the same
    data) is a serious problem.
  • Integrity Constraints can be enforced inside a
    database -- telephone numbers all 10 digits
    phone numbers in Maryland all have the first
    three numbers being 410 or 301.

14
Why Not Flat Files? (2)
  • Uniform access and control of data using a
    standard language
  • Data Independence. We want the data to be
    independent of the representation chosen for it
    within the system. Tying the data to a given
    representation is what caused the Y2K fuss --
    only two digits were used for a "year" field.
  • Concurrency control. Multiple users on a single
    database is a big advantage.
  • Recovery.
  • Security. Different users of the database may
    need different levels of access to information.
  • Centralized Control
  • Platform independence (portability). Since the
    internal file structure and access program
    details are hidden from the user, it is much
    easier to use the database on multiple platforms.

15
Data Abstraction
  • Most users don't need to understand all the
    details of the implementation and data design of
    a complex database. To make a database
    convenient to use, the system provides users with
    an abstract view of the data, limiting the
    information available to them. There are usually
    three levels of data abstraction.
  • Physical Level
  • Conceptual Level
  • View Level

16
Data Abstraction (2)
  • Physical level. The actual implementation
    details of low-level data structures are
    described at this level.
  • Conceptual level. This level describes all the
    different data types that exist by defining a
    relatively small number of simple structures,
    including all the relationships that these data
    types have with each other. Implementation of
    these objects might be complex, but it is hidden
    from the user at this level. Database
    administrators are usually the only ones who have
    access at this level.

17
Data Abstraction (3)
  • View level. There may be multiple different
    views, each of which represents a simpler subset
    of the functions and data available at the
    conceptual level. Different user types may
    require different parts of the database (for
    example, a bank account database might be
    accessed by cashiers, account holders, credit
    card companies, and the bank's payroll manager.
    Each of them can only access a small part of the
    full database of bank account information).
    Creating a number of restricted views makes the
    database more useful for the individual user
    types, giving each type access according to the
    needs of that type.

18
Data Abstraction (4)
  • Definition a Schema is a specification of a
    particular database using a particular data
    model.
  • The three levels of data abstraction are often
    referred to as
  • External Schema(s) (for the view level(s)).
  • Conceptual Schema (for the conceptual level)
  • Internal Schema (for the physical level)

19
Database as Model
  • A model represents a perception of a real system
  • Models help us manage or understand the real
    world system they represent.
  • When modeling a system we select aspects and
    characteristics we want to represent we abstract
    them to form a simple(r) system
  • examples a map, an airplane flight simulator,
    computer weather analysis program
  • A database is a model of reality

20
Data Models underlying the Database
  • The data model is a collection of conceptual
    tools for describing data and its attributes
  • data objects
  • interrelationships of the data
  • data semantics and consistency constraints
  • There are two well-established data models used
    in database design
  • Entity-Relationship (E-R) model
  • Relational model
  • older methods included the Network and
    Hierarchical data models
  • Each was tied closely to the underlying
    implementation, which made it more difficult to
    model data and to modify or update the database.
    As a result they arent much used any more

21
Entity-Relationship Model
  • Diagram based model
  • Two primitives
  • Entities -- each represents a unique real-world
    object
  • Relationships -- each represents an association
    among several entities
  • Each are associated in sets of the same type (for
    example, one entity set might be customer,
    representing the set of all entities that
    represent customers at a given bank)
  • Third important notion Attributes
  • Entities are associated with a set of attributes

22
Diagram-based Model
23
Entities
  • Entity a distinguishable object we want to model
  • e.g., room CSI 3120, Celine Dion, Elizabeth I of
    England
  • Entities have attributes (single-valued
    properties)
  • e.g., a person has a name, SSN, gender,
  • if an attribute has more than a single value, we
    should model it as an Entity
  • Entity Set a set of entities of the same type
  • e.g., CLASSROOMs, SINGERs, HISTORICAL MONARCHs
  • Entity Sets may overlap
  • CSI 3120 is a member of CLASSROOMs and also a
    member of CSI BUILDING ROOMs.

24
Relationships
  • Relationship is an association among entities
  • David Kuijt teaches-in CSI 3120
  • Relationship Set is a collection of relationships
    of the same type
  • FACULTY teach-in CLASSROOMs
  • Relationships may also have attributes
  • e.g., the relationship teach-in has an attribute
    weekday and another attribute time to store
    the day and time in which a given Entity of the
    set FACULTY teaches in a given Entity of the type
    CLASSROOM

25
Example Database Design (1)
  • Application library database. Authors have
    written books about various subjects different
    libraries in the system may carry these books.
  • Entities (with attributes in parentheses)
  • Authors (SS, name, tel, birthdate)
  • Books (ISDN, title)
  • Subjects (sname)
  • Libraries (lname)
  • Relations associating entities in square
    brackets
  • Wrote-on Authors, Subjects
  • Carry Libraries, Subjects
  • Index Subjects, Books

26
Diagram of Initial Database Design
27
Poor Initial Design
  • Our first design is a poor model of the
    real-world system we are examining. Problems in
    our first design
  • no relationship associating authors and books
  • no relationship associating libraries and books
  • common queries will be complex and difficult
  • Q what libraries carry books by a given author?
  • Q what books has a given author written?
  • Q who is the author of a given book?
  • Q how many copies of a given book exist at each
    library?
  • Q what edition of a book does the library have?

28
Example Database Design (2)
  • Application library database as before
  • Entities (with attributes in parentheses)
  • Authors (SS, name, tel, birthdate)
  • Books (ISDN, title)
  • Subjects (sname)
  • Libraries (lname)
  • Relations associating entities in square
    brackets (attributes in parentheses)
  • Wrote Authors, Books
  • In-stock Libraries, Books (quantity, edition)
  • Index Subjects, Books

29
Diagram of Improved Database Design
30
Keys
  • Fundamental concept for databases
  • Must be able to uniquely identify things within a
    database (in the E-R model, Entities and
    Relationships)
  • Avoid duplication of results in a search
    identify data redundancy in other operations
  • Halt search on positive results
  • Quick lookup in underlying data structures used
    at the Physical Level of abstraction
  • Examples of possible keys
  • Student ID number (SS) is used as a key for most
    UMD databases having to do with students

31
Entity Keys
  • Superkey set of attributes whose values uniquely
    identify the entity
  • candidate key a minimal superkey (a minimal
    subset of a superkey whose values still uniquely
    identify the entity)
  • primary key if there is more than one possible
    candidate key, one is chosen as the primary one
    used for most entity-identification purposes
  • weak entity has no primary key instead it
    depends upon another strong entitys primary key
    to exist
  • e.g., CHILDren of EMPLOYEEs are weak the primary
    key of EMPLOYEE in addition to the attributes of
    the CHILD are used for identification
  • weak entities are existent dependent on a
    strong entity -- when the strong entity gets
    deleted, so does the weak one

32
Relationship Keys
  • Depend upon the entity mapping of the
    relationship
  • one-one the primary key of any of the entities
    can be used to uniquely distinguish a given
    relationship between two unique entities.
  • one-many the primary key of the many entity,
    plus possibly a subset of the attributes of the
    relationship, will uniquely identify a given
    relationship
  • e.g., MOTHER gave-birth-to CHILD to identify a
    specific gave-birth-to relationship requires the
    primary key of MOTHER and possibly the (date) and
    (time) attributes of gave-birth-to
  • many-many the union of the primary keys of the
    entities associated, plus possibly a subset of
    the attributes of the relationship, will uniquely
    identify a given relationship
  • e.g., PERSON married PERSON SS of both and
    possibly date

33
Special Cases
  • Relationships may associate different entities of
    the same type
  • Ternary versions of the above
  • M-N relationships many-one mappings are often
    more useful in practice than many-many mappings.
  • DUMMY Entities can be used to convert an M-N
    mapping relationship to a pair of relationships,
    one M-1 and one N-1.

34
Specialization-Generalization(ISA Hierarchy)
  • This is a way to represent entity complexity
  • specialization top-down refinement of entities
    with distinct attributes
  • Entity type BANK ACCOUNT might be subdivided into
    related but different types CHECKING ACCT and
    SAVINGS ACCT
  • generalization bottom-up abstraction of common
    attributes
  • Course types DATABASE, SYSTEM, and NETWORK all
    have common attribute (project). From them we
    can abstract a new course type PRACTICAL COURSE
  • other common course attributes are included
    (e.g., course number)

35
ISA Hierarchy Example Top-down Refinement
  • Account entity with attributes balance and number
  • additional complexity we want to represent two
    subtypes of account
  • Savings Account with attribute Interest Rate
  • Checking Account with attribute Overdraft Limit

36
ISA Hierarchy Example Bottom-up Abstraction
  • Three related entities with similar attribute
    project
  • we abstract a new type of super entity Practical
    Course and link the three entities as subtypes
  • other shared attributes (e.g., course number) are
    also promoted to the upper level entity

37
Aggregation(Part-of Hierarchy)
  • This is a way to represent relationship
    complexity
  • relationships among relationships are not
    supported by the E-R model
  • often we want to model lower-level relationships
    differently
  • Groups of entities and relationships can be
    abstracted into higher level entities

38
Part-of Hierarchy Example
  • Entities driver, car, tires, doors, engine,
    seats, piston, valves
  • Relationship drives is insufficient to model the
    complexity of this system
  • Part-of relationships allow abstraction into
    higher level entities (piston and valves as parts
    of engine engine, tires, doors, seats aggregated
    into car)

39
Mapping an E-R Schema to Tables
  • Motivation - translating E-R database designs
    into Relational designs
  • Both models are abstract, logical representations
    of a real-world enterprise
  • Both models employ similar design principles
  • Converting an E-R diagram to tables is the way we
    translate an E-R schema to a Relational schema.
  • Later on well examine how to convert a
    Relational schema to an E-R schema

40
Mapping an E-R Schema to Tables (2)
  • Strong Entity E with primary key PK and
    attributes A, B, E(PK, A, B, )
  • Weak Entity F with (non-primary) key WK and
    attributes C, D, depending upon E above for
    primary key
    F(PK, WK, C, D, )
  • Relationship R with attributes L, M, and
    associating Entities E (with primary key PK), E2
    (PK2), E3 (PK3),
    R(PK, PK2, PK3, , L, M, )
  • Relationships between weak entities and the
    strong one on which they are dependent usually do
    not require representation because it is usually
    a many-one relationship with no attributes on the
    relationship (they are on the weak entity) and so
    the resulting table R(PK, WK) is a subset of the
    weak entity itself.

41
Table Details
  • The whole table represents a single Entity Set or
    Relationship Set.
  • Each entry (row) in the table corresponds to a
    single instance (member in that set)
  • For an Entity Set each column in the table
    represents an attribute in the E-R diagram
  • For a Relationship Set each column in the table
    represents either an attribute of the
    Relationship or one of the parts of the primary
    key of the Entity Sets it associates

42
Mapping an E-R Schema to Tables (3)
  • ISA relationships choose either to
  • Represent the super class entity, then represent
    each subclass with the primary key of the super
    class and its own attribute set. This is very
    similar to the way weak entities are treated.
  • Or, map the subclasses to separate relations and
    ignore the whole super class. This is good when
    the subclasses partition the whole superclasses
    between them (the subclasses are disjoint and the
    union of the subclasses covers the whole super
    class).
  • Aggregate (part-of) relationship
  • Translation is straightforward -- just treat the
    aggregate as an entity and use the methods
    defined above.
  • With last weeks lecture, this covers the
    material of chapter 2.

43
Relational Database Model
  • Most popular logical data model
  • Relations (also called tables) represent both
    Entity Sets and Relationship Sets.
  • Attributes form the columns of the table (column
    and attribute are synonymous)
  • Each row represents a single entity or
    relationship (called a row or tuple)
  • Each instance of an attribute takes values from a
    specific set called the domain of the column (the
    domain defines the type)

44
Relational Database Model (cont)
  • A relation schema is made up of the name and
    attributes of a relation with their underlying
    domains
  • A database schema is a set of all relation
    schemas.
  • The notions of keys, primary keys, superkeys are
    all as previously described

45
Query Languages
  • a language in which a user requests information
    from the database
  • a higher level language than standard programming
    languages
  • query languages may be procedural or
    non-procedural
  • procedural languages specify a series of
    operations on the database to generate the
    desired result
  • non-procedural languages do not specify how the
    information is generated
  • most commercial relational database systems offer
    a query language that includes procedural and
    non-procedural elements

46
Relational Algebra
  • procedural query language
  • set of operators that map one or more relations
    into another relation
  • closed algebraic system
  • best feature - operations on operations
  • form relational algebraic expressions
  • two types of operations set-theoretic and
    database specific

47
Relational Algebra Operations
  • database specific
  • (horizontal) selection (?)
  • (vertical) projection (?)
  • join
  • outer join
  • semijoin
  • division
  • set operators
  • union
  • difference
  • intersection
  • cartesian (cross) product

48
Example Relations
  • EMP ename salary dept
  • Gary 30K toy
  • Shirley 35K candy
  • Christos 37K shoe
  • Robin 22K toy
  • Uma 30K shoe
  • Tim 12K (null)
  • DEPT dept floor mgr
  • candy 1 Irene
  • toy 2 Jim
  • men 2 John
  • shoe 1 George

49
Database Specific Operators
  • (horizontal) selection (?)
  • picks a subset of the rows
  • (vertical) projection (?)
  • picks a subset of the columns
  • join
  • creates a new relation (table) out of two
  • equijoin (based upon equality of attributes)
  • natural join (equijoin plus projection to
    eliminate duplicated columns)

50
Set Operators
  • union
  • both relations must be union-compatible -- same
    degree and same domains
  • set difference
  • both relations must be union-compatible as above
  • intersection
  • same deal
  • cartesian (cross) product
  • note similarity to join operation join can be
    defined as a cross product followed by a
    selection criteria

51
More Operators
  • rename (?)
  • results of operations in the relational algebra
    do not have names
  • it is often useful to be able to name such
    results for use in further expressions later on
  • conceptually similar to an assignment operator in
    most programming languages
  • semijoin
  • very useful in practical implementation of large
    queries
  • semijoin of R and S is equivalent to the join of
    R and S projected onto the attributes of R.

52
Still More Operators Outer Join
  • outer join is an extension of the join operation
    to deal with missing information
  • three types left outer join, right outer join,
    and full outer join
  • left outer join computes the natural join, then
    takes all tuples (rows) in the left relation that
    did not match on the join attribute and includes
    them in the result, with all attributes of the
    right relation padded with null values
  • right outer join is the same, except non-matching
    tuples in the right relation are included in the
    result padded with null values
  • full outer join includes all non-matching tuples
    of both relations appropriately padded
  • see examples in text, p108-109

53
Still More Operators
  • division
  • R/S given R(A,B) and S(B), then a given tuple t
    is in R/S if for all s in S there exists an r in
    R such that r.Bs.B and t.Ar.A.
  • So tuple t with attribute t.A is in the result if
    and only if R contained tuples (t.A, B1), (t.A,
    B2), (t.A, B3), for every possible value Bi
    contained in S.
  • Note that S must be defined on a subset of the
    attributes of R for the operation to be
    meaningful.

54
A Short Interlude Integrity
  • the preceding slides covered chapter three up to
    section 3.3
  • before attacking chapter 4 (SQL), were going to
    make a brief excursion up to chapter 6, touching
    sections 6.1 - 6.4
  • Integrity constraints attempt to enforce data
    consistency and prevent accidental damage to the
    database during updates
  • Weve already seen two forms of integrity
    constraints
  • key declarations (stipulating that certain
    attributes form a candidate key for a given
    entity set)
  • mapping form of a relationship (one-one,
    one-many, many-many)

55
Integrity Constraints
  • Domain Constraints
  • simplest form of integrity constraint
  • type declarations are one such domain constraint
    (e.g., integer, floating point, double-precision,
    fixed length character string).
  • domains can be further restricted (e.g., check
    clause in SQL can ensure that hourly wages are ?
    4.00 dollars)
  • easily tested whenever a new data item is entered
    into the database
  • extensions like date or currency can be easily
    supported on a strongly typed programming
    language
  • Null values can be useful for values to be filled
    in later, but some attributes may need to be
    specified as not Null (e.g., primary keys
    cannot have a null value)

56
Integrity Constraints (2)
  • Key Constraints
  • keys must have unique values
  • primary key -- a candidate key declared primary
  • unique key -- a candidate key
  • foreign key -- a set of attributes that are a
    primary key for some other relations
  • foreign keys are an important concept because we
    need to treat foreign keys differently from other
    attributes (for example, protecting their
    uniqueness and insuring referential integrity)
    even though they arent a primary key in the
    current relation

57
Referential Integrity
  • We often want to be able to ensure that an
    attribute value in a tuple of a relation appears
    in at least one tuple of another relation. For
    example
  • EMP(eno, ename, salary)
  • DEPT(dno, dname, floor)
  • WORKS-IN(eno, dno, hours)
  • note that eno is a foreign key in WORKS-IN
  • We want the following to be true
  • ?eno(WORKS-IN) ? ?eno(EMP) (every eno is a real
    employee)
  • ?dno(WORKS-IN) ? ?dno(DEPT) (every dno is a real
    department)
  • SQL allows the declaration of domain/key/referenti
    al integrity constraints with the clause check in
    its DDL

58
Referential Integrity SQL DDL Example
  • Create table customer
  • (cust-name char(20) not null,
  • cust-street char(30),
  • cust-city char(30),
  • primary key (cust-name))
  • Create table branch
  • (branch-name char(15) not null,
  • branch-city char(30),
  • assets number,
  • primary key (branch-name),
  • check (assets ?0))
  • Create table account
  • (account-no char(10) not null,
  • branch-name char(15),
  • balance number,
  • primary key (account-no),
  • foreign key (branch-name) references branch,
  • check (balance ?0))
  • Create table depositor
  • (cust-name char(20) not null,
  • account-no char(10) not null,
  • primary key (cust-name),
  • foreign key (cust-name) references customer,
  • foreign key (account-no) references account)

59
Referential Integrity and Database Modifications
  • Database modifications may violate referential
    integrity
  • Insertion inserting a value into the referencing
    relation that is not in the referenced relation
  • Deletion deleting the last example of a given
    value in the referenced relation and leaving that
    value in the referencing one
  • proper handling may lead to cascading deletions
  • Update to the referencing relation (constraints
    as Insertion)
  • Update to the referenced relation (constraints as
    Deletion)

60
Assertions
  • An assertion is an arbitrary expression that the
    database must always satisfy
  • e.g., student GPA 2.8, or sum(all-charges) credit-line
  • Domain constraints and referential integrity
    constraints are special forms of assertion that
    are easy to test
  • SQL supports assertions as follows
  • create assertion check
  • When an assertion is made the system checks it
    for validity. If it is validated, every future
    modification of the database is checked against
    the assertion and allowed only if it is not
    violated.
  • This can be very expensive if assertions are
    complex or numerous

61
Triggers
  • A trigger is a statement that the system executes
    automatically as a side effect of an update to
    the database.
  • A trigger has two parts
  • condition under which it is executed
  • actions to be taken if it is executed
  • Example instead of having an assertion balance
    ?0 for a checking account, use a trigger on
    negative balances that sets the balance to zero
    and creates a new loan for the amount of the
    overdraft
  • Triggers make the system reactive
  • Triggers are also called active rules
  • Like Assertions, Triggers can be very expensive.

62
Trigger Example
  • define trigger overdraft on update of account T
  • (if new T.balance values
  • (T.branch.name, T.account-number,
  • T-customer-name, - new T.balance)
  • update deposit S
  • set S.balance 0
  • where S.account-number T.account-number))
  • (note SQL syntax given here is slightly
    different from that in the text, p235)

63
SQL (Structured Query Language)(Astrahan, Gray,
Lindsay, Selinger, )
  • Most common and influential commercial query
    language well established as the industry
    standard query language for relational databases
  • Developed (as Sequel) at the IBM Research Lab
    in San Jose in the early 70s
  • Four basic commands
  • select
  • insert
  • delete
  • update
  • Result of each query is a relation

64
SQL Example
  • select e.name
  • from emp e
  • where e.age 30
  • e is a tuple variable ranging over the emp
    relation
  • a tuple variable followed by a . and an
    attribute is an indexed tuple variable and
    specifies the corresponding attribute of the
    tuple, very similarly to in many programming
    languages
  • what follows the keyword select is the target
    list
  • what follows from is the tuple variable list and
    consists of a list of relations and variable
    names
  • what follows where is the qualification clause
    an arbitrary boolean expression

65
SQL
  • Basic format of the select command
  • select distinct target_list
  • from tuple_variable_list
  • where qualification
  • order by target_list_subset
  • Semantics
  • evaluate qualification select the subset of the
    cartesian product of the ranges of the tuple
    variables that satisfy the qualification
  • evaluate target list eliminate columns that are
    not in the target list
  • prepare the result as a relation with columns
    according to the target list
  • if distinct is used, eliminate duplicate tuples
  • if order by is used, sort the result accordingly

66
SQL some example queries
  • We will give a number of simple query examples
    using the following relational schema
  • sailors(sid, sname, rating)
  • boats(bid, bname, colour)
  • reserve(sid, bid, date)
  • (1) Find the names of sailors who have reserved
    boat 2
  • select s.sname
  • from sailors s, reserve r
  • where s.sidr.sid and r.bid2

67
SQL example queries (2)
  • (2) Find the names of sailors who have reserved a
    red boat
  • select s.sname
  • from sailors s, reserve r, boats b
  • where s.sidr.sid and r.bidb.bid and
    b.colourred
  • (3) Find the colours of all boats reserved by Pat
  • select b.colour
  • from sailors s, reserve r, boats b
  • where s.snamePat and s.sidr.sid and
    r.bidb.bid

68
SQL example queries (3)
  • (4) Find the names of sailors who have reserved
    at least one boat
  • select s.sname
  • from sailors s, reserve r
  • where s.sidr.sid
  • (5) Find the names of sailors who have reserved a
    red or a green boat
  • select s.sname
  • from sailors s, reserve r, boats b
  • where s.sidr.sid and r.bidb.bid and
  • (b.colourred or b.colourgreen)

69
SQL example queries (4)
  • (6) Find the names of sailors who have reserved a
    red and a green boat
  • select s.sname
  • from sailors s, reserve r, boats b, reserve r2,
    boats b2
  • where s.sidr.sid and r.bidb.bid and
    b.colourred
  • and s.sidr2.sid and r2.bidb2.bid and
  • b2.colourgreen)
  • Note in the above query if sailor Pat has
    reserved one green boat and two red ones, the
    name Pat will appear twice in the results. To
    avoid that, use the keyword distinct in the
    select line, as in
  • select distinct s.sname

70
SQL
  • Basic format of the select command
  • select distinct target_list
  • from tuple_variable_list
  • where qualification
  • order by target_list_subset
  • Simple query examples use this relational schema
  • sailors(sid, sname, rating)
  • boats(bid, bname, colour)
  • reserve(sid, bid, date)

71
SQL target list
  • is an abbreviation for all attributes in the
    from list
  • select
  • from sailors s
  • where order by s.rating
  • Each item in the target list can be as general as
    attribute_name expression, where the expression
    is any arithmetic or string expression over
    indexed tuple variables and constants. It can
    also contain some built-in functions like sqrt,
    sin, mod, etc. as well as aggregates (coming up
    later)

72
SQL target list expression example
  • With rating an integer from 1 to 10, this query
    gives a rating bonus to sailors who sailed two
    different boats on the same day.
  • select s.sid, s.sname, ratings.rating2
  • from sailors s, reserve r, reserve r2
  • where s.sidr.sid and s.sidr2.sid and
    r.dater2.date and
  • r.bid ! r2.bid
  • Whats wrong with the above?
  • What happens if s.rating 9 before this query?
  • Domain constraints might take care of this, but
    we need to be careful

73
SQL
  • Qualifications each item in a qualification
    (where clause) can be as general as
    expressionexpression
  • Example
  • select name1 s1.sname, name2 s2.sname
  • from sailors s1, sailors s2
  • where 2s1.rating s2.rating-1

74
SQL
  • Further elaboration
  • tuple variables can be implicit if the system can
    figure out which relation each attribute belongs
    to
  • table names can be used as tuple variables
  • Example find names, ages, and departments of
    employees who are over 40 and work on the first
    floor.
  • select ename, age, emp.dname
  • from emp, dept
  • where age40 and floor1 and emp.dnamedept.dname

75
SQL
  • SQL provides set operators union, intersect, and
    minus
  • Example find the names of employees who work in
    the toy department and make at most 60K
  • (select ename
  • from emp
  • where dnametoy)
  • minus
  • (select ename
  • from emp
  • where sal60K)

76
SQL
  • Note that it is usually possible to phrase a
    single query in multiple ways. The previous
    query could have also been written
  • (select ename
  • from emp
  • where dnametoy)
  • intersect
  • (select ename
  • from emp
  • where sal?60K)

77
SQL
  • Or also (even simpler)
  • select ename
  • from emp
  • where dnametoy and sal?60K
  • Writing a query in different ways will usually
    change how efficient the query is -- the above
    query is very likely to be faster than the
    example using intersects, and that one is likely
    to be faster than the one using minus.

78
SQL
  • SQL also provides set operators contains (a set
    being a superset of another) and exists (a set
    not being empty). Both return Boolean results,
    so may be negated (using not).

79
SQL
  • Example find the names of employees who manage
    all the departments on the first floor.
  • select mgr
  • from dept d1
  • where (select d2.dname
  • from dept d2
  • where d1.mgrd2.mgr)
  • contains
  • (select dname
  • from dept
  • where floor1)

80
SQL
  • SQL allows nested queries using the keyword in
  • Example find the names of employees who work on
    the first floor.
  • select ename
  • from emp
  • where dname in
  • (select dname
  • from dept
  • where floor 1)
  • The same query in flat form is
  • select dname
  • from emp, dept
  • where emp.dnamedept.dname and floor1

81
SQL
  • The connective in tests for set membership.
    Similar connectives are
  • not in (set non membership)
  • op any (op relationship with some tuple in the
    set)
  • op all (op relationship with all tuples in the
    set)
  • where op is one of (, !, , )
  • Example find the names of employees who make
    more than everybody on the first floor.
  • select ename
  • from emp
  • where sal all
  • (select sal
  • from emp, dept
  • where emp.dnamedept.dname and floor 1)

82
SQL
  • Scoping of variables works exactly as in Pascal
    or C
  • Example find the names of students who take a
    course from their advisor.
  • select sname
  • from student
  • where s in
  • (select s
  • from enroll
  • where c in
  • (select c
  • from class
  • where profstudent.advisor))

83
Recap SQL
  • Four basic commands
  • select
  • insert
  • delete
  • update

84
SQL Insert
  • Insert command format
  • insert into relation_name values (value_list)
  • or
  • insert into relation_name select_statement
  • Semantics of insert
  • format one add the tuple corresponding to
    value_list into relation_name
  • format two execute the select statement, then
    add all the resulting tuples into relation_name
  • Example
  • insert into student values (1, Carey, CS,
    Stonebraker)

85
SQL Insert
  • Example relation register(S, name, paid)
  • in which registered students are recorded. After
    the end of registration week, we execute
  • insert into student
  • select r.s, r.name
  • from register r
  • where r.paidyes

86
SQL Delete
  • Delete command format
  • delete relation_name where qualification
  • Semantics of delete execute the corresponding
    select command
  • select full_target_list (or )
  • from relation_name
  • where qualification
  • and then remove the resulting tuples from
    relation_name

87
SQL Delete
  • Example with the following schema
  • student(s, name, major, advisor)
  • enroll(s, c, grade)
  • course(c, dept)
  • The following command expels CS majors who
    received a grade of less than 2.5 in a CS course
  • delete student
  • where majorCS and s in
  • (select s
  • from enroll, course
  • where enroll.sstudent.s and grade
  • and enroll.ccourse.c and deptCS)

88
SQL Update
  • Update format
  • update relation_name
  • set target_list
  • where qualification
  • Semantics of update it is equivalent to
    executing
  • insert into del_temp
  • select
  • from relation_name
  • where qualification

89
SQL Update
  • Semantics of update (cont) then executing
  • insert into app_temp
  • select ext_target_list
  • from relation_name
  • where qualification
  • delete the tuples in del_temp from relation_name
  • add the tuples in app_temp to relation_name

Ext_target_list is identical to target_list in
the original update command, but augmented with
tuple_variable.attribute_name for all attributes
of the range of tuple_variable that dont appear
in target_list.
90
SQL Update
  • Example give a 10 grade raise to every CS major
    in CS564
  • update enroll
  • set grade1.1grade
  • where cCS564 and s in
  • (select s
  • from student
  • where majorCS)

91
SQL Update
  • Which is equivalent to
  • insert into del_temp
  • select s, c, grade
  • from enroll
  • where cCS564 and s in
  • (select s
  • from student
  • where majorCS)
  • insert into app_temp
  • select s, c, grade1.1grade
  • from enroll
  • where cCS564 and s in
  • (select s
  • from student
  • where majorCS)

92
SQL Aggregates
  • Aggregate functions are functions that take a
    collection of values as input and return a single
    value. SQL supports five built-in aggregate
    functions
  • average avg
  • minimum min
  • maximum max
  • total sum
  • cardinality count
  • using distinct to aggregate only unique values is
    often important with avg, sum, and count

93
SQL Aggregates
  • Example find the number of students
  • select num_of_students count(s)
  • from student
  • why do we not need to use distinct in this
    example?
  • Example find the number of employee records
  • select count ()
  • from emp
  • if an employee appears more than once in the emp
    relation, for example if he had switched jobs or
    had two jobs, then this command would count that
    employee once for each record

94
SQL Aggregates
  • Qualified Aggregates
  • Example find the average age of employees in the
    toy department
  • select avg(age)
  • from emp
  • where dnametoy

95
SQL Group By clause
  • Group aggregates groups of tuples are computed
    using the group by clause
  • the attributes given in the clause are used to
    form groups
  • typles with the same value on all attributes in
    the clause are placed in one group
  • Example in each department, find the minimum age
    of employees who make more than 50K
  • select dname, min(age)
  • from emp
  • where sal50K
  • group by dname

96
SQL Having clause
  • Sometimes it is useful to state a condition that
    applies to groups in group by rather than to
    tuples. We do that in SQL with the having
    clause. SQL applies predicates of having after
    groups have been formed.
  • Example find the average salary for employees
    under 30 for each department having more than 10
    such employees
  • select dname, avg(sal)
  • from emp
  • where age
  • group by dname
  • having count()10

97
SQL Multiple Group Bys
  • Example using relation emp(ss, ename, dept,
    cat, sal)
  • Count the employees and average monthly salary
    for each employee category in each department
  • select dept, cat, count(), avg(sal)/12
  • from emp
  • group by dept, cat

98
SQL Multiple Group Bys
  • Select from emp
  • group by cat
  • Select from emp
  • group by dept

99
SQL Multiple Group By
  • Select from emp
  • group by dname, cat
  • note that some dname/cat groups are empty.

100
SQL Examples on Having
  • Find the average salary of employees under 30 for
    each department with more than 10 such employees
  • select dname, avg(sal)
  • from emp
  • where age
  • group by dname (group by department)
  • having 10 10)

101
SQL Examples on Having
  • Find the average salary of employees under 30 for
    each department with more than 10 employees
  • select e.dname, avg(e.sal)
  • from emp e
  • where e.age
  • group by e.dname (group by department)
  • having 10
  • (select count(ee.ename) (number of employees
    in group)
  • from emp ee
  • where ee.dnamee.dname) ( from the same dept
    as e)
  • (why is this query different from the previous
    one?)

102
SQL Examples on Having
  • Find categories of employees whose average salary
    exceeds that of programmers
  • select cat, avg(sal)
  • from emp
  • group by cat
  • having avg(sal) (select avg(sal)
  • from emp
  • where catprogrammer)

103
SQL Examples on Having
  • Find all departments with at least two clerks
  • select dname
  • from emp
  • where jobclerk
  • group by dname
  • having count() 2

104
SQL Examples
  • Find the names of sailors with the highest rating
  • select sname
  • from sailors
  • where rating (select max(rating)
  • from sailors)

105
SQL Examples
  • For each boat, find the number of sailors of
    rating 7 who have reserved this boat
  • select bid, bname, count(s.sid)
  • from sailors s, boats b, reserve r
  • where s.sidr.sid and r.bidb.bid and rating7
  • group by b.bid

106
SQL Examples
  • For each red boat, find the number of sailors who
    have reserved this boat
  • select bid, bname, count(s.sid)
  • from sailors s, boats b, reserve r
  • where s.sidr.sid and r.bidb.bid
  • group by b.bid
  • having colourred

107
SQL Examples
  • Difference between the last two queries?
  • First one gave a qualification on the tuples
  • (take all tuples of the multijoin
  • discard tuples that do not fulfill ratings7
  • then group them by boat id
  • then find the cardinality of each group)
  • Second one gave a qualification for the groups
  • (take all tuples of the multijoin
  • group them by boat id
  • discard groups representing boats that are
    non-red
  • find the cardinality of remaining groups)

108
And Now, For SomethingCompletely Different...
  • The recent SQL material largely covers chapter 4,
    at least sections 4.1 through 4.6 and some of
    4.9.
  • Earlier we examined Relational Algebra, covering
    sections 3.1 through 3.3
  • Now we leave chapter 4 and head back to examine
    sections 3.6 and 3.7, covering Relational Calculi
  • based upon predicate calculus
  • non-procedural query languages (descriptive
    rather than prescriptive)
  • we will examine two relational calculi tuple
    calculus and domain calculus

109
Tuple Calculus
  • Query t P(t) P is a predicate associated
    with some relation R
  • t is a tuple variable ranging over the
    relation R
  • tA is the value of attribute A in tuple t
  • students in CMSC 424
  • t t ? enroll ? tcourse CMSC424
  • students in CMSC 424 conforming with the
    CMSC-420 prerequisite
  • t t ? enroll ? ? s ? enroll ? tcourse
    CMSC424 ?
  • scourse CMSC420 ? tss sss

110
Tuple Calculus
  • Quantifiers and free variables
  • ?, ? quantify the variables following them,
    binding them to some value. (in the previous
    slide, s was bound by ?)
  • A tuple variable that is not quantified by ? or ?
    is called a free variable. (in the previous
    slide, t was a free variable)
  • Atoms
  • R(t) where t is a tuple variable
  • tx ? sy where t,s are tuple variables and
  • ? ? ?, ?, ?, ?, ?, ?

111
Tuple Calculus
  • Formulas
  • an Atom is a Formula
  • If P and Q are Formulas, then so are (P), ?P,
    P?Q, P?Q, and P?Q
  • If P(t) is a Formula, then so are ?t P(t) and ?t
    P(t)
  • Equivalences
  • ?(P ? Q) ? ?P ? ? Q
  • ?(P ? Q) ? ?P ? ? Q
  • ?t P(t) ? ? (?t (? P(t)))
  • ? t P(t) ? ? (? t (? P(t)))

112
Tuple Calculus
  • Safety
  • Math is too powerful we can easily phrase
    expressions that describe infinite sets
  • t t ? enroll
  • These expressions are called unsafe
  • When we are dealing with finite sets, unsafe
    expressions happen in expressions that involve
    negation (?)
  • We can avoid this problem by using an entirely
    positive (non-negated) scope as the first operand
    of any conjunction where we use negation. The
    first operand establishes the scope and the
    second one filters the established scope.
  • t t ? enroll ? tcourse ? CMSC-420

113
Domain Calculus
  • Another form of relational calculus
  • Uses domain variables that take values from an
    attributes domain, rather than values
    representing an entire tuple
  • Closely related to tuple calculus
  • Domain Calculus serves as the theoretical basis
    for the query language QBE, just as the
    relational algebra we examined earlier forms the
    basis for SQL
  • Expressions are of the form
  • P( x1, x2, x3, ..., xn)

114
Domain Calculus
  • Atoms
  • ? R
  • x ? y where x,y are domain variables and
  • ? ? ?, ?, ?, ?, ?, ?
  • x ? c where c is a constant
  • Formulas
  • an atom is a formula
  • If P and Q are formulas, then so are (P), ?P,
    P?Q, P?Q, and P?Q
  • If P(x) is a formula and x is a domain variable,
    then ?x P(x) and ?x P(x) are also formulas

115
Domain Calculus
  • Queries are of the form
  • P( x1, x2, x3, ..., xn)
  • Examples
  • Enroll(ss, course,
    semester)
  • Enroll(x, y, z) ? y CMSC-424

116
Reductions of Relational Algebra and Calculi
  • Relational Algebra (sections 3.2-3.5), Tuple
    Calculus (section 3.6), and Domain Calculus
    (section 3.7) can be reduced to each other they
    have equivalent expressive power. For every
    expression in one, we can compute an equivalent
    expression in the others.

117
Functional Dependencies
  • Important concept in differentiating good
    database designs from bad ones
  • FD is a generalization of the notion of keys
  • An FD is a set of attributes whose values
    uniquely determine the values of the remaining
    attributes.
  • Emp(eno, ename, sal) key FDs eno ename
  • Dept(dno, dname, floor) eno sal
  • Works-in(eno, dno, hours) (eno,dno) hours
  • dno dname
  • dno floor

118
Functional Dependencies
  • If ? ? R and ? ? R, then ? ? holds in the
    extension r(R) of R iff for any pair t1 and t2
    tuples of r(R) such that t1(?)t2(?) , then it is
    also true that t1(?) t2(?)
  • We can use FDs as
  • constraints we wish to enforce (e.g., keys)
  • for checking to see if the FDs are satisfied
    within the database
  • R( A B C D)
  • 1 1 1 1 A B satisfied? no
  • 1 2 1 2 A C satisfied? yes
  • 2 2 2 2 C A satisfied? no
  • 2 3 2 3 AB D satisfied? yes
  • 3 3 2 4

119
Functional Dependencies
  • Trivial dependencies ? ?
  • ? ? if ? ? ?
  • Closure
  • we need to consider all FDs
  • some are implied by others e.g., FDs are
    transitive if AB and BC, then AC
  • Given F set of FDs, we want to find F (the
    closure of all FDs logically implied by F)

120
Armstrongs Axioms
  • Reflexivity if ? ? ? then ? ?
  • Augmentation if ? ? then ?? ? ??
  • Transitivity if ? ? and ? ? then ? ?
  • Armstrongs Axiom
Write a Comment
User Comments (0)
About PowerShow.com