Database Management Systems INFS 614-001 Fall 08 - PowerPoint PPT Presentation

About This Presentation
Title:

Database Management Systems INFS 614-001 Fall 08

Description:

Hard to extract information (print Christmas cards) ... 1 Christmas card per residence! ... Can't send Christmas card to somebody without an address ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 51
Provided by: csG6
Learn more at: http://cs.gmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Database Management Systems INFS 614-001 Fall 08


1
Database Management SystemsINFS 614-001Fall 08
Lecture One - Introduction
  • Instructor Carlotta Domeniconicarlotta_at_cs.gmu.ed
    u
  • http//www.cs.gmu.edu/carlotta/teaching/INFS-614-
    s08/info.html

2
Outline
  • Course syllabus
  • Course Schedule
  • Homeworks exams
  • Satisfaction of prerequisites
  • Strictly enforced GMU HONOR CODE applies!
  • Introduction to DB DBMS
  • Outline of the entire course material

3
Front matters
  • To communicate with me
  • Email carlotta_at_cs.gmu.edu, I will try to reply
    promptly.
  • Office hours by appointment.
  • Sign up for your Mason account. You may forward
    all your Mason emails to your favorite email
    address.

4
Front matters (cont.)
  • GTA
  • Huaming Liu and Chun-Kit Ngan
  • Email hliu5_at_gmu.edu
  • Office hours Tuesday, 4-6pm (Huaming)
  • Room 330 -ST2.

5
Front matters (cont.)
  • Required textbooks
  • Database Management Systems, 3rd ed. by Raghu
    Ramakrishnan Johannes Gehrke, McGraw-Hill.
  • Oracle 9i Programming a Primer, by Rajshkhar
    Sunderraman, Addison Wesley, ISBN 0-321-19498-5
  • On-Line Course Resources
  • http//www.cs.gmu.edu/carlotta/teaching/INFS-614-
    f08/info.html
  • You are required to read all the material there.
    The content will be updated frequently. So check
    the web site periodically, at least once every
    week, and every time before class!

6
Date Topic (chapter/section) HW assignment HW due
Aug 28 Introduction (chapter 1)    
Sep 4 ER model (chapter 2) 1  
Sep 11 Relational Model (chapter 3)     
Sep 18 Relational Algebra (sections 4.1-4.2) 2 1
Sep 25 Relational Algebra (continued)    
Oct 2 SQL (sections 3.4, 5.1-5.5) 3 2
Oct 9 Review    3  
Oct 16 Midterm Exam
Oct 23 SQL (sections 5.6-5.15) 4
Oct 30 Functional Dependencies  
Nov 6 Functional Dependencies (continued)  5 4 
Nov 13 Decomposition and Normal Forms   
Nov 20 Advanced topics (or catch up)   5 
Dec 4 Review
Dec 11 Final Exam    
7
Submission and Grading
  • Late submissions are not accepted
  • no exceptions!
  • No make-up exams!
  • On-time within 5 minutes after the class begins.
  • Important your homeworks must run properly under
    the Oracle system in the labs.
  • Final grades
  • homework assignments (20)
  • Project (15)
  • midterm exam (25)
  • final exam (40)

8
Honor Code System
  • GMU honor Codehttp//honorcode.gmu.edu/
  • For this class
  • Homeworks exams require individual work. Study
    groups are encouraged, but homeworks solutions
    and write up must be individual.
  • Exams individual effort, closed books
  • Satisfaction of prerequisites
  • Honor code invoked.

9
Satisfaction of prerequisites
  • Prerequisites (strictly enforced)
  • INFS-501 (Discrete mathematics)
  • INFS-515 (Computer architecture/organization)
  • INFS-590 (Program Design and data structures)
  • Specifically
  • Good background in discrete mathematics (e.g.,
    set theory, mathematical logic, relations and
    functions)
  • Programming (good knowledge of either C, C or
    Java)
  • Data structures and algorithms, Computer
    architecture, and Operating systems.

10
Satisfaction of prerequisites
  • For INFS/SWE/ISA students
  • Consult your letter of acceptance. It specifies
    your status with respect to these foundation
    courses. For each course, it must be that either
  • You were waved from the course (the evidence
    should be either in the acceptance letter or in a
    subsequent official document).
  • You took the course and received a grade of B or
    better.

11
Satisfaction of prerequisites
  • For non-IS/SWE/ISA students, MUST DO THE
    FOLLOWING (by next week)
  • Consult the description of each of the
    prerequisite courses in the university catalog.
  • For each course, provide a list of one or more
    courses taken, that cover the subjects of that
    course, as follows course-number, course-title,
    institution, year, final-grade
  • Syllabus of each course taken
  • Copy of transcripts that shows equivalent classes
    taken (with grade B or better)
  • Current status

12
Useful links for your computing needs
  • http//labs.ite.gmu.edu/ (click on FAQs) for
    ITE computing labs, ITE cluster account, and
    Oracle DBMS information.
  • http//cs.gmu.edu/ami/teaching/infs614/current/or
    acle.html
  • for information on our particular computing
    environment.

13
Introduction to DB and DBMS
14
What is a Database?
  • Database A very large, integrated collection of
    data.
  • Data Known facts about the real-world that can
    be recorded and have implicit meaning
  • A database models real-world scenarios
  • Entities
  • Relationships between entities

15
University Database
  • Information about university environment
  • Relationships
  • Students enrollment course
  • Faculty teaching courses
  • Use of classroom for course
  • Prerequisite courses
  • Entities
  • Students
  • Faculty
  • Courses
  • Classrooms

16
What is a DBMS?
  • A Database Management System (DBMS) is a software
    package designed to store, provide access and
    manage databases
  • One DBMS, many databases
  • Database System A database and a DBMS.
  • Why use a DBMS?

17
A simple problem address list
  • Solution 1 a blank notebook, entries recorded
    with a pen, in time order.
  • Advantages simple, private, reliable, space
    efficient.
  • Disadvantages
  • Hard to search
  • Hard to add information (e.g., e-mail)
  • Hard to update information
  • Hard to extract information (print Christmas
    cards)
  • Integrity and consistency (Mary Jones see P.
    Jones address, P. Jones-Smith entry)
  • Loosing it is a catastrophe!

18
Solution 2 a loose-leaf notebook with n entries
per page
  • Better
  • Can keep it sorted by key
  • Insertions deletions can be done
  • Same as Sol. 1 in other aspects
  • No search by other keys (e.g., phone number).

19
Solution 3 Text format,managed by text editor
  • Advantages
  • Free format
  • Unlimited size
  • Easily copied (for backup)
  • Easily shared
  • Sub-string searchable
  • Easy Update.
  • Disadvantages
  • Change requirements?

20
Complications with Solution 3
  • File gets very large
  • Search gets slow and possibly imprecise.
  • (E.g., search for Elm Street may yield
  • Wilhelm Street)
  • Solution structure entries into records with
    fields and add indexes over fields.

21
Complication 2 need to separate families from
addresses
  • Why?
  • People move
  • Might forget to update addresses
  • Want space economy single point of update
  • Important to separate for applications 1
    Christmas card per residence!
  • Solution two files (one people, one
    addresses). How do we link them? How many
    residences a person can have?

22
Complication 3 multiple association
  • People own, rent, manage residences
  • May want to impose constraints in the number of
    residences per person or vice versa.
  • Examples Many to many (rich people) Many to one
    (single family) One to many (Builder) One to
    one (legal residence)

23
Complication 4 dynamic nature of the data
  • Add new information
  • Cards sent and received
  • Zip4
  • Requirements
  • Adding fields
  • Summarizing

24
Complication 5 Ad hoc analysis and retrieval
  • Example
  • Find who sent me cards over the past 5 years, but
    received less than 3 cards from me.
  • Requirements
  • A language
  • An implementation of retrieval functions (correct
    and efficient).

25
Complication 6 Sharing
  • Different users, different organizations
  • Other family members want to see names and
    addresses together
  • You dont want to give update access over your
    business contacts to anybody.
  • Solutions
  • Use stored queries as windows or views over
    the database.
  • Ability to reunite data from different files.
  • Data not selected by the query is not there
  • Permissions

26
Complication 7 Required existence of associated
data
  • Examples
  • Cant send Christmas card to somebody without an
    address
  • Names are not unique only when associated with
    residence.
  • Solution
  • Dont insert a name if there is no address and
    vice versa
  • Or tolerate multiple non-unique names

27
Complication 8 Multiple updates in an all or
none basis
  • Examples
  • Two households merge (marriage)
  • Need to change residences (or other data) for a
    group of people
  • Computer crashes in the middle of updates
  • Solution
  • Illusion of updates being done simultaneously
  • Commit or rollback an entire chunk of work

28
Complication 9 computer crashes
  • Will I have my data after the crash?
  • Uncorrupted?
  • Consistent?
  • Solution
  • Make sure data is available uncorrupted at a
    point in the past (checkpoint)

29
Complication 10 multimedia
  • Pictures, Audio, Text,
  • Requirements
  • Ability to store new data types
  • Content search
  • Integration with text and numeric data

30
Complication 11 You become President!
  • Of something (US, Corporation, Local chapter of
    charity, your household)
  • Your address list grows exponentially
  • You realize some of the information is useful!
  • Examples
  • zip codes in states where there are less than 5
    difference in Rep./Dem. Votes in 2004?
  • Which combinations of products sold best last
    year?

31
Files vs. DBMS
  • Application must store large datasets between
    main memory and secondary storage (e.g.,
    buffering, page-oriented access, etc.)
  • Special code to answer different queries
  • Must protect data from inconsistency due to
    multiple concurrent users
  • Crash recovery
  • Security and access control.

32
Why Use a DBMS?
  • Easier and More Efficient
  • Data independence and efficient access
  • Reduced application development time
  • Data integrity and security
  • Uniform data administration
  • Concurrent access, recovery from crashes.

33
Data Models
  • A data model is a collection of concepts for
    describing data.
  • A schema is a description of a particular
    collection of data, using a given data model.
  • The relational model of data is the most widely
    used model today.
  • Main concept relation, basically a table with
    rows and columns.
  • Every relation has a schema, which describes the
    columns, or fields.

34
Relational Model
  • The main concept is a relation
  • A table with rows and columns
  • Each row in the table is called a tuple

35
Relational Model (cont.)
  • The relation schema specifies
  • name of the relation,
  • name of each attribute (column,field) and its
    type. Every attribute has an atomic type.
  • Relation Name
  • Student(sidstring, loginstring, ageinteger,
    gpareal)
  • Attribute Name
  • A Relation (Relation instance) a set of
    tuples.

36
Levels of Abstraction
  • Many views, single conceptual (logical) schema
    and physical schema.
  • Views describe how users see the data.
  • Conceptual schema defines logical structure
  • Physical schema describes the files and indexes
    used.

View 1
View 2
View 3
Conceptual Schema
Physical Schema
  • Schemas are defined using DDL (Data Definition
    Language)
  • Data is modified/queried using DML (Data
    Manipulation Language).

37
Levels of Abstraction
  • Conceptual Schema the data is described
    through the data model. It describes structure
    and constraints for the whole database.
  • External Schema how the users see and use the
    data. Many views of the data.
  • Physical schema describes the physical
    structure of the DB
  • Mappings among schema levels are also needed.
    Programs and applications refer to an external
    schema, and are mapped by the DBMS to the
    conceptual schema for execution.

Conceptual, External Schemas are defined using
Data Definition Language (DDL) specification
for defining the database schema
38
Example University Database
  • Conceptual schema
  • Student (sid string, name string, login
    string, age integer, gpa real)
  • Courses (cid string, cname string, credits
    integer)
  • Enrolled (sid string, cid string, grade
    string)
  • Physical schema
  • Relations stored as unordered files.
  • Index on first column of Students
  • External schema (View)
  • Course_info (cid string, enrollment integer)

39
Data Independence
  • Applications insulated from how data is
    structured and stored.
  • Logical data independence Protection from
    changes in logical structure of data.
  • Physical data independence Protection from
    changes in physical structure of data.
  • One of the most important benefits of using a
    DBMS!

40
Easy Manipulation Access of the Information
  • A DBMS enables users to create , update and query
    the data through Data Languages
  • Data Definition Language (DDL)
  • Specification notation to create the Database
    schema
  • Data Manipulation Language (DML)
  • A language for manipulating the data updating
    the data and accessing the data
  • The portion of a DML that allows to access the
    information through formulating queries is called
    the Query Language
  • Query Request for retrieving data stored in a
    DBMS.

41
Concurrency Control
  • Concurrent execution of user programs is
    essential for good DBMS performance
  • Because disk accesses are frequent, and
    relatively slow, it is important to keep the cpu
    not idle by working on several user programs
    concurrently.
  • Interleaving actions of different user programs
    can lead to inconsistency e.g., check is cleared
    while account balance is being computed.
  • DBMS ensures such problems dont arise users can
    pretend they are using a single-user system.

42
Concurrency Control (cont.)
  • Example
  • One course still has space for one more student.
  • Two students are trying to enroll in that course
    at the same time.
  • The DBMS executes the two requests in a serial
    order.
  • Thus, only one student will be enrolled.

43
Transaction
  • An execution of a DB program
  • Key concept is transaction, which is an atomic
    sequence of database actions (reads/writes).
  • ACID properties
  • A Atomicity
  • C Consistency
  • I Isolation
  • D Durability
  • How log and concurrency control sub-system

44
Scheduling Concurrent Transactions
  • DBMS ensures that execution of T1,, Tn is
    equivalent to some serial execution T1Tn.
  • Before reading/writing an object, a transaction
    requests a lock on the object, and waits till the
    DBMS gives it the lock. All locks are released at
    the end of the transaction. (Strict 2PL locking
    protocol.)
  • Idea If an action of Ti (say, writing X) affects
    Tj (e.g., reads X), one of them, say Ti, will
    obtain the lock on X first and Tj is forced to
    wait until Ti completes this effectively orders
    the transactions.

45
Ensuring Atomicity
  • DBMS ensures atomicity (all-or-nothing property)
    even if system crashes in the middle of a
    transaction.
  • Idea Keep a log (history) of all actions carried
    out by the DBMS while executing a set of
    transactions
  • Before a change is made to the database, the
    corresponding log entry is forced to a safe
    location. (WAL Write-Ahead Log protocol)
  • After a crash, the effects of partially executed
    transactions are undone using the log.

46
The Log
  • The following actions are recorded in the log
  • Ti writes an object the old and new value.
  • Log record must go to disk before the changed
    page!
  • Ti commits/aborts a log record indicating this
    action.
  • Log records chained together by the transaction
    id, so its easy to undo a specific transaction
    (e.g., resolve a deadlock)
  • All log related activities are handled
    transparently by the DBMS.

47
Structure of a DBMS
  • A typical DBMS has a layered architecture
  • Each layer is composed of several modules
  • The architecture varies from vendor to vendor

48
Main cost of using a DBMS
  • High initial investment and possible need for
    additional hardware.
  • Overhead for providing generality, security,
    recovery, integrity and concurrency control.

When a DBMS may be unnecessary
  • If the Database and application are simple,
    well-defined and not expected to change.
  • If there are stringent real-time requirements,
    that may not be met due to DBMS overhead.
  • If access to data by multiple users is not
    required.

49
Database Users
  • End users (or DB application users)
  • DB application programmers (more precisely, they
    are DBMS users)
  • E.g. smart webmasters
  • This course is mostly to learn how to (start to)
    be a DB application programmer.
  • Database administrator (DBA)
  • Designs logical /physical schemas
  • Handles security and authorization
  • Data availability, crash recovery
  • Database tuning as needs evolve

Must understand how a DBMS works!
50
Summary
  • DBMS used to maintain, query large datasets.
  • Benefits include recovery from system crashes,
    concurrent access, quick application development,
    data integrity and security.
  • Levels of abstraction give data independence.
  • We will learn how to
  • Set up a database
  • Design (ERD and Relational Model), and refine
    (Relational Normalization Theory)
  • Use to query the database
  • Relational Algebra and SQL
Write a Comment
User Comments (0)
About PowerShow.com