CS 542 - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

CS 542

Description:

Title: 91.573 Database 1 Author: kajal Last modified by: Kajal Claypool Created Date: 1/28/2002 12:56:46 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 35
Provided by: kajal
Learn more at: http://web.cs.wpi.edu
Category:
Tags: schema

less

Transcript and Presenter's Notes

Title: CS 542


1
CS 542 Database Management Systems Summer 2004
  • Kajal Claypool
  • kajal_at_cs.wpi.edu

2
Administrative Stuff
  • Class web page
  • http//www.cs.wpi.edu/kajal/courses/cs542-S04/ind
    ex.html
  • Collaboration Policy
  • Homework assignments are individual efforts
  • Copying from Web, friends, old assignments, any
    source gt cheating!
  • Copying -gt taking anything verbatim, finding
    the main idea and using it.
  • 1st cheating offense -gt F for assignment
  • 2nd cheating offense -gt F for the course

3
Introduction to Database Systems
  • An Overview of DBMS
  • Chapter 1

4
Outline
  • Purpose of Database Systems
  • View of Data
  • Data Models
  • Data Definition Language
  • Data Manipulation Language
  • Transaction Management
  • Storage Management
  • Database Administrator
  • Database users
  • Overall System Architecture

5
Why study Databases?
  • Databases are everywhere
  • Banking all transactions
  • Airlines reservations, schedules
  • Universities registration, grades
  • Sales customers, products, purchases
  • Manufacturing production, inventory, orders,
    supply chain
  • Human resources employee records, salaries, tax
    deductions
  • DBMS brings together many different CS areas
  • OS, Algorithms, AI, logic, languages, multimedia
  • DBAs make a lot of money!

6
What is a DBMS?
  • DBMS Database Management System
  • Collection of interrelated data
  • Set of sophisticated programs to access that data
  • Database Collection of data
  • Usually information relevant to an enterprise
  • Models the real world
  • Entities (students and courses)
  • Relationship between entities (a student takes a
    course)

7
Why a DBMS?
  • In the early days, database applications were
    built on top of file systems
  • Drawbacks of using file systems to store data
  • Data redundancy and inconsistency
  • Format of one file may be different from format
    of another file!
  • Difficulty in accessing data
  • Need special programs to now generate a list of
    all customers in a particular postal-code area
  • Data isolation
  • Because data may be scattered over various files,
    it becomes hard for the programmer to grab the
    information from all locations

8
Problems (contd.)
  • Integrity problems
  • Integrity constraints (e.g. account balance gt 0)
    become part of program code
  • Hard to add new constraints or change existing
    ones
  • Example Bank decides that savings balance must
    always be greater than 50.
  • Atomicity
  • Certain actions must be treated as one. This is
    hard to do if data is scattered over many files.
  • Example if you withdraw money, the bank must
    withdraw and then update the balance as one
    operation
  • Concurrent Access
  • Often for efficiency you may allow many users to
    access the files at the same time. But you must
    guarantee that my money does not go into your
    account! This is hard with files.
  • Security
  • Not every user should be able to see everything!
    Hard to control with files.

9
A DBMS
  • Database systems offer solutions to all the above
    problems
  • provide uniform structures for storage of
    information
  • provide mechanisms for manipulating the
    information
  • ensure the safety of the information stored
    despite system crashes or attempts at
    unauthorized access
  • avoid anomalous results if data is to be shared
    among many users
  • Their Goal do all of this in a way that is both
    convenient and efficient.

10
Data Storage
11
Information Storage
  • Remember convenience and efficient are two
    requirements for a DBMS system
  • To efficiently retrieve data , data is often
    stored as complex data structures. However, these
    complex data structures adversely effect the
    requirement convenience of database system
    users
  • Solution Data Abstraction
  • Simplify the users interaction with the DBMS

12
Data Abstraction
  • Physical Level
  • Lowest level
  • Describes how data (a record) is actually
    stored
  • Logical Level
  • Next higher level
  • Describes what data are stored in the database,
    and what relationships exist between those data
  • Logical level user does not need to be aware of
    the complexity of the physical level structures
  • View Level
  • Highest level of abstraction
  • Only part of the entire database is visible,
    I.e., information that a user may need to see
  • Also provides a security mechanism
  • Example a teller in a bank does not need to see
    the salaries of all bank employees!

13
View of Data
An architecture for a database system
14
Data Abstraction (contd.)
  • Example a record in Pascal
  • type customer record name string street
    string city integerend
  • Physical level
  • Stored as contiguous blocks
  • Logical level
  • As shown above
  • View level
  • A subset
  • type customer record name string
  • end

15
Instances and Schemas
  • Concept is similar to types and variables in
    programming languages
  • Schema the logical structure of the database
  • e.g., the database consists of information about
    a set of customers and accounts and the
    relationship between them
  • Analogous to type information of a variable in a
    program
  • Physical schema database design at the physical
    level
  • Logical schema database design at the logical
    level
  • Instance the actual content of the database at
    a particular point in time
  • Analogous to the value of a variable
  • Physical Data Independence the ability to
    modify the physical schema without changing the
    logical schema
  • Applications depend on the logical schema
  • In general, the interfaces between the various
    levels and components should be well defined so
    that changes in some parts do not seriously
    influence others.

16
Data Models
  • A collection of concepts for describing
  • data
  • data relationships
  • data semantics
  • data constraints
  • Examples of Data Models
  • Entity-Relationship model
  • Relational model
  • Object-oriented model
  • Semi-structured data models (XML)
  • Older models network model and hierarchical model

17
Entity-Relationship Model
  • E-R model of real world
  • Entities (objects)
  • E.g. customers, accounts, bank branch
  • Relationships between entities
  • E.g. Account A-101 is held by customer Johnson
  • Relationship set depositor associates customers
    with accounts
  • Widely used for database design
  • Database design in E-R model usually converted to
    design in the relational model (coming up next)
    which is used for storage and processing

18
Relational Model
Attributes
  • Example of tabular data in the relational model

customer- street
customer- city
account- number
customer- name
Customer-id
Johnson Smith Johnson Jones Smith
192-83-7465 019-28-3746 192-83-7465 321-12-3123
019-28-3746
Alma North Alma Main North
Palo Alto Rye Palo Alto Harrison Rye
A-101 A-215 A-201 A-217 A-201
19
A Sample Relational Database
20
Data Manipulation
21
Data Definition Language (DDL)
  • Specification notation for defining the database
    schema
  • E.g. create table account (
    account-number char(10), balance
    integer)
  • DDL compiler generates a set of tables stored in
    a data dictionary
  • Data dictionary contains metadata (i.e., data
    about data)
  • e.g. schema of a table is metadata
  • Data storage and definition language
  • language in which the storage structure and
    access methods used by the database system are
    specified
  • Usually an extension of the data definition
    language

22
Data Manipulation Language (DML)
  • Language for accessing and manipulating the data
    organized by the appropriate data model
  • DML also known as query language
  • It can
  • Retrieve information stored in the database
  • Insert new information
  • Delete existing information
  • Modify existing information
  • Two classes of languages
  • Procedural
  • user specifies what data is required and how to
    get those data
  • Nonprocedural (or Declarative)
  • user specifies what data is required without
    specifying how to get those data
  • The portion of DML that is involved with
    information retrieval is called a query language
  • SQL is the most widely used query language
  • The DML component of SQL is nonprocedural

23
SQL
  • SQL widely used non-procedural language
  • E.g. find the name of the customer with
    customer-id 192-83-7465 select
    customer.customer-name from
    customer where customer.customer-id
    192-83-7465
  • E.g. find the balances of all accounts held by
    the customer with customer-id 192-83-7465 select
    account.balance from depositor,
    account where depositor.customer-id
    192-83-7465 and
    depositor.account-number account.account-number
  • Application programs generally access databases
    through one of
  • Language extensions to allow embedded SQL
  • Application program interface (e.g. ODBC/JDBC)
    which allow SQL queries to be sent to a database

24
Safety
25
Database Users
  • Users are differentiated by the way they expect
    to interact with the system
  • Application programmers
  • interact with system through DML calls
  • Sophisticated users
  • form requests in a database query language
  • Specialized users
  • write specialized database applications that do
    not fit into the traditional data processing
    framework
  • Naïve users
  • invoke one of the permanent application programs
    that have been written previously
  • E.g. people accessing database over the web, bank
    tellers, clerical staff

26
Database Administrator
  • Coordinates all the activities of the database
    system the database administrator has a good
    understanding of the enterprises information
    resources and needs.
  • Database administrator's duties include
  • Schema definition
  • Storage structure and access method definition
  • Schema and physical organization modification
  • Granting user authority to access the database
  • Specifying integrity constraints
  • Acting as liaison with users
  • Monitoring performance and responding to changes
    in requirements

27
Safe Sharing
28
Concurrency Control
  • Concurrent execution of user programs is
    essential for good DBMS performance.
  • Because disk accesses are frequent, and
    relatively slow, it is important to keep the cpu
    humming by working on several user programs
    concurrently
  • Interleaving actions of different user programs
    can lead to inconsistency e.g., check is cleared
    while account balance is being computed
  • DBMS ensures that such problems do not arise.
    Users can pretend that they are using a
    single-user system.

29
Transaction An Execution of a DB Program
  • Key concept is transaction
  • An atomic sequence of database actions
    (read/write)
  • Each transaction, executed completely, must leave
    the database in a consistent state if the DB is
    consistent when the transaction begins.
  • Users can specify some simple integrity
    constraints on the data, and the DBMS will
    enforce these constraints
  • Beyond this, the DBMS does not really understand
    the semantics of your data (eg. It does not
    understand how the interest is calculated on the
    account balance)
  • Thus, ensuring that the transaction (run alone)
    preserves consistency is ultimately the users
    responsibility!

30
Scheduling Concurrent Transactions
  • DBMS ensures that the execution of T1, T2.Tn
    is equivalent to some serial execution of T1,
    T2Tn
  • Before reading/writing an object, a transaction
    requests a lock on the object, and waits till the
    DBMS gives it the lock. All locks are released at
    the end of the transaction.
  • This is called Strict 2PL locking protocol
  • Main Idea
  • If an action of Ti (say writing X) affects Tj
    (which perhaps reads X), one of them say Ti will
    obtain the lock first and Tj is forced to wait
    until Ti completes this effectively orders the
    transactions
  • What if Tj already has a lock on Y and Ti
    requests Y?

31
Ensuring Atomicity
  • DBMS ensures atomicity, all or nothing, even if
    the system crashes in the middle of a Xact.
  • Idea
  • Keep a log (history) of all actions carried out
    by the DBMS while executing a set of Xacts
  • Before a change is made to the database, the
    corresponding log entry is made and forced into a
    safe location
  • After a crash, the effects of partially executed
    Xacts are undone using the log.

32
Overall System Structure
33
Summary
  • DBMS used to maintain and query large datasets
  • Benefits include recovery from system crashes,
    concurrent access, quick application development,
    data integrity and security
  • Levels of abstraction give data independence
  • A DBMS typically has a layered architecture
  • DBAs are one of the highest paid computer jobs
    and still in high demand!
  • DBMS RD is one of biggest research areas in
    industry (Microsoft, IBM, Oracle) and in academia!

34
Where to next?
  • Storage and Manipulation of Data
  • Storage at the Logical level
  • Chapter 2
  • Manipulation of Data
  • Chapter 3
Write a Comment
User Comments (0)
About PowerShow.com