CIS 550 Databases Fall 2008 Val Tannen - PowerPoint PPT Presentation

Loading...

PPT – CIS 550 Databases Fall 2008 Val Tannen PowerPoint presentation | free to view - id: c92f8-MjJiN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

CIS 550 Databases Fall 2008 Val Tannen

Description:

Size of personal address book is probably less than 100 entries, but there are ... Blue = physical. CIS 550 Fall 2008. Lecture Notes 01 09/03. 20 ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 35
Provided by: zack4
Category:
Tags: cis | databases | fall | tannen | val

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CIS 550 Databases Fall 2008 Val Tannen


1
CIS 550 Databases Fall 2008 Val Tannen
  • Lecture Notes 01
  • September 3, 2008

Slides courtesy of Susan Davidson, Zachary Ives
Raghu Ramakrishnan
2
READ
  • Chapter 1 of Ramakrishnan Gehrke
  • Lecture Notes 01 (these slides!)

3
Administration
  • Instructor Val Tannen, val_at_cis or val_at_seas
  • 570 Levine Hall North, x8-2665
  • Office hours M 3-4 or by appointment
  • TA Marie Jacob, majacob_at_seas,
  • Office hours TuTh 430-530 in 575 Levine Hall
    North
  • Web site (under construction)
    www.seas.upenn.edu/cis550/
  • Texts and readings
  • Ramakrishnan Gerke, Database Systems, 3rd ed.
  • Supplementary material (to be handed out in class)

4
Course Format and Grading
  • Roughly one major topic area per week or every
    two weeks
  • Readings from the text
  • Homework assignments for each topic area (20)
  • Homework 1 is given out in class today
  • Midterm exam (20), Wednesday October 8, in class
  • Final exam (30), TBA
  • Project (30) groups of 2-3
  • An initial handout describing the project is
    given out in class today

5
Why This Course?
  • Most CS courses concentrate on code our
    interest is managing and representing data
  • Warning this course doesnt focus on teaching
    advanced SQL or how to be an Oracle DBA (though
    it will get you started)
  • … So what in the world are we studying for 14
    weeks???

6
From the Wikipedia entry for database on
8/19/2005 (check it out today!)
  • A database is a collection of data elements
    (facts) stored in a computer in a systematic way,
    such that a computer program can consult it to
    answer questions. The answers to those questions
    become information that can be used to make
    decisions that may not be made with the data
    elements alone. The computer program used to
    manage and query a database is known as a
    database management system (DBMS).
  • Facts may be structured in a number of ways,
    known as database models. For instance, one
    database model is to associate each fact with a
    record representing an entity (such as a person),
    and to arrange these entities into trees or
    hierarchies -- the hierarchical database model.
    Another model is to arrange facts into sets of
    values which satisfy logical predicates -- the
    relational database model.

7
Wikipedia entry for database, cont.
  • Database management systems range from the
    extremely simple to the highly complex.
    Differences among DBMSes include whether they are
    capable of ensuring the integrity of the data
    whether they may be used by many users at once
    and what sorts of conclusions they can be
    programmed to compute from a set of data.
  • The first database management systems were
    developed in the 1960s. A pioneer in the field
    was Charles Bachman. Two key data models arose at
    this time the network model (developed by
    CODASYL) followed by the hierarchical model (as
    implemented in IMS). These were later usurped by
    the relational model, which was contemporary with
    the so-called flat model designed for very small
    tasks. Another contemporary of the relational
    model is the object-oriented database (OODB).

8
Example To-Do List
  • Buy school supplies due 9/7
  • Go to orientation on 9/7
  • Exercise every M/W/F
  • Buy Philly postcards
  • How does this differ from plain text? What might
    you do with it that you couldnt (easily) as
    plain text?

9
Example Your PDA/Cell Phone
Event Day When Who Where Lunch 10/24 1pm Zack
Cavanaughs Advice 10/25 9am Dr. Smith 599
Levine Biking 10/26 9am Jane Pottruck Dinner 10/2
6 6PM Jane Food Court
Calendar
Who Phone Email Office Zack 6-2789 zives 576
Levine N Dr. Smith 6-1234 drsmith 599
Levine Jane 543-2198 jane 2220 Walnut St.
Contacts
10
What If We Want to Include Contact Info on Our
Calendar?
  • Do we also want to keep e-mail addresses,
    telephone numbers etc.?
  • Should we expand the number of columns in the
    Calendar table? For example
  • Event When Who-name Who-email Who-tel …. Where
  • Lunch 1pm Zack zives 6-2789 …. Cav…
  • …
  • What is the trade-off in terms of entering data?

11
Link Calendar with Contacts?
  • Why cant we link calendar entries with contact
    info, and show the results of the two?
  • The link could be based on something as simple as
    the person's name
  • (Whats the danger here? What else might work
    better?)
  • This brings up an issue how to follow links
  • If we were to do this in Java, how might it be
    done?

12
Another Kind of Link Classes
and Subclasses
  • Person has attributes
  • ssn
  • PennID
  • set of user IDs
  • given name
  • family name
  • …
  • Student IS A person who
  • takes courses
  • is given grades
  • is taught
  • listens to lectures in class, OR over the Web, OR
    on videotape
  • This is yet another kind of information
  • How have you previously seen such relationships
    encoded?

13
Data Representation and Modeling
  • All of the data weve seen have an implicit data
    model
  • The data model includes some basic assumptions
    about whats an item of data, how to interpret
    it, and so on
  • The relational data model was the first model for
    data that is independent of its data structures
    and implementation
  • A theory of normalization guides you in designing
    relations
  • Concepts from the relational data model have been
    adapted to form object-oriented data models (with
    classes and subclasses), XML models, etc.
  • There are sibling fields to databases that
    consider
  • natural language models (how to understand words)
  • document models (how to match words and
    documents)
  • ontologies (how to define relationships between
    classes)

14
The DBMS Provides an Interface for the Database
  • A database (DB) is a large, integrated collection
    of data
  • Generally is cohesive in some way
  • A DB models a real-world organization or unit
  • A database management system (DBMS) is a software
    package designed to store and manage databases
  • Reliable storage recovery of 100s of GB
  • Querying/updating interface and API (for
    applications and Web pages)
  • Support for many concurrent users
  • Why do we need a DBMS, instead of coding in Java?

15
DBMS Benefit 1 Generality and Declarativity
  • Dont require the programmer or user to know
    details like indices, sort orders, machine
    speeds, disk speeds, concurrent users, etc.
  • Instead, the programmer/user programs with a
    logical model in mind
  • The DBMS makes it happen based on an
    understanding of relative costs of different
    methods
  • Declarative programming what does it mean?

16
Benefit 2 Efficiency
and Scale
  • Size of personal address book is probably less
    than 100 entries, but there are things we'd like
    to do quickly and efficiently
  • Give me all appointments on 10/28
  • When am I next meeting Jim?
  • Program these as quickly as possible (and make
    them resilient to data format changes)
  • Scale to a corporate calendar with hundreds of
    thousands of entries

17
Benefit 3
Management of Concurrency and Reliability
  • Suppose other people are allowed access to your
    calendar and are allowed to modify it? How do we
    stop two people changing the file at the same
    time and leaving it in a physical (or logical)
    mess?
  • Suppose the system crashes while we are changing
    the calendar. How do we recover our work?
  • This requires a basic concept…

18
Transactions
  • Key concept for concurrency is that of a
    transaction an atomic sequence of database
    actions (read/write) on data items (e.g. calendar
    entry).
  • Key concept for recoverability is that of a log
    keeping track of all actions carried out by the
    db.

19
The Layers of the DBMS
API/GUI
(Simplification!)
Query
Optimizer
Stats
Physical plan
Exec. Engine
Logging, recovery
Schemas
Catalog
Requests
Data/etc
Index/file/rec Mgr
Data/etc
Requests
Buffer Mgr
Red logical Blue physical
Pages
Pages
Storage Mgr
Requests
Data
Storage
20
The Database Abstraction Provided by the DBMS
  • We think of databases at two levels
  • Logical structure
  • What users/programmers see program or query
    interface
  • Physical structure
  • Organization on disk, indexes (indices), etc.
  • The logical level is further split into
  • Overall database design (conceptual seen by the
    DB designer)
  • Views that various users get to see

21
The Three-level Architecture for Databases
View 1 View 2 … View N
Schema
Logical, Conceptual Level
Physical Level (file organization, indexing)
22
Data Independence
  • A user of a relational database system should be
    able to use the database without knowing
    precisely how the data is stored, e.g.
  • After all, you don't worry about IEEE
    floating-point when you do division in a Java
    program or with a calculator!

SELECT When, Where FROM Calendar WHERE Who
Jane"
23
More on Data Independence
  • Logical data independence
  • Protects the user from changes in the logical
    structure of the data
  • could reorganize the calendar schema without
    changing how we query it if the query conforms to
    a view that stays the same
  • Physical data independence
  • Protects the user from changes in the physical
    structure of data
  • could add an index on who (or sort by when)
    without changing how the user would write the
    query, but the query would execute faster (query
    optimization)

24
Presentation Layer (4th Tier) Data-Driven Web
Sites
HTML
view
Processing
  • Data driven web sites also add an HTML
    presentation layer on top of what weve seen
  • Or they use XML plus style sheets to get the
    same effect

25
An Issue 80 of the Worlds Data is Not in a
DB!
  • Examples
  • scientific data (large images, complex programs
    that analyze the data)
  • personal data
  • WWW and email (some of it is stored in something
    resembling a DBMS)
  • Data management is expanding to tackle these
    problems
  • Flexibility data management imposes many
    constraints to make problems solvable
  • Must deal with entities outside our control
  • In this course, well start by focusing on
    databases, but eventually look outside the box
    at the Web and at gluing together data from many
    places

26
Combining Databases with Mediators (a kind of
middleware)
Mediated Schema
XML
  • A layer above the three-tiered architecture, to
    combine multiple databases/sources on the Web
  • Some of these are databases over which we have no
    control
  • Some must be accessed in special ways
  • We generally need to think about how to translate
    between different database formats

27
How Does One Build a Database?
  • Start with a conceptual model
  • Design implement schema
  • Write applications using DBMS and other tools
  • Many ways of doing this where the hard problems
    are taken care of by other people (DBMS, API
    writers, library authors, web server, etc.)
  • Common applications include PHP/JSP/servlet-driven
    web sites
  • The DBMS takes care of query optimization and
    execution

28
Conceptual Design
PROFESSOR
fid
name
Teaches
Takes
STUDENT
COURSE
cid
name
semester
sid
name
29
Designing a Schema (Set of Relations)
STUDENT
COURSE
Takes
  • Convert to tables constraints
  • Then need to do physical design the layout on
    disk, indices, etc.

PROFESSOR
Teaches
30
Applications Use Queries in SQL
  • Structured Query Language
  • Based on restricted first-order logic expressions
    over relations
  • Not procedural defines constraints on the
    output
  • Converted into a query plan that exploits
    properties run over the data by the query
    optimizer and query execution engine
  • lthtmlgt
  • ltbodygt
  • lt!-- hypotheticalEmbeddedSQL
  • SELECT FROM STUDENT, Takes, COURSE
  • WHERE STUDENT.sid Takes.sID
  • AND Takes.cID cid
  • --gt
  • lt/bodygt
  • lt/htmlgt

31
Processing the Query
Web Server / UI / etc
Execution Engine
Optimizer
Storage Subsystem
SELECT FROM STUDENT, Takes, COURSE
WHERE STUDENT.sid Takes.sID AND
Takes.cID cid
32
DBMSs in the Real World
  • A huge industry for 20 of the worlds data!
  • Big, mature relational databases
  • IBM, Oracle, Microsoft
  • Middleware above these
  • SAP, PeopleSoft, dozens of special-purpose apps
  • Application servers
  • Integration and warehousing systems
  • Current trends
  • Web services XML everywhere
  • Smarter, self-tuning systems

33
So What about Database Research?
  • Not focusing on the problems of Oracle…
  • Understanding whats possible to do with XML
  • Better query processing
  • Better languages for meta-info (e.g.,
    constraints)
  • Data streams
  • Peer-to-peer architectures
  • Integrating and sharing data data provenance
  • Lots of theory and systems-building
  • Youll see familiar concepts in this course from
    operating systems and from complexity
    theory/logic
  • … And from programming languages, AI planning, …

34
In this Course...
  • Study relational databases, their design, how to
    query, what forms of indices to use.
  • Beyond relational algebra a logical model of
    data (Datalog), recursion
  • XML and semi-structured data models
  • Understanding DB internals
  • How DBs are built
  • Performance implications
  • Integrating and mediating between databases (a
    huge problem today)
About PowerShow.com