CS157B Lecture 1 - PowerPoint PPT Presentation


PPT – CS157B Lecture 1 PowerPoint presentation | free to download - id: 24dec6-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

CS157B Lecture 1


Database System Concepts. You are required to write up your report using LaTeX. ... Database System Concepts. Outline of Course. Study of principals and ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 76
Provided by: marily197
Learn more at: http://www.cs.sjsu.edu


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CS157B Lecture 1

CS157B Lecture 1
  • Prof. Sin-Min Lee
  • Department of Computer Science
  • San Jose State University

Tuesday Thursday 1015 1130 Also by
Your evaluation in this course is determined by
Class Presentation 10 Presentation report
You are required to write up your report using
LaTeX. LaTeX is the standard tool for typesetting
scientific articles. Read http//www.latex-projec
Inventor of TeX
(No Transcript)
(No Transcript)
(No Transcript)
(No Transcript)
Outline of Course
  • Study of principals and techniques of databases
  • Grades assigned as in information sheet
  • Examples of use of databases
  • Programming projects in database design and
  • Programming in Microsoft Access
  • Programming in Java with Oracle
  • Development of a web site with database support

Textbook and class meetings
  • Principles of Database Systems With Internet and
    Java Applications
  • by Greg Riccardi
  • 2001, Addison-Wesley
  • Lectures and recitations

Students Role in Class
  • Attend class, please.
  • The class notes are available, but they are not
    the full classroom experience
  • Recitation sections are provided to personalize
    and enhance your learning environment
  • You are paying for this, take advantage of it
  • Read the book.
  • There are many topics covered in the text, but
    not the lectures
  • There are many details and examples in the text
  • Seek help during office hours

Current Events
  • Each lecture will cover current events that
    affect the database industry
  • Please bring info to lectures and recitations

Why Study Files and Databases?
  • Next few slides address the following
  • Importance of Databases to Economy
  • Can you get a job in the database field?
  • Representation of Information
  • Add meaning to data
  • Management of Complexity
  • Divide system into layers, focus on data
  • Management of Access and Security
  • Efficiency of Access and Storage
  • Separate data, allow optimizations

Importance of Databases to Economy
  • Expanding use of databases in retail sales
  • Walmart, retail sales information tracking
  • LL Bean, catalog sales information tracking
  • Examples of analyses
  • Sales of items
  • Comparisons between daily totals of items sold
    and items in inventory
  • Seasonal variations in sales of specific and
    similar items
  • Relative sales of similar items with different
  • Market-basket collections (all items in a single
  • Average and variation in total purchase amount
  • Average and variation in number and price of
  • Correlation between sales of items in a single
  • Customer analysis
  • Behavior of average customer
  • Preferences of individual customers

Importance of Database Companies
  • Oracle is the 2nd largest software company
  • Its stock has outperformed SP 500 and Microsoft
  • This picture is the stock performance, as shown
    on the BigCharts Web site from July, 1999 to
    July, 2000

E commerce
  • Companies are fighting for the market
  • See Oracle Web site
  • See IBM Web site
  • XML and XSL at Microsoft
  • http//www.microsoft.com/xml
  • http//msdn.microsoft.com/xml/demos/
  • http//msdn.microsoft.com/isapi/msdnlib.idc?theURL

Representation of Information
  • Data is collections of bits
  • physical database
  • Information is data with meaning
  • logical database
  • Representation of meta-data
  • database system is self-describing
  • Database Management System (DBMS)
  • define information content
  • construct database
  • manipulate by queries, reports and updates
  • data plus software

Management of Complexity
  • Insulation between programs and data
  • Program-data independence
  • Program-operation independence
  • Data abstraction
  • conceptual model for users
  • physical model for administrators
  • Sharing data and multi-user transactions
  • People
  • Database administrators
  • Database designers
  • Applications programmers
  • End users

Management of Access and Security
  • Controlling redundancy
  • inconsistency and duplication
  • Restricting unauthorized access
  • Enforcing integrity constraints
  • Providing backup and recovery
  • Persistent storage for program objects

Efficiency of Access and Storage Cost of Access
for Seagate Cheetah Disk
  • Seek time
  • Move access arm to the cylinder
  • Avg 6 msec, min 0.6 msec
  • Rotational delay
  • 1000 rpm, one revolution per 6 millisecond
  • Average 3 msec
  • Total latency max 12 msec, avg 9 msec
  • Transfer rate
  • 24 Mbytes/sec
  • Speed of memory access, Athlon 750 mhz
  • Latency lt100 nanosecond, 10,000 times faster than
  • Transfer rate 1.6 GBytes/sec, 7 times faster than

Hierarchical Cost of Storage
  • Registers and Cache are fixed size
  • Primary storage, memory (RAM) limited by hardware
  • 1000 Mbytes per CPU
  • Secondary storage, disk, also limited by hardware
  • 100 Gbytes per CPU
  • Tertiary storage, tape, etc. limited by storage

  • Glossary of terms
  • Define the terms as used in this subject
  • Database literature is filled with terms
  • Example of terms
  • Data, bits
  • Information, bits with meaning (type)
  • Entity
  • Schema

Client-server computing
  • Examples from web sites
  • New York Times
  • Pricewatch
  • Industry movement

What is a Database?
  • Database is a collection of data
  • data is known facts with implicit meaning
  • database is logically coherent, organized.
  • database is designed, built and populated for a
    specific purpose.
  • Database management system
  • collection of programs which support creation and
    maintenance of databases.

Time Line for Database Systems
  • before 1960 transition from punched card and tape
  • 1960s, from file management to databases
  • Bachman (GE) IDS and data structure diagrams
  • IMS from IBM, Hierarchical Data Model
  • IMS DB/DC, Network Model and communication
  • SABRE, multi-user access with network
  • 1970s, CODASYL and Relational Model
  • Codd (IBM) Relational Model
  • Chen introduced Entity Relationship Model
  • Query languages developed (SQL)
  • 1980s, Client/Server DBs, Oracle, DB2
  • PC databases, DBase, Paradox, etc.
  • SQL standard for definition and manipulation
  • 1990s, web-based information delivery
  • Trends expert DBs, object DBs, distributed DBs

Concepts and Architecture
  • Data Model is a set of concepts that can be used
    to describe the structure of a database
  • data types, relationships, and constraints
  • basic operations, for retrieval and updates
  • user-defined operations, behavior
  • 3 types of data models
  • High level or conceptual model
  • entities,attributes, and relationships
  • low-level or physical model
  • record formats, indexes and access paths
  • representational or implementation model
  • record structures or object models

Data Modeling
  • A data model is a specification of the
    information content of a system
  • conceptual data model describes information in
    terms the users will understand
  • logical data model describes information in a way
    that can be used to build a database
  • physical data model describes information in
    terms of its representation in physical storage

Schemas and Instances
  • Schema is the structure of a database
  • intention or meaning of the data
  • data models are schemas
  • table definitions are schemas
  • class definitions are schemas
  • Instances are the contents of a database
  • extension or values of the data
  • objects are instances
  • objects in a database are typically rows in a

Levels of database schemas
  • Different schemas are presented to different users

Data Independence
  • Logical data independence
  • Change in conceptual schema does not require
    change in external schemas
  • Expand or contract database with no change to
    external applications
  • View mappings must be changed
  • Physical data independence
  • Change in internal schema does require change in
    conceptual schema
  • Reorganize the file and index structure,
    especially for improved performance
  • Conceptual mapping must be changed

Database Languages
  • DDL, data definition language, conceptual schema
  • describe conceptual schemas
  • SDL, storage definition language, internal schema
  • describe file structures, indexes
  • VDL, view definition language, external schema
  • DML, data manipulation language
  • High-level or non-procedural (e.g. SQL)
  • Select Last Name from Roster where Section 2
  • Low-level or procedural
  • For r in Roster loop if r.section 2
    then result.Add ( r.lastname )

Information Engineering
Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
Stages in Database Design
  • Requirements formulation and analysis
  • Conceptual Design -- Conceptual Model
  • Implementation Design -- Logical Model
  • Physical Design --Physical Model

Database Design Process
  • Requirements formulation and analysis
  • Purpose Identify and describe the data that are
    used by the organization
  • Results Metadata identified, Data Dictionary,
    Conceptual Model-- ER diagram

Database Design Process
  • Requirements Formulation and analysis
  • Systems Analysis Process
  • Examine all of the information sources used in
    existing applications
  • Identify the characteristics of each data element
  • numeric
  • text
  • date/time
  • etc.
  • Examine the tasks carried out using the
  • Examine results or reports created using the

Database Design Process
  • Conceptual Model
  • Merge the collective needs of all applications
  • Determine what Entities are being used
  • Some object about which information is to
  • What are the Attributes of those entities?
  • Properties or characteristics of the entity
  • What attributes uniquely identify the entity
  • What are the Relationships between entities
  • How the entities interact with each other?

Database Design Process
  • Logical Model
  • How is each entity and relationship represented
    in the Data Model of the DBMS
  • Hierarchic?
  • Network?
  • Relational?
  • Object-Oriented?

Database Design Process
  • Physical (AKA Internal) Model
  • Choices of index file structure
  • Choices of data storage formats
  • Choices of disk layout

Chapter 2, Representing Information with Data
  • Entity Relationship (ER) Model
  • high-level, conceptual data model
  • Specify conceptual schema
  • conceptual database design
  • Identify the data requirements of users and
    detailed descriptions of data types,
    relationships and constraints.
  • Concentrate on specifying the properties of the
    data, not storage.

An Example of ER Modeling
  • Company database
  • Department
  • name, number, manager (employee), start date of
  • Projects controlled by department
  • name, number, single location
  • Employees
  • name, ssn, address, salary, sex, birthdate
  • assigned to department, several projects
  • Dependents of employees

Principals of ER Modeling
  • Entities and classes
  • Entity, a thing in the real world
  • Entity Class, the structure of a collection of
    similar entities
  • Attributes
  • Attribute, a property of an entity
  • Each entity has a value for each of its
  • Types of attributes
  • simple vs. composite, single-valued vs.
    multi-valued, stored vs. derived
  • domains of attributes

Relationships Between Entities
  • Relationship type defines a set of associations
    among given types.
  • Relationsip Instances are particular
    relationships among objects.
  • Examples of relationship types in company
  • Manages 11 between employee and department
  • Works-for 1N between department and employee
  • Controls 1N between department and project

Relationships, Roles, and Structural Constraints
  • Roles are attributes that signify the function of
    a particular entity (type) in a relationship
  • Employee manages department
  • Department is managed by employee
  • Employee works-for department
  • Department has employees who work for it
  • Constraints can be
  • cardinality
  • Each department can have no more than one manager
  • participation
  • Each department must have a manager

ER schema diagram for Company
Entity Classes for BigHit Video
Sample Attribute Specifications
Entity Classes, Attributes and Constraints
Entities, instances of classes
Relationships Between Entities
  • Relationship type defines a set of associations
    among given types.
  • Relationship Instances are particular
    relationships among objects.
  • Examples of relationship types in company
  • Manages 11 between employee and department
  • Works-for 1N between department and employee
  • Controls 1N between department and project

Relationships, Roles, and Structural Constraints
  • Roles are attributes that signify the function of
    a particular entity (type) in a relationship
  • Employee manages department
  • Employee works-for department
  • Constraints can be
  • cardinality
  • Each department can have no more than one manager
  • participation
  • Each department must have a manager

Relationship Types and Instances
  • Marriage relationship type
  • Person related to Person
  • One person has the role of wife one has the
    role of husband
  • Relationship type may have one or more attributes
  • e.g. weddingDate
  • Marriage relationship (instance)
  • Jane Block is married to Joe Block (relationship)
  • Jane Block is the wife of Joe Block (role)
  • Joe Block is the husband of Jane Block (role)
  • Parent-child relationship type
  • A person may have zero or more children

Relationships are always one-to-one
  • A relationship is an instance
  • These pictures are sets of instances

Find the Entities, Attributes and Relationships
ER schema diagram for BigHit Video
Chapter 4 Relational Model
  • Structure of Relational Databases

Posted on Sun, Apr. 20, 2003 IBM database
developer dead at 79 RELATIONAL' MODEL IS
Krieger Mercury News
Edgar F. Codd, an IBM computer pioneer who
created the relational database model'' that
underlies a 7 billion industry of storing the
world's online business data, died of heart
failure at home Friday in Williams Island, Fla.
He was 79. Bank accounts, credit cards, stock
trading, travel reservations, online auctions and
innumerable other now-routine data transactions
all rely on Codd's model, based on highly
abstract and complex mathematical theory. Before
Codd's landmark research paper in 1970, it was
possible to store lots of information -- but
analyzing it was difficult, requiring lines and
lines of code for even simple tasks.
His model made it possible to access large
amounts of data from small computers, giving
businesses and government agencies something they
desperately needed quick and easy access to
information. He had a vision about data that
was considered radical at the time,'' said
computer scientist Don Chamberlin, also of IBM.
Larry Ellison of Oracle used Codd's model to
build the first commercially available relational
database management system. As complex and
abstract as the math he loved, over the decades
Codd retained his British accent, his dry wit and
his love of a strong cup of tea, say family
members. Codd was the youngest of seven children
born to a leather processor and his schoolteacher
wife in the remote town of Portland, England. He
attended Oxford University on a full scholarship,
earning degrees in math and chemistry. Although
eligible for a military deferment because of his
studies, he chose to fly in the Royal Air Force.
Codd first came to the United States in 1948, at
the age of 25. He found work with IBM as a
programming mathematician for an early
proto-computer that filled two floors of a
Manhattan office building. In 1953, Codd moved to
Canada, frustrated that no one insisted that Sen.
Joseph McCarthy produce proof of his charges that
Communists were embedded in the U.S.
government. He later returned and became a U.S.
citizen. In 1965, he earned a doctorate from the
University of Michigan in Ann Arbor. A
disappointing job rating from his supervisor in
Poughkeepsie, N.Y., spurred Codd to transfer to
IBM's Santa Teresa development laboratories in
San Jose. There he found existing data management
systems seat-of-the-pants, with no theory at
all,'' he recalled in one interview. I began
reading documentation,'' Codd said, and I was
He proposed a solution that leaned heavily on
mathematical logic the relational model. He
believed that all the information in a database
should be represented as values in the rows and
columns of tables, and that no information should
be represented by pointers or connections among
records. But support for the traditional database
system within IBM was large, powerful and
antagonistic. It was at a meeting of a high-level
IBM technical committee that the relational model
caught the attention of IBM chairman Frank Cary.
IBM subsequently announced SQL/DS, its first
relational product, in 1981. DB2, for larger MVS
machines, was announced in 1983. When he put
two and two together, he didn't think about what
they added up to, but what they meant,'' said son
Ronald Codd, 47, of Alamo. He had this natural
ability to see a situation and reach a conclusion
that was a step beyond what people would
ordinarily think
Codd's life changed in 1983, when he suffered a
serious injury from a fall. After his recovery,
he retired from IBM and quit his beloved hobby of
recreational flying. But he continued to work
until 1999, commuting to his San Jose office at
Codd and Date Consulting Group, joined by
longtime IBM collaborator Chris Date and
mathematician Sharon Weinberg, another IBM
colleague, who after 12 years of courtship became
Codd's second wife. Edgar F. Codd Born Aug. 23,
1923, in Portland, England Died April 18, 2003,
in Williams Island, Fla.
An Appreciation by C. J. Date
Ted Codd was a genuine computing pioneer. He was
also an inspiration to all of us who had the
fortune to know him and work with him. He began
his career in 1949 as a programming mathematician
for IBM on the Selective Sequence Electronic
Calculator. He subsequently participated in the
development of several important IBM products,
including its first commercial electronic
computer (IBM 701) and the STRETCH machine, which
led to IBM's 7090 mainframe technology. Then, in
the 1960's, he turned his attention to the
problem of managing large commercial databases
and over the next few years he created, single
handed, the invention with which his name will
forever be associated the relational model of
The relational model is widely recognized as one
of the great technical innovations of the 20th
century. Codd described it and explored its
implications in a series of research papers
staggering in their originality--which he
published throughout the period 1969-1979. The
effect of those papers was twofold They changed
for good the way the IT world (including the
academic component f that world in particular)
perceived the database management problem and
they laid the foundation for an entire new
industry, the relational database industry, now
worth many billions of dollars a year. In fact,
not only did Codd's relational model set the
entire discipline of database management on a
solid scientific footing, it also formed the
basis for a technology that has had, and
continues to have, a major impact on the very
fabric of our society. It is no exaggeration to
say that Ted Codd is the intellectual father of
the modern database field.
Codd's supreme achievement with the relational
model should not be allowed to eclipse the fact
that he made major original contributions in
several other important areas as well, including
multiprogramming, natural language processing,
and more recently Enterprise Delta (a relational
approach to business rules management), for which
he and his wife were granted a US patent. The
depth and breadth of his contributions were
recognized by the long list of honors and elected
positions that were conferred on him during his
lifetime, including IBM Fellow elected ACM
Fellow elected Fellow of the Britain Computer
Society elected member of the National Academy
of Engineering and elected member of the
American Academy of Arts and Sciences. In 1981 he
received the ACM Turing Award, the most
prestigious award in the field of computer
science. He also received an outstanding
recognition award from IEEE the very first
annual Achievement Award from the international
DB2 Users Group and another annual achievement
award from DAMA in 2001. Computerworld, in
celebration of the 25th anniversary of its
publication, selected him as one of 25
individuals in or related to the field of
computing who have had the most effect on our
society. And Forbes magazine, which in December
2002 published a list of the most important
innovations and contributions for each of the 85
years of its existence, selected for the year
1970 the relational model of data, by E. F. Codd.

Ted Codd was a native of England and a Royal Air
Force veteran of World War II. He moved to the
United States in 1946 and became a naturalized US
citizen. He held MA degrees in mathematics and
chemistry from Oxford University and MS and PhD
degrees in communication sciences from the
University of Michigan. He is survived by his
wife Sharon and her parents, Sol and Nora Boroff,
of Williams Island, Florida a brother David Codd
and his wife, Barbara and a sister, Katherine
Codd, all of England and a second sister, Lucy
Pickard, of Hamilton, Ontario. He also leaves
four children and their families Katherine Codd
Clark, her husband Lawrence, and their daughters,
Shannon and Allison, of Palo Alto, California
Ronald E. F. Codd, his wife Susie, and their son,
Ryan and daughter, Alexis, of Alamo, California
Frank Codd and his wife, Aydes of Castro Valley,
CA and David Codd, his wife Ileana, and their
daughter Melissa and son, Andrew, of Boca Raton,
Florida. He also leaves nieces and nephews in
England, Canada, and Australia, as well as many,
many friends and colleagues worldwide. He is
mourned and greatly missed by all.
Example of a Relation
Basic Structure
  • Formally, given sets D1, D2, …. Dn a relation r
    is a subset of D1 x D2 x … x Dn Thus a
    relation is a set of n-tuples (a1, a2, …, an)
    where each ai ? Di
  • Example if
  • customer-name Jones, Smith, Curry,
    Lindsay customer-street Main, North,
    Park customer-city Harrison, Rye,
    Pittsfield Then r (Jones, Main, Harrison),
    (Smith, North, Rye),
    (Curry, North, Rye),
    (Lindsay, Park, Pittsfield) is a relation over
    customer-name x customer-street x customer-city

Attribute Types
  • Each attribute of a relation has a name
  • The set of allowed values for each attribute is
    called the domain of the attribute
  • Attribute values are (normally) required to be
    atomic, that is, indivisible
  • E.g. multivalued attribute values are not atomic
  • E.g. composite attribute values are not atomic
  • The special value null is a member of every
  • The null value causes complications in the
    definition of many operations
  • we shall ignore the effect of null values in our
    main presentation and consider their effect later

Relation Schema
  • A1, A2, …, An are attributes
  • R (A1, A2, …, An ) is a relation schema
  • E.g. Customer-schema
    (customer-name, customer-street, customer-city)
  • r(R) is a relation on the relation schema R
  • E.g. customer (Customer-schema)

Relation Instance
  • The current values (relation instance) of a
    relation are specified by a table
  • An element t of r is a tuple, represented by a
    row in a table

attributes (or columns)
Jones Smith Curry Lindsay
Main North North Park
Harrison Rye Rye Pittsfield
tuples (or rows)
Relations are Unordered
  • Order of tuples is irrelevant (tuples may be
    stored in an arbitrary order)
  • E.g. account relation with unordered tuples

  • A database consists of multiple relations
  • Information about an enterprise is broken up into
    parts, with each relation storing one part of the
    information E.g. account stores
    information about accounts
    depositor stores information about which
    owns which account customer
    stores information about customers
  • Storing all information as a single relation such
    as bank(account-number, balance,
    customer-name, ..) results in
  • repetition of information (e.g. two customers own
    an account)
  • the need for null values (e.g. represent a
    customer without an account)
  • Normalization theory (Chapter 7) deals with how
    to design relational schemas

The customer Relation
The depositor Relation
About PowerShow.com