Title: Everything You Need to Know About Databases database management systems, for this class
1Everything You Need to Know About
Databasesdatabase management systems, for
this class
- Objectives
- A Critical Property of RDBMSs Databases are
content addressable - The role of keys
- Terminology
- Relational database management systems are
- originated in set theory
- queries constitute declarative programming
-
2About Terminology
- Distinguish Three Usages
- Common use among the public
- Common use among Computer Science types.
- Formal terminology of Computer Science.
3Formal DefinitionsA word on notation.
Underlined terms are technical terms whose
definition I expect you to know well.
- Database A collection of data
- Database Management System (DBMS) A software
system that provides a set of services on a
database. - Relational Database Management System (RDBMS)
- A database management system organized on the
relational model - E.g. Oracle, Postgres, mySQL, DB2
- Public common usage, a database,
- CS common usage, DBMS, formal, an RDBMS
- - - means not a formal term, but you should
know it anyway, or a formal term beyond the scope
of exams.
4What are Other Examples of DBMSs?
- Operating Systems
- What about operating systems?
- __________
- __________
- __________
- facebook.com?
5Excel The most important not a database DBMS
- What are some of it services?
- Calculation
- Plotting data
- Storage and Retrieval
- Organization of data
6On the Commoditization of Computer Technology
- Three Tier Architecture
- Pervasive?
- every hardware vendor sells a preloaded rack.
- What does it mean if you know about databases?
- youre a king
- What if you are a professor of Computer Science?
- - - not a formal term, but you should know it
anyway or a formal term beyond the scope of the
class.
7Software Engineering Implications
- A DBMS is a shared resource and an obvious place
to persist all data. - Thus, unambiguously
- Content and programming of a DBMS is
- at the center of
- must interoperate with
- all the other software development
- Does the application deal with
- Physical definition
- Or Logical definition
- of the DBMS contents?
8More Excel
- Data Tabulated
- Comma Separated Values, vs. Columns with headings.
9The Steady March of Computer Performance
- Computers were slow
- necessarily priority was ad-hoc means to
success - Speed/price enables inefficiency
- Inefficiency for the computer
- Efficiency for people
10Motivating Set Theory as a Basis
- Max(27, 18.3, 256)
- Concept of an element of a set as a tuple
- S
- (James, Watson, DNA-Structure)
- (Lee, Hood, Sequencing Machine)
11- Examples (on board)
- tuples as elements of sets.
- Dot operator
12Inventors/Inventions
13In the Beginning,cant store the same thing
repeatedly
- (why not?) theory of normal forms
- tuple notation
- retrieval as set theory
14Formal and Informal Terminology
Relations tables
Tuples rows records
Attributes columns
15SQL - Select/From/Where query
SELECT last // output columns FROM
Inventors // input tables WHERE Inventors.first
James // returns Watson
16Inventor - Invention as DB Tables
Who invented the sequencing machine? Select
last From Invention, Inventor Where
Invention.namesequencing_machine and
Invention.iid Inventor.iid
17More Formalities
- Relation schema the name of a table, its
attributes (column names) - Examples
- Invention(iid, name) // table_name(comma
separated attributes) - Inventor(iid, first, last)
- T1(a1, , an) // convenient to use just letters
- Database schema a set of relation schema //
defines a database - Examples
- Invention(iid,name), Inventor(iid,first,last)
- Name(student-id, First, Last), Class(class-name,
student-id) - R(a1, a2, a3), S(b1,b2)
- First use of a formal term will be underlined.
Terminology is important
18Compare to SQL Programming
- Invention(iid, name)
- CREATE TABLE Invention (
- iid INTEGER,
- name VARCHAR ( 255 ),
-
- CONSTRAINT Invention0 PRIMARY KEY (iid)
- )
- Inventor(iid, first, last)
- CREATE TABLE T_Inventor (
- iid INTEGER,
- first VARCHAR ( 255 ),
- last VARCHAR ( 255 )
- )
19Inventor - Invention in Memory
- Inventors of DNA
- structure, see above
- sequencing_machine, Lee Hood
- expression_chips, David Botstein
-
20Data structures to represent, Inventions and who
Invented them
- list Inventors // define method signature
Inventors returns a list of Inventor - new Inventions list Invention // in main, define
a new object as a list of Invention
21Some key definitions (pun intended)
- Candidate Key Any attribute or set of attributes
that uniquely determines a row of a table - Foreign Key An attribute or set of attributes in
one table, that form a candidate key in a
different table. - aside content based retrieval does not have to
be on key fields
22(not needed now) - these are physical
- Primary key A candidate key used as the index
key for index that determines the placement of a
record on disk. (primary index) - Index Key An attribute or set of attributes that
serve as the basis of organization of an index
structure. Need not be a candidate key. - practical aside commercial DBMSs often hide
the candidate key requirement from the user. If
primary key is not a candidate key a unique field
is added.
23DBMSs are Content Addressable
Select last // output columns From
Inventors // input tables Where Inventors.first
James returns Watson
Select first From Inventors Where Inventors.last
Watson returns James
24Inventor/Invention - content addressable
Who invented the sequencing machine? Select
last From Invention, Inventor Where
Invention.namesequencing_machine and
Invention.iid Inventor.iid
25Inventor - Invention as DB Tables
foreign key implements 1 to many relation
26content addressable contd
What did Lee Hood invent? Select last From
Invention, Inventor Where Inventor.lastHood
and Invention.iid Inventor.iid
27Declarative vs. Procedural Programming
- Java/C Procedural
- explicit sequence of steps
- SQL querys Declarative
28How does procedural programming compare to
declarative?
29Of Foreign Keys, Pointers and Surrogates
- Surrogate An instance of a data type that can
be used to locate a disk block or a row in a
database. Can be a key or row id (RID) - - informally, but never formally, a pointer.
Q What is the direction of content based
implementation of an association? A It has
none.
30A Biological Example
Exon seq.
1 1..n
Gene seq.
Intron seq.
1 0..n
- Q What biology does this data model represent?
- A
- Genes are made up of Exons and Introns
- A gene has at least 1 exon, but does not have to
have an intron (prokaryotes dont) - Exons and Introns are hard to identify when you
dont know what gene they belong to.
31What if you just wrote the code
- create tables
- Gene(gid,organism, sequence) // schema
representation - Exon(eid, gid, sequence) // not SQL
- Intron(iid, gid, sequence)
- What biology did you learn from that?
- Models are
- easier, faster, much less to write
- obvious, a picture says a thousand words
- produce better quality results
- immediately
- forever influences the robustness and
maintainability of the code.
32History of Data Modeling First there were E-R
Models
- Entity-Relationship model is a set of concepts
and graphical symbols that can be used to create
conceptual schemas
33E-R Models Proof nothing ever dies in Computer
Technology, they just fade away very slowly
- Four versions
- Original E-R model by Peter Chen (1976)
- Extended E-R model the most widely used model
- Information Engineering (IE) by James Martin
(1990) - IDEF1X national standard by the National
Institute of Standards and Technology - Unified Modeling Language (UML) supporting
object-oriented methodology
34Inventor-Invention, Object Model
- A list of inventions, each with their list of
inventors - Class diagram, 1 of 7 (primary) diagrams in UML
class methods
1
association
35Inventor-Invention, Physical Data Model (UML
syntax)
1
36Logical Data Model
1
UML representations of data models rarely have
direction Q Even so why might you label a
direction? A1 Logical model encodes the
semantics (logic) of the application A2 The
logical model can be used for the shared
beginning of both the database and procedural
programming
37UML The new data modeling language
correction
38Vocabulary of Logical Data Models
- Class diagram
- table (object) names
- attributes (methods)
- Associations // lines connecting classes
- Cardinality, (0,1, n or , 0..1, 0..n, 1..n)
- Aggregation
- expresses something has parts
- identifying, or non-identifying, // parts
cant exist by themselves - Inheritance
- synonyms
- 0 0..0
- 1..1
- 1..n 1..
- 0..n 0..
- 0..
39Cardinality of an Association (1)
40 Cardinality of an Association (2)
41Cardinality of an Association (3)
42Diamond Represents Aggregation- Empty diamond
non-identifying- Filled diamond identifying
43Inheritance (subtypes)
44A Comparison of ER Modeling Symbols
Historical Aside Object models came in,I) Booch,
2) Shlaer Mellor 3) They agreed to converge on
Unified Modeling Language, UML