Introduction to Databases: Relational and XML Models and Languages - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Introduction to Databases: Relational and XML Models and Languages

Description:

instead: guarantee consistency by 'roll-back' of partially executed ... transaction management guarantees consistency despite concurrent/interleaved execution ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 75
Provided by: bent83
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Databases: Relational and XML Models and Languages


1
Introduction to DatabasesRelational and XML
Models and Languages
  • Instructors
  • Bertram Ludaescher
  • Kai Lin

2
Overview
  • 0915-1020 Relational Databases (1h05)
  • 1020-1030 BREAK (10)
  • 1030-1150 Relational Databases (1h20)
  • 1150-1315 LUNCH (1h25)
  • 1315-1345 Demo Hands-on (30)
  • 1345-1510 XML Basics (1h25)
  • 1510-1530 BREAK (20)
  • 1530-1630 XML Querying (1h)
  • 1630-1700 Demo Hands-on (30)

3
Scope
  • Today Introduction to Databases, in particular
  • Relational database model
  • Relational Operations and Queries
  • Constraints
  • XML data model
  • Querying and transforming XML
  • Some demos simple hands-on exercise
  • Tomorrow
  • Introduction to Knowledge Representation and
    Ontologies
  • But first déjà vu

4
(No Transcript)
5
What is a Database System?
  • Database (system)
  • Database Instance (set of tables of rows)
  • Database Management System (DBMS)
  • Origins in the commercial world
  • to organize, query, and manipulate data more
    effectively, efficiently, and independently
  • Scientific databases
  • often special features
  • spatial, temporal, spatiotemporal, GIS, units,
    uncertainty, raw derived data,

6
Why not just use files as databases?
  • For some applications yeah why not?
  • But in general
  • scanning greping large files can be very
    inefficient
  • no language support for selecting desired data,
    joining them, etc.
  • cannot express the kinds of questions/queries
    youd like to ask
  • grep is no substitute for a query language
  • redundant and/or inconsistent storage of data
  • no transaction management and concurrency control
    among multiple users
  • no security
  • no recovery
  • no data independence (application? data)
  • no data modeling support

7
Features of a Database System
  • A data model (relational, object-oriented, XML)
    prescribes how data can be organized
  • as relations (tables) of tuples (rows)
  • as classes of (linked) objects
  • as XML trees
  • A (database) schema (stored in the data
    dictionary) defines the structure of a specific
    database instance
  • Relational schema
  • OO schema
  • XML Schema (or XML DTD)

8
Features of a Database System
  • Data is treated uniformly and separately from the
    application
  • Efficient data access
  • Queries and views are expressed over the schema
  • Integrity constraints (checking and enforcement)
  • Transactions combine sets of operations into
    logical units (all-or-nothing)
  • Synchronization of concurrent user transactions
  • Recovery (after system crash)
  • not to be confused w/ backup
  • instead guarantee consistency by roll-back of
    partially executed transactions (how? Hint
    logging)

9
DB features, e.g., Concurrency Control
  • Concurrent execution of simultaneous requests
  • long before web servers where around...
  • transaction management guarantees consistency
    despite concurrent/interleaved execution
  • Transaction ( sequence of read/write operations)
  • Atomicity a transaction is executed completely
    or not at all
  • Consistency a transaction creates a new
    consistent DB state, i.e., in which all integrity
    constraints are maintained
  • Isolation to the user, a transaction seems to
    run in isolation
  • Durability the effect of a successful
    (committed) transaction remains even after
    system failure

10
Levels of Abstraction Architecture Overview
User
Conceptual Level
View 1
View 2
View n
Export schemas
logical data independence
ER-Model (Entity-Relationship) OO Models
(Classes) part of DB design ? conceptual
design often lost in the process
Logical (conceptual) level
Tables
physical data independence
Physical level
Index structures
DB instances
11
Database Design Entity-Relationship (ER) Model
Employee
Department
works-for
  • Entities
  • Relationships
  • Attributes
  • ER Model
  • initial, high-level DB design (conceptual model)
  • easy to map to a relational schema (database
    tables)
  • comes with more constraints (cardinalities,
    aggregation) and extensions EER (is-a gt class
    hierarchies)
  • related UML (Unified Modeling Language) class
    diagrams


12
The Relational Model
Employee
  • Relation/Table Name
  • employee, dept
  • Attributes Column Names
  • Emp, Salary, DeptNo, Name, Mgr
  • Relational Schema
  • employee(Empstring, Salaryinteger,
    DeptNointeger), ...
  • Tuple Row of the table
  • (tom, 60000, 1)
  • Relation Set of tuples
  • (...), (...), ...

Emp Salary DNo tom 60k 1 tim 57k 1 sally 45k 3 car
ol 30k 1 carol 35k 2 .
FK foreign key, pointing to another key
Department
DNo Name Mgr 1 Toys carol 2 Comp. carol 3 Shoes
sam
13
Ex Creating a Relational Database in SQL
CREATE TABLE employee ( ssn CHAR(11), name
VARCHAR(30), deptNo INTEGER, PRIMARY KEY
(ssn), FOREIGN KEY (deptNo) REFERENCES
department )
CREATE TABLE department ( deptNo INTEGER, na
me VARCHAR(20), manager CHAR(11), PRIMARY
KEY (deptNo), FOREIGN KEY (manager)
REFERENCES employee(ssn) )
14
What is a Query?
  • Intuitively
  • An executable question in terms of a database
    schema
  • Evaluating a query Q against a database instance
    D yields a set of answer objects
  • Relational tuples or XML elements
  • Example
  • Who are the employees in the Toys dept.?
  • Who is (are) the manager(s) of Tom?
  • Show all pairs (Employee, Mgr)
  • Technically
  • A mapping from an input schema (the given table
    schemas) to a result schema (the new columns you
    are interested in) defined in some query language

15
Why (Declarative) Query Languages?
,,Die Grenzen meiner Sprache bedeuten die Grenzen
meiner Welt. The limits of my language mean the
limits of my world. Ludwig Wittgenstein,
Tractatus Logico-Philosophicus
If you have a hammer, everything looks like a
nail.
  • Things we talk and think about in PLs and QLs
  • Assembly languages
  • registers, memory locations, jumps, ...
  • C and the likes
  • if-then-else, for, while, memory (de-)allocation,
    pointers, ...
  • Object-oriented languages
  • C C plus objects, methods, classes, ...
  • Java objects, methods, classes, references,
    ...
  • Smalltalk objects, objects, objects, ...
  • OQL object-query language

16
Why (Declarative) Query Languages?
  • Things we talk and think about in PLs and QLs
  • Functional languages (Haskell, ML)
  • (higher-order) functions, fold(lr), recursion,
    patterns, ...
  • gt Relational languages (SQL, Datalog)
  • relations (tables), tuples (rows) conceptual
    level ER
  • relational operations ?, ?, ?, ?, ...,
    ?,?,?,?,?,..., ?, ?, X
  • gt Semistructured/XML (Tree) Graph Query
    Languages
  • trees, graphs, nodes, edges, children nodes,
    siblings,
  • XPath, XQuery,
  • Also
  • Focus on what, and not how!

17
Example Querying a Relational Database
input tables
join
SELECT e.Emp, d.Mgr FROM Employee e, Department
d WHERE e.DeptNo d.DeptNo
SQL query (or view def.)
we dont say how to evaluate this expression
answer (or view)
18
Example Query SQL vs DATALOG
  • List all employees and their managers
  • In SQL
  • SELECT e.name, d.manager
  • FROM Employee e, Department d
  • WHERE e.deptNo d.deptNo
  • In DATALOG
  • q(E, M) - employee(E, S, D), department(D, N,
    M).

a join operation
19
Important Relational Operations
  • select(R, Condition)
  • filter rows of a table wrt. a condition
  • project(R, Attr)
  • remove unwanted columns keep rest
  • join(R1, A2, R2, A2, Condition)
  • find matches in a related table
  • e.g. match R1.foreign key R2.primary key
  • cartesian product(R1, R2)
  • union (OR), intersection (AND)
  • set-difference (NOT IN)

20
Relational Operations (in DATALOG)
(query) output (query) input
21
Demo
  • Relational Queries in DATALOG

22
Queries, Views, Integrity Constraints
  • can all be seen as special queries
  • Query q() - ad-hoc queries
  • View v() - exported views
  • Integrity Constraints
  • ic () - . MgrSal lt EmpSal
  • say what shouldnt happen
  • if it does alert the user (or refuse an update,
    )

23
Query Evaluation vs Reasoning
  • Query evaluation
  • Given a database instance D and a query Q, run
    Q(D)
  • What databases do all the time
  • Reasoning (aka Semantic Query Optimization)
  • Given a query Q and a constraint C, optimize
    QC (e.g., given C, Q might be unsatisfiable)
  • Given Q1 and Q2 decide whether Q1? Q2
  • Given Q1,Q2, C decide whether Q1? Q2 C
  • Note we are NOT given a database instance D
    here just the schema and the query/IC expressions

24
Summary QLs for Relational Databases
25
Relational Algebra
26
Relational Algebra
27
Relational Algebra
28
Relational Algebra
29
Relational Algebra
30
Hands-on Part
  • DBDesigner 4

31
DBDesigner 4
  • An open source (GPL) database design tool
  • Goto http//www.fabforce.net/dbdesigner4/
  • Download and install (5 min)
  • Open the example schema Order (File -gt Open -gt
    Order), and examine the relations between the
    tables forumtopic and forumpost. Find the foreign
    key used in the relation postHasTopic (5 min)
  • Connect to a sample MySQL database
  • host geon07.sdsc.edu
  • port 3306
  • database summer_institute
  • username root
  • password blank
  • (5 min)
  • Select all records in the table forumtopic (2
    min)
  • Select all records in the table forumpost, and
    sort the result according to their idforumpost
    (3min)
  • Find all forum posts with the topic Cars (5 min)
  • Insert a record into the table forumpost (5 min)

32
Additional Material
  • (not presented)

33
Non-Relational Data Models
  • Relational model is flat atomic data values
  • nesting is modeled via pointers (foreign keys)
    and Skolem-ids
  • extension nested relational model (tables
    within tables, cf. nested HTML tables)
  • values can be nested lists ..., tuples (...),
    sets ...
  • ISO standard(s) SQL
  • identity is value based
  • Object-oriented data model
  • complex (structured) objects with object-identity
    (oid)
  • class and type hierarchies (sub-/superclass,
    sub-/supertype)
  • OODB schema may be very close to world model
    (no translation into tables)
  • () queries fit your OO schema
  • (-) (new) queries that dont fit nicely
  • ODMG standard, OQL (Object Query Language)

34
Example Object Query Language (OQL)
SELECT DISTINCT STRUCT( E e.name, C
e.manager.name, M ( SELECT c.name
FROM c IN e.children WHERE FOR ALL d IN
e.manager.children c.age gt d.age ) ) FROM e IN
Employees
  • Q what does this OQL query compute?
  • Note the use of path expressions like
    e.manager.children
  • gt Semistructured/Graph Databases

35
A Graph Database
36
Querying Graphs with OO-Path Expressions
?- dblp."Inf. Systems".L."Michael E.
Senko". Answer L"Volume 1, 1975 L"Volume
5, 1980".
?- dblp."Inf. Systems".L.P, substr("Volume",L),
P person.spouselives_in P.lives_in.
37
Constructs for Querying Graphs
Example ?- dblp . any . (if(?vldb) if(?sigmod))
38
Keys, Keys, and more Keys
  • A key is a minimal set of attributes that
  • uniquely identify a tuple
  • determine every other attribute value in a tuple
  • There may be many keys for a relation we
    designate one as the primary key
  • The phrase candidate key is used in place of
    key where the key denotes the primary key
  • A superkey is a superset of a key (i.e., not
    necessarily minimal)

39
Normalization of Relations
Example of good design
Employee(EName, SSN, BDate, Address, DNumber)
Department(DName, DNumber, DMgrSSN)
Example of bad design (why is it bad?)
Emp(EName, SSN, BDate, Address, DNum, DName,
DMgrSSN)
40
Whats Wrong?
Emp(EName, SSN, BDate, Address, DNum, DName,
DMgrSSN)
  • The description of the department (DName,
    DMgrSSN) is repeated for every employee that
    works in that department.
  • Redundancy!
  • The department is described redundantly.
  • This leads to update anomalies! ( and wastes
    space)
  • Digression (for experts only)
  • redundancy can be used to increase performance
    e.g. materialized views)

41
Update Anomalies (caused by redundancy)
  • Insertion Anomalies
  • If you insert an employee
  • Need to know which department he/she works
  • Need to know the description information for that
    department
  • If you want to insert a department, you cant
    until there is at least one employee
  • Deletion Anomalies if you delete an employee, is
    that dept. gone? was this the last employee in
    that dept?
  • Modification Anomalies if you change DName,
    for example, it needs to be changed everywhere!

42
Null values also cause problems
  • Null values might help in special cases, but are
    not a general solution to update anomalies
  • For example, they may
  • Waste space
  • Make it hard to specify and understand joins
  • Make it hard to aggregate (count, sum, etc.)
  • Have different meanings
  • Attribute does not apply to this tuple
  • Attribute value is unkown
  • Value is known but absent (not yet recorded)
  • Causes problems with queries (cant interpret
    query answers)

43
Why worry about normalization?
  • To
  • reduce redundancy update anomalies
  • reduce the need for null values
  • Last not least its a solved problem
  • all algorithms proofs have been worked out

Here Normalization based on FDs (theres more)
44
Functional Dependencies
  • Statement that if two tuples agree on attributes
    A they must agree on attributes B
  • A ? B
  • If the value of the first attribute(s), A, is
    known,
  • then the value of the second attribute(s), B,
    is
  • known (read A determines B)
  • We want to know if it is always true in the
    application

45
Functional Dependencies
  • Examples of functional dependencies
  • social-security-number ? employee-name
  • course-number ? course-title
  • Examples that are NOT functional dependencies
  • course-number -?? book
  • course-number -?? car-color

46
What is a functional dependency?
Remember what it means to be a function x
f(x) x g(x) x h(x) 1
2 1 2 1 10 1 3
2 2 2 20 2 5
3 5 3 30 3 5
f is not a function for x1, f(x) is not unique
we are looking for functional relationships (that
must occur in a relation) among attribute values
47
What are the FDs?
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME,
DMGRSSN) EMP_PROJ(SSN, PNUM, HOURS, ENAME,
PNAME, PLOCATION)
48
What are the FDs?
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME,
DMGRSSN) EMP_PROJ(SSN, PNUM, HOURS,
ENAME, PNAME, PLOCATION)
5
2
1
3
6
4
9
10
7
8
49
Why do we care?
EMPLOYEE (SSN, NAME, SALARY, JOB_DESC)
  • If all FDs are implied by the key
  • it means that the DBMS only enforces keys (not
    FDs) and the DBMS is going to enforce the keys
    anyway
  • otherwise, we have to perform expensive
    operations to maintain consistency (code or check
    statements)

50
Decomposition
  • Main refinement technique decomposition
    (replacing ABCD with, say, AB and BCD, or ACD and
    ABD) based on the projection operator.
  • Decomposition should be used judiciously
  • Is there reason to decompose a relation? (via
    Normal Forms)
  • What problems (if any) does the decomposition
    cause? (lost information or dependencies?)

51
How can we decompose using the Project Operator?
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME,
DMGRSSN) EMP_PROJ(SSN, PNUM, HOURS, ENAME,
PNAME, PLOCATION)
2
1
EMP1(SSN, ENAME, BDATE, ADDRESS, DNUM) X(DNUM,
DNAME, DMGRSSN)
3
4
5
6
EMP2(SSN, ENAME) X(PNUM, PNAME,
PLOCATION) Y(SSN, PNUM, HOURS)
8
9
10
7
52
Correct Decompositions
  • How do we know if a decomposition is correct?
  • That we havent lost anything?
  • We have three goals
  • lossless-join decomposition
  • (dont throw any information away)
  • (be able to reconstruct the original relation)
  • dependency preservation
  • all of the FDs end up in just one relation
  • (not split across two or more relations)
  • Boyce-Codd Normal Form (BCNF) - no redundancy

53
Lossless Decompositions
  • What is a lossless decomposition?
  • What is a lossy decomposition?
  • When R is decomposed into R1 and R2
  • Check to see if (R1 R2) R
  • if it is a lossy decomposition, then R1 R2
  • gives you TOO MANY tuples.
  • Note we are doing a natural join

54
Example a lossy decomposition
  • Now join them using natural join we get extra
    tuples!!!
  • 1 smith p2 billing

55
Testing for a Lossless Decomposition
  • For decomposition into two relations
  • Let R1 and R2 form a decomposition of R.
  • R1 and R2 are both sets of attributes from R.
  • For the decomposition to be lossless ...
  • The attributes in common must be a key for 1 of
    the relations!
  • Note You are joining a key and a foreign key

56
Example test for a lossless decomposition
  • Employee(SSN, name, project, p-title)
  • decomposition Employee (SSN, name) Project
    (project, p-title, name)
  • Which attribute is in common?
  • Employee.Name and Project.Name
  • Is name a key for either of these two tables?
  • NO! We have a problem.

57
Example test for a lossless decomposition
  • Employee(SSN, name, project, p-title)
  • decomposition Employee (SSN, name) Project
    (project, p-title)
  • Which attribute is in common?
  • None
  • Is this decomposition lossless?
  • NO! We have a problem.

58
Example test for a lossless decomposition
  • Employee(SSN, name, project, p-title)
  • decomposition Employee (SSN, name, project)
    Project (project, p-title)
  • Which attribute is in common?
  • Employee.project and Project.project
  • Is project a key for either of these two tables?
  • YES! We have a lossless decomposition.

59
Some Preliminary Normal Form Definitions
  • Prime Attribute - an attribute A is prime if it
    is a member of a key, otherwise it is nonprime
  • Partial Dependency - given an FD X?Y, Y is
    partially dependent on X if there is a proper
    subset X? of X such that X??Y.
  • Transitive Dependency - A is transitively
    dependent upon X if X?Y, Y-/?X, and Y?A and A?XY.

60
What are the FDs?
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME,
DMGRSSN) EMP_PROJ(SSN, PNUM, HOURS,
ENAME, PNAME, PLOCATION)
5
2
1
3
6
4
9
10
7
8
61
Normal Forms Based on FDs
  • 1NF - all attribute values (domain values) are
    atomic(part of the definition of the relational
    model)
  • 2NF - 1NF no key partial dependencies (every
    nonprime attribute fully depends on every key of
    R)
  • R(A B C D) B ? C (not allowed)
  • 3NF - 2NF no nonprime attribute is transitively
    dependent upon any key
  • R(A B C D) AB ?C, C ? D (not allowed)
  • BCNF - 3NF no attribute is transitively
    dependent upon any key (all FDs determined by a
    key)
  • R(A B C D) AB ?C, C ? B (not allowed)

62
Whats the Goal?
  • BCNF, Lossless, and Dependency-Preserving
  • (first choice)
  • 3NF, Lossless and Dependency-Preserving
  • (second choice)
  • because sometimes you cant preserve all
    dependencies

63
Finding all of the FDs
  • Armstrongs Axioms
  • Reflexivity If X ? Y, then X ? Y
  • Trivially, all attributes A ? A,
  • e.g., name ? name and gender ? gender
  • Augmentation If X ? Y, then XZ ? YZ for any Z
  • Transitivity If X ? Y and Y ? Z, then X ? Z
  • X, Y, and Z are sets of attributes in R
  • Armstrongs Axioms are a sound complete set of
    inference rules
  • Union If X ? Y and X ? Z, then X ? YZ
  • Decomposition If X ? YZ, then X ? Y and X ? Z

64
Finding all FDs the closure of a set of FDs
  • The closure of a set of FDs
  • Let F be a set of FDs and F the closure of F.
  • F is the set of all FDs implied (or derivable)
    from F using Armstrongs Axioms
  • F is computed by applying the inference rules
    until no new FDs can be found

65
What is Dependency Preserving
  • Suppose F is the original set of FDs and G is the
    set of FDs after decomposition
  • If we compute F and G, and F G then the
    decomposition is dependency preserving

66
Other Results (textbook)
  • Algorithms To
  • Compute F
  • Compute the Minimal Cover for F
  • Find a dependency-preserving decomposition into
    3NF
  • Find a lossless-join decomposition into BCNF
  • Find a lossless-join dependency preserving
    decomposition into 3NF

67
Example Not dependency preserving
  • addr(number, street, city, state, zip)
  • number, street, city, state ? zip
  • zip ? state
  • If we decompose, this FD wont occur within one
    relation. Thus, since the DBMS only enforces
    keys (and not FDs directly), this cant be
    enforced.(If we leave it alone, it is in 3NF)

68
Lossless join decomposition algorithm
  • Set D R (the current set of relations)
  • While there is a relation in D thats not in BCNF
  • Choose a relation scheme Q that is not in BCNF
  • Find a FD X ? Y in Q that violates BCNF
  • Replace Q in D by (Q Y) and (X?? Y)
  • End While
  • Identify dependences that are not preserved (X??
    A).
  • Add XA as a table to the set D

69
Tuning the Conceptual Schema
  • The choice of conceptual schema should be guided
    by the workload, in addition to redundancy
    issues
  • We may settle for a 3NF schema rather than BCNF.
  • E.g., the workload may influence our choice to
    decompose a relation into 3NF or BCNF.
  • We may further decompose a BCNF schema!
  • We might denormalize (i.e., undo a decomposition
    step), or add fields to a relation. We must take
    care to avoid the problems caused by the
    redundancy!

70
Decomposition of one table into several
  • A relation is replaced by a collection of
    relations that are projections. Most important
    case. (Vertical partitioning)
  • Sometimes, we might want to replace a relation by
    a collection of relations that are selections.
    (Horizontal partitioning)
  • Each new relation has the same schema as the
    original, but a subset of the rows.
  • Collectively, new relations contain all rows of
    the original. Typically, the new relations are
    disjoint.
  • Why might we do this?

71
Tuning the Conceptual Schema
  • Vertical decomposition using the project operator

Course c cname instructor room days
Course1 c room days
Course3 c instructor
Course2 c cname
72
Tuning the Conceptual Schema
  • Horizontal partition/decomposition using the
    select operator

Course c cname instructor room days
Undergraduate-Course c cname instructor room day
s
Graduate-Course c cname instructor room days
73
Tuning the Conceptual Schema
  • Denormalizingalways introduces redundancy!

Course-Offering (offering, quarter, c,
instructor, time, days)
Instructor-Room (instructor, room)
Course (c, cname)
Course-Offering (offering, quarter, c, cname,
instructor, room, time, days)
This introduces redundancy - which must be
managed.
74
Denormalization
  • Denormalization interferes with dependency
    preservation
  • Course-Offering
  • (offering, quarter, c, cname, instructor, room,
    time, days)
  • Note that having the FD instructor ? room (and
    nothing else) in a single table allows the FD to
    be enforced (by enforcing the candidate key of
    instructor).
  • Instructor-Room (instructor, room)
Write a Comment
User Comments (0)
About PowerShow.com