Database Structures - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Database Structures

Description:

Extract columns and place the result in a new relation. JOIN ... Store the relation as an indexed file or utilize hashing techniques to provide ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 67
Provided by: shyhka
Category:

less

Transcript and Presenter's Notes

Title: Database Structures


1
Database Structures
  • Shyh-Kang Jeng
  • Department of Electrical Engineering/
  • Graduate Institute of Communication Engineering
  • National Taiwan University

2
Database and Flat File
  • Database
  • Multidimensional storage system
  • Contains internal links between its entries
  • Information can be accessible from a variety of
    perspectives
  • Flat file
  • One-dimensional storage system
  • Presents information from a single point of view

3
Example
  • Flat file can provide
  • only a list of compositions arranged by composers
  • Database can present
  • all works by a single composer
  • all composers who wrote a particular type of
    music
  • all the composers who wrote variations of another
    composers work

4
File-Oriented Information System
5
Database-Oriented Information System
6
Schema and Subschema
  • Schema
  • Description of the entire database structure for
    maintenance
  • Subschema
  • Description of portion of the database pertinent
    to a users needs
  • Example University Database
  • Schema Student data and faculty data linked
    together
  • Subschema faculty records not including
    employment history
  • Subschema faculty records not including the
    linkage between students and advisors

7
Concept Layers
8
Database Implementation
  • Application layer
  • Handles the communication with the user
  • Determines the overall systems external
    characteristics
  • Database management system (DBMS)
  • Accomplished the actual manipulation of the
    database

9
Separation of Application Software and DBMS
  • Simplifies the design process
  • Isolate maintenance details in DBMS
  • Flexible for distributed database
  • Provides a means for controlling access
  • Achieves data independence
  • If the organization of the database changes, the
    application software based on unaltered subschema
    does not need to be modified

10
Database Models
  • Concept view of the database
  • The requests are translated into the actions of
    the actual data storage system by DBMS routines
  • The application software can be written as though
    the information in a database were stored
    according to the conceptual database model rather
    than the actual storage system
  • Relational database model and object-oriented
    database model are most common

11
Host Language
  • General-purpose programming languages used to
    write application software
  • Often extended by routines in DBMS
  • Within this programming environment the database
    is manipulated as though it were organized
    according to the conceptual database model

12
Relational Database Model
  • Most popular today
  • Simple structure
  • Stores data in rectangular tables, called
    relations
  • Tuples
  • Rows in a relation
  • Attributes
  • Columns in a relation

13
A Relation Containing Employee Information
14
A Relation Containing Redundancy
15
Problems with Redundancy
  • Lack of efficiency
  • e.g., multi-assignment of jobs to an employee
  • e.g., job shared by more than one employee
  • Data integrity
  • e.g., deletes a tuple with the only description
    of a job
  • Combines more than one concept into a single
    relation

16
EMPLOYEE Relation
17
JOB Relation
18
ASSIGNMENT Relation
JobId
TermDate
EmplId
StartDate
19
Finding the Departments in Which Employee 23Y34
Has Worked
20
Mini Review
  • Based on the EMPLOYEE, JOB, and ASSIGNMENT
    relations, answer the following questions
  • Who is the secretary in the accounting department
    with experience in the personnel department?
  • Who is the floor manager in the sales department?
  • What job does G. Jerry Smith currently hold?

21
Answers
  • G. Jerry Smith
  • Cheryl H. Clark
  • S26Z

22
A Decomposition with Information Loss
23
Nonloss Decomposition
  • A relation decomposed into smaller relations
    without losing information

24
Relational Operations
  • SELECT
  • Select certain tuples possessing certain
    characteristics and to place these selected
    tuples in a new relation
  • PROJECT
  • Extract columns and place the result in a new
    relation
  • JOIN
  • Combine different relations into one

25
SELECT Example
26
PROJECT Example
27
JOIN Example
28
JOIN Example
29
JOIN Application
30
Obtaining List of Employee ID and Working
Department
  • NEW1 ? JOIN ASSIGNMENT and JOB where
    Assignment.JobId JOB.JobId
  • NEW2 ? SELECT from NEW1 where ASSIGNMENT.TermDate
  • LIST ? PROJECT ASSIGNMENT.EmplId, JOB.Dept from
    NEW2

31
Mini Review
  • Based on the EMPLOYEE, JOB, and ASSIGNMENT
    relations, write a sequence of relational
    operations to obtain a list of all job titles
    within the personnel department

32
Answer
  • TEMP ? SELECT from JOB
  • where Dept PERSONNEL
  • LIST ? PROJECT JobTitle from TEMP

33
Mini Review
  • Based on the EMPLOYEE, JOB, and ASSIGNMENT
    relations, write a sequence of relational
    operations to obtain a list of employee names
    along with the employees department

34
Answer
  • TEMP1 ? JOIN JOB and ASSIGNMENT
  • where JOB.JobId ASSIGNMENT.JobId
  • TEMP2 ? SELECT from TEMP1
  • where TermDate
  • TEMP3 ? JOIN EMPLOYEE and TEMP2
  • where EMPLOYEE.EmplId TEMP2.EmplId
  • RESULT ? PROJECT Name, Dept from TEMP3

35
Relational DBMS
  • Include routines to perform SELECT, PROJECT, and
    JOIN operations
  • These routines can be called from the application
    software
  • The application software can be written as though
    the data were actually stored in the simple
    tabular form of the relational model
  • Store the relation as an indexed file or utilize
    hashing techniques to provide rapid access to
    entries

36
SQL
  • Structured Query Language
  • Developed by IBM, Standardized by ANSI
  • Example
  • select EmplId, Dept
  • from ASSIGNMENT, JOB
  • where ASSIGNMENT.JobId JOB.JobId
  • and ASSIGNMENT.TermDate

37
SQL Examples
  • Example
  • select Name, Address
  • from EMPLOYEE
  • Example
  • select EmplId, Name, Address, SSNum
  • from EMPLOYEE
  • where Name Cheryl H. Clark

38
SQL Examples
  • Example
  • select EMPLOYEE.Name, ASSIGNMENT.StartDate
  • from EMPLOYEE, ASSIGNMENT
  • where EMPLOYEE.EmplId ASSIGNMENT.EmplId

39
SQL Examples
  • Example
  • insert into EMPLOYEE
  • values(42Z12, Sue A Bert, 33 Fair St.,
    444661111)
  • Example
  • delete from EMPLOYEE
  • where Name G. Jerry Smith
  • Example
  • update EMPLOYEE
  • set Address 1812 Napoleon Ave.
  • where Name Joe E. Baker

40
Mini Review
  • Based on the EMPLOYEE, JOB, and ASSIGNMENT
    relations, write a sequence of SQL statements to
    obtain a list of employee names along with the
    employees department

41
Answer
  • select EMPLOYEE.Name, JOB.Dept
  • from JOB, ASSIGNMENT, and EMPLOYEE
  • where (JOB.JobId ASSIGNMENT.JobId )
  • and (ASSIGNMENT.EmplId
  • EMPLOYEE.EmplId)
  • and (ASSIGNMENT.TermDate )

42
Object-Oriented Databases
  • Consisting of objects that are linked to each
    other to reflect their relationships
  • Example
  • Classes EMPLOYEE, JOB, ASSIGNMENT
  • Each of these objects contains methods describing
    how the object should respond to messages
    regarding its contents and relationships.
  • Not need to write an exterior procedure

43
Object-Oriented Database
44
Object-Oriented DBMS
  • Maintains links between objects
  • Creates pointer system that may be required to
    record these associations
  • Provides permanent storage for the objects
    entrusted to it (persistency)
  • Allows the entire software system to be designed
    in the same paradigm
  • Encapsulates the technicalities of different data
    formats
  • Example intelligent objects, multimedia databases

45
Database Integrity
  • DBMS for personal use are relatively simple
  • The errors can be easily corrected
  • Financial loss is limited
  • Large, multiuser, commercial databases are with
    higher stakes
  • Cost of incorrect or lost data can be enormous
  • DBMS has to maintain the databases integrity

46
Commit/Rollback Protocol
  • Commit Point
  • The point at which all steps in a transaction
    have been recorded in the log
  • Has the information to reconstruct the
    transaction since the last backup
  • Rollback
  • Undo an incomplete transaction
  • Done when equipment malfunctions or illegal
    access or deadlock and so on occur
  • Cascading rollback problem

47
Incorrect Summary Problem
  • One transaction is in the middle of transferring
    funds from one account to another
  • Another transaction tries to compute the total
    deposits in the bank
  • Could result in a total that is either too large
    or too small depending on the order of
    transactions

48
Incorrect Summary Problem
Start transferring
Calculating the balance
Finish transferring
T2
T3
T1
49
Lost Update Problem
  • Exemplified by two transactions, each of which
    makes a deduction from the same account
  • One transaction reads the accounts current
    balance at the point when the other has just read
    the balance but has not yet calculated the new
    balance
  • Both transactions will base their deductions on
    the same initial balance
  • The effect of one of the deductions will not be
    reflected in the database

50
Lost Update Problem
Read balance
Read balance
Put back new balance
Put back new balance
T1
T2
51
Locking Protocol
  • Items within a database that are currently being
    used are marked with locks
  • Shared locks
  • Corresponding to transactions not going to alter
    the data item
  • Other transactions are also allowed to view the
    data item
  • Exclusive locks
  • Corresponding to transactions going to alter the
    data item
  • Other transactions are not allowed to access the
    data item

52
Deadlocks and Solution
  • Deadlocks occur when two transactions
  • Require exclusive access to the same two data
    items
  • Each obtains exclusive access to one of the items
  • Each insists on waiting for the other
  • Wound-wait protocol
  • Younger transaction is forced to release all its
    data items and roll back its activities

53
Sequential files
  • Sequential file file whose contents can only be
    read in order
  • Reader must be able to detect end-of-file (EOF)
  • Data can be stored in logical records, sorted by
    a key field
  • Greatly increases the speed of batch updates

54
A procedure for merging two sequential files
55
Applying the merge algorithm
56
The structure of a simple employee file
57
Indexed files
  • Index list of (key, location) pairs
  • Sorted by key values
  • location where the record is stored

58
Opening an indexed file
59
Hashing
  • Each record has a key
  • The master file is divided into buckets
  • A hash function computes a bucket number for each
    key value
  • Each record is stored in the bucket corresponding
    to the hash of its key

60
Hashing the key field
61
The rudiments of a hashing system
62
Collisions in Hashing
  • Collision when two keys hash to the same bucket
  • Major problem when table is over 75 full
  • Solution increase number of buckets and rehash
    all data

63
Data mining
  • Data mining a set of techniques for discovering
    patterns in collections of data
  • Relies heavily on statistical analyses
  • Data warehouse static data collection to be
    mined
  • Data cube data presented from many perspectives
    to enable mining
  • Raises significant ethical issues when it
    involves personal information

64
Data mining strategies
  • Class description
  • Class discrimination
  • Cluster analysis
  • Association analysis
  • Outlier analysis
  • Sequential pattern analysis

65
Social impact of database technology
  • Problems
  • Massive amounts of personal data are being
    collected
  • Often without knowledge or meaningful consent of
    affected people
  • Data merging produces new, more invasive
    information
  • Errors are widely disseminated and hard to
    correct
  • Remedies
  • Existing legal remedies largely ineffective
  • Negative publicity may be more effective

66
Exercise
  • Review Problems
  • 7, 8, 9, 10, 11, 20, 37, 40, 47, 54
Write a Comment
User Comments (0)
About PowerShow.com