Title: Lec. 1: INTRODUCTION TO DATABASES Reference: Connolly et al Ch 1
1Lec. 1 INTRODUCTION TO DATABASESReference
Connolly et al Ch 1
- INTRODUCTION
- SOME (VERY) BASIC FILE PROCESSING CONCEPTS
- PROBLEMS ENCOUNTERED WITH TRADITIONAL FILE
PROCESSING - THE DATABASE APPROACH
- What is a Database?, What does it contain? , What
is a DBMS?, What is SQL? - USER ROLES IN THE DATABASE ENVIRONMENT
- DAs, DBAs, Database Designers, Application
Programmers, End-users - INTRO TO DATABASE DESIGN
- What needs to be designed?, What are ERDs and
Schema Descriptions, Physical Database Design and
the Data Dictionary - DATA FIRST PROGRAMS SECOND
- Normalisation, Bottom-up Database Design (
your early lab work) - ADVANTAGES DISADVANTAGES OF DATABASE SYSTEMS
- THE HISTORY OF DATABASE SYSTEMS
-
- All of these topics will be examined in much
more detail in later lectures
2INTRODUCTION
- We have learnt that from the days of file-based
systems (1960s 70s) there are better ways to
handle enterprise data. - All of todays high-volume on-line transaction
processing (OLTP) systems are completely reliant
upon databases and DBMS e.g. - ATM networks
- Utility billing systems
- Airline and travel systems
- The history of database system research is one of
exceptional productivity and startling economic
impact. Major developments are still taking
place. E.g. - Data warehousing OLAP, massively parallel video
servers, etc., etc., etc.
3SOME (VERY) BASIC FILE PROCESSING CONCEPTSManual
File Processing
What do we call a collection of related
records? What does a manual record usually
contain? (See next slide)
4SOME (VERY) BASIC FILE PROCESSING CONCEPTS Data
and Meta Data
How does the Payroll Clerk know what this group
of data characters is supposed to
represent? What is meta data?
5SOME (VERY) BASIC FILE PROCESSING CONCEPTSWhere
is the Meta Data Held in a Traditional Computer
File Processing System?
When the program reads a record from the Payroll
file what data and meta data does it get? DATA
123456FALL, Steve 100002100010000SMITH, J
09100010700SMITH, J 16100009700SMITH, J
23100011000BROWN, B 30100010650SMITH, J META
DATA? Where is the meta data stored?
6SOME (VERY) BASIC FILE PROCESSING CONCEPTS The
program-data dependence in traditional file
processing
This is a traditional COBOL-like program. The
Data Division part of the program contains meta
data that describes the format of every character
the program will read/write AND gives it a name
by which it will be referred to in this
program The Procedure Division part of the
program contains statements that manipulate
(read/write and change) the data. If the
structure of the data on the payroll file was
changed then every program that used that file
would need to be changed.
7OTHER PROBLEMS ENCOUNTERED WITH TRADITIONAL FILE
PROCESSINGThe Old Approach
A separate master file is created for each
application. Each application is responsible for
updating its own master file. What happens when
the auditors check, on Wednesday, how many
employees we have on the Personnel file and how
many we are paying on the Payroll
file? Time-based data inconsistency problem
8PROBLEMS ENCOUNTERED WITH TRADITIONAL FILE
PROCESSINGMore Problems
Data redundancy same (similar) data is stored on
several different files Lack of data integrity
different copies of the same data have different
values because different programs use different
validation rules. Data inconsistency lack of
data integrity leads to content
inconsistency. Isolation of data data created
by one application is often not available to
another application that requires it. (e.g.
Personnel system may set a suspend without pay
flag on the Personnel master but it is the
Payroll system that actually makes
payments!). Program-data dependence What does
this mean???
9THE DATABASE APPROACHWhat is a Database?
- A Database is
- a shared collection of logically related data
AND a description of this data, - designed to satisfy the information requirements
of an organisation. - OR
- a self-describing collection of integrated
records to meet an organisations information
needs.
10THE DATABASE APPROACHWhat does a Database
contain?
- So a database contains
- the organisations operational data,
- the relationships between the data collections,
- the description of this data - the data
dictionary (catalogue) or meta data. - The data dictionary is said to provide
program-data independence.
11THE DATABASE APPROACHWhat is a Database
Management System (DBMS) and What is SQL?
- A DBMS is a software package that enables users
to - define and create data definitions (using Data
Definition Language -DDL), - manipulate and update data (using Data
Manipulation - DML), - control access to their database. (using Data
Control Language - DCL). - The DBMS also acts as an interface between the
users application programs and the shared
database. - The most widely used database language is a
non-procedural language called SQL (Structured
Query Language) which contains all three of the
above types of language statements (DDL, DML,
DCL)
12THE DATABASE APPROACH Database, DBMS and Classes
of Users
13USER ROLES IN THE DATABASE ENVIRONMENT
- Data Administrators
- Database Administrators
- Database Designers
- Application Programmers
- End-users
14USER ROLES IN THE DATABASE ENVIRONMENT Data
Administrator (DA) and Database Administrator
(DBA)
- The DA operates at the management level of the
organisation and is responsible for - the data resource and database planning
- development and maintenance of policies and
procedures logical database design. - The DBA operates at the operational level of the
organisation and is responsible for - the daily maintenance and performance of the
whole system - physical database design and implementation.
- (In smaller organisations these two roles are
usually combined)
15USER ROLES IN THE DATABASE ENVIRONMENT
Database Designers
- In large systems the Designers have two areas of
design - Logical database design data modelling
(entities, attributes, relationships),
constraints including business rules for the
organisation - Physical database design takes the logical data
model and decides how it is to be physically
realised into actual tables and integrity
constraints, access method, and security methods.
16USER ROLES IN THE DATABASE ENVIRONMENT
Application Programmers and End-users
- Application programmers
- write end-user programs as specified by the
systems analyst - they need to embed SQL (DML) statements in the
programs they write in order to access and update
the database. - There are two basic types of end-user
- Naïve users lack basic database knowledge and
work only with application programs eg for data
entry and reporting (or you when you use an ATM
machine!) - Sophisticated users have some knowledge of SQL
and can work independently of programmers to
produce their own reports. (this will be you when
you have completed this unit!)
17INTRO TO DATABASE DESIGNWhat needs to be
designed?
- Logical database design is concerned with
answering the following questions - What things (entity types) do we want to store
data about on the database? - What attributes do we need to store for each
entity type? - What are relationships (that we need to store)
between the entity types? - (Try answering these questions for a student
grade recording and reporting database) - The designer uses the answers to these questions
to create a description of the database structure
called a schema description.
18INTRO TO DATABASE DESIGNWhat does a Schema
description consist of?
- The schema description is basically a description
of the tables that will need to be stored on the
database. -
- There will need to be a separate table for each
entity type Tables are two dimensional consisting
of - Columns. Each column is used to store the various
attribute values of each entity - Rows. Each row is used to store all the different
attribute values of a particular entity. - There will also need to be a method of describing
the relationships that exist between the
different entity types. For this the designer
will create an Entity-Relationship Diagram (ERD)
19INTRO TO DATABASE DESIGN ERD and Schema
Descriptions
Schema Description for Student Grade database
Entity-Relationship Diagram (ERD) for Student
Grade database
Student (StIDNo, STName) Unit (UnCode,
UnName) Grade (StIDNo, UnCode, Grade)
Possible contents of Grade Table
Is there anything else we might need to include
in our logical design?
20INTRO TO DATABASE DESIGN Physical Database
Design and the Data Dictionary
- Physical database design involves
- Converting the logical schema into DDL statements
that the DBMS can understand. - Deciding upon which users will have access to
which tables and specifying it in DCL statements
that the DBMS can understand - All the physical database designs will
automatically be stored by the DBMS in the Data
Dictionary (catalogue). ALL the logical database
designs should also be stored in the data
dictionary.
21DATA FIRST PROGRAMS SECOND
- The old approach
- a separate master file for each application
- The database approach
- one shared database for all applications
-
- For the database to succeed the organisation
has to think data first, program second. This
paradigm shift of thinking is necessary if the
system is to be acceptable to end-users. - Connolly et al
22DATA FIRST PROGRAMS SECONDBottom-up Database
Design ( your early lab work)
- How could you go about designing a new database
to replace the old Payroll, Personnel, and Super
applications master files? - Collect samples of all the output screens and
reports produced by the three applications - Normalise the data contained on each output (as
you are doing in the labs) to create simple
tabular data structures - Merge the any of the tables that contain columns
that describe the same type of entity (i.e. that
have the same primary key). - Work out the relationships between your
normalised tables - Create an ERD and Schema and create your new
relational database. - (This is called bottom-up database design)
- How do we design a database if we dont know
exactly what outputs are to be produced? (covered
later!)
23THE HISTORY OF DATABASE SYSTEMS
- 1ST Generation Hierarchical DBMS (1960s)
- IBMs (BOM BOMP) and IMS
- 2nd Generation Network and CODASYL (1967 -DBTG)
- IDMS. IDSII, etc.
- 3rd Generation Relational early RDDBMS
(mid-1970s) - IBMs DB2 and ORACLE SQL/DS 1980s
- INGRES, Paradox and dBaseIV from Borland, ACCESS
from Microsoft - Client-Server RDBMS(mid-1980s) Oracle. Sybase,
Informix, SQL Server - 4th Generation DBMSs Post-Relational (1990s)
- Object-Oriented DBMS (e.g. Gemstone)
- Open DBMS standards (OMG and CORBA) and the WWW
- Heterogenous DBMSs , Data Warehousing, and OLAP
24ADVANTAGES DISADVANTAGES OF DATABASE SYSTEMS
- Advantages
- Control of data redundancy
- Data consistency
- Data sharing (between people AND applications -
this allows the proper business integration of
the applications) - Improved data of data integrity, backup and
recovery, and access security (necessary because
all the data is in one basket) - Improved developer productivity
- Improved maintenance productivity
- Flexible ad hoc user inquiries
25ADVANTAGES DISADVANTAGES OF DATABASE SYSTEMS
- Disadvantages (textbook)
- Complexity (therefore different skills, training,
etc. required) - Cost of conversion
- Higher impact of failure
- Size of DBMS (requires extra memory, processing
power, disk space) - Performance (a general purpose DBMS may be less
efficient than a specialised file handler) - (The last two of these are becoming increasingly
insignificant as hardware costs continue to
decline)