Theory of Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Theory of Databases

Description:

1970s style applications. Business context. simple business model, limited automation, access etc ... Classical DBs support data abstraction at many levels ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 47
Provided by: meurig
Category:
Tags: databases | theory

less

Transcript and Presenter's Notes

Title: Theory of Databases


1
Theory of Databases
  • Module lecturer Dr Meurig Beynon, Room 3.15
  • Module Content
  • Generalities on databases
  • Database models Theory and Practice
  • Entity-relationship models
  • Relational models algebraic and logical
  • Commercial relational languages

2
CS319 Theory of Databases
  • Module Content (cont.)
  • Technical Issues and Algorithms
  • Optimisation
  • Integrity constraints
  • Database design
  • Future Trends Theory and Practice
  • Object-oriented models
  • Extended relational systems
  • The Relational Model and SQL

3
Books on Classical Relational Databases
  • Date, C.J. A Database Primer
  • Date, C.J. An Introduction to Database Systems
    (Vol. 1) Addison-Wesley 6th edition, 1995
  • Korth, H.F. and Silberschatz Database System
    Concepts McGraw-Hill 2nd edition 1991
  • Ullman, J.D. Principles of Database Systems 2nd
    edition Computer Science Press 1982

4
Books on more general database issues
  • Ullman, J.D. Principles of Database
    Knowledge-base Systems Vols 1 2 Computer
    Science Press 1988
  • Brown, A.W. Object-oriented Databases
    Applications in Software Engineering McGraw-Hill
    International Series in Engineering, 1991
  • Gray, P.D. Logic, Algebra Databases,
    Ellis-Horwood Series, Computers and their
    Applications 29
  • Kent, William Data and Reality basic assumptions
    in data processing reconsidered, North-Holland
    1987

5
Books on Relational Theory in its application
  • Date, C.J. Relational Database Writings 1985-1989
    Addison-Wesley and other collections in this
    series
  • Hugh Darwen and C.J.Date Foundations for
    Object/Relational Databases The Third Manifesto
    1998 Harlow, Addison-Wesley Longman
  • Dr Hugh Darwen was an IBM consultant on
    databases
  • he is our guest lecturer in weeks 18 and 19
  • NOTE he will give 4 lectures in week 18 thus
  • Monday February 21th at 2pm and 5pm in CS1.04
  • Thursday February 24th at 2pm to 5pm in CS1.04
  • He will also give 4 lectures in week 19 at
    similar times

6
CS319 Schedule for 2005
  • Dr Hugh Darwen was an IBM consultant on
    databases
  • he is our guest lecturer in weeks 18 and 19
  • NOTE the lectures in week 18 will be on The
    Relational Model and SQL thus
  • Monday February 21st at 2pm and 5pm in CS1.04
  • Thursday February 24th at 2pm and 5pm in CS1.04
  • the lectures in week 19 will be on
  • Temporal Data and the Relational Model thus
  • Monday February 28th at 2pm and 5pm in CS1.04
  • Thursday March 3rd at 2pm and 5pm in CS1.04

7
DB generalities What is a database?
  • Chris Date
  • Database computer-based record keeping system
  • R.W. Engles "A Tutorial on DB Organisation"
    (1974)
  • collection of stored operational data used by
    the applications system of a particular
    enterprise
  • enterprise hospital, university, bank, company
    etc
  • operational data
  • data on products, accounts, patients etc
  • typically persistent cf conventional program IO
    data

8
DB generalities Why use a database?
  • Case-study Banking (after Korth Silberschatz
    Chap. 1
  • ? How to meet needs using a traditional
    file-processing system supported by a
    conventional OS
  • Files permanent records of customers, accounts
  • Applications programs (APs) enable user to
    modify files
  • to credit or debit an account
  • to add a new account
  • to find the balance in an account
  • to generate monthly statements
  • APs written by systems programmers as required
  • new requirements ? new files new programs

9
Original context for data modelling 1
  • 1970s style applications
  • unsophisticated computer users
  • batch mode interaction
  • modest response times
  • no visualisation or GUI
  • modest expectations for
    ease-of-use
  • programming perceived as technical
  • simple computing infrastructure and
    environment
  • no PC, web etc
  • no live feeds of data
  • textual interaction the norm

10
Original context for data modelling 2
  • 1970s style applications
  • Business context
  • simple business model, limited automation,
    access etc
  • low volume of data
  • not initially distributed
  • Computing context
  • - existing/emerging DB proposals unconvincing
  • - computers not very powerful
  • - human and computing resources very expensive

11
Summary of issues for data management
  • Problems of data management for file systems
  • that DBs were originally intended to address
  • Data redundancy and inconsistency
  • Difficulty in accessing data
  • Data isolation
  • Concurrent access anomalies
  • Security problems
  • Integrity problems

12
DB generalities s and s of DB use
  • Conventional file systems have certain
    characteristics
  • will review the key issues for data management
  • indicates a positive impact of using a database
  • indicates a potentially negative impact of
    using a DB

13
DB generalities Issues for data management
  • Problems that DBs were originally intended to
    address
  • Data redundancy and inconsistency
  • Difficulty in accessing data
  • Data isolation
  • Concurrent access anomalies
  • Security problems
  • Integrity problems

14
DB generalities Issues for data management
  • Data redundancy and inconsistency
  • each programmer potentially uses different format
    file, develops at different stage in history of
    enterprise
  • data duplicated
  • in a DB rationalise and standardise data
  • rationalise conceptually create an authentic
    shared source for data
  • rationalise doesn't necessarily mean
    centralise

15
DB generalities Issues for data management
  • Data redundancy and inconsistency
  • each programmer potentially uses different format
    file, develops at different stage in history of
    enterprise
  • data duplicated
  • compromises are needed,
  • where users suit themselves, can get efficient
    results
  • vs no perfect data organisation to suit all
    users
  • duplication can give insurance against info loss

16
DB generalities Issues for data management
  • Difficulty in accessing data
  • have to respond to unforeseen requests, hence new
    functionality
  • in file-processing environment, have to write new
    programs, and possibly devise new data structures
  • in a DB, simplify access and manipulation by
    intelligent organisation of data cf. modelling
    approach to requirements, as e.g. in use of UML
    in OOSE

17
DB generalities Issues for data management
  • Data isolation
  • data has to be retrieved from many sources when
    APs written
  • in DB, aim to hide the source and form of
    physical data by viewing the data at a higher
    level of abstraction
  • automation decreases the amount of human
    interaction with data
  • risk of corrupted data passing between
    integrated files is greater

18
DB generalities Issues for data management
  • Concurrent access anomalies
  • would like multiple access for efficiency and
    faster response time e.g. simultaneous withdrawal
  • concurrency can't be managed without a form of
    overall control

19
DB generalities Issues for data management
  • Security problems
  • would like to restrict access to authorised users
    for confidential info
  • security can't be managed without a form of
    overall control
  • issue as to whether this control is most easily
    exercised inside or outside computer system e.g.
    non-trivial problem to determine what can be
    inferred from responses to queries that aren't
    explicit

20
DB generalities Issues for data management
  • Integrity problems
  • data in file system must satisfy integrity
    constraints constraints may arise dynamically
    difficult to modify programs to cope with this
    also hard to guarantee integrity if data is
    stored in different files
  • automated management demands some form of
    overall control
  • automation reduces scope for human intervention
    / interpretation

21
DB generalities Issues for data management
  • Conclusion from above discussion ...
  • For many commercial applications (as in
    enterprises above) good solution is offered by a
    database management system (DBMS). A DBMS is an
    unconventional OS operating over a structured
    file system. The s above indicate some of the
    positive benefits of the use of a DBMS.

22
DB generalities Issues for data management
  • For many commercial applications (as in
    1970s-style enterprises above) good solution is
    offered by a database management system (DBMS).
  • A DBMS is an unconventional OS operating over a
    structured file system.

23
Generalities of DBs the DBMS concept
  • Motivating idea
  • devise an abstract model of the entire corpus of
    operational data that simplifies the data
    processing activity, so that
  • simple queries can be handled without writing new
    application programs
  • where applications programs must be written, the
    task of accessing and manipulating operational
    data consistently and efficiently is greatly
    simplified

24
Generalities of DBs themes of the module
  • In this module, primarily concerned with a
    particular data model ("the relational database")
    that has proved very successful in meeting the
    needs of many commercial enterprises involved in
    data processing.
  • For other applications, such as interactive
    design, different principles seem to be required.
    For instance, human interaction and intervention
    in data processing for design is of the essence.
    In the latter part of the course, will consider
    alternative approaches (e.g. object-oriented
    databases, deductive databases).

25
Generalities of DBs themes of the module
  • There is a management aspect to the use of DBs
    the issues above can to some extent be addressed
    either by manual processes or through automation.
    In some cases (e.g. security), it isn't clear
    which is most effective. Hardest to automate in
    areas where essentially new data and new modes of
    organising data are being generated dynamically.
  • Relational databases are arguably associated with
    applications of a particular kind, where the
    business processes are especially stable and
    ritualised.

26
Generalities of DBs themes of the module
  • Two views of impact of databases
  • can view the DBMS
  • as a program generator for the end-user
  • cf. current research on end-user programming
  • as a means to record persistent real-world state
  • cf. current research on virtual reality
  • Key issue Is it possible to align paradigms for
    programming and general-purpose data modelling?

27
DB generalities the ingredients of a database
  • Data
  • integrated
  • shared
  • possibly distributed
  • Hardware
  • primary storage secondary storage
  • Software
  • database management system DBMS
  • protects users from hardware level detail
  • serves the needs of all users

28
DB generalities the ingredients of a database
  • Users
  • end-user
  • non-specialist accessing data via a query
    language
  • naïve user accessing data via a special-purpose
    interface
  • performs data retrieval and update (extend /
    modify)
  • applications programmer
  • writes programs that use the DB by embedding
    queries to the DB in a HLL
  • develops interfaces for the naïve user

29
DB generalities the ingredients of a database
  • Users
  • Database Administrator (DBA)
  • responsible for overall control
  • decides what data is to be stored
  • designs the conceptual scheme
  • used to represent the operational data
  • implements authorisation checks
  • decides strategy for backup and recovery
  • monitors performance
  • oversees modification to suit user requirements

30
DB generalities data abstraction in a database
  • Data abstraction addresses issues of use,
    design, management and implementation in a
    database
  • The data model serves to describe in a formal
    manner the way in which data is viewed at three
    different levels of abstraction
  • physical level
  • conceptual level
  • view level

31
DB generalities data abstraction in a database
  • physical level
  • how is the data actually represented in the
    hardware?
  • bits, bytes
  • conceptual level
  • what meaningful relationships are expressed by
    the physical data?
  • entities, and relationships between entities
  • view level
  • what particular relationships are required by
    users?
  • more abstract because partial typically very
    high-level knowledge constitutes the view

32
DB generalities data abstraction in a database
  • Illustrating data abstraction
  • Data base stores the date of birth of a client
    as a bit string. When we identify the senior
    citizens, we find all clients aged over 65.
  • Representations at different levels of
    abstraction
  • conceptual date of birth of a client
  • physical the bit string that records this
    information
  • view refers to age, which isn't stored in the
    DB.

33
DB generalities data abstraction in a database
DESIGN MANAGEMENT
USE
IMPLEMENTATION
34
DB generalities data abstraction in a database
  • Internal and external translation schemas serve
  • to protect the conceptual model from change when
    physical organisation changes or new views are
    required
  • to protect the user from a need to change views
    and the database programmer from a need to alter
    the physical organisation if and when the
    conceptual model is modified

35
DB generalities data abstraction in a database
  • The DBA conceives the database in terms of the
    conceptual model.
  • Users and application programs access the
    physical data via the conceptual model.
  • physical data independence protecting the
    conceptual model from change when the physical
    organisation changes
  • logical data independence protecting the user
    from the need to change views when the conceptual
    model changes

36
DB generalities data abstraction in a database
  • Analogous to the way in which the programmer is
    protected from low-level issues by using ADTs
  • program ? Abstract Data Type ? ADT implementation
  • physical data independence
  • cf. change in an ADT implementation
  • logical data independence
  • cf. use of old program over enhanced ADT

37
Recall - Generalities of DBs the DBMS concept
  • Motivating idea
  • devise an abstract model of the entire corpus of
    operational data that simplifies the data
    processing activity, so that
  • simple queries can be handled without writing new
    application programs
  • where applications programs must be written, the
    task of accessing and manipulating operational
    data consistently and efficiently is greatly
    simplified

38
Characteristics of electronic data 1970 (1)
  • Abstract model of the entire corpus of
    operational data
  • Separation between persistent transient data
    sharper
  • file vs executing program
  • Isolation of persistent data more complete
  • changes to persistent data initiated by human
    action
  • persistent data accessed through text interfaces
  • Electronic data storage management rare,
    expensive
  • intelligent interpretation of electronic state
    by human
  • no direct connection between environment and data

39
Modern context for general data modelling
Programs
40
Characteristics of electronic data 1970 (2)
  • Abstract model of the entire corpus of
    operational data
  • Demands of the abstract model in 1970 quite low
  • small volumes of data, modest performance
  • limited levels of volatility and automation
    tolerated
  • Today is very different, BUT subject to viewing
    human agency as a metaphor for any agency, the
    key issues to be addressed by a classical
    database are still vital
  • Any DB modelling paradigm must handle 70s problems

41
DB generalities data models for a database
  • Many different paradigms have been proposed for
    developing abstract data models for databases
  • There are two principal kinds of abstract data
    model
  • object-based models
  • record-based models
  • The earliest database systems were record-based
  • - this reflects the file system culture that they
    displaced

42
DB generalities data models for a database
  • Object-based models
  • The main models in this category are
  • entity-relationship models
  • object-oriented data models
  • Others include semantic and functional data
    models.
  • E-R model widely used to model data abstractly
  • OO model gaining acceptance in practice
    effectively represents data operations on data.

43
DB generalities data models for a database
  • Record-based Logical Models
  • Used at the conceptual and view levels. Specify
    both
  • overall logical structure of the database
  • higher-level description of the implementation.
  • Record-based because uses records in fixed-format
    of several types. This simplifies implementation
    cf. trend towards richness and variety in
    structures used to implement OODBs

44
DB generalities data models for a database
  • Varieties of record-based logical model
  • hierarchical model
  • records links organised as a family of trees
  • network model
  • records links organised as a family of graphs
  • relational model
  • uses tables to record relationships between data

45
DB generalities data models for a database
  • Physical Data Models
  • There are also models of data at the lowest level
    of abstraction, concerned with physical
    organisation. These are not our primary concern
    in this module.
  • Relevant issues for relational databases include
  • are data tables stored using hashing?
  • how are data tables indexed?
  • how are entries in data tables encoded and
    ordered?
  • what algorithms are used to retrieve and update?

46
DB generalities classical database features
  • Instances and Schemes
  • State of a DB changes over time distinguish
    structure of DB from current state as defined by
    the data in it.
  • overall design of DB database scheme
  • current content of DB instance of the DB
  • Useful analogy with procedural variables
  • database scheme type definition for variable
  • instance of database value of the variable

47
DB generalities classical database features
  • Classical DBs support data abstraction at many
    levels
  • The three levels of abstraction at which the
    database can be viewed are modelled by
    appropriate schemes.
  • physical scheme at the lowest level
  • conceptual scheme at intermediate level
  • several subschemes (possibly user-defined)
  • at the highest level (also called views of the
    DB)

48
DB generalities classical database features
  • Data Definition Language (DDL)
  • database scheme is defined using a DDL
  • compiling the DDL description creates a Data
    Dictionary
  • the storage and access methods used by the DB
  • are specified in a storage and definition
    language
  • Implementation details for storage are usually
    hidden from users

49
DB generalities classical database features
  • Data Manipulation Language (DML)
  • data manipulation means accessing DB to
  • retrieve, insert, delete, or modify data
  • most common use of DML is for data retrieval
    informally described as "querying the DB"
  • retrieval component of DML query language
  • (and by abuse, sometimes use term query
    language as synonym for DML)

50
DB generalities classical database features
  • Varieties of Data Manipulation Language
  • There is a tension between
  • efficiency at physical level
  • intelligibility / ease of use at higher level
  • Have both procedural and non-procedural DMLs
  • procedural requires knowledge of data
    implementation
  • non-procedural need only specify what data is
    needed

51
DB generalities classical database features
  • Data Manipulation Languages for typical data
    models
  • object-based, hierarchical, network models have
    procedural DMLs
  • user can take explicit responsibility for
    optimising queries, but needs knowledge of data
    organisation
  • relational models use non-procedural DMLs
  • can formulate queries without knowledge of data
    organisation, but implementation has to be
    optimised

52
DB generalities classical database features
  • Database Manager (as program module!)
  • interfaces between low-level data in DB and
    application programs user queries.
  • Large volumes of data (relative to available
    technology)
  • gigabytes thousand megabytes 1 billion bytes
    !
  • terabytes million megabytes 1 trillion bytes
  • Requires auxiliary storage media, such as disk,
    CD etc.
  • Optimisation is primarily concerned with
    eliminating
  • data transfers between main and auxiliary memory.

53
DB generalities classical database features
  • Functions of the DB manager program module
  • query processing
  • interacting with the file manager modules doing
    actual operations on physical data
  • integrity enforcement
  • checking that data in the DB conforms to
    specified constraints
  • security enforcement
  • ensuring that authorisation is given for access
    to data
  • backup and recovery
  • coping with failure, and recovery to consistent
    DB state
  • concurrency control
  • ensuring that simultaneous transactions do not
    interfere.

54
DB generalities classical database features
  • Role of the Database Administrator
  • ... functions performed by the DBA .
  • scheme definition using DDL
  • specification of storage structure and access
    method
  • modification of schemes and physical storage as
    required
  • granting authorisation for data access
  • specifying integrity constraints

55
DB generalities classical database features
  • Overall system structure
  • Processing components
  • file manager
  • database manager
  • query processor
  • DML precompiler (to process DML embedded in APs)
  • DDL compiler
  • Data structures
  • data files actual content of db
  • data dictionary meta-data
  • indices auxiliary files to assist fast access

56
to follow Will review features of a DB system
with reference to Ingres and Quel
Write a Comment
User Comments (0)
About PowerShow.com