What is a Database - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

What is a Database

Description:

Some are STILL in use! Problems with this approach? Changes ... Represented by the presence or absence of a 'crows foot' One-to-One. One-to-Many. Many-to-Many ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 53
Provided by: uobcommun
Category:
Tags: crows | database

less

Transcript and Presenter's Notes

Title: What is a Database


1
What is a Database?
  • Definition A database is a structure that can
    store information about multiple types of
    entities, the attributes of those entities, and
    the relationships among the entities. (Pratt
    Adamski, 2003)

2
Traditional Data Processing
  • Languages like COBOL and FORTRAN
  • Data in files
  • Not interfaced
  • Some are STILL in use!

3
Problems with this approach?
  • Changes difficult
  • Errors and inconsistencies data integrity
  • Same data in multiple places
  • Duplication and redundancy

4
Advantages of a DBMS
  • Reduction of redundancies and errors within the
    data
  • Easier to extract information
  • The system is more flexible
  • Security is improved
  • Integrity can be improved
  • Data model needs to be produced

5
Disadvantages of a DBMS
  • Takes longer to access the data
  • Need special knowledge
  • Applications may be dependant to specific DBMS
    versions
  • Initial cost may be large.

6
Characteristics
  • A good DBMS has the following
  • Data integrity
  • Security
  • Concurrency control
  • Transaction processing
  • Recovery functions

7
Database Lifecycle
8
ER Terminology - Relationship
  • A set of associations between one or more entity
    type. (Connolly Begg, 2002)
  • Examples
  • A Staff member works at a Branch
  • A Branch has Staff members

works at
has
9
ER Terminology - Cardinality
  • Represented by the presence or absence of a
    crows foot
  • One-to-One
  • One-to-Many
  • Many-to-Many

10
ER Terminology - Optionality
  • Whether the information is mandatory or optional
  • A customer may not have an invoice
  • An invoice will have a customer
  • Represented by the circles and bars

11
Example Relationship
12
Developing Models
  • First stage initial study
  • gather information
  • Second stage database design
  • inc. development of the ER model based on the
    information gathered
  • Iterative process (sometime you have to go back
    to the first stage)

13
Attribute
  • State of an entity
  • Correspond to columns in a table
  • Book
  • ISBN
  • Author
  • Name
  • Publisher

14
Primary Key
  • Attribute(s) that are distinct for each entry
  • Primary key is UNIQUE
  • Examples Student Number, Book ISBN, PIN,
    Drivers Licence Number, Course Code...

15
Intelligent keys
  • When business-related columns are used as primary
    keys they are often called intelligent or natural
    keys.
  • For example, if the user gives each customer a
    unique customer number that value might naturally
    serve as the primary key for the customer table
    (e.g. StudentID, DriverLicenceID).
  • An alternative is to use system-generated
    artificial primary key values. These are often
    called surrogate keys because they are
    replacements for the intelligent keys, or blind
    keys because the user doesn't see them.

16
Surrogate keys
  • According to the Webster's Unabridged Dictionary,
    a surrogate is an "artificial or synthetic
    product that is used as a substitute for a
    natural product."
  • That's a great definition for the surrogate keys
    we use in data warehouses. A surrogate key is an
    artificial or synthetic key that is used as a
    substitute for a natural key.

17
Foreign Keys
  • A foreign key attribute set is a primary key
    attribute in one table that is repeated as a
    foreign key in another table
  • A non-key attribute in one relation that also
    appears as a primary key in another relation

18
What is Normalization?
  • Reorganisation and evaluation of table structures
  • Produces a set of stable, well structured tables.
  • Remove repeating/redundant data

19
Why Normalize?
  • Remove redundant data
  • Reduce update anomalies
  • Reduce delete anomalies
  • Reduce insert anomalies

20
Normal Forms
  • UNF a table that contains one or more repeating
    groups
  • 1NF A relation is in 1NF is all value are
    atomic and no rows are repeated.
  • 2NF A relation is said to be in 2NF if it is in
    1NF and every non-key attribute is fully
    functionally dependent on the primary key
  • 3NF A relation is said to be in 3NF if it is in
    2NF and every non-key is non-transitively
    dependent on the primary key
  • All attributes in every table must be determined
    by the key, the whole key, and nothing but the
    key

21
Un-normalized Data
  • Grade-Report
  • sNumber sName sAddress uNumber uTitle iNam
    e iRoom sMark
  • 38214 Bright 123 Smith St IS350 Database CODD B104
    A
  • 38214 Bright 123 Smith St IS465 Sys
    Anal KEMP B213 C
  • 69173 Smith 35 Main Rd IS465 Sys Anal KEMP B213
    A
  • 69173 Smith 35 Main Rd PM300 Proj Mgt LEWIS D317
    B
  • 69173 Smith 35 Main Rd QM440 OpSys KEMP B213 A
  • What entities are present?
  • What fields do they have?

22
Normalization Conclusion
  • All attributes in every table must be determined
    by the key, the whole key, and nothing but the
    key

23
SQL (Structured Query Language)
  • Developed by IBM in late 70s for DB2
  • SEQUEL
  • Used to
  • Build
  • Alter
  • Query

24
Data Manipulation Language
  • Used to retrieve, add, modify and delete
  • Commands
  • SELECT
  • INSERT
  • UPDATE
  • DELETE
  • ANSI vs Various DBMS implementations

25
SELECT STATEMENT
  • SELECT column(s)
  • FROM table(s)
  • WHERE IS, NOT, NULL, , ltgt, gt , lt, IN, NOT IN,
    BEETWEEN, LIKE
  • GROUP BY column(s)
  • HAVING COUNT
  • ORDER BY column(s) ASC, DESC

26
SQL Functions
  • Five functions
  • AVG
  • MAX
  • MIN
  • SUM
  • COUNT
  • MS-Access functions
  • DATE()
  • LCASE/UCASE
  • MID
  • LTRIM/RTRIM
  • Etc

27
Database Development Activities
  • Six main phases
  • Enterprise modelling
  • Conceptual data modelling
  • Logical database design
  • Physical database design and creation
  • Database implementation
  • Database maintenance

28
Data and Database Administration
  • Two factors are driving the changes in the data
    administration and DB administration
  • the availability of more technologies and
    platforms that must be managed concurrently
  • The increased pace of business changes

29
Growth Change
  • What happens when the database grows?
  • Analyse space utilization and performance
  • Additional space allocated
  • Change in structure
  • Check if already available (where?)
  • Y use current data
  • N add new data-types or relationships
  • Change in usage
  • Move to faster devices
  • Change placement (clustering)
  • Change contents

30
DA vs DBA
  • Strategic planning
  • Sets long term goals
  • Sets Policies and standards
  • Broad scope
  • Long term
  • Managerial orientation
  • DBMS-independent
  • Control and supervision
  • Executes plans
  • Enforces policies / procedures prog stand
  • Narrow scope
  • Short term (daily)
  • Technical
  • DBMS-specific

31
6 Types of Information Systems
  • Transaction Processing Systems (TPS)
  • Management Information Systems (MIS)
  • Decision Support Systems (DSS)
  • Executive Information Systems (EIS)
  • Expert Systems (ES)
  • Geographical Information Systems (GIS)

32
Organisations
  • Organisation classification (Mintzberg, 1979)
  • Machine Bureaucracy
  • Divisionalised Bureaucracy
  • Entrepreneurial structure
  • Professional
  • Ad hoc racy

33
Distributed DBMS
  • Formal Definition
  • A distributed database management system (DDBMS)
    governs the storage and processing of logically
    related data over interconnected computer systems
    in which both data and processing functions are
    distributed among several sites.

34
Distributed DBMS Characteristics
  • A collection of logically related shared data
  • Data is split into a number of fragments
  • Fragments may be replicated
  • Fragments/replicates are allocated to sites
  • The sites are linked by a communication network
  • The data at each site is under the control of a
    DBMS
  • The DBMS at each site can handle local
    applications
  • Each DBMS participates in at least one global
    application

35
DDBMS Commandments
  • Local site independence
  • Central Site Independence
  • Failure Independence
  • Location Transparency
  • Partitioning Transparency
  • Replication Transparency
  • Distributed Query Processing
  • Distributed Transaction Processing
  • Hardware Independence
  • Operating System independence
  • Network Independence
  • Database Independence

36
Distributed Database Design
  • Data partitioning
  • Horizontal partitioning
  • Vertical partitioning
  • Mixed

37
Transaction processing terms
  • A transaction is an atomic unit of work. i.e.
    changes to the database that must be done
    together.
  • For example, in discharging a patient from a
    hospital the system needs to update tables to
    indicate his bed is free and needs to update
    tables to store his bill. Doing one without the
    other would be incomplete.

38
Transaction processing terms
  • A program that changes data in tables is called
    transaction.
  • Single user vs Multi user. DBMS may allow more
    than one user to update data stored in the DB.
    However users may update the same data at the
    same time i.e. Concurrently
  • Concurrency control is a term that describes
    procedures used to ensure correct interleaving of
    transactions

39
Transaction processing
  • Begin transaction
  • Read, Write
  • End transaction
  • Commit transaction
  • Rollback
  • Redo
  • Undo

40
Concurrency Control
  • A program that changes data in tables (as an
    atomic unit of work) is called transaction
  • Concurrency control (CC) controls flow in a
    multi-user database
  • CC is managing and maintaining data integrity
  • CC is related to transactions

41
Two Types of Locks
  • Read / Shared others allowed to read locked
    data but cannot update. Placing a shared lock
    prevents another user for placing exclusive lock
    on that record set
  • Write / Exclusive prohibits other transaction
    from accessing data (reading). A transaction
    should put a write lock on record set when is
    about to update that record set. Placing write
    lock prevents users from placing any type of lock
    on that record set
  • LOCK TABLE Statement
  • LOCK TABLE tableName IN SHARE/EXCLUSIVE MODE

42
Deadlocks
  • Also referred to as Deadly Embrace
  • Locking introduces this problem
  • Example
  • Application A puts read lock on table X
  • Application B puts read lock on table X
  • A requests write lock to X (wait)
  • B requests write lock to X (wait)

43
Deadlock
  • Results when two or more transactions have locked
    a common resource, and each waits for the other
    to unlock that resource.
  • Unless the DBMS intervenes (by maintaining a
    matrix of usage), the transactions will wait
    indefinitely.

44
Handling Deadlocks
  • Mainly DBMS, partly programmer
  • So far we covered locking called Pessimistic
    concurrency control
  • Exclusive locks on all records until complete
  • A newer approach to concurrency control called
    versioning takes Optimistic approach that most
    of the time other users do not want the same
    record, or if they do, they only want to read
    (but not update) the record.

45
Versioning
  • Each transaction timestamped
  • Read requested latest timestamped version is
    used
  • Read requested on item being updated
  • update transaction rolled back and given later
    timestamp

46
Decision Support Systems (DSS)
  • Methodology designed to extract information from
    data accumulated over time
  • Arrangement of computerized tools to assist
    decision making
  • Used at all levels tailored to focus on
    specific areas or problems
  • Provides ability to conduct ad hoc queries

47
Data Warehouse
  • Data Warehouse (DW) is a broad based, shared
    database for management decision making that
    contains data that has been accumulated over time
  • Data must be high quality, aggregated, often
    denormalised and is not necessarily absolutely
    current

48
Contrast
  • Operational
  • Current transactions
  • Specific transactions
  • Focus on atomic transactions
  • Volume megabytes
  • Scope narrow
  • DSS Data
  • Longer time frame
  • Different levels
  • DSS analysed from multi-dimensions
  • Volume gigabytes
  • Broad scope

49
Data Warehouse Concepts
  • The Data Warehouse is an integrated,
    subject-oriented, time-variant, non-volatile
    database that provides support for
    decision-making

50
Data Mart
  • Data Marts are based on a limited number of
    subjects (possibly one)
  • Support to small group of people
  • Designed for local or departmental problems
  • A large company will often have several Data
    Marts.

51
On-Line Analytical Processing (OLAP)
  • Decision support methodology based on viewing
    data in multiple dimensions
  • OLAP is well suited for querying and multi time
    period trend analysis
  • OLAP concepts
  • Drill down (yearly-gtmonthly-gtweekly)
  • Slice (subset of the data e.g. products-gtshoes)
  • Rotate (interchange the dimensions)

52
Data Mining
  • DSS tools reactive
  • Data-mining proactive
  • Automatically search for anomalies and possible
    relationships
Write a Comment
User Comments (0)
About PowerShow.com