Title: CIS 5163 APPLIED DATABASE MANAGEMENT Section 1 Lecture 1 Database Principles within The Enterprise F
1CIS 5163 APPLIED DATABASE MANAGEMENT Section 1
Lecture 1Database Principles withinThe
Enterprise Framework
- Dr. Raja K. Iyer
- iyer_at_tarleton.edu
- (254) 968-9341
2Course Presentation Sections Outlines
- DATABASE PRINCIPLES WITHIN THE ENTERPRISE
FRAMEWORK - Business Process Engineering for Database
- Transactions Processing versus Analytical
Processing - Workflow Models Versus Data Models
- Data Dictionaries and Applications CRUD
Linkages - DATABASE ANALYSIS AND DESIGN PRINCIPLES
- Logical versus Physical Schema
- DBMS and Relational DBMS
- Object-oriented DBMS
- Choosing the Appropriate DBMS Environment for
enterprises - 4th Generation Programming / Web-based DB Designs
- DATABASE DESIGN AND IMPLEMENTATION CONSIDERATIONS
- Database Implementation in Legacy / Mainframe
Environments - Database Implementation in Client-Server
Environments - Database Implementation in Networked / Web
Environments - Database Implementation Open Source Future of
DBMS? - Database Integrity, Security, and Recovery
Administration - DATABASE PROJECT
- Semester project based on a Case Study (to be
selected)
3Section 1Database Principles withinThe
Enterprise Framework
- Business Process Engineering for Database
- Transactions Processing versus Analytical
Processing - Workflow Models Versus Data Models
- Data Dictionaries and Applications CRUD Linkages
4Business Process Re-Engineering for Database
Management Systems
- Introduction to Workflow and BPR
- Drivers for BPM 11 Money-Relevant Reasons to
Start - Always View Database in terms of Workflow!
- (For additional information, visit the Workflow
Management Coalition Website at www.wfmc.com)
5Multiple Layers of Workflow Automation
6Workflow Automation Model from WfMC
Online Auction Process Analysis, Modeling
Definition Tools
Business Process Analysis, Modeling Definition
Tools
Build Time
Auction Process Definitions
Process Definition
Run Time
Auction Workflow Enhancement Service
Workflow Enhancement Service
Auction Applications IT Tools
Applications IT Tools
Source Online Auction Workflow Model Mapped on
to Workflow Reference Model
Source Workflow Reference Model from Workflow
Management (WfM) Coalition)
7An Example of Workflow AutomationSales Workflow
Process Automation
Customer Quotes Management (CQM)
Customer Opportunities Management (COM)
Customer Contacts Management (CCM)
Sales Order Management (SOM)
Front Office
Back Office
Sales Commission Management (SCM)
Sales Shipping Management (SSM)
Accounts Receivables Management (ARM)
Sales Invoice Management (SIM)
8An Example of Workflow AutomationHospital
Workflow Process Automation
- Role of Database in Hospital Workflow
- Process Automation
- Transactions
- Records Accessibility
- Security
Reference Yahoo! News Hospitals Move Toward
Paperless Age
9Workflow Automation and BPR The Role of the
Database and DBMS
- Database should support workflow through process
automation with transactions against the database - DBMS technologies, such as normalization
ensures integrity and consistency of data used in
business process-oriented transactions - DBMS guarantees data independence from business
functions and business processes and workflow
transactions - Security and recoverability
10What is a Database?
- Collection of related data items that are being
stored for record-keeping analysis - Could be stored on cards in Rolodex, file
cabinet, computer, - Computerized databases are managed by a Database
Management System (DBMS) - Efficient, safe storage of large amounts of data
- High-level language for specifying operations
user wishes to perform on data - Shields users from knowing details about how data
is stored - Users View of Data versus how data is stored
- Conceptual Schema versus Physical Schema
11Importance of DBMS
- Amount of electronically available data is
exploding - Cost of storage is continuously dropping
- Moores law every 18 months, speed of processor
and/or capacity of disk doubles or price goes
down by half - Value of data as an organizational asset is
widely accepted - High demand in industry for powerful, flexible
data management systems to store data efficiently
and get the most out of their large, complex data
sets - e.g., data warehousing, data mining
- Largest databases
- Federal Express, Wal-Mart, Knight-Ridder
(Dialog), - Tables with 1 billion or more rows
- Approaching 10s of TB of data
- Think of the consequences of storing this much
data ?
12Brief History of Data Management
- Early DBMSs (late 1960s) evolved from file-based
processing systems - Need for supporting concurrent access to the data
by many users, recovery, back-up, - Roots in airline reservation systems (SABRE),
banking systems, corporate record-storage systems - Visualize the data much as it was stored
- Tree-based (hierarchical model)
- Graph-based (network model)
- Cumbersome to use, require programming to access
data
DEPTS
Course
EMPS
MGR
ITEMS
Student
Faculty
Pre-Req
NAME
SS
NAME
ID
13Tracing the Roots of Modern DBMS
- Early 1970s Ted Codd invented new data model
(relational data model) and the concept of data
abstraction - Soon thereafter, team of IBMers invented SQL
(Structured Query Language) - Became de-facto standard for query languages
based on the relational data model - Commercial DBMS based on relational model are now
widely accepted in industry - e.g., Microsoft Access, Oracle 9i, Sybase
Adaptive Server, - gt10 billion dollar industry!
14Examples of Database Applications
- Purchases from the supermarket
- Purchases using your credit card
- Booking a holiday at the travel agents
- Using the local library
- Taking out insurance
- Using the Internet
- Studying at university
15File-Based Systems
- Collection of application programs that perform
services for the end users (e.g. reports). - Each program defines and manages its own data.
16File-Based Processing
17Limitations of File-Based Approach
- Separation and isolation of data
- Each program maintains its own set of data.
- Users of one program may be unaware of
potentially useful data held by other programs. - Duplication of data
- Same data is held by different programs.
- Wasted space and potentially different values
and/or different formats for the same item.
18Limitations of File-Based Approach
- Data dependence
- File structure is defined in the program code.
- Incompatible file formats
- Programs are written in different languages, and
so cannot easily access each others files. - Fixed Queries/Proliferation of application
programs - Programs are written to satisfy particular
functions. - Any new requirement needs a new program.
19Database Approach
- Database Approach essential because
- Definition of data was embedded in application
programs, rather than being stored separately and
independently. - No control over access and manipulation of data
beyond that imposed by application programs. - Result
- Database and Database Management System (DBMS).
20Database
- Shared collection of logically related data (and
a description of this data), designed to meet the
information needs of an organization. - System catalog (metadata) provides description of
data to enable programdata independence. - Logically related data comprises entities,
attributes, and relationships of an
organizations information.
21Database Management System (DBMS)
- A software system that enables users to define,
create, and maintain the database and that
provides controlled access to this database.
22Database Approach
- Data definition language (DDL).
- Permits specification of data types, structures
and any data constraints. - All specifications are stored in the database.
- Data manipulation language (DML).
- General enquiry facility (query language) of the
data. - Controlled access to database may include
- A security system.
- An integrity system.
- A concurrency control system.
- A recovery control system.
- A user-accessible catalogue.
- A view mechanism.
- Provides users with only the data they want or
need to use.
23Concept of Views in Database
- Allows each user to have his or her own view of
the database. - A view is essentially some subset of the
database. - Benefits of Views include
- Reduce complexity
- Provide a level of security
- Provide a mechanism to customize the appearance
of the database - Present a consistent, unchanging picture of the
structure of the database, even if the underlying
database is changed.
24Components of DBMS Environment
- Hardware
- Can range from a PC to a network of computers.
- Software
- DBMS, operating system, network software (if
necessary) and also the application programs. - Data
- Used by the organization and a description of
this data called the schema. - Procedures
- Instructions and rules that should be applied to
the design and use of the database and DBMS. - People
25Roles in the Database Environment
- Data Administrator (DA)
- Database Administrator (DBA)
- Database Designers (Logical and Physical)
- Application Programmers
- End Users (naive and sophisticated)
26History of Database Systems
- First-generation
- Hierarchical and Network
- Second generation
- Relational
- Third generation
- Object Relational
- Object-Oriented
27Advantages of DBMSs
- Control of data redundancy
- Data consistency
- More information from the same amount of data
- Sharing of data
- Improved data integrity
- Improved security
- Enforcement of standards
- Economy of scale
28Advantages of DBMSs
- Balanced conflicting requirements
- Improved data accessibility and responsiveness
- Increased productivity
- Improved maintenance through data independence
- Increased concurrency
- Improved backup and recovery services
29Disadvantages of DBMSs
- Complexity
- Size
- Cost of DBMS
- Additional hardware costs
- Cost of conversion
- Performance
- Higher impact of a failure
30Transactions Processing Vs. Analytical Processing
31Transactions Processing Systems (TPS)
- What are Transactions and How do they differ from
data/information? - Informally, a transaction is a change to a
database in real-time - Program that maintains correspondence between the
state of the database and the real-world
application it is modeling - e.g., deposit transaction at an ATM, hotel
check-in, - Transactions are executed by Transaction
Processing System (TPS) consisting of - Set of application programs containing
transactions - Transaction middleware (TP monitor) which
monitors transactions and controls access to
DBMS(s) - One or more DBMSs for maintaining persistence
- Understanding and building (simple) TPS is focus
of this course
32Transactions vs. Programs
- Recall, transaction is a program that interacts
with database - Goal Maintain correspondence between database
and real-world application that is modeled by
database - Example deposit transaction at a bank
- Customer gives teller cash and deposit slip
- Transaction updates customers account
information in database to reflect deposit - Transaction must satisfy certain requirements
which go beyond what is expected of a
non-transactional program
33Transactional Requirements
- Maintain database consistency
- Database restrictions stored as integrity
constraints - Burden of the user/programmer to assure that
transaction preserves all such constraints - Guarantee that transaction is executed as a whole
or not at all (atomicity guarantee) - e.g., either deposit whole amount or no money at
all - Guarantee that no information is lost
(durability) - For multiple transactions running concurrently,
guarantee that transactions do not interfere with
each other (isolation guarantee) - The effect of multiple, concurrent transactions
on database should be the same as that of a
serial execution of the transactions why? - Atomicity, durability, and isolation are
guaranteed by transaction subsystem - More on transactions later in semester
34Structure of a Transaction Processing System
35Challenges of Transaction Processing Systems
- Applications that involve long transactions with
multiple database accesses Use information in
database to guide decision making - Applications that involve complex transactions
that involve multiple distributed databases
FLAT TRANSACTION MODEL Begin transaction(
) S Commit( )
- Structure for TPS
- Savepoints Rollbacks
- Begin_transaction()
- S1
- Sp1 create_savepoint()
- .
- If (condition)
- rollback sp1
-
- Commit( )
- Multiple or Federated DBs
Note S indicates a set of SQL statements.
36Decision Support Systems
- Second application domain (besides transaction
processing) in which databases play major role is
decision-support - Use information in database to guide decision
making - e.g., analyze the purchasing behavior of
supermarket customers to make decisions about
when and how to re-stock inventory - a.k.a. online analytical processing (OLAP) ?
online transaction processing (OLTP) - Decision-support queries are more complex than
transaction-oriented queries - OLTP In a particular store Reduce the number
of blue shirts in the inventory database by 10. - OLAP Compute the percent change in sales of
blue shirts for all of our stores in the
northeastern US between 1990 and 2000.
37Decision-Support Contd
- Databases for OLAP are different from OLTP
systems - Usually larger (TBs vs. GBs)
- Separated from OLTP systems to prevent
interference (since OLTP systems have stringent
real-time constraints) - Different organization of data to support complex
queries and facilitate maintenance - Different users, hence different interfaces
- A.k.a. data warehouses
- Data mining as an additional application
- We will not focus on data warehousing and OLAP
38Characteristics of Modern Database Systems
- Support for concurrent access to data
- Safeguard data against accidentally loss
- Maintain integrity of database in light of
changes - Support for distributed data
- Control access to data
- More recently
- Support for non-standard data
- Support for heterogeneous data
- Support for decision-support and analysis
39Additional Requirements
- Increase usage and new applications for databases
have resulted in additional requirements (since
early days) - High availability
- High reliability
- High throughput
- Low response time
- Extensible
40Some Recent Trends
- DBMS are getting smaller and smaller
- DBMS that can store GB of data can run on PC
- Databases are getting bigger and bigger
- Multiple TBs (terabyte 1012 bytes) not uncommon
- Databases also able to store images, video, audio
- Database stored on secondary storage devices
- Use of Tertiary Storage in TPS
- Larger capacity disks but much slower response
time (10-20 msec vs. several sec.) - Tape, CD, etc. usually involves robotic
conveyance - DBMS Supporting Parallel Computing
- Speed-up query processing through parallelism
(e.g., read data from many disks) - However, need special algorithms to partition
data correctly
41Types of DBMS
- General-purpose DBMS
- Multimedia DBMS
- Geographic information systems (GIS)
- Data warehouse DBMS
- Real-time DBMS
- Active DBMS
42Stakeholders/Participants in the Transaction
Processing System Arena
- System Analyst
- Database Designer
- Application Programmer
- Project Manager
- Database Administrator
- System Administrator
- End Users
- Naïve end users
- Sophisticated end users
43When NOT to Use a DBMS
- Initial investment too high
- Too much overhead
- Application is simple, well-defined, not expected
to change - Stringent real-time requirements (use specialized
real-time DBMS) - Multi-user access to data is not required
- Alternative collection of files managed by
access programs
44Basic Database Terminology
- Database (DB)
- Collection of related data that exists over a
long period of time - Database Management System (DBMS)
- Collection of programs that
- allows users to create a new database and specify
its structure - gives users the ability to query and modify the
data efficiently - keeps the data secure from accidents or
unauthorized use - controls the access to the data for many users at
once - Database System (DBS)
- The database and DBMS software together make up
what is known as the Database System
45Data Abstraction
- Set of seemingly incompatible goals for DBMS
- (1) Retrieve large amounts of data efficiently
lead to the design of complex data structures - (2) Provide simple interactions with data for
non-trained users - For various reasons, early data-intensive
applications worked directly with physical data
representation ? What are the drawbacks? - Modern DBMS hide the physical complexity from
users through several levels of abstraction while
providing simple and efficient access methods - 3-tier architecture
46Schema
- Description of how data is stored or represented
- Physical schema describes the structure of
database (i.e., data files and indexes) as it is
stored on disk (e.g., sequence of byte addresses
and offsets) - Conceptual schema describes data in terms of
higher-level concepts (e.g., a Student is a
record containing ID, Name, and Address fields,
which are of type INTEGER, STRING and STRING
respectively)
47Three-Tier Architecture
48Data Independence
- Ability to modify a schema definition at one
level without affecting a schema definition in
the next higher level - Physical data independence
- Ability to modify physical schema without causing
the conceptual (or external) schema or
application programs to be re-written - Logical data independence
- Ability to modify the conceptual schema without
having to change the external schemas or
application programs - Why is data independence so important?
49Data Model
- A Data Model (DM) is a collection of conceptual
tools and languages for describing data,
relationships, semantics, and consistency
constraints (schema) - Distinguish between high-level (conceptual) data
models and low-level (physical) data models - Conceptual Data Models (external and conceptual
level) - Object-Based Logical Models Entity-Relationship
Model, Object-Oriented Model - Record-Based Logical Models Relational Model,
Network Model (CODASYL), Hierarchical Model (IMS) - Physical Data Models (internal)
50Schema vs. Data
- Recall, database schema describes the structure
of the database - Changed infrequently
- a.k.a. database intension
- a.k.a. metadata ( data about data)
- Database State refers to the data in the database
at any given moment (snapshot) - Changes frequently
- a.k.a. database extension
- BTW, DBMS assures that all database states are
valid states - Database Instance refers to a specific data item
in the database
51DBMS Languages
- Data Definition Language (DDL)
- Used to define the conceptual and internal
schemas - Includes constraint definition language (CDL) for
describing conditions that database instances
must satisfy - Includes storage definition language (SDL) to
influence layout of physical schema (some DBMSs) - Data Manipulation Language (DML)
- Used to describe operations on the instances of a
database - Procedural DML (how) vs. declarative DML (what)
- e.g., Relational Algebra e.g., SQL
- Note, SQL includes a DML and a DDL in one!
- Host Language
- General-purpose programming language which lets
users embed DML commands (data sublanguage) into
their code
52Architecture of a DBMS
Schema Modifications
Modifications
Queries
Database System
Query Processor
Transaction Subsystem
DBMS Software
Storage Manager
Data
Data Definition (Metadata)
53Component Overview
- Data storage (incl. metadata)
- e.g., names of relations, attributes, data types,
etc. - Often, DBMS maintains an index
- Helps us find data items quickly given part of
their value how? - Storage manager
- Handles requests from levels above, retrieves
data from store and returns it in format
requested by queries - Query processor
- Processes not only queries but also requests for
modifications, etc. - Figures out best way to retrieve data
- Transaction subsystem
- Handles concurrent transactions against database
(see slides 9 through 12 for more details on
transactions) - Three types of input at top
54How Are Indexes Implemented?
- Hash tables used extensively by early systems
- Today B-tree, B for balanced tree
- Generalization of a balanced binary search tree
- Optimized for organizing data on disk, each node
of B-Tree occupies 1 full disk block - Each node in B-Tree has large number of children
- Hundreds of pointers to children in one block
- Search of B-Tree rarely requires traversal of
more than three levels!! - Cost of disk op is proportional to number of disk
blocks accessed - 3 block accesses much more efficient than search
of binary tree with many different levels - Lesson most appropriate data structure for data
stored on disk is different from the data
structures used for algorithms that run in memory
55Next Lectures
- Start with database design process, followed by
study of the relational data model and its query
languages - Goal of course project is to develop an
application that supports transactions and uses
database for persistence (transaction processing
system) - Brief review of methodology used for project
development next
56CIS 5163 Course Project
- Goal of course project is to develop an
application that supports transactions and uses
database for persistence (transaction processing
system) - Implementation of transaction processing system
is significant undertaking - Numerous software engineering practices to guide
developer - Steps for designing and implementing course
project - For the selected application domain, state
objectives clearly - Expand objectives into formal requirements
document - Design database schema
- Implement and test database
- Develop application and integrate with database
- Test integrated system, fine-tune database if
necessary - Produce documentation and deliver project
- Demonstrate finished system to course staff
57CIS 5163 Course ProjectRequirements Document (RD)
- Describes in some detail what the system is going
to do, but NOT how it will do it - Specifically, RD describes
- Functional requirements for the application
- Set of interactions with users
- Information/data that must be contained in the
system - Constraints the at must be satisfied by the
underlying database - Final set of deliverables
58ADDITIONAL READINGS OF INTEREST
- Business Event Methodology
- Introduction to Workflow
- Workflow - An Introduction
- Modern Business Strategies and Process Support
- Process Knowledge
- Tech for Virtual Enterprise
- Workflow and Process Management
- Yahoo! News - Hospitals Move Toward Paperless Age