Chapter 10 Distributed Database Management System presentation

About This Presentation

Transcript and Presenter's Notes

Title: Chapter 10 Distributed Database Management System

1
Chapter 10Distributed Database Management System
Database Systems Design, Implementation, and
Management 4th Edition
2
The Evolution of Distributed DBMS

Centralized DBMS in the 1970s
Support for structured information needs.
Regularly issued formal reports in standard
formats.
Prepared by specialist using 3GL in response to
precisely channeled request.
Centrally stored corporate data.
Data access through dumb terminals.
Incapable of providing quick, unstructured, and
ad hoc information for decision makers in a
dynamic business environment.

3
The Evolution of Distributed DBMS

Social and Technical Changes in the 1980s
Business operations became more decentralized
geographically.
Competition increased at the global level.
Customer demands and market needs favored a
decentralized management style.
Rapid technological change created low-cost
microcomputers. The LANs became the basis for
computerized solutions.
The large number of applications based on DBMSs
and the need to protect investments in
centralized DBMS software made the notion of data
sharing attractive.

4
The Evolution of Distributed DBMS

Two Database Requirements in a Dynamic Business
Environment
Quick ad hoc data access became crucial in the
quick-response decision making environment.
The decentralization of management structure
based on the decentralization of business units
made decentralized multiple-access and
multiple-location databases a necessity.
Developments in the 1990s affecting DBMS
The growing acceptance of the Internet and the
World Wide Web as the platform for data access
and distribution.
The increased focus on data analysis that led to
data mining and data warehousing.

5
The Evolution of Distributed DBMS

DDBMS Advantages
Data are located near the greatest demand site.
Faster data access
Faster data processing
Growth facilitation
Improved communications
Reduced operating costs
User-friendly interface
Less danger of a single-point failure
Processor independence

DDBMS Disadvantages
Complexity of management and control
Security
Lack of standards
Increased storage requirements

6
Distributed Processingand Distributed Database

Distributed processing shares the databases
logical processing among two or more physically
independent sites that are connected through a
network. (See Figure 10.1)
Distributed database stores a logically related
database over two or more physically independent
sites connected via a computer network. (See
Figure 10.2)

7
Distributed Processing Environment
Figure 10.1
8
Distributed Database Environment
Figure 10.2
9
Distributed Processingand Distributed Database

Distributed processing does not require a
distributed database, but a distributed database
requires distributed processing.
Distributed processing may be based on a single
database located on a single computer. In order
to manage distributed data, copies or parts of
the database processing functions must be
distributed to all data storage sites.
Both distributed processing and distributed
databases require a network to connect all
components.

10
What Is A Distributed DBMS?

A distributed database management system (DDBMS)
governs the storage and processing of logically
related data over interconnected computer systems
in which both data and processing functions are
distributed among several sites.

11
What Is A Distributed DBMS?

Functions of a DDBMS
Application interface
Validation to analyze data requests
Transformation to determine requests components
Query-optimization to find the best access
strategy
Mapping to determine the data location
I/O interface to read or write data
Formatting to prepare the data for presentation
Security to provide data privacy
Backup and recovery
Database administration
Concurrency control
Transaction management

12
Centralized Database Management System
Figure 10.3
13
Fully Distributed Database Management System
Figure 10.4
14
DDBMS Components

Computer workstations that form the network
system.
Network hardware and software components that
reside in each workstation.
Communications media that carry the data from one
workstation to another.
Transaction processor (TP) receives and processes
the applications data requests.
Data processor (DP) stores and retrieves data
located at the site. Also known as data manager
(DM).

15
Distributed Database System Components
Figure 10.5
16
DDBMS Components

DDBMS protocol determines how the DDBMS will
Interface with the network to transport data and
commands between DPs and TPs.
Synchronize all data received from DPs (TP side)
and route retrieved data to the appropriate TPs
(DP side).
Ensure common database functions in a distributed
system -- security, concurrency control, backup,
and recovery.

17
Levels of Data Process Distribution

Single-Site Processing, Single-Site Data (SPSD)
All processing is done on a single CPU or host
computer.
All data are stored on the host computers local
disk.
The DBMS is located on the host computer.
The DBMS is accessed by dumb terminals.
Typical of most mainframe and minicomputer DBMSs.
Typical of the 1st generation of single-user
microcomputer database.

Table 10.1
18
Nondistributed (Centralized) DBMS
Figure 10.6
19
Levels of Data Process Distribution

Multiple-Site Processing, Single-Site Data (MPSD)
Typically, MPSD requires a network file server on
which conventional applications are accessed
through a LAN.
A variation of the MPSD approach is known as a
client/server architecture. (Chapter 12)

Figure 10.7
20
Levels of Data Process Distribution

Multiple-Site Processing, Multiple-Site Data
(MPMD)
Fully distributed DBMS with support for multiple
DPs and TPs at multiple sites.
Homogeneous DDMS integrate only one type of
centralized DBMS over the network.
Heterogeneous DDBMS integrate different types of
centralized DBMSs over a network. (See Figure
10.8)

21
Figure 10.8 Heterogeneous Distributed Database
Scenario
22
Distributed DB Transparency

DDBMS transparency features have the common
property of allowing the end users to think that
he is the databases only user.
Distribution transparency
Transaction transparency
Failure transparency
Performance transparency
Heterogeneity transparency

23
Distribution Transparency

Distribution transparency allows us to manage a
physically dispersed database as though it were a
centralized database.
Three Levels of Distribution Transparency
Fragmentation transparency
Location transparency
Local mapping transparency

Table 10.2
24
Distribution Transparency

Example (Figure 10.9) Employee data (EMPLOYEE)
are distributed over three locations New York,
Atlanta, and Miami.Depending on the level of
distribution transparency support, three
different cases of queries are possible

Figure 10.9 Fragment Locations
25
Distribution Transparency

Case 1 DB Supports Fragmentation Transparency
SELECT FROM EMPLOYEEWHERE EMP_DOB lt
01-JAN-1940

26
Distribution Transparency

Case 2 DB Supports Location Transparency
SELECT FROM E1WHERE EMP_DOB lt 01-JAN-1940
UNION
SELECT FROM E2WHERE EMP_DOC lt 01-JAN-1940
UNION
SELECT FROM E3WHERE EMP_DOC lt 01-JAN-1940

27
Distribution Transparency

Case 3 DB Supports Local Mapping Transparency
SELECT FROM E1 NODE NYWHERE EMP_DOB lt
01-JAN-1940
UNION
SELECT FROM E2 NODE ATLWHERE EMP_DOC lt
01-JAN-1940
UNION
SELECT FROM E3 NODE MIAWHERE EMP_DOC lt
01-JAN-1940

28
Distribution Transparency

Distribution transparency is supported by
distributed data dictionary (DDD) or a
distributed data catalog (DDC).
The DDC contains the description of the entire
database as seen by the database administrator.
The database description, known as the
distributed global schema, is the common database
schema used by local TPs to translate user
requests into subqueries.

29
Transaction Transparency

Transaction transparency ensures that database
transactions will maintain the databases
integrity and consistency. The transaction will
be completed only if all database sites involved
in the transaction complete their part of the
transaction.
Related Concepts
Remote Requests
Remote Transactions
Distributed Transactions
Distributed Requests

30
Transaction Transparency

Distributed Requests and Distributed Transactions
A remote request allows us to access data to be
processed by a single remote database processor.
(Figure 10.10)
A remote transaction, composed of several
requests, may access data at only a single site.
(Figure 10.11)
A distributed transaction allows a transaction to
reference several different (local or remote) DP
sites. (Figure 10.12)
A distributed request lets us reference data from
several remote DP sites. (Figure 10.13) It also
allows a single request to reference a physically
partitioned table. (Figure 10.14)

31
A Remote Request
Figure 10.10
32
A Remote Transaction
Figure 10.11
33
A Distributed Transaction
Figure 10.12
34
A Distributed Request
Figure 10.13
35
Another Distributed Request
Figure 10.14
36
Figure 10.15
37
Transaction Transparency

Two-Phase Commit Protocol
The two-phase commit protocol guarantees that, if
a portion of a transaction operation cannot be
committed, all changes made at the other sites
participating in the transaction will be undone
to maintain a consistent database state.
Each DP maintains its own transaction log. The
two-phase protocol requires that each individual
DPs transaction log entry be written before the
database fragment is actually updated.
The two-phase commit protocol requires a
DO-UNDO-REDO protocol and a write-ahead protocol.

38
Transaction Transparency

Two-Phase Commit Protocol
The DO-UNDO-REDO protocol is used by the DP to
roll back and/or roll forward transactions with
the help of the systems transaction log entries.
DO performs the operation and records the
before and after values in the transaction
log.
UNDO reverses an operation, using the log entries
written by the DO portion of the sequence.
REDO redoes an operation, using the log entries
written by DO portion of the sequence.
The write-ahead protocol forces the log entry to
be written to permanent storage before the actual
operation takes place.

39
Transaction Transparency

Two-Phase Commit Protocol
Two-phase commit protocol defines the operations
between two types of nodes the coordinator and
one or more subordinates or cohorts. The protocol
is implemented in two phases
Phase 1 Preparation
The coordinator sends a PREPARE TO COMMIT message
to all subordinates.
The subordinates receive the message, write the
transaction log using the write-ahead protocol,
and send an acknowledgement message to the
coordinator.
The coordinator makes sure that all nodes are
ready to commit, or it aborts the transaction.

40
Transaction Transparency

Phase 2 The Final Commit
The coordinator broadcasts a COMMIT message to
all subordinates and waits for the replies.
Each subordinate receives the COMMIT message then
updates the database, using the DO protocol.
The subordinates reply with a COMMITTED or NOT
COMMITTED message to the coordinator.
If one or more subordinates did not commit, the
coordinator sends an ABORT message, thereby
forcing them to UNDO all changes.

41
Performance Transparency andQuery Optimization

The objective of a query optimization routine is
to minimize the total cost associated with the
execution of a request. The costs associated with
a request are a function of the
Access time (I/O) cost involved in accessing the
physical data stored on disk.
Communication cost associated with the
transmission of data among nodes in distributed
database systems.
CPU time cost associated with the processing
overhead of managing distributed transactions.

42
Performance Transparency andQuery Optimization

Query optimization must provide distribution
transparency as well as replica transparency.
Replica transparency refers to the DDBMSs ability
to hide the existence of multiple copies of data
from the user.
Most of the query optimization algorithms are
based on two principles
Selection of the optimum execution order
Selection of sites to be accessed to minimize
communication costs

43
Performance Transparency andQuery Optimization

Operation Modes of Query Optimization
Automatic query optimization means that the DDBMS
finds the most cost-effective access path without
user intervention.
Manual query optimization requires that the
optimization be selected and scheduled by the end
user or programmer.
Timing of Query Optimization
Static query optimization takes place at
compilation time.
Dynamic query optimization takes place at
execution time.

44
Performance Transparency andQuery Optimization

Optimization Techniques by Information Used
Statistically based query optimization uses
statistical information about the database.
In the dynamic statistical generation mode, the
DDBMS automatically evaluates and updates the
statistics after each access.
In the manual statistical generation mode, the
statistics must be updated periodically through a
user-selected utility.
Rule-based query optimization algorithm is based
on a set of user-defined rules to determine the
best query access strategy.

45
Distributed Database Design

The design of a distributed database introduces
three new issues
How to partition the database into fragments.
Which fragments to replicate.
Where to locate those fragments and replicas.

46
Data Fragmentation

Data fragmentation allows us to break a single
object into two or more segments or fragments.
Each fragment can be stored at any site over a
computer network.
Data fragmentation information is stored in the
distributed data catalog (DDC), from which it is
accessed by the transaction processor (TP) to
process user requests.
Three Types of Fragmentation Strategies
Horizontal fragmentation
Vertical fragmentation
Mixed fragmentation

47
A Sample CUSTOMER Table
Figure 10.16
48
Data Fragmentation

Horizontal FragmentationDivision of a relation
into subsets (fragments) of tuples (rows). Each
fragment is stored at a different node, and each
fragment has unique rows. Each fragment
represents the equivalent of a SELECT statement,
with the WHERE clause on a single attribute.

Table 10.3 Horizontal Fragmentation of the
CUSTOMER Table By State
49
Table Fragments In Three Locations
Figure 10.17
50
Data Fragmentation

Vertical FragmentationDivision of a relation
into attribute (column) subsets. Each subset
(fragment) is stored at a different node, and
each fragment has unique columns -- with the
exception of the key column. This is the
equivalent of the PROJECT statement.

Table 10.4 Vertical Fragmentation of the
CUSTOMER Table
51
Vertically Fragmented Table Contents
Figure 10.18
52
Data Fragmentation

Mixed FragmentationCombination of horizontal and
vertical strategies. A table may be divided into
several horizontal subsets (rows), each one
having a subset of the attributes (columns).

53
Table 10.5 Mixed Fragmentation of the CUSTOMER
Table
54
Figure 10.19
55
Data Replication

Data replication refers to the storage of data
copies at multiple sites served by a computer
network.
Fragment copies can be stored at several sites to
serve specific information requirements.
The existence of fragment copies can enhance data
availability and response time, reducing
communication and total query costs.

Figure 10.20
56
Data Replication

Mutual Consistency Rule
Replicated data are subject to the mutual
consistency rule, which requires that all copies
of data fragments be identical.
DDBMS must ensure that a database update is
performed at all sites where replicas exist.
Data replication imposes additional DDBMS
processing overhead.

57
Data Replication

Replication Conditions
A fully replicated database stores multiple
copies of all database fragments at multiple
sites.
A partially replicated database stores multiple
copies of some database fragments at multiple
sites.
Factors for Data Replication Decision
Database Size
Usage Frequency

58
Data Allocation

Data allocation describes the processing of
deciding where to locate data.
Data Allocation Strategies
CentralizedThe entire database is stored at one
site.
PartitionedThe database is divided into several
disjoint parts (fragments) and stored at several
sites.
ReplicatedCopies of one or more database
fragments are stored at several sites.

59
Data Allocation

Data allocation algorithms take into
consideration a variety of factors
Performance and data availability goals
Size, number of rows, the number of relations
that an entity maintains with other entities.
Types of transactions to be applied to the
database, the attributes accessed by each of
those transactions.

60
Client/Server vs. DDBMS

Client/server architecture refers to the way in
which computers interact to form a system.
It features a user of resources or a client and a
provider of resources or a server.
The architecture can be used to implement a DBMS
in which the client is the transaction processor
(TP) and the server is the data processor (DP).

61
Client/Server Architecture

Client/Server Advantages
Client/server solutions tend to be less
expensive.
Client/server solutions allow the end user to use
the microcomputers graphical user interface
(GUI), thereby improving functionality and
simplicity.
There are more people with PC skills than with
mainframe skills.
The PC is well established in the workplace.
Numerous data analysis and query tools exist to
facilitate interaction with many of the DBMSs.
There are considerable cost advantages to
off-loading application development from the
mainframe to PCs.

62
Client/Server Architecture

Client/Server Disadvantages
The client/server architecture creates a more
complex environment with different platforms.
An increase in the number of users and processing
sites often paves the way for security problems.
The burden of training a wider circle of users
and computer personnel increases the cost of
maintaining the environment.

63
C. J. Dates 12 Commandments for Distributed
Database

1. Local Site Independence
2. Central Site Independence
3. Failure Independence
4. Location Transparency
5. Fragmentation Transparency
6. Replication Transparency
7. Distributed Query Processing
8. Distributed Transaction Processing
9. Hardware Independence
10. Operating System Independence
11. Network Independence
12. Database Independence

Write a Comment

User Comments (0)

About PowerShow.com

Chapter 10 Distributed Database Management System PowerPoint PPT Presentation