Basis for Distributed Database Technology - PowerPoint PPT Presentation


PPT – Basis for Distributed Database Technology PowerPoint presentation | free to download - id: 6f6447-NjNjN


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Basis for Distributed Database Technology


A multiprocessor system based DBMS (parallel database system) is not a DDBMS. A DDBMS is not a system wherein data resides only at one node. – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 36
Provided by: qin62


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Basis for Distributed Database Technology

Basis for Distributed Database Technology
  • Database System Technology (DST)
  • controlled access to structured data
  • aims towards centralized (single site) computing
  • Computer Networking Technology (CNT)
  • facilitates distributed computing
  • goes against centralized computing
  • Distributed Database Technology DST CNT
  • aims to achieve integration without centralization

What is distributed?
  • Processing Logic
  • Function
  • Data
  • Control
  • All the above modes of distribution are necessary
    and important for distributed database technology

Distributed database system
  • A distributed database is a collection of
    multiple, logically interrelated databases
    distributed over a computer network.
  • A distributed database management system (DDBMS)
    is a software system that permits the management
    of the distributed databases and makes the
    distribution transparent to the users.

What is not a DDBMS?
  • A DDBMS is not a collection of files that can
    be stored at each node of a computer network.
  • A multiprocessor system based DBMS (parallel
    database system) is not a DDBMS.
  • A DDBMS is not a system wherein data resides only
    at one node.

Aims of Distributed DBMS - Transparent Management
of Distributed Replicated Data
  • Transparency refers to separation of the
    higher-level semantics of a system from
    lower-level implementation details.
  • From data independence in centralized DBMS to
    fragmentation transparency in DDBMS.
  • Who should provide transparency? - DDBMS!

Aims of Distributed DBMS - Reliability through
Distributed Transactions
  • Distributed DBMS can use replicated components to
    eliminate single point failure.
  • The users can still access part of the
    distributed database with proper care even
    though some of the data is unreachable.
  • Distributed transactions facilitate maintenance
    of consistent database state even when failures

Aims of Distributed DBMS - Improved Performance
  • Since each site handles only a portion of a
    database, the contention for CPU and I/O
    resources is not that severe. Data localization
    reduces communication overheads.
  • Inherent parallelism of distributed systems may
    be exploited for inter-query and intra-query
  • Performance models are not sufficiently developed.

Aims of Distributed DBMS - Easier System Expansion
  • Ability to add new sites, data, and users over
    time without major restructuring.
  • Huge centralized database systems (mainframes)
    are history (almost!).
  • PC revolution (Compaq buying Digital, 1998) will
    make natural distributed processing environments.
  • New applications (such as, supply chain) are
    naturally distributed - centralized systems will
    just not work.

Complicating Factors
  • Data may be replicated in a distributed
    environment. Therefore, DDBMS is responsible for
    (i) choosing one of the stored copies of the
    requested data, and (ii) making sure that the
    effect of an update is reflected on each and
    every copy of that data item.
  • Maintaining consistency of distributed/replicated
  • Since each site cannot have instantaneous
    information on the actions currently carried out
    in other sites, the synchronization of
    transactions at multiple sites is harder than
    centralized system.
  • and Complexity, Cost, Distribution of control,

Problem Areas
  • Distributed Database Design
  • Distributed Query Processing
  • Distributed Directory Management
  • Distributed Concurrency Control
  • Distributed Deadlock Management
  • Reliability of Distributed Databases
  • Operating Systems Support
  • Heterogeneous Databases

Relationship among Problems
Directory Management
Distributed DB Design
Query Processing
Concurrency Control
Deadlock Management
Transparency and Architecture issues in DDBMSs
Top-Down DDBMS Architecture - Classical
Site Independent Schemas
Other sites
Local Database 2
Site 1
Site 2
Top-Down DDBMS Architecture - Classical
  • Global Schema a set of global relations as if
    database were not distributed at all
  • Fragmentation Schema global relation is split
    into non-overlapping (logical) fragments. 1n
    mapping from relation R to fragments Ri.
  • Allocation Schema 11 or 1n (redundant) mapping
    from fragments to sites. All fragments
    corresponding to the same relation R at a site j
    constitute the physical image Rj. A copy of a
    fragment is denoted by Rji.
  • Local Mapping Schema a mapping from physical
    images to physical objects, which are manipulated
    by local DBMSs.

Global Relations, Fragments and Physical Images
  • Separating concepts of fragmentation and
  • Explicit control of redundancy
  • Independence from local databases
  • Allows for
  • Fragmentation Transparency
  • Location Transparency
  • Local Mapping Transparency

Rules for Data Fragmentation
  • Completeness All the data of the global relation
    must be mapped into fragments.
  • Reconstruction It must always be possible to
    reconstruct each global relation from its
  • Disjointedness It is convenient if the fragments
    are disjoint so that the replication of data can
    be controlled explicitly.

Types of Data Fragmentation
  • Vertical Fragmentation
  • Projection on relation (subset of attributes)
  • Reconstruction by join
  • Updates require no tuple migration
  • Horizontal Fragmentation
  • Selection on relation (subset of tuples)
  • Reconstruction by union
  • Updates may requires tuple migration
  • Mixed Fragmentation
  • A fragment is a Select-Project query on relation.

Vertical Fragmentation
Horizontal Fragmentation
Levels of Distribution Transparency
  • Fragmentation Transparency Just like using
    global relations.
  • Location Transparency Need to know fragmentation
    schema but need not know where fragments are
    located. Applications access fragments (no need
    to specify sites where fragments are located).
  • Local Mapping Transparency Need to know both
    fragmentation and allocation schema no need to
    know what the underlying local DBMSs are.
    Applications access fragments explicitly
    specifying where the fragments are located.
  • No Transparency Need to know local DBMS query
    languages, and write applications using
    functionality provided by the Local DBMS

Why is support for transparency difficult?
  • There are tough problems in query optimization
    and transaction management that need to be
    tackled (in terms of system support and
    implementation) before fragmentation transparency
    can be supported.
  • Less distribution transparency the more the
    end-application developer needs to know about
    fragmentation and allocation schemes, and how to
    maintain database consistency.
  • Higher levels of distribution transparency
    require appropriate DDBMS support, but makes
    end-application developers work easy.

Some Aspects of top-down architecture
  • Distributed database technology is an add-on
    technology, most users already have populated
    centralized DBMSs. Whereas top down design
    assumes implementation of new DDBMS from scratch.
  • In case of OODBMs, top-down architecture makes
    sense because most OODBMs are going to be built
    from scratch.
  • In many application environments, such as
    semi-structured databases, continuous multimedia
    data, the notion of fragment is difficult to
  • Current relational DBMS products provide for some
    form of location transparency (such as, by using

Bottom up Architecture - Present Future
  • Possible ways in which multiple databases may be
    put together for sharing by multiple DBMSs.
  • The DBMSs are characterized according to
  • Autonomy - degree to which individual DBMSs can
    operate independently. Tightly coupled -
    integrated (A0), Semiautonomous -federated (A1),
    Total Isolation - multidatabase systems(A2)
  • Distribution - no distribution - single site
    (D0), client-server - distribution of DBMS
    functionality (D1), full distribution - peer to
    peer distributed architecture(D2)
  • Heterogeneity - homogeneous (H0) or heterogeneous

Distributed DBMS Implementation Alternatives
Architectural Alternatives
  • (A0,D0,H0) multiple DBMSs that are logically
    integrated at single site - composite systems.
  • (A0,D0,H1) multiple database managers that are
    heterogeneous but provide integrated view to the
  • (A0,D1,H0) client-server based DBMS.
  • (A0,D2,H0) Classical distributed database system
  • (A1,D0,H0) Single site, homogeneous, federated
    database systems - not realistic.
  • (A1,D0,H1) heterogeneous federated DBMS, having
    common interface over disparate cooperating
    specialized database systems.

Architectural Alternatives
  • (A1,D1,H1) heterogeneous federated database
    systems with components of the systems placed at
    different sites.
  • (A2,D0,H0) homogeneous multidatabase systems at
    a single site.
  • (A2,D0,H1) heterogeneous multidatabase systems
    at a single site.
  • (A2,D1,H1) (A2,D2,H1) distributed
    heterogeneous multidatabase systems. In case of
    client-server environments it creates a three
    layer architecture. Interoperability is the major
  • Autonomy, distribution, heterogeneity are
    orthogonal issues.

Client/Server Database Systems
  • Distinguish and divide the functionality to be
    provided into two classes server functions and
    client functions. That is, two level
    architecture. Made popular by relational DBMS
  • DBMS client user interface, application,
    consistency checking of queries, and caching and
    managing locks on cached data.
  • DBMS Server handles query optimization, data
    access and transaction management.
  • Typical scenarios multiple clients/single
    server multiple client/multiple servers
    (dedicated home-server or any server)

Client/Server Reference Architecture
User Interface
Application Program
Operating System
Client DBMS
Communication software
SQL Queries
Result Relation
Communication software
Semantic Data Controller
Query Optimizer
Transaction Manager
Recovery Manager
Runtime Support Processor
Distributed Database Reference Architecture
Components of Distributed DBMS
User Requests
System Responses
User Processor
Data Processor
System Log
MDBS Architecture With Global Schema
MDBS Architecture without Global Schema
Multidatabase Layer
Local Database System Layer
Components of MDBS
System Responses
User Requests
Multi-DBMS Layer
Global Directory Issues
Directory is itself a database that contains
meat-data about the actual data stored in the
database. It includes the support for
fragmentation transparency for the classical
DDBMS architecture. Directory can be local or
distributed. Directory can be replicated and/or
partitioned. Directory issues are very important
for large multi-database applications, such as
digital libraries.
Impact of new technologies
  • Internet and WWW
  • Semi-structured data, multimedia data
  • Keyword based search - browsing versus querying
  • What does integration mean?
  • Applied technologies
  • Workflow systems
  • Data warehousing Data mining
  • What is the role of distributed database

Research Issues - DDBMS Technology
Evaluation of state of the art data replication
strategies. On-line distributed relational
database redesign. Distributed object-oriented
database systems - design (fragmentation,
allocation), query processing (methods execution,
transformation), transaction processing WWW and
Internet - transparency issues, implementation
strategies (architecture, scalability), On-line
transaction processing, On-line analytical
processing (data warehousing , data mining),
query processing (STRUDEL, WebSQL), commit
Research Issues - Applications
Workflow systems - High throughput (supply chain,
Amazon,..) short, sweet, and robust versus ad-hoc
(office automation) problem solving. Electronic
commerce - reliable high throughput, distributed
transactions. Distributed multimedia - QoS,
real-time delivery, design and data allocation,
MPEG-4 aspects.