Distributed Database research group presentation

About This Presentation

Transcript and Presenter's Notes

Title: Distributed Database research group

1
Distributed Database research group
In the name of God

Instructor Dr. M. Rahgouzar
Samira Tasharofi
Reza Basseda

2
Outline

Introduction
Distributed Data Storage
Distributed Transaction
Commit Protocols
Concurrency Control in Distributed Database
Availability
Distributed Query Processing
Heterogeneous Distributed Databases
Directory Systems
Conclusion
Acknowledgement

3
Introduction

Distributed computing
Consists of a number of processing elements (not
necessarily homogeneous ) that interconnected by
a computer network and that co-operate in
performing their assigned tasks
Distributed Database
Database whose relations reside on different
sites
Database some of whose relations are replicated
at different sites
Database whose relations are split between
different sites

4
Introduction

Distributed database management system
Software system that permits the management of
distributed database system
Advantages
Local autonomy
Improved performance (by proper fragmentation)
Improved reliability/availability (by
replication)
Greater expandability
Greater shareability

5
Introduction (Cont.)

Disadvantages
Higher complexity
Higher software and hardware cost
Synchronization and co-ordination among the sites
Higher maintenance overhead in case of
replication
Greater security problem

6
Distributed Data Storage

On Dynamic Fragmentation of Distributed Databases
Using Partial Replication D. Pinto, G. Torres
,2002
Fragmentation
Three kinds of fragmentation Vertical
Fragmentation, Horizontal Fragmentation and Mixed
Fragmentation
Solution
horizontal fragmentation with partial replication
RBy2(bound) Algorithm
1. For each query requested, a slave computer
increments a counter (ctr) for the user that have
made the request.
2. If ctr reaches bound number (parameter of this
algorithm), then this computer is a candidate to
have a set of records replicated and need to
follow steps 3 and 4, else step 5.
3. Request the set of records that the user is
asking for and save this information into the
slave database.
4. Reset the user local counter to cero.
5. end
Local access database before sending the query to
the master computer

7
Distributed Data Storage (Cont.)

The algorithm is based on two techniques
Slave-Master Search, to provide fast access to
database on user queries
Replication slave-master (two computers) in order
to get availability
Allows database availability even when the
connection between slave and master database is
broken
Proved on local network
Future Work
Check how the algorithm works under dial-up
connection

8
Distributed Data Storage (Cont.)

Transparent Data Relocation in Highly Available
Distributed Systems,S. Voulgaris, M.V. Steen, A.
Baggio, and G. Ballintjn, 2002
Management issue of distributed services
redistribution of non-replicated data among the
servers comprising a distributed service
Redistribute the data without disrupting the
services availability
Solution Base
Shipping the data records that need to be
relocated to their new hosting server
Updating the servers mapping information to
reflect the new configuration of the distributed
service

9
Distributed Data Storage (Cont.)

Solution For a Single Redistribution
Initialization
Distribute new mapping M
Record Relocation
Termination
Replace M with M
Solution for Overlapping Redistributions
Per-server Sequential Redistribution
Using redistribution R2 after R1 completed
Per-server Mixed but Ordered Redistributions
The server ships each record as soon as possible,
based on the virtual mapping with first
preference
Direct Shipping to Final Destination

10
Distributed Data Storage (Cont.)

Advantages
low delays in the servicing of client requests
during a configuration change
Adding no significant processing requirement to
the servers involved, and terminates in a timely
fashion
conceptual simplicity
sequential concurrent versions

11
Transaction Management

Ensuring Relaxed Atomicity for Flexible
Transaction in Multidatabase Systems, A.Zhang,
M.Nodine, B.Bhargava, O.Bukhres, ACM
Global transaction A set of sub transactions,
where each sub transaction is a transaction
accessing the data items at a single local site
Flexible global transaction Specifying
Definitions of
Execution ordering dependencies between two sub
transaction
Alternating dependencies between two subsets of
sub transactions
Eliminate prepare-to-commit stage
Sub transaction
Retriable
Compensable
Pivot
In global transactions at most one sub
transaction can be pivot

12
Transaction Management (Cont.)

Semi-atomicity in Flexible Transaction Weaker
than atomicity in global transaction
All its sub transaction in one lt-rpo commit and
all attempted sub transactions not in the
committed lt-rpo are either aborted or have their
effects undone
No partial effects of its sub transactions remain
permanent in local database
Ensuring semi-atomicity
Using retry and compensetable techniques
Flexible transaction Advantages
Enhances the scope of global transaction
management beyond that offered by the traditional
global transaction model
Blocking that caused by two-phase commit can be
prevented

13
Transaction Management (Cont.)

Global scheduling for flexible Transactions in
Heterogeneous distributed Database Systems, A.
Zhang, M. Nodine, B. Bhargava
Global Serializability
If the projection of committed local, flexible
and surplus transactions is conflict equivalent
to some serial execution of these transactions
Compensation-interference free
For any sub transaction tj which is serialized
between a subtransaction ti and its compensating
transaction cti in S, WC(ti) /\ AC(tj) 0

14
Transaction Management (Cont.)

F-serializability
Prevents the flexible transactions which are
serialized between a flexible transaction and its
compensating sub transactions to affect any data
items that have been updated by flexible
transaction (global serializable and compensation
free)
Avoids unnecessary abort or compensation
Scheduling Protocols
Stored Sub transaction Execution Graph (SSEG)
Avoid cascading abort

15
Commit Protocols

A Two-Phase Commit Protocol for Mobile Wireless
Environment, N. Nouali, A. Doucet, H. Drias, 2005
IF the traditional 2PC is executed in mobile
environment, disconnections will increase the
number of, may be wrong, abortion decisions of
transaction because if a FH tries to communicate
with it a disconnected MH this will cause a
failure
Disconnections are not exceptions but rather are
part of the normal mode of operation, so they
should not be treated as failures

16
Commit Protocols (Cont.)

The case of mobile client and fixed servers,
Fixed Coordinator
To mitigate the unforeseeable breakdowns, the
client must force-write the identity and location
information of the coordinator (commit-BS) just
before sending the commit-request
Only one force-write is needed to record the
coordinator information during the entire
execution of Atomic commitment Protocol (ACP)

17
Commit Protocols (Cont.)

The case of mobile client and mobile servers
The participant-agent is responsible of
transmitting the result to the participant at
reconnection time and also of keeping logs and
eventually recovering in the case of failure
When participant registers to a new BS, the
participant MH (or mobile participant) informs
its participant-agent about its new location
Workload is shifted to the fixed part of the
network thus preserving processing power and
communication resources and minimizing traffic
cost over the wireless links

18
Commit Protocols (Cont.)

Reducing the Latency of Non-Blocking Commitment
using Optimism and Replication, R.J.Peris,
M.P.Martinez,G.Alonso, S.Arevalo, 2001
2Phase Commit Blocking
3Phase Commit
Sending too much messages
Low performance
Flushing log records adds to the overall latency
as messages cannot be sent or responded to before
writing to the log
Delay is reduced by allowing sites to send
messages instead of flushing log records

19
Commit Protocols (Cont.)

Solution
To use the main memory of a replicated group as
stable memory instead of a mirrored log with
careful writes
The group of the participating transaction
managers (the TM group)
A replicated group providing the commit service
that acts as coordinator (the CS group)
A participant does not wait to flush its log,
instead it uniformly multicasts its vote together
with its log entry

20
Commit Protocols (Cont.)

If the message corresponds to the last vote, and
all were yes votes, the transaction is
optimistically committed, and the fact is
communicated to the TM group
The optimistic commit changes the locks held by
the transaction to opt-mode
Do not allow to the holding transaction to
commit until the transaction that released them,
definitively commits

21
Commit Protocols (Cont.)

The message is optimistically delivered right
away without waiting for the stabilization of the
message (e.g., waiting for the message to be
received by all the members of the group)

22
Commit Protocols (Cont.)

The PROMPT Real-Time Commit Protocol,
J.H.Harista, K.Ramamritham, R.Gupta,IEEE,1999
Distributed commit processing can have
considerably more effect than distributed data
processing on real-time performance
PROMPT
New commit protocol in real-time distributed
transactions
Preventing borrowers from continuing to execute
if their associated lenders had not been received
their decisions was addressed by incorporating an
additional bit and message that informed the
master about the borrowing state and the
completion of borrowing by a cohort

23
Commit Protocols (Cont.)

Features
Controlled optimistic access to uncommitted data
Reduce data inaccessibility and priority
inversion
Active abort
Cohorts inform the master as soon as they decide
to abort locally
Silent kill
Aborts due to deadline misses that occur before
the master has initiated the commit protocol are
implemented silently
Healthy-Lending
Health factor deadline of transaction
Using HF to decide whether transaction can lend
its data (best choice)

24
Commit Protocols (Cont.)

One-Phase Real-Time Commit Protocols, P.Saha,
1999
Comparing one-phase protocols (e.g. EP) and
PROMPT, the best-performing two-phase real-time
commit protocol
For parallel distributed transaction, EP
outperforms PROMPT
For sequential distributed transaction, EP
perform rather poorly
For high workload cases EP performs better
Future Works
Addressing the security considerations in
Multi-Level Secure (MLS) distributed RTDBS
Combination of EP and PROMPT

25
Concurrency Control in Distributed Databases

Distributed Concurrency Control Performance A
Study of Algorithms, Distribution, and
Replication
Michael J. Carey Miron Livny
It express Distributed Concurrency Control
Algorithms and evaluate their performance in some
of conditions. At the start, It describe
Concurrency Control classic algorithms such as
2PL, Wound-Wait, Basic Timestamp ordering and
discussed on the structure of distributed
concurrency control algorithm. Then it suggest a
basic model for DDB and has some experiments with
those algorithm and evaluate the model.

26
Concurrency Control in Distributed Databases
(Cont.)

Concurrency Control in Distributed Database
Systems
PHILIP A. BERNSTEIN NATHAN GOODMAN
It explain mythological proofs for Distributed
Concurrency Control Algorithms and theoretically
evaluate them. It define serializablity in DDB
and define a formal language to formulate
transactions in DDB. Then it express each of
classic algorithms in his formal language and
prove their correctness and completeness.

27
Concurrency Control in Distributed Databases
(Cont.)

Concurrency Control in Distributed
Object-Oriented Database Systems
Kjetil Nørvag Olav Sandsta and Kjell
Bratbergsengen
The simulation results in this paper is a
comparison of performance and response times for
two concurrency control algorithms, timestamp
ordering and two-phase locking. The simulations
have been run with different number of nodes,
network types, data declustering and workloads.
The results show that for a mix of small and long
transactions, the throughput is significantly
higher for a system with a timestamp ordering
scheduler than for a system with a two-phase
locking scheduler.
Implementing Atomic Actions on Decentralized Data
DAVID P. REED
Its a general survey on Concurrency Control in
DDB and describe classic method.

28
Concurrency Control in Distributed Databases
(Cont.)

Dynamic Voting Algorithms for Maintaining the
Consistency of a Replicated Database
SUSHIL JAJODIA DAVID MUTCHLER
The best known pessimistic algorithm, voting, is
a static algorithm, meaning that all potential
distinguished partitions can be listed in
advance. It presents a dynamic extension of
voting called dynamic voting. This algorithm
permits updates in a partition provided it
contains more than half of the up-to-date copies
of the replicated file. It also presents an
extension of dynamic voting called dynamic voting
with linearly ordered copies (abbreviated as
dynamic-linear). These algorithms are dynamic
because the order in which past distinguished
partitions were created plays a role in the
selection of the next distinguished partition.

29
Concurrency Control in Distributed Databases
(Cont.)

Deadlock Detection in Distributed Databases
EDGAR KNAPP
This paper is concerned only with the aspect of
deadlock detection. Recent developments in the
area of distributed deadlock detection algorithms
are surveyed, with a special emphasis on their
relation to distributed DBSs. The paper
introduces a uniform framework for the discussion
of these algorithms. The abstraction achieved
this way allows us to talk about the algorithms
in terms of the underlying theoretical concepts,
instead of just giving a phenomenon-logical
description of the workings of the algorithms.

30
Concurrency Control in Distributed Databases
(Cont.)

MODELS OF A VERY LARGE DISTRIBUTED DATABASE
Mark Blakey
The best known pessimistic algorithm, voting, is
a static algorithm, meaning that all potential
distinguished partitions can be listed in
advance. It presents a dynamic extension of
voting called dynamic voting. This algorithm
permits updates in a partition provided it
contains more than half of the up-to-date copies
of the replicated file. It also presents an
extension of dynamic voting called dynamic voting
with linearly ordered copies (abbreviated as
dynamic-linear). These algorithms are dynamic
because the order in which past distinguished
partitions were created plays a role in the
selection of the next distinguished partition.

31
Concurrency Control in Distributed Databases
(Cont.)

Performance Study of a Centralized Concurrency
Control Algorithm for Distributed Database
Systems using SIMULA
K. Viswanathan Iyer L. M. Patnaik
One objective of this paper is to elaborate the
simulation methodology using SIMULA. Detailed
studies have been carried out on a centralized CC
algorithm and its modified version. The results
compare well with a previously reported study on
these algorithms. Here, additional results
concerning the update intensiveness of
transactions and the degree of conflict are
obtained. The degree of conflict is
quantitatively measured and it is seen to be a
useful performance index. It seems that, It is
going to formulate the effectiveness of
Concurrency Control Algorithm and it focused on
the behavior of a class of performance index.

32
Availability

Maintaining Availability in Partitioned
Replicated Databases
AMR EL ABBADI SAM TOUEG
It describes a new replica control protocol that
allows the accessing of data in spite of site
failures and network partitioning. It claims that
this protocol provides the database designer with
a large degree of flexibility in deciding the
degree of data availability, as well as the cost
of accessing data.
Providing High Availability Using Lazy
Replication
RIVKA LADIN BARBARA LISKOV SANJAY GHEMAWAT
This paper describes a new technique that
supports causal order. An operation call is
executed at just one replica updating of other
replicas happens by lazy exchange of gossip
messageshence the name lazy replication. The
replicated service continues to provide service
in spite of node failures and network partitions.

33
Distributed Query Processing

Query Brokers for Distributed And Flexible Query
Evaluation
Tuyet-Trinh yu Christine Collet
This paper provides an approach for designing
query processor of a DDB by using hierarchical
mediators and using query brokers which translate
a global query in DDB context to local queries
Query Decomposition, Optimization and Processing
in Multidatabase Systems
Cem Evrendilek Asuman Dogac
This paper suggest an approach to decomposing
queries in a optimized manner. In this way, we
need to dynamically calculate cost of query
processing in every sites for all of the sub
queries and using this factors in calculating
minimum cost.

34
Distributed Query Processing (Cont.)

Dynamically Distributed Query Evaluation
Trevor Jim Dan Suciu
This paper provides an approach for evaluation of
queries over the web and a directory system
dynamically. It provides a language for
explaining information requirements over a multi
database system. It uses this language for
defining a DDB and its queries in a formal way.
So it suggest an algorithm for dynamic query
evaluation in DDB and by this logic it proves its
algorithms correctness.

35
Distributed Query Processing (Cont.)

Database Connectivity Using an Agent-Based
Mediator System
Larry M. Stephenes Michael N Huhns
This paper provides an Agent-Based approach for
managing a DDB. It uses Agents as proactive
components which include KB about system and have
reaction to topology changes to manage query
processing and concurrency control , . It uses
KQML and a specific coordination strategy for
this system.

36
Distributed Query Processing (Cont.)

Optimizing Equijoin Queries In Distributed
Databases Where Relations Are t-lash Partitioned
DENNIS SHASHA TSONG-LI WANG
It studies the optimization problem that arises
when the query processor must repartition the
relations and intermediate results participating
in a multi join query. Using estimates of the
sizes of intermediate relations, it shows (1)
optimum solutions for closed chain queries (2)
the NP-completeness of the optimization problem
for star, tree, and general graph queries and
(3) effective heuristics for these hard cases.

37
Conclusion

DDB is a mature topic and many model provided for
expressing its approaches such as concurrency
control, availability ,
Various models using to prove correctness of its
approaches in concurrency controls ,
It is faced with many of DB problems in a new
viewpoint because of its distribution
Legacy system and wrapper design to

38
Conclusion (Cont.)

High availability by increasing replication vs.
performance of transaction management and
concurrency control A trade off
DDB system Concurrency control open problems
Improving recent approaches to increase
availability with performance
Improving distributed query evaluation over large
distributions

39
Conclusion (Cont.)

Tuple routing over a distribution and data
distribution with high performance and
performance evaluation factors in DDB
Using autonomous components to manage a DDB and
using AI in peer-to-peer DDB

Distributed Database research group PowerPoint PPT Presentation