Distributed Databases - PowerPoint PPT Presentation

Loading...

PPT – Distributed Databases PowerPoint presentation | free to download - id: 4ad40b-NjJjM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Distributed Databases

Description:

Distributed Databases John Ortiz Distributed Databases Distributed Database (DDB) is a collection of interrelated databases interconnected by a computer network ... – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 26
Provided by: Prof599
Learn more at: http://www.cs.utsa.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Distributed Databases


1
Distributed Databases
  • John Ortiz

2
Distributed Databases
  • Distributed Database (DDB) is a collection of
    interrelated databases interconnected by a
    computer network
  • Distributed Database Management System (DDBMS) is
    software which manages a distributed database
  • World Wide Web technology does not yet constitute
    a DDB by our definition

3
Advantages of a DDB
  • Supports various levels of transparency
  • Distribution (network) transparency
  • Degree to which user is unaware of the networked
    nature of the DB
  • Replication transparency
  • Degree to which user is unaware of copies of the
    DB
  • Fragmentation transparency
  • Degree to which user is unaware the DB is broken
    into pieces

4
Advantages of a DDB
  • Increased Reliability and Availability
  • Reliability probability a system is running at
    a particular point in time
  • Availability probability a system is
    continuously available during a time interval

5
Advantages of a DDB
  • Improved Performance
  • Supports data localization data is kept near
    where it is most often used to reduce affects of
    network delay
  • Easier Expansion
  • Adding more data, increasing DB size, adding
    resources is easier
  • Reduced Operation Costs (when considering a
    mainframe system)
  • cheaper to add workstations than a new mainframe
    computer

6
Advantages of a DDB
  • No Single Point of Failure
  • When one computer fails, others can take its place

7
Disadvantages of a DDB
  • Significant increase in complexity
  • Normalization, query optimization, security,
    transaction processing, concurrency control,
    crash recovery, etc. ALL become much more
    difficult to handle
  • Increased storage requirements
  • Since multiple copies of various portions of the
    DB exist, more storage space is required

8
Data Fragmentation
  • Fragmentation is the division of the database
    into pieces stored at different sites
  • Horizontal Fragmentation a subset of tuples in
    a particular relation
  • the result of a query which SELECTS some tuples,
    but not others produces a horizontal fragment
  • In a DDB, the output from the previous query may
    be stored as a separate DB at a separate site
  • Requires a UNION to recombine information

9
Data Fragmentation
  • Vertical Fragmentation a subset of attributes
    of a particular relation
  • The result of a query which PROJECTS certain,
    specific attributes
  • Requires an outer join (or an outer union) to
    recombine information
  • Hybrid Fragmentation can you guess?
  • Includes both horizontal and vertical
    fragmentation
  • Complete fragmentation simply means all
    tuples/attributes are in the result
  • A fragmentation schema

10
Data Fragmentation
  • A fragmentation schema is a definition of the set
    of fragments that includes all attributes and
    tuples sufficient to reconstruct the DB
  • An allocation schema describes which fragments
    are at what sites

11
Data Replication
  • Replication is the creation of copies of the DB
  • A DDB may be fully replicated (a copy of the
    entire DB is made at each site)
  • Why would you want to make a full copy of a DDB?
  • A DDB may have no replication (each fragment is
    stored at one and only one site)
  • Naturally, a DDB may be partially replicated
  • A replication schema is a description of what
    pieces are copied at which sites

12
Data Replication
  • Replication creates new consistency and
    redundancy problems
  • Every piece of data that is replicated is
    redundant, and therefore subject to be
    inconsistent
  • These copies may be updated separately which
    causes inconsistency
  • How much inconsistency acceptable?

13
Synchronization
  • Synchronization is the process of of updating the
    individual replicas
  • Since pieces are stored in different places, the
    DDB must periodically be made consistent
  • Synchronization can be expensive in terms of
    network resources and time
  • It is not simply copying one replica to another
    most recent updates on both copies being
    synchronized must be accounted for
  • P.775 - 778 in the text has an example of a DDB

14
US Air Force Email
  • We have noted in the past that there are many
    types of databases such as spreadsheets, address
    books, and even documents (such as MS Word)
  • Consider the AF with approximately 500,000 people
    who all have email addresses and need to
    communicate
  • They have constructed a global email address book
    and make use of replication
  • The AF is divided into levels global, command,
    base

15
US Air Force Email
  • Initially the bases were each set up with email
    and interconnected via the network
  • However, you had to know the email address of
    anyone at a different base
  • Eventually, each command (a group of related
    bases) set up an address book consisting of all
    the bases
  • Each base maintains a complete replica of the
    entire commands address book
  • Why not just a piece?

16
US Air Force Email
  • The DB is synchronized each night
  • So, when someone moves, their email address is
    removed from the local copy
  • All the other bases will still have that old
    email address until the next day, at which point
    the DDB is consistent again
  • I believe that now the entire AF address book is
    available at each base
  • Not sure how often it is synchronized, perhaps
    weekly
  • Search for an email address is quick

17
US Air Force Email
  • Search for an email address is quick since a
    local copy is kept
  • This reduces network traffic considerably
    compared with everyone having to search a
    centralized DB for email addresses

18
Query Processing in DDB
  • When we looked at query processing before, the
    largest delay was with the disk
  • Now, that same concept is extended to include
    network delay which can be much longer
  • Suppose the EMPLOYEE DB (10,000 records, 100
    bytes each) is at site 1, and the DEPARTMENT DB
    (100 records, 35 bytes each) is at site 2
  • YOU are at site 3
  • Assume result is 400,000 bytes

19
Query Processing in DDB
  • SELECT E_Name
  • FROM EMPLOYEE
  • WHERE DeptNum 5
  • There are 3 strategies
  • 1) Txfr both DBs to site 3 to perform the query
  • (1,003,500 bytes txfrd)
  • 2) Txfr EMPLOYEE to site 2, perform the query,
    txfr result to site 3 (1,400,000 bytes txfrd)
  • 3) Txfr DEPARTMENT to site 1, perform the query,
    txfr result to site 3 (403,500 bytes)

20
Query Processing using Semijoin
  • Rather than sending the entire set of records to
    be joined, we could just send the joining
    attribute(s)
  • Then the join is performed and the join
    attributes as well as the attributes projected,
    can be transferred to the requesting site
  • The semijoin is symbolized as
  • NOTE
  • R S S R
  • Substantially reduces amount of data txfrd

21
Concurrency Control and Recovery
  • Dealing with multiple copies
  • Failure of individual sites
  • Failure of network
  • Distributed commit is more complicated
  • Deadlock is more difficult to detect and prevent
  • A number of techniques have been proposed to deal
    with these problems

22
Distinguished Copy
  • The locks for a data item are associated with the
    distinguished copy
  • There are several distinguished copy variations
  • Primary site (with backup)
  • One site is the chosen one and coordinates
    locking activities (centralized locking)
  • Primary copy
  • Various fragments at different sites are chosen
    as the distinguished copy this distributes the
    locking problem

23
Distributed Recovery
  • Very complex
  • Suppose that X sends a request to Y there may
    be a number of reasons the request was not
    granted
  • Message was never delivered
  • Site Y is down
  • Site Y sent a response but the response was not
    delivered

24
Summary
  • Re-read the first 23 slides!
  • Advantages/Disadvantages of a DDB
  • The 3 Transparencies network, replication,
    fragmentation
  • Fragmentation
  • Replication and Synchronization
  • Query Processing in a DDB
  • Semijoin
  • Concurrency Control and Recovery

25
Primary Site Technique
About PowerShow.com