Distributed Databases - PowerPoint PPT Presentation


PPT – Distributed Databases PowerPoint presentation | free to download - id: 4ad40b-NjJjM


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Distributed Databases


Distributed Databases John Ortiz Distributed Databases Distributed Database (DDB) is a collection of interrelated databases interconnected by a computer network ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 26
Provided by: Prof599
Learn more at: http://www.cs.utsa.edu


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Distributed Databases

Distributed Databases
  • John Ortiz

Distributed Databases
  • Distributed Database (DDB) is a collection of
    interrelated databases interconnected by a
    computer network
  • Distributed Database Management System (DDBMS) is
    software which manages a distributed database
  • World Wide Web technology does not yet constitute
    a DDB by our definition

Advantages of a DDB
  • Supports various levels of transparency
  • Distribution (network) transparency
  • Degree to which user is unaware of the networked
    nature of the DB
  • Replication transparency
  • Degree to which user is unaware of copies of the
  • Fragmentation transparency
  • Degree to which user is unaware the DB is broken
    into pieces

Advantages of a DDB
  • Increased Reliability and Availability
  • Reliability probability a system is running at
    a particular point in time
  • Availability probability a system is
    continuously available during a time interval

Advantages of a DDB
  • Improved Performance
  • Supports data localization data is kept near
    where it is most often used to reduce affects of
    network delay
  • Easier Expansion
  • Adding more data, increasing DB size, adding
    resources is easier
  • Reduced Operation Costs (when considering a
    mainframe system)
  • cheaper to add workstations than a new mainframe

Advantages of a DDB
  • No Single Point of Failure
  • When one computer fails, others can take its place

Disadvantages of a DDB
  • Significant increase in complexity
  • Normalization, query optimization, security,
    transaction processing, concurrency control,
    crash recovery, etc. ALL become much more
    difficult to handle
  • Increased storage requirements
  • Since multiple copies of various portions of the
    DB exist, more storage space is required

Data Fragmentation
  • Fragmentation is the division of the database
    into pieces stored at different sites
  • Horizontal Fragmentation a subset of tuples in
    a particular relation
  • the result of a query which SELECTS some tuples,
    but not others produces a horizontal fragment
  • In a DDB, the output from the previous query may
    be stored as a separate DB at a separate site
  • Requires a UNION to recombine information

Data Fragmentation
  • Vertical Fragmentation a subset of attributes
    of a particular relation
  • The result of a query which PROJECTS certain,
    specific attributes
  • Requires an outer join (or an outer union) to
    recombine information
  • Hybrid Fragmentation can you guess?
  • Includes both horizontal and vertical
  • Complete fragmentation simply means all
    tuples/attributes are in the result
  • A fragmentation schema

Data Fragmentation
  • A fragmentation schema is a definition of the set
    of fragments that includes all attributes and
    tuples sufficient to reconstruct the DB
  • An allocation schema describes which fragments
    are at what sites

Data Replication
  • Replication is the creation of copies of the DB
  • A DDB may be fully replicated (a copy of the
    entire DB is made at each site)
  • Why would you want to make a full copy of a DDB?
  • A DDB may have no replication (each fragment is
    stored at one and only one site)
  • Naturally, a DDB may be partially replicated
  • A replication schema is a description of what
    pieces are copied at which sites

Data Replication
  • Replication creates new consistency and
    redundancy problems
  • Every piece of data that is replicated is
    redundant, and therefore subject to be
  • These copies may be updated separately which
    causes inconsistency
  • How much inconsistency acceptable?

  • Synchronization is the process of of updating the
    individual replicas
  • Since pieces are stored in different places, the
    DDB must periodically be made consistent
  • Synchronization can be expensive in terms of
    network resources and time
  • It is not simply copying one replica to another
    most recent updates on both copies being
    synchronized must be accounted for
  • P.775 - 778 in the text has an example of a DDB

US Air Force Email
  • We have noted in the past that there are many
    types of databases such as spreadsheets, address
    books, and even documents (such as MS Word)
  • Consider the AF with approximately 500,000 people
    who all have email addresses and need to
  • They have constructed a global email address book
    and make use of replication
  • The AF is divided into levels global, command,

US Air Force Email
  • Initially the bases were each set up with email
    and interconnected via the network
  • However, you had to know the email address of
    anyone at a different base
  • Eventually, each command (a group of related
    bases) set up an address book consisting of all
    the bases
  • Each base maintains a complete replica of the
    entire commands address book
  • Why not just a piece?

US Air Force Email
  • The DB is synchronized each night
  • So, when someone moves, their email address is
    removed from the local copy
  • All the other bases will still have that old
    email address until the next day, at which point
    the DDB is consistent again
  • I believe that now the entire AF address book is
    available at each base
  • Not sure how often it is synchronized, perhaps
  • Search for an email address is quick

US Air Force Email
  • Search for an email address is quick since a
    local copy is kept
  • This reduces network traffic considerably
    compared with everyone having to search a
    centralized DB for email addresses

Query Processing in DDB
  • When we looked at query processing before, the
    largest delay was with the disk
  • Now, that same concept is extended to include
    network delay which can be much longer
  • Suppose the EMPLOYEE DB (10,000 records, 100
    bytes each) is at site 1, and the DEPARTMENT DB
    (100 records, 35 bytes each) is at site 2
  • YOU are at site 3
  • Assume result is 400,000 bytes

Query Processing in DDB
  • SELECT E_Name
  • WHERE DeptNum 5
  • There are 3 strategies
  • 1) Txfr both DBs to site 3 to perform the query
  • (1,003,500 bytes txfrd)
  • 2) Txfr EMPLOYEE to site 2, perform the query,
    txfr result to site 3 (1,400,000 bytes txfrd)
  • 3) Txfr DEPARTMENT to site 1, perform the query,
    txfr result to site 3 (403,500 bytes)

Query Processing using Semijoin
  • Rather than sending the entire set of records to
    be joined, we could just send the joining
  • Then the join is performed and the join
    attributes as well as the attributes projected,
    can be transferred to the requesting site
  • The semijoin is symbolized as
  • NOTE
  • R S S R
  • Substantially reduces amount of data txfrd

Concurrency Control and Recovery
  • Dealing with multiple copies
  • Failure of individual sites
  • Failure of network
  • Distributed commit is more complicated
  • Deadlock is more difficult to detect and prevent
  • A number of techniques have been proposed to deal
    with these problems

Distinguished Copy
  • The locks for a data item are associated with the
    distinguished copy
  • There are several distinguished copy variations
  • Primary site (with backup)
  • One site is the chosen one and coordinates
    locking activities (centralized locking)
  • Primary copy
  • Various fragments at different sites are chosen
    as the distinguished copy this distributes the
    locking problem

Distributed Recovery
  • Very complex
  • Suppose that X sends a request to Y there may
    be a number of reasons the request was not
  • Message was never delivered
  • Site Y is down
  • Site Y sent a response but the response was not

  • Re-read the first 23 slides!
  • Advantages/Disadvantages of a DDB
  • The 3 Transparencies network, replication,
  • Fragmentation
  • Replication and Synchronization
  • Query Processing in a DDB
  • Semijoin
  • Concurrency Control and Recovery

Primary Site Technique
About PowerShow.com