Distributed Database Design - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Distributed Database Design

Description:

Distributed Database Design. What is to be Designed? ... The join can be processed in a genuinely distributed fashion. Join Graphs ... – PowerPoint PPT presentation

Number of Views:854
Avg rating:3.0/5.0
Slides: 39
Provided by: scie192
Category:

less

Transcript and Presenter's Notes

Title: Distributed Database Design


1
Distributed Database Design
2
What is to be Designed?
  • In designing a distributed database, the same
    issues are faced as for a centralized database
    plus, in addition
  • Fragmentation (how relations are to be logically
    divided up, if at all)
  • Allocation (how relations/fragments are to be
    assigned to sites)
  • Replication (if, and where, relations/fragments
    should be replicated)

3
Fragmentation
  • There are two basic ways in which a relation can
    be fragmented
  • Horizontally (each fragment contains a subset of
    the tuples/rows of the relation)
  • Vertically (each fragment contains a subset of
    the attributes/columns of the relation)

4
Hybrid (Mixed) Fragmentation
Relation A
Fragment H1
Fragment H2V2
Fragment H2V1H1
Fragment H2V1H2
Fragment H2V1H3
5
Horizontal Fragmentation
  • Horizontal fragments are defined according to a
    condition or predicate that the tuples in it
    satisfy
  • For example, the team relation of the sports club
    database might be horizontally fragmented
    according to the values of the attribute club.

6
Vertical Fragmentation
  • Vertical fragments contain a subset of the
    attributes of the original relation
  • Note, however, an important point Every vertical
    fragment should contain all of the prime
    attribute(s) - the primary key - of the original
    relation
  • Why?

7
Correctness of Fragmentation
  • A fragmentation, if it is to be correct, should
    obey certain conditions or rules
  • Completeness If it is in the original relation
    then it is also in one of the fragments
  • Reconstruction The original relation can be
    correctly rebuilt from the fragments
  • Disjointness Data in one fragment should not
    also be present in another, except for the
    primary key attributes in vertical fragmentation

8
Derived Horizontal Fragmentation
  • When a horizontal fragmentation is defined by a
    predicate that is directly applicable to the
    tuples of the relation, it is a primary
    horizontal fragmentation
  • When the fragmentation is defined by a predicate
    that is applied to the tuples in a different
    relation, it is a derived horizontal fragmentation

9
Semi-Joins Derived Horizontal Fragmentation
  • A derived horizontal fragmentation can be
    formally specified using a semi-join
  • A semi-join is a natural join that retains the
    attributes of only one of the participating
    relations
  • It is denoted by the symbol
  • For example

10
Degree of Fragmentation
  • What determines how far we should go in
    fragmenting relations?
  • The answer lies not in the data, but in the
    applications (i.e. queries) that access it
  • We want to fragment to get the best possible
    result for all applications/queries but what
    does best possible mean?

11
Factors Affecting Fragmentation
  • The two primary factors affecting how we fragment
    are
  • The database structure (global conceptual schema)
  • The application characteristics (predicates used,
    locations and frequencies of use)
  • Other important aspects are the network
    characteristics and computer processing power

12
Applications Simple Predicates
  • Applications (think of them as queries) that
    access a database often select out a subset of
    the data according to a predicate a Boolean
    expression
  • A simple predicate pj is one of the form
  • pj Ai ? Value
  • where Ai is an attribute, ? is a comparison
    operator, and Value is a domain value

13
Horizontal Fragmentation Minterm Predicates
  • In designing horizontal fragmentation we need to
    identify the chunks of tuples in the database
    that are accessed as a unit by applications/querie
    s
  • Each chunk is defined by a minterm predicate,
    which is a combination of simple predicates
  • These chunks are then candidates for horizontal
    fragments

14
Minterm Predicates
  • It is sometimes (often?) possible to eliminate
    some minterm predicates, based on our
    understanding of the database semantics
  • This occurs when one minterm predicate implies
    another

15
Minterm Predicates (cont)
  • As well as knowing what the minterm predicates
    are, it is important to know (typically) how many
    tuples would be selected by each, and
  • How frequently the tuples selected by each are
    accessed (this is closely related to query access
    frequency)

16
Derived Horizontal Fragmentation Again
  • It often happens that a relation can have
    alternate, and conflicting, possible derived
    fragmentations
  • This may occur when it is related to two (or
    more) parent relations through foreign keys

17
Choosing a Derived Fragmentation
  • Given two (or more) possible derived
    fragmentations, how to choose between them?
  • The two factors are
  • Application usage (the more frequently accessed
    minterm is generally preferable)
  • Join characteristics (the design that gives
    better overall join performance is preferable)

18
Join Characteristics
  • The performance of a join in a distributed
    database benefits when
  • The relations, or fragments, to be joined are
    small (i.e. have relatively few tuples)
  • The join can be processed in a genuinely
    distributed fashion

19
Join Graphs
  • A join graph is a pictorial way of showing which
    fragments of two relations participating in a
    join might produce non-empty results
  • Each fragment is represented by a node and those
    fragments that produce a non-empty join result
    are connected by an edge (line)

20
A Simple Join Graph
T1
PF1
T2
PF2
Fragments of team
Fragments of plays_for
T3
PF3
T4
PF4
21
A Total Join Graph
P1
PF1
P2
PF2
Fragments of player
Fragments of plays_for
P3
PF3
P4
PF4
P5
22
A Partitioned Join Graph
P1
PF1
P2
PF2
Fragments of player
Fragments of plays_for
P3
PF3
P4
PF4
P5
23
Vertical Fragmentation
  • Analogously to horizontal fragmentation, the
    purpose of vertical fragmentation is to group
    together attributes into chunks that are
    usually accessed together by user
    applications/queries
  • Vertical fragmentation is more complex to design
    than horizontal fragmentation

24
Approaches to Vertical Fragmentation
  • There are two basic approaches to vertical
    fragmentation
  • Splitting originally designed relations
  • Grouping individual attributes from scratch
  • The first is preferred since
  • The original relations (should) be well designed
  • Appropriate fragments are likely to be closer to
    relation size than individual attribute size

25
Application Information for Vertical Fragmentation
  • The application information needed for vertical
    fragmentation design is
  • Which applications/queries use which attributes
  • The frequencies with which the various
    applications/queries are executed
  • The first of these can be captured in an
    attribute usage matrix, then use of the second
    results in a attribute affinity matrix

26
The Player Relation
  • Suppose we wish to determine how the
    relationplayer(Name, Height, Gender, Address,
    Weight, DOB, Telephone)should be vertically
    fragmented
  • The queries that access player are shown on the
    next slide

27
Queries on Player
  • q1 SELECT Name, DOB, Address, Telephone FROM
    player WHERE Gender
  • q2 SELECT avg(Height), avg(Weight) FROM player
    WHERE Gender
  • q3 SELECT Name, Height, Weight, DOB FROM player
    WHERE Name LIKE
  • q4 SELECT Name, Address, Telephone FROM player
    WHERE Name

28
Attribute Usage Matrix
  • This is a matrix that has attributes along one
    axis and queries along the other
  • It contains a 1 in positions for which the
    relevant query accesses the attribute of
    interest, and 0 otherwise
  • The matrix for player and queries q1 to q4 is
    shown on the next slide

29
Attribute Usage Matrix
30
Attribute Affinity Matrix
  • This is a matrix that has the attributes of the
    relation we are dealing with on both axes
  • To construct the matrix we need to know the
    frequency of accesses (across all sites) to each
    query
  • Then we can work out how often (relatively) each
    pair of attributes are accessed together

31
Query Characteristics
  • Suppose there are three sites accessing the
    player relation, with these relative
    frequencies

32
Attribute Affinity Matrix
33
Attribute Clustering
  • The next step is to group together the
    attribute(s) that have an affinity for each other
    this results in a clustered affinity matrix
  • There is a formal, so-called bond energy,
    algorithm for this (given in Özsu Valduriez)
    but we present an informal version that is easier
    to understand although it is not, of course,
    suitable for realistically complex situations

34
Clustered Attribute Affinity Matrix
35
Hybrid Fragmentation
  • The nature of the applications/queries may
    require a more complex fragmentation then just a
    simple horizontal or vertical one
  • The technique of fragmentation is, however,
    basically identical (e.g. apply the horizontal
    fragmentation technique followed by the vertical
    fragmentation method to one or more of the
    horizontal fragments)

36
Allocation
  • The general problem of determining the optimal
    site(s) to allocate the various fragments to is
    impossibly complex
  • Clearly each fragment has to be stored at one
    site at least however, there is also the choice
    of replicating it at one or more other sites as
    well

37
Aim of Allocation
  • The fundamental aim of allocation is to minimize
    some cost function that may include
  • Inter-site communication costs
  • Processing costs
  • Data retrieval and updating costs
  • Data storage costs

38
Factors Affecting Allocation
  • Factors that will influence the final allocation
    include
  • Expected fragment sizes
  • Query selectivity
  • Ratio of retrievals to updates for each fragment
  • The originating site(s) of queries/updates
  • Site storage and processing capabilities
  • Inter-site network communication capacities
Write a Comment
User Comments (0)
About PowerShow.com