Distributed Database Systems - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Distributed Database Systems

Description:

To provide independence of directory data elements. Different hardware and software environments ... Specific Drawbacks with Globally Replicated Directories ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 67
Provided by: milton48
Learn more at: https://www.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Distributed Database Systems


1
Distributed Database Systems
2
A Distributed Database on a Geographically
Dispersed Network
3
A Distributed Database on a Local Network
4
A Multi-Processor System
5
Types of Accesses to a Distributed Database
6
Distributed Access Plan
  • At site 1
  • Send sites 2 and 3 the supplier number SN
  • 2) At sites 2 and 3
  • Execute in parallel, upon receipt of the
    supplier number, the following program
  • Find all PARTS records having
  • SUP SN
  • Send result to site 1
  • 3) At Site 1
  • Merge results from sites 2 and 3
  • Output the result.

7
(No Transcript)
8
Components of a Commercial DDBMS
9
Data Distribution
  • Problem
  • Choose a unit of the logical database to use for
    assignment to data modules.
  • Possibilities
  • Relations Distribution issues will influence
    logical database design.
  • Columns Distribution issues will influence
    logical database design.
  • Rows Too many Directories become too
    large.
  • Data Items -Too many Directories become too
    large.

10
Data Distribution
11
Data Distribution
12
Data Distribution
Datamodules
DM1
DM2
DM3
F1
F2
F3
F1
F2
Personnel
Inventory
Assignment of Fragments to Datamodules
13
Data Distribution
  • Advantages of fragments as units of
    distribution.
  • Very flexible in size and definition.
  • Distribution choices are largely independent of
    logical design.

14
System Considerations
  • Reliable Network
  • Pipelining
  • Logical Data Items
  • Database Operations Read
  • Write
  • Transactions Read Set
  • Write Set
  • Atomic All or Nothing Effect

15
System Considerations (contd)
  • Each site in the DDBMS has one or both of the
    following software modules
  • Transaction Manager (TM)
  • Data Manager (DM)
  • TMs
  • Read, Parse, and Optimize user queries
  • Handle all interface with the user
  • DMs
  • Maintain physical database
  • Perform actual reads and writes

16
System Considerations (contd)
17
Transaction Execution
  • Transaction TMs Action.
  • Begin Set up temporary workspace.
  • Read (X) Select a DM which stores X,
  • Send a message to this DM requesting X,
  • Place X in workspace.
  • Read (X) No Action necessary
  • X is already in workspace.
  • Write (X) Change the value of X.
  • Read (X) No action necessary.
  • End Send a pre-commit to each DM that stores a
    copy of X,
  • Await acknowledgements,
  • Send commit message

18
Optimal File Allocation In A Distributed Database
System
  • Given a number of computers that process common
    information files, how can we
  • allocate the files optimally so that the
    allocation yields minimum overall operating costs
    (storage and communication)?
  • meet access time requirements for each file?
  • not exceed the storage capacity of each computer?
  • Note A File may be viewed as a segment.

19
System Parameters
  • n Computers
  • m Files
  • Size of each file
  • Usage distribution for each file at each computer
  • Frequency of modification of each file at each
    computer during usage
  • Access time requirement for each file at each
    computer
  • Storage capacity of each computer.
  • Cost of storage per unit file length per
    computer.
  • Cost of transmission per unit file length per
    second per pair of computers.

20
Model
  • COSTS
  • Total Cost Storage Costs Transmission Costs
  • TC CS CT
  • Transmission Costs Costs for Retrievals Cost
    for Updates
  • CT CTR CTU
  • CONSTRAINTS
  • Each file must be stored in at least one
    computer.
  • The storage capacity of each computer must not be
    exceeded.
  • The probability of exceeding the required access
    time for each file must be less than a specified
    bound.

21
Mathematical Representation Model
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Transmission Paths Between Each Pair of Computers
28
(No Transcript)
29
Reliability Constraint
  • Assuming processors and channels each have
    identical reliability,
  • ap availability of the processor
  • ac availability of the channel
  • rj of redundant copies of the jth file
  • Aj Availability of the jth file
  • Aj ap 1 - (1 - acap)rj
  • For example ap 0.98, ac 0.99, then
  • Aj 0.951 for rj 1
  • Aj 0.979 for rj 2

30
(No Transcript)
31
File Directory for Distributed Databases
32
User Transaction
DDBMS
Transaction Manager
Directory Manager
To Other Nodes
Database Manager
Directory Fragment
Database
Overview of the Directory Manager
33
Content of Directory
  • Global description
  • Fragmentation description
  • Allocation description
  • Mappings to local names
  • Access method description
  • Statistics on the database
  • Consistency information

34
Content of a Directory System
Security (File, User, C) CRead/Write Read
Only Write Only
Operation Compression ratio (Logical Operation
Query Data Value) Query Access
Optimizer Statistical Data Gathering Protocols
Logical (Dynamic) File Status (R, W) Number of
Backlog Jobs Site Availability Resource
Requirement Processing Cost Communication
Cost Translation Cost
Physical (Static) Location (Site, Copy , Disk,
Page) Creator Creation Date Version of the
File Size Code Format Date of Last Update
35
The Functional Objectives ofIntegrated
Dictionary/Directory
  • To support the control of data resources
  • Maintaining data independence, security, and
    integrity
  • To support applications development
  • Offering standardized data definitions and usage
    characteristics
  • Established program entities, DDL
  • To provide independence of directory data
    elements
  • Different hardware and software environments
  • Changes in these environments

36
Possible Data Types In IDD
  • Data names, definitions, formats and sizes.
  • Integrity constraints, authorization tables, and
    usage statistics for transaction management.
  • Schemas and sub-schemas.
  • Description of standardized transactions and
    reports.
  • Characteristics of hardware, such as processors,
    lines, and terminals.
  • Description of users.
  • The IDD must support the maintenance of
    relationships between various entities such as
  • Associations between
  • Authorization tables and data,
  • Users and transactions
  • Reports
  • The IDD supplies version control

37
(No Transcript)
38
Maximum Length 400 Characters
Relationship Created 820708
Contains
Payroll Record
Length 9 Characters
Figure 2
39
Schema Model Level
Schema Level
Dictionary Level
Typical Entities, Relationships, and Attributes
Typical Entity-Types, Relationship-Types,and
Attribute-Types
Typical Meta-Entity-Types
Social-Security-Number Agency-Name
Element
Employee Record Payroll Record
Entity-Type
Record
Form 1040 FIPS Guideline
Document
Payroll-Record-Contains-Employee-Name
Relationship-Type
Record-Contains-Element
Length
9 Characters
Attribute-Type
Creator
ADP Division
Table 1
40
Classes of Directory
  • Centralized Directory
  • Single Master Directory
  • Extended Centralized Directory
  • Multiple Master Directory
  • Local Directory
  • Distributed Directory

41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Causes For Directory Update
  • Changing the description or structure of the user
    database.
  • Moving user database entities from one node to
    another.
  • Changing the description of a user or node.
  • Changing a user view.
  • Changing a network nodes status.

47
Specific Drawbacks with Globally Replicated
Directories
  1. Additional remote activity to maintain directory
    coherence.
  2. Difficulty of posting directory changes to a down
    site.
  3. Difficulty of integrating a new site.
  4. Storage of directory entries where they are not
    referenced.
  5. Blurred responsibility for maintaining the
    directory.

48
Performance Measure
  • Operating Cost/Unit Time Communication Cost
  • (QueryUpdate)
  • Storage Cost Code Translation
    Cost (QueryUpdate)
  • Response Time

49
Operating Cost for the Centralized Directory
System
50
(No Transcript)
51
Cost Trade-offs of Directory Systems
  • Assume
  • Communication cost much greater than storage cost
  • No Translation cost
  • All computers have same directory update rate
  • Then the cost trade-off point is at directory
    update rate.
  • P(C,EC) 2/(N 1)b
  • P(C,D) 2/(N 1)
  • P(L,D) 1

52
(No Transcript)
53
Type Centralized Extended Centralized Mult
iple Master Distributed Master Localized
Description Single Master directory
Advantages
Disadvantages
Simplicity Ease of update
Transmission costs and delays
Reduces transmission costs and delays
Coordinating updates of local directories Knowledg
e of appended directories
Variation of the centralized case in which the
directory information is permanently appended in
the local node once it is obtained from the
master directory
Reduces transmission costs and delays Fall-soft
Characteristics
Storage requirements Coordinating update of
redundant copies
Variation of the centralized case in which
redundant copies of the master directory exist
Fast Response
Storage costs Transmission costs for updates to
the directory
Master at every node
Simple update procedure
Transmission costs for non-local queries
Local directory at each node without replication
Directory Design Alternatives
54
Distributed Ingres Dictionary/Directory Contain
Four Types of Data
  • Relation name and location
  • Information for parsing queries
  • (domain names, formats, etc.)
  • Performance information
  • (number of tuples, storage structures, etc.)
  • Consistency information
  • (protection, integrity constraints, etc. Does
    not include control data for concurrency control
    and synchronization)

55
SDD-1 Dictionary/Directory
  • The directory itself is defined and maintained
    like any other user data. It can be logically
    fragmented, distributed, and replicated across
    the distributed DBMSs.
  • A directory locator (a small highly static file
    of directory fragment locations) is kept at every
    site and is used by the TMs and DMs to plan and
    control transactions and to help ensure DB
    integrity and consistency across concurrent
    accesses of data elements.
  • The transaction modules are capable of caching
    remotely accessed directory data for subsequent
    usage. This facility is provided on the
    presumption that DB operations will exhibit the
    locality-of-reference characteristic.

56
PatientDB1
name SSN age
PatientDB2
name SSN patID
PatReportDB2
patID report
Note that a shaded box represents a real
collection and an unshaded box represents a
virtual entity. Figure 17 Pictorial diagram
showing usefulness of keys.
57
personDB1
name sex age ssn
Vperson PersonClass
name sex age ssn job
Character_to_String
Character_to_String
personDB2
name gender ssn job
LargePositiveInteger_to_String
Note that a shaded box represents a real
collection and an unshaded box represents a
virtual entity.
People
V person
Virtual Collection
Figure 15 Pictorial diagram showing
correspondence between virtual and real
attributes.
58
Note that a shaded box represents a real
collection and an unshaded box represents a
virtual entity. Figure 18 Pictorial diagram for
aggregation.
59
Vname nameClass
first middle last
personDB1 name
getfirst
getmiddle
getlast
Note that a shaded box represents a real
collection and an unshaded box represents a
virtual entity. Figure 19 Pictorial diagram of
computed attribute.
60
financeDB1
name stockAmount
1
VretireeretireClass
name income
financeDB2
name pension
2
Note that a shaded box represents a real
collection and an unshaded box represents a
virtual entity. Figure 20 Pictorial diagram of
computed attribute.
61
carInsuranceDB1
carOwner amount
VinsuranceinsuranceClass
name insuranceAmounts
houseInsuranceDB2
houseOnwer amount
Note that a shaded box represents a real
collection and an unshaded box represents a
virtual entity. Figure 21 Pictorial diagram
showing grouping.
62
patientDB1
name docID
(key)
patientDB2
(pointer)
name physician
relationship
patientDB1
Vdoctors doctorClass
name docID
name docID salary
patientDB1
name salary
Note that a shaded box represents a real
collection and an unshaded box represents a
virtual entity. Figure 22 Pictorial diagram
showing relationship.
63
VtreatedBy treatedByClass
patientDB1
(key)
name docID amountOwed
patient doctor amountOwed
(key)
Vpatient PatientClass
Vdoctor DoctorClass
. . .
. . .
Note that a shaded box represents a real
collection and an unshaded box represents a
virtual entity. Figure 23 Pictorial diagram
showing a named relationship.
64
VpersonPatient personClass
name
patientDB1
name SSN payment
Vpatient patientClass
patID amount
VpersonDoctor personClass
name
doctorDB2
name docID salary
Vdoctor DoctorClass
docID salary
Note that a shaded box represents a real
collection and an unshaded box represents a
virtual entity.
patient
doctor
person
Vpatient
Vdoctor
VpersonPatient
VpersonDoctor
Virtual collections
Figure 24 Pictorial diagram showing relationship.
65
ConceptSemType
conceptID semTypeID
Vconcept
(key)
conceptID semType termSet
Vterm
termID stringSet
Concept
conceptID termID stringType stringID stringVal
Vstring
stringName stringID stringType
Note that a shaded box represents a real
collection and an unshaded box represents a
virtual entity. Figure 30 Derivation of Virtual
Entity Vconcept.
66
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com