Designing Distributed Databases - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Designing Distributed Databases

Description:

All commercial systems have chosen the relational data model. Designing ... Exercise: Write a flowchart that performs this SQL statement assuming Department ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 70
Provided by: Mik7253
Category:

less

Transcript and Presenter's Notes

Title: Designing Distributed Databases


1
Chapter 3
  • Designing Distributed Databases

2
Designing Distributed Databases
  • Data Models
  • Choosing a Standard Data Model
  • Data Model Translators
  • Schema Integration
  • Summary

3
What Is a Data Model?
  • Data model Style of describing and manipulating
    data
  • Data model components include
  • Data description data manipulation
  • Semantic integrity constraints

4
Why Are Data Models Important?
  • Different DBMSs use different data models
  • File
  • Relational
  • Hierarchical
  • Network
  • Translators are necessary to support
    heterogeneous DBMSs
  • Translate data manipulation commands
  • Translate data structures

5
File Data Model
  • Data description exists in applications
  • Sample data description
  • Sample file

01 Department 03 DepartmentFlag Char(1) 03
DepartmentName Char(10) 03 DepartmentBudget Inte
ger.01 Employee 03 EmployeeFlag Char(1) 03
EmployeeName Char(10) 03 EmployeeSalary Integer.
More
6
File Data Model, cont.
  • Data manipulation is a single record per command

Sequential Direct open open read read write
insert close delete modify close
More
7
File Data Model, cont.
  • Semantic integrity constraints include
  • Type-checking
  • Integer
  • Character
  • Decimal
  • File data model is weak in semantic integrity
    constraints up to 50 of the application is
    devoted to enforcing semantic integrity
    constraints.

8
Relational Data Model
  • Data description exists in the data dictionary,
    not in the application
  • No visible pointers

Table Department,Column DepartmentName Char(10)
not null,Column DepartmentBudget Integer Table
Employee,Column EmployeeName Char(10) not
null,Column EmployeeSalary Integer,Column
DepartmentName Char(10)
EmployeeEmployee Name Employee Salary
Department NameAckman 5000 Toy
Baker 4500 ToyCarson 4800
CarDavis 5100 Car
More
9
Relational Data Model, cont.
  • Data manipulation is multiple records per command
  • Update commands
  • Insert
  • Delete
  • Update

More
10
Relational Data Model, cont.
  • Semantic integrity constraints include
  • Type-checking
  • Unique table identifiers
  • Referential integrity Each foreign key value
    must exist as a value of the foreign tables
    primary key or be null.

Primary keys
Foreign key
11
Hierarchical Data Model
  • Data description exists in the data dictionary
    and is copied to the application.

More
12
Hierarchical Data Model, cont.
  • Data manipulation is a single record per command

More
13
Hierarchical Data Model, cont.
  • Semantic integrity constraints include
  • Type-checking
  • Integer
  • Character
  • Decimal
  • Range-checking
  • Parent-child constraints

14
Data Models Summary
15
Designing Distributed Databases
  • Data Models
  • Choosing a Standard Data Model
  • Data Model Translators
  • Schema Integration
  • Summary

16
Choosing a Standard Data Model
User Interface Global Transaction UI
Translator Global Transaction Global Request
Optimizer Global Execution Plan Distributed
Execution Manager Subtransaction Communication
Subsystem Subtransaction Local Execution
Manager Subtransaction Translator Subtransaction L
ocal Optimizer Physical Commands Runtime Support
Processor Database
UI Schema
Allocation Schema
Local Standard Schema
Allocation Schema
17
Why Choose a Single Data Model?
Twelve Translations
Six Translations
18
Relational Is Choice for Distributed DBMS
  • Minimizes communication costs
  • Avoids trouble with pointers
  • Weak in semantic integrity constraints

19
RelationalMinimizes Communication Costs
  • Relational data model supports multiple-records-pe
    r-command queries.
  • Fewer transmissions requesting data
  • Fewer (but larger) data communications

20
RelationalAvoids Troubles With Pointers
  • Relational is oriented to flat files with no
    pointers.
  • No dangling pointers when files are transmitted
  • No pointers to be misunderstood by the target
    DBMS
  • Can transfer parts of complex structures

21
RelationalWeak in Semantic Integrity Constraints
  • Limited to few semantic integrity constraints
  • Type constraints
  • Unique identifier
  • Not null
  • Referential integrity constraints
  • Needed, more constraints
  • Triggers and asserts
  • Semantic data models
  • All commercial systems have chosen the relational
    data model

22
Designing Distributed Databases
  • Data Models
  • Choosing a Standard Data Model
  • Data Model Translators
  • Schema Integration
  • Summary

23
Data Model Translators
UI Schema
Allocation Schema
Local Standard Schema
Allocation Schema
24
Data Manipulation Commands Translating
Nonprocedural to Procedural Commands
  • Transform local DBMS data structures into tables
  • Convert relational data manipulation commands
    into commands supported by the local DBMS

25
Why Data Model Translations?
  • Users data model is different from the standard
    data model.
  • Data model used by the local DBMS is different
    from the standard data model.

26
Translating File Data Structures to Relational
Data Structures
  • Concept correspondence

File Data Definition Relational Data
Descriptionrecord type tablefield columnuni
que identifier primary key
27
Concept Correspondence Example
EmployeeEmployee Name Employee Salary
Department NameAckman 5000 Toy
Baker 4500 ToyCarson 4800
CarDavis 5100 Car
28
Problem Areas
  • No unique identifier
  • Implicit information in ordering
  • Repeating groups
  • Multiple record types

29
Problem AreaNo Unique Identifier
  • Problem Some files do not have records with
    unique identifiers.
  • Solution Generate a new field that contains a
    unique identifier.

30
No Unique Identifier Example
31
Problem AreaImplicit Information in Ordering
  • Problem Files in which record order implies
    information that is not represented in field
    values e.g.,
  • Order implies waiting list, ranking, etc.
  • Solution Generate a new field whose value shows
    the order e.g.,
  • New field containing a value representing the
    position in the file

32
Implicit Information in OrderingExample
33
Problem Area Repeating Groups
  • Example file
  • Solution Build new table for repeating group

34
Problem Area Multiple Record Types
  • Example file
  • Solution Split into two tables

D Car 45000000E Ackman 00005000E Baker 000045
00D Toy 48000000E Carson 00004800E Davis 000
05100
DepartmentDName BudgetCar 45000000
Toy 48000000
EmployeeDept EName
SalaryToy Ackman 5000 Toy Baker
4500Car Carson
4800Car Davis 5100
35
Translating Hierarchical Data Structures to
Relational Data Structures
  • Concept correspondence

File Data Definition Relational Data
Descriptionrecord type tabledata
item columnrecord key primary
keyparent-child foreign key
36
Concept CorrespondenceExample
37
Problem Areas
  • Implicit information in ordering
  • Repeating groups
  • Solution Same as in file systems

38
Problem Area Duplicate Segments
  • Solution Duplicate information in multiple rows
    and make foreign key part of primary key

39
Concept Correspondence Example
40
Problem Areas
  • Files in which the record order implies
    information that is not represented in field
    values.
  • Repeating groups
  • Multiple record types
  • Ordering of the members of a set implies
    information
  • Solution Same solutions as in file-to-relational
    data structure translation

41
Exercise Translating File Structures to
Relational Data Structures
  • Convert the following files to relational
    database tablesThere are two record types in
    the file, Employee and Dependent. The Employee
    record contains Employee flag (E), EmpName,
    EmpNumber, NumberOfSkills, Skill1, Skill2, and
    Skill3. Following each Employee record are the
    records of the employees dependents
    DependentFlag (D), DependentName, and
    Birthdate
  • A sample of the file follows E Ackman 555-55-555
    5 3 Fortran COBOL Pascal D Sally 1979 D Susan 19
    83 D Fred 1984 E Baker 666-66-6666 2 Pascal Lisp
    E Carson 777-77-7777 3 Prolog C Forth D Sam 198
    8 D George 1989

42
Translating Nonprocedural Data Manipulation
Commands to Procedural Data Manipulation Commands
  • Used for all procedural data models
  • Hierarchical
  • File
  • Consider data structures and access techniques
  • Indexes
  • Physical ordering
  • Hash files
  • Etc.

43
Translating Nonprocedural to Procedural Data
Manipulation Commands Example
  • Exercise Write a flowchart that performs this
    SQL statement assuming Department is a sequential
    file with no index

44
Translating Nonprocedural to Procedural Data
Manipulation Commands Another Example
  • Exercise Write a flowchart that performs this
    SQL statement assuming that both Department and
    Employee are unordered sequential files

45
Designing Distributed Databases
  • Data Models
  • Choosing a Standard Data Model
  • Data Model Translators
  • Schema Integration
  • Summary

46
Schema Integration
UI Schema
Allocation Schema
Local Standard Schema
Allocation Schema
47
Why Integrate Schemas?
  • To build a global conceptual schema that
    describes data in multiple databases
  • The relationship between original schemas and the
    global logical schema is represented in the
    allocation schema.

48
Schema Integration Problems
  • Differences in data structures
  • Differences in database values

49
Differences in Data Structures Example
  • Engineer Name,EmpId,Salary,NumberOfPatents
  • Scientist Name,EmpId,Salary,HighestDegree
  • Secretary Name,EmpId,Salary,WordsPerMin
  • Equipment Name,Inventory,Location
  • Employee Name,EmpId,Salary,HireDate

50
Integrating Two Tables
  • Determine the relationship of the tables rows
  • Containment
  • Overlap
  • Disjoint
  • Determine how many tables to create
  • Single table
  • Joins the columns
  • Set noncommon attributes to null
  • Multiple tables
  • One table contains common columns
  • Other tables contain key and noncommon columns

A
B
51
Strict Containment Single Table Approach
  • Each scientist is also an employee Scientist
    Name,EmpId,Salary,HighestDegree in
    DBMS1 Employee Name,EmpId,Salary,HireDate in
    DBMS2
  • Union of attributes Employee Name,EmpId,Salary,
    HireDate, HighestDegree
  • HighestDegree has value only for Scientist,
    otherwise null
  • HighestDegree column serves two purposes
  • Contains the value of highest degree for
    scientists
  • Indicates whether row represents Scientist or
    Employee

52
Strict Containment Multiple Table Approach
  • Each scientist is also an employee. Scientist
    Name,EmpId,Salary,HighestDegree in
    DBMS1 Employee Name,EmpId,Salary,HireDate in
    DBMS2
  • Create two tables Employee Name,EmpId,Salary,Hir
    eDate Scientist EmpId,HighestDegree
  • Scientist.EmpId is the foreign key

53
Strict Containment Single Table Disadvantages
  • Multiple attributes must either all be null or
    all have values.
  • For example, if both HighestDegree and School are
    attributes of Scientist but not
    Employee Scientist Name,EmpId,Salary,HighestDegr
    ee,School in DBMS1 Employee Name,EmpId,Salary,
    HireDate in DBMS2 Employee
    Name,EmpId,Salary,HireDate,HighestDegree,School
  • Either HighestDegree and School must both have
    values or neither can have values

54
Strict Containment Multiple Table Disadvantages
  • Data about one entity in two places
  • Expensive for retrieval
  • Updates are complex
  • For example, Employee Name,EmpId,Salary,HireDate
    Scientist EmpId,HighestDegree
  • Employee and Scientist must be joined to access
    all of Scientists attributes.
  • Whenever a scientist resigns, a row must be
    deleted from both the Scientist and Employee
    tables.

55
Overlap Single Table Approach
  • Engineer Name,EmpId,Salary,NumberOfPatents
  • Scientist Name,EmpId,Salary,HighestDegree
  • Some engineers are also scientists, but some are
    not.
  • Some scientists are also engineers, but some are
    not.
  • Union of attributes
  • RDStaff Name,EmpId,Salary, HighestDegree,Numbe
    rOfPatents
  • Only engineers have values for NumberOfPatents.
  • Only scientist have values for HighestDegree.

56
Overlap Multiple Table Approach
  • Engineer Name,EmpId,Salary,NumberOfPatents
  • Scientist Name,EmpId,Salary,HighestDegree
  • Some engineers are also scientists, but some are
    not.
  • Some scientists are also engineers, but some are
    not.
  • Union of attributes
  • RDStaff Name,EmpId,Salary Scientist
    EmpId, HighestDegree Engineer EmpId,
    NumberOfPatents
  • Only engineers have values for NumberOfPatents.
  • Only scientist have values for HighestDegree.

57
Disjoint
  • Equipment Name,Inventory,Location
  • Employee Name,EmpId,Salary,HireDate
  • No Employee is an Equipment.
  • No Equipment is an Employee.
  • Leave Equipment and Employee as separate tables.

58
Disjoint But Merge
  • Scientist Name,EmpId,Salary,HighestDegree
  • Secretary Name,EmpId,Salary,WordsPerMin
  • Merge is similar to overlap.

59
General Approach for Schema Integration
  • For each pair of Tables A and B, determine
    whether
  • A is contained in B
  • B is contained in A
  • A is equivalent to B
  • A overlaps B
  • A is disjoint from B, but integrates anyway
  • A is disjoint from B, but does not integrate
  • Modify tables if table integration is warranted.
  • If the single table approach is used, remember to
    specify which groups of attributes must all be
    null or nonnull.
  • If the multiple table approach is used, remember
    to specify new foreign key constraints.

60
Schema Integration Exercise
  • Create a schema for the following tables
  • Engineer Name,EmpId,Salary,NumberOfPatents
  • Scientist Name,EmpId,Salary,HighestDegree
  • Secretary Name,EmpId,Salary,WordsPerMin
  • Equipment Name,Inventory,Location
  • Employee Name,EmpId,Salary,HireDate
  • Assume that the Employee file is the master file
    maintained by the personnel office and that it is
    used for generating paychecks that the Equipment
    file is maintained by the building maintenance
    office and that the remaining files are
    maintained by department heads

61
Data Integration Problems
  • Corresponding columns of two table rows may have
    different values.
  • Domains of corresponding columns may have
  • Different precision
  • Different semantic integrity constraints
  • Different coding structures

Domain Set of values contained in a column
62
Corresponding Columns of Two Table Rows That Have
Different Values
63
Resolving Data Value Differences
  • If one database contains value x and the other
    contains value y, what value does the user see?
  • Automatically resolve the difference
  • Min (x,y)
  • Choose from the table judged to be most reliable
  • Null, not applicable, dont know
  • User resolves the difference
  • Retain both values
  • Go back to original source

64
Handling Data Value Differences Example
Let user resolvedifferences
Resolve Differences
65
Resolving Data Value Differences Exercise
  • Integrate the following tables and resolve data
    differences

66
Schema Integration Summary
  • Types of integration
  • Schema
  • Create a schema describing data in multiple
    databases
  • Data
  • Merge corresponding records from different
    databases
  • Schema and data must be integrated if the
    distributed database is to support a unified
    database.
  • Otherwise, users must resolve schema and data
    inconsistencies.

67
Designing Distributed Databases
  • Data Models
  • Choosing a Standard Data Model
  • Data Model Translators
  • Schema Integration
  • Summary

68
Summary
  • Several data models are used for centralized
    DBMSs.
  • Relational
  • Hierarchical
  • Network
  • Files
  • The relational data model is frequently chosen
    for distributed DBMSs.
  • Translators are needed for user interfaces.
  • Translators are needed for heterogeneous local
    DBMSs.
  • Data model translators are complicated by
    differences in the descriptive capabilities of
    the various data models.
  • Nonprocedural-to-procedural translations use
    local data structures
  • Procedural-to-nonprocedural are very difficult.

More
69
Summary, cont.
  • To integrate data, two tasks must be performed
  • Schema integration, to build a global schema
  • Data value inconsistency resolution, to remove
    data discrepancies
  • Files should be fragmented based on use by
    important and frequently executed applications.
  • The original file can be constructed from
    fragments.
  • Fragments should be allocated based on efficiency
    considerations.
Write a Comment
User Comments (0)
About PowerShow.com