Chapter 17: Additional Slides - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Chapter 17: Additional Slides

Description:

Variable Transmission Time (TT) CT = MD TT. MD = Number of ... TT = Number of bits/Data rate. Physical Data Management. Distributed Query Processing cont'd ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 23
Provided by: jimk56
Category:

less

Transcript and Presenter's Notes

Title: Chapter 17: Additional Slides


1
Chapter 17 Additional Slides
  • October 11, 2015

2
Outline
  • Physical Data Management
  • Fragments
  • Distributed Query Processing
  • Transactions
  • Logical Data Management
  • Transparency
  • Conceptual Data Management

3
Physical Data ManagementFragments
  • What is a Fragment?
  • Vertical subset (project operation)
  • Horizontal subset (restrict operation)
  • Mixed Fragment (combination of project and
    restrict)
  • A fragment may be allocated to a single or
    multiple sites
  • Fragments may be replicated where
  • there is a primary fragment as a single site
  • Copies of the fragment are placed at multiple
    sites (secondary)

4
Physical Data ManagementDistributed Query
Processing
  • Involves both local (intra site) and global
    (inter site) optimization.
  • Multiple optimization objectives
  • The weighting of communication costs versus local
    processing costs depends on network
    characteristics.
  • There are many more possible access plans for a
    distributed query.

5
Physical Data ManagementDistributed Query
Processing contd
  • Local vs. Global query processing
  • In Local,
  • Queries are performed at a central server
    (single site)
  • In Global,
  • Must decide which sites to access for the
    fragments
  • May need to move fragments from site to site
  • Multiple optimization is needed for Global
  • Due to the multiple sites and access plans
  • Many possible access plans for Global
  • Choosing the best one may be difficult

6
Physical Data ManagementDistributed Query
Processing contd
  • Communication Costs
  • Communication Time (CT)
  • Fixed Message Delay (MD)
  • Variable Transmission Time (TT)
  • CT MD TT
  • MD Number of Messages Delay per message
  • TT Number of bits/Data rate

7
Physical Data ManagementDistributed Query
Processing contd
  • Global Query Example (p. 632)
  • List the order number, order date, product
    number, product name, product price, and order
    quantity for eastern orders with a specified
    customer number, date range, and product color.
  • Four possible access plans

8
Physical Data ManagementDistributed Query
Processing contd
  • Access Plan 1
  • Move the Product table to the Tulsa site where
    the query is processed

9
Physical Data ManagementDistributed Query
Processing contd
  • Access Plan 2
  • Restrict the Product table at the Denver Site
  • Then move result to the Tulsa site to execute the
    remainder of the query

10
Physical Data ManagementDistributed Query
Processing contd
  • Access Plan 3
  • Perform join and restrictions of Eastern-Orders
    and Eastern Order-lines fragments at the Tulsa
    site
  • Then move result to Denver site to join with
    Product Table

11
Physical Data ManagementDistributed Query
Processing contd
  • Access Plan 4
  • Restrict the Product table at the Denver site
  • Move product numbers to Tulsa and do
    restrict/join
  • Then move result back to Denver to combine with
    Product table to get product names

12
Physical Data ManagementDistributed Query
Processing contd
  • Obviously many different access plans can be used
    to answer the same query
  • Need to investigate
  • actual network costs
  • the local processing costs at each site to
    determine which access plan is the best

13
Physical Data ManagementTransactions 2 Phase
Commit Protocol
  • 2 Phase Commit (2PC)
  • Ensures that all transactions are Atomic
  • One site is selected as a Coordinator while other
    sites are Participants
  • Each Participant site execute a different part of
    the transaction
  • Two phases Voting Phase and Decision Phase
  • Figure 17.18, page 634

14
Physical Data ManagementTransactions 2 Phase
Commit Protocol
  • Several Complications IF
  • Failures during recovery and Timeouts
  • Log records are lost
  • Coordinator fails
  • Etc.
  • Several methods to resolve these, but out of
    scope for this class

15
Logical Data ManagementTransparency
  • Transparency is related to data independence.
  • With transparency, users can write queries with
    no knowledge of the distribution, and
    distribution changes will not cause changes to
    existing queries and transactions.
  • Without transparency, users must reference some
    distribution details in queries and distribution
    changes can lead to changes in existing queries.

16
Logical Data ManagementFragmentation Transparency
  • Fragmentation transparency provides the highest
    level of data independence.
  • Users formulate queries and transactions without
    knowledge of fragments (locations, or local
    formats).
  • If fragments change, queries and transactions are
    not affected.
  • Table 17.6, p 626

17
Logical Data ManagementLocation Transparency
  • Location transparency provides a lesser level of
    data independence than fragmentation
    transparency.
  • Users need to reference fragments in formulating
    queries and transactions.
  • However, knowledge of locations and local formats
    is not necessary.
  • Table 17.7, p 627

18
Conceptual Data ManagementSchema Integration
  • Multiple types of schemas may exist to describe
    the same dataset
  • Integrate multiple schemas into a single schema
  • Best explained using an Exercise

19
Conceptual Data ManagementSchema Integration -
Exercise
Engineering Database Relational Schema E(eno,
ename, title), p.k. eno J(jno, jname, budget,
loc, cname), p.k. jno G(eno, jno, resp, dur),
p.k. eno, jno S(title, sal), p.k. title
First, find the common entities and relationships
between schemas
Employee Database CODASYL Schema Department(dept
-name, budget, manager) Employee(e, name,
address, title, salary) Department Employs
Employee (1N relationship)
  • Database III E/R Model
  • Entities Engineer(Engineer No, name, title,
    salary) Project(PNo, project name, budget,
    location) Client(Client Name, Address)
  • Relationships Engineer Works_In Project
    (Responsibility, Duration) Project Contract_By
    Client (Contract Date)

20
Conceptual Data ManagementSchema Integration -
Exercise
Engineering Database Relational Schema E(eno,
ename, title), p.k. eno J(jno, jname, budget,
loc, cname), p.k. jno G(eno, jno, resp, dur),
p.k. eno, jno S(title, sal), p.k. title
Second, draw the conceptual diagram for the
common entities and relationships
Employee Database CODASYL Schema Department(dept
-name, budget, manager) Employee(e, name,
address, title, salary) Department Employs
Employee (1N relationship)
  • Database III E/R Model
  • Entities Engineer(Engineer No, name, title,
    salary) Project(PNo, project name, budget,
    location) Client(Client Name, Address)
  • Relationships Engineer Works_In Project
    (Responsibility, Duration) Project Contract_By
    Client (Contract Date)

21
Conceptual Data ManagementSchema Integration
  • No single correct solution to this exercise
  • It may be seen that there are multiple solutions
    to this problem
  • Considered a very HARD problem
  • Often hard to find the best synonyms especially
    from a large set of schemas

22
Weekly Exercise
  • Questions 2, 4, and 5
Write a Comment
User Comments (0)
About PowerShow.com