Title: Chapter 17: Additional Slides
1Chapter 17 Additional Slides
2Outline
- Physical Data Management
- Fragments
- Distributed Query Processing
- Transactions
- Logical Data Management
- Transparency
- Conceptual Data Management
3Physical Data ManagementFragments
- What is a Fragment?
- Vertical subset (project operation)
- Horizontal subset (restrict operation)
- Mixed Fragment (combination of project and
restrict) - A fragment may be allocated to a single or
multiple sites - Fragments may be replicated where
- there is a primary fragment as a single site
- Copies of the fragment are placed at multiple
sites (secondary)
4Physical Data ManagementDistributed Query
Processing
- Involves both local (intra site) and global
(inter site) optimization. - Multiple optimization objectives
- The weighting of communication costs versus local
processing costs depends on network
characteristics. - There are many more possible access plans for a
distributed query.
5Physical Data ManagementDistributed Query
Processing contd
- Local vs. Global query processing
- In Local,
- Queries are performed at a central server
(single site) - In Global,
- Must decide which sites to access for the
fragments - May need to move fragments from site to site
- Multiple optimization is needed for Global
- Due to the multiple sites and access plans
- Many possible access plans for Global
- Choosing the best one may be difficult
6Physical Data ManagementDistributed Query
Processing contd
- Communication Costs
- Communication Time (CT)
- Fixed Message Delay (MD)
- Variable Transmission Time (TT)
- CT MD TT
- MD Number of Messages Delay per message
- TT Number of bits/Data rate
7Physical Data ManagementDistributed Query
Processing contd
- Global Query Example (p. 632)
- List the order number, order date, product
number, product name, product price, and order
quantity for eastern orders with a specified
customer number, date range, and product color. - Four possible access plans
8Physical Data ManagementDistributed Query
Processing contd
- Access Plan 1
- Move the Product table to the Tulsa site where
the query is processed
9Physical Data ManagementDistributed Query
Processing contd
- Access Plan 2
- Restrict the Product table at the Denver Site
- Then move result to the Tulsa site to execute the
remainder of the query
10Physical Data ManagementDistributed Query
Processing contd
- Access Plan 3
- Perform join and restrictions of Eastern-Orders
and Eastern Order-lines fragments at the Tulsa
site - Then move result to Denver site to join with
Product Table
11Physical Data ManagementDistributed Query
Processing contd
- Access Plan 4
- Restrict the Product table at the Denver site
- Move product numbers to Tulsa and do
restrict/join - Then move result back to Denver to combine with
Product table to get product names
12Physical Data ManagementDistributed Query
Processing contd
- Obviously many different access plans can be used
to answer the same query - Need to investigate
- actual network costs
- the local processing costs at each site to
determine which access plan is the best
13Physical Data ManagementTransactions 2 Phase
Commit Protocol
- 2 Phase Commit (2PC)
- Ensures that all transactions are Atomic
- One site is selected as a Coordinator while other
sites are Participants - Each Participant site execute a different part of
the transaction - Two phases Voting Phase and Decision Phase
- Figure 17.18, page 634
14Physical Data ManagementTransactions 2 Phase
Commit Protocol
- Several Complications IF
- Failures during recovery and Timeouts
- Log records are lost
- Coordinator fails
- Etc.
- Several methods to resolve these, but out of
scope for this class
15Logical Data ManagementTransparency
- Transparency is related to data independence.
- With transparency, users can write queries with
no knowledge of the distribution, and
distribution changes will not cause changes to
existing queries and transactions. - Without transparency, users must reference some
distribution details in queries and distribution
changes can lead to changes in existing queries.
16Logical Data ManagementFragmentation Transparency
- Fragmentation transparency provides the highest
level of data independence. - Users formulate queries and transactions without
knowledge of fragments (locations, or local
formats). - If fragments change, queries and transactions are
not affected. - Table 17.6, p 626
17Logical Data ManagementLocation Transparency
- Location transparency provides a lesser level of
data independence than fragmentation
transparency. - Users need to reference fragments in formulating
queries and transactions. - However, knowledge of locations and local formats
is not necessary. - Table 17.7, p 627
18Conceptual Data ManagementSchema Integration
- Multiple types of schemas may exist to describe
the same dataset - Integrate multiple schemas into a single schema
- Best explained using an Exercise
19Conceptual Data ManagementSchema Integration -
Exercise
Engineering Database Relational Schema E(eno,
ename, title), p.k. eno J(jno, jname, budget,
loc, cname), p.k. jno G(eno, jno, resp, dur),
p.k. eno, jno S(title, sal), p.k. title
First, find the common entities and relationships
between schemas
Employee Database CODASYL Schema Department(dept
-name, budget, manager) Employee(e, name,
address, title, salary) Department Employs
Employee (1N relationship)
- Database III E/R Model
- Entities Engineer(Engineer No, name, title,
salary) Project(PNo, project name, budget,
location) Client(Client Name, Address) - Relationships Engineer Works_In Project
(Responsibility, Duration) Project Contract_By
Client (Contract Date)
20Conceptual Data ManagementSchema Integration -
Exercise
Engineering Database Relational Schema E(eno,
ename, title), p.k. eno J(jno, jname, budget,
loc, cname), p.k. jno G(eno, jno, resp, dur),
p.k. eno, jno S(title, sal), p.k. title
Second, draw the conceptual diagram for the
common entities and relationships
Employee Database CODASYL Schema Department(dept
-name, budget, manager) Employee(e, name,
address, title, salary) Department Employs
Employee (1N relationship)
- Database III E/R Model
- Entities Engineer(Engineer No, name, title,
salary) Project(PNo, project name, budget,
location) Client(Client Name, Address) - Relationships Engineer Works_In Project
(Responsibility, Duration) Project Contract_By
Client (Contract Date)
21Conceptual Data ManagementSchema Integration
- No single correct solution to this exercise
- It may be seen that there are multiple solutions
to this problem - Considered a very HARD problem
- Often hard to find the best synonyms especially
from a large set of schemas
22Weekly Exercise