Temple University - PowerPoint PPT Presentation

About This Presentation
Title:

Temple University

Description:

Temple University CIS Dept' CIS661 Principles of Data Management – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 56
Provided by: Vas111
Learn more at: https://cis.temple.edu
Category:

less

Transcript and Presenter's Notes

Title: Temple University


1
Temple University CIS Dept.CIS661 Principles
of Data Management
  • V. Megalooikonomou
  • Distributed Databases
  • (based on slides by Silberchatz,Korth, and
    Sudarshan and slides by C. Faloutsos at CMU )

2
General Overview - rel. model
  • Relational model - SQL
  • Functional Dependencies Normalization
  • Physical Design Indexing
  • Query optimization
  • Transaction processing
  • Advanced topics
  • Distributed Databases

3
Problem definition
  • centralized DB

CHICAGO
LA
NY
4
Problem definition
  • Distr. DB
  • DB stored in many places
  • ... connected

NY
LA
5
Problem definition
connect to LA exec sql select from EMP ...
connect to NY exec sql select from EMPLOYEE
...
now
NY
LA
EMPLOYEE
DBMS2
EMP
DBMS1
6
Problem definition
connect to distr-LA exec sql select from EMPL
ideally
LA
NY
EMPLOYEE
D-DBMS
D-DBMS
EMP
DBMS2
DBMS1
7
Pros Cons
  • Pros
  • Data sharing
  • reliability availability
  • speed up of query processing
  • Cons
  • software development cost
  • more bugs
  • may increase processing overhead (msg)

8
Overview
  • Problem motivation
  • Design issues
  • Query optimization semijoins
  • transactions (recovery, conc. control)

9
Design of Distr. DBMS
  • what are our choices of storing a table?

10
Design of Distr. DBMS
  • replication
  • fragmentation (horizontal vertical hybrid)
  • both

11
Design of Distr. DBMS
vertical fragm.
ssn name address
123 smith wall str.
... ... ...
234 johnson sunset blvd
horiz. fragm.
12
Transparency autonomy
  • Issues/goals
  • naming and local autonomy
  • replication and fragmentation transp.
  • location transparency
  • i.e.

13
Problem definition
connect to distr-LA exec sql select from EMPL
ideally
LA
NY
EMPLOYEE
D-DBMS
D-DBMS
EMP
DBMS2
DBMS1
14
Overview
  • Problem motivation
  • Design issues
  • Query optimization semijoins
  • transactions (recovery, conc. control)

15
Distributed Query processing
  • issues (additional to centralized q-opt)
  • cost of transmission
  • parallelism / overlap of delays

(cpu, disk, bytes-transmitted,
messages-trasmitted)
minimize elapsed time? or minimize resource
consumption?
16
Distr. Q-opt semijoins
S2
SUPPLIER
SHIPMENT
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
SUPPLIER Join SHIPMENT ?
17
semijoins
  • choice of plans?
  • plan 1 ship SHIP -gt S1 join ship -gt S3
  • plan 2 ship SHIP-gtS3 ship SUP-gtS3 join
  • ...
  • others?

18
Distr. Q-opt semijoins
S2
SUPPLIER
SHIPMENT
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
SUPPLIER Join SHIPMENT ?
19
Semijoins
SUPPLIER
SHIPMENT
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
  • Idea reduce the tables before shipping

SUPPLIER Join SHIPMENT ?
20
Semijoins
  • How to do the reduction, cheaply?
  • Eg., reduce SHIPMENT

21
Semijoins
SUPPLIER
SHIPMENT
(s1,s2,s5,s11)
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
  • Idea reduce the tables before shipping

SUPPLIER Join SHIPMENT ?
22
Semijoins
  • Formally
  • SHIPMENT SHIPMENT SUPPLIER
  • express semijoin w/ rel. algebra

23
Semijoins
  • Formally
  • SHIPMENT SHIPMENT SUPPLIER
  • express semijoin w/ rel. algebra

24
Semijoins eg
  • suppose each attr. is 4 bytes
  • Q transmission cost (bytes) for semijoin
  • SHIPMENT SHIPMENT semijoin SUPPLIER

25
Semijoins
SUPPLIER
SHIPMENT
(s1,s2,s5,s11)
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
4 bytes
SUPPLIER Join SHIPMENT ?
  • Idea reduce the tables before shipping

26
Semijoins eg
  • suppose each attr. is 4 bytes
  • Q transmission cost (bytes) for semijoin
  • SHIPMENT SHIPMENT semijoin SUPPLIER
  • A 44 bytes

27
Semijoins eg
  • suppose each attr. is 4 bytes
  • Q1 give a plan, with semijoin(s)
  • Q2 estimate its cost (bytes shipped)

28
Semijoins eg
  • A1
  • reduce SHIPMENT to SHIPMENT
  • SHIPMENT -gt S3
  • SUPPLIER -gt S3
  • do join _at_ S3
  • Q2 cost?

29
Semijoins
SUPPLIER
SHIPMENT
(s1,s2,s5,s11)
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
30
Semijoins eg
  • A2
  • 44 bytes - reduce SHIPMENT to SHIPMENT
  • 38 bytes - SHIPMENT -gt S3
  • 48 bytes - SUPPLIER -gt S3
  • 0 bytes - do join _at_ S3

72 bytes TOTAL
31
Other plans?
32
Other plans?
  • P2
  • reduce SHIPMENT to SHIPMENT
  • reduce SUPPLIER to SUPPLIER
  • SHIPMENT -gt S3
  • SUPPLIER -gt S3

33
Other plans?
  • P3
  • reduce SUPPLIER to SUPPLIER
  • SUPPLIER -gt S2
  • do join _at_ S2
  • ship results -gt S3

34
A brilliant idea two-way semijoins
  • (not in book, not in final exam)
  • reduce both relations with one more exchange
    Kang, 86
  • ship back the list of keys that didnt match
  • CAN NOT LOSE! (why?)
  • further improvement
  • or the list of ones that matched whatever is
    shorter!

35
Two-way Semijoins
S2
SUPPLIER
SHIPMENT
(s1,s2,s5,s11)
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
(s5,s11)
S3
36
Overview
  • Problem motivation
  • Design issues
  • Query optimization semijoins
  • transactions (recovery, conc. control)

37
Transactions recovery
  • Problem eg., a transaction moves
  • 100 from NY -gt 50 to LA, 50 to Chicago
  • 3 sub-transactions, on 3 systems
  • how to guarantee atomicity (all-or-none)?
  • Observation additional types of failures (links,
    servers, delays, time-outs ....)

38
Transactions recovery
  • Problem eg., a transaction moves
  • 100 from NY -gt 50 to LA, 50 to Chicago

39
Distributed recovery
CHICAGO
How?
T1,2 50
NY
LA
NY
T1,3 50
T1,1-100
40
Distributed recovery
CHICAGO
Step1 choose coordinator
T1,2 50
NY
LA
NY
T1,3 50
T1,1-100
41
Distributed recovery
  • Step 2 execute a protocol,
  • eg., 2 phase commit
  • when a transaction T completes
  • execution (i.e., when all sites at which
  • T has executed inform the
  • transaction coordinator Ci that T has
  • completed) Ci starts the 2PC protocol
  • -gt

42
2 phase commit
T1,3
T1,1 (coord.)
T1,2
prepare to commit
time
43
2 phase commit
T1,3
T1,1 (coord.)
T1,2
prepare to commit
Y
Y
time
44
2 phase commit
T1,3
T1,1 (coord.)
T1,2
prepare to commit
Y
Y
commit
time
45
2 phase commit (eg., failure)
T1,3
T1,1 (coord.)
T1,2
prepare to commit
time
46
2 phase commit
T1,3
T1,1 (coord.)
T1,2
prepare to commit
Y
N
time
47
2 phase commit
T1,3
T1,1 (coord.)
T1,2
prepare to commit
Y
N
abort
time
48
Distributed recovery
  • Many, many additional details (what if the
    coordinator fails? what if a link fails? etc)
  • and many other solutions (eg., 3-phase commit)

49
Overview
  • Problem motivation
  • Design issues
  • Query optimization semijoins
  • transactions (recovery, conc. control)

50
Distributed conc. control
  • also more complicated
  • distributed deadlocks!

51
Distributed deadlocks
T2,ny
T2,la
CHICAGO
T1,la
T1,ny
NY
LA
NY
52
Distributed deadlocks
T2,la
T2,ny
T1,la
T1,ny
LA
NY
53
Distributed deadlocks
T2,la
T2,ny
T1,la
T1,ny
LA
NY
54
Distributed deadlocks
LA
NY
T2,la
T2,ny
T1,la
T1,ny
  • cites need to exchange wait-for graphs
  • clever algorithms, to reduce messages

55
Conclusions
  • Distr. DBMSs not deployed
  • BUT produced clever ideas
  • semijoins
  • distributed recovery / conc. control
  • which can be useful for
  • parallel db / clusters
  • active disks
  • replicated db (e-commerce servers)
Write a Comment
User Comments (0)
About PowerShow.com