Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Description:

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications C. Faloutsos Distributed DB General Overview - rel. model Relational model - SQL ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 60
Provided by: csCmuEdu1
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications


1
Carnegie Mellon Univ.Dept. of Computer
Science15-415 - Database Applications
  • C. Faloutsos
  • Distributed DB

2
General Overview - rel. model
  • Relational model - SQL
  • Functional Dependencies Normalization
  • Physical Design Indexing
  • Query optimization
  • Transaction processing
  • Advanced topics
  • Distributed Databases

3
Problem definition
  • centralized DB

CHICAGO
LA
NY
4
Problem definition
  • Distr. DB
  • DB stored in many places
  • ... connected

NY
LA
5
Problem definition
connect to LA exec sql select from EMP ...
connect to NY exec sql select from EMPLOYEE
...
now
NY
LA
EMPLOYEE
DBMS2
EMP
DBMS1
6
Problem definition
connect to distr-LA exec sql select from EMPL
ideally
LA
NY
EMPLOYEE
D-DBMS
D-DBMS
EMP
DBMS2
DBMS1
7
Pros Cons
  • Pros
  • .
  • .
  • .
  • Cons
  • .
  • .
  • .

8
Pros Cons
  • Pros
  • Data sharing
  • reliability availability
  • speed up of query processing
  • Cons
  • software development cost
  • more bugs
  • may increase processing overhead (msg)

9
Overview
  • Problem motivation
  • Design issues
  • Query optimization semijoins
  • transactions (recovery, conc. control)

10
Design of Distr. DBMS
  • what are our choices of storing a table?

11
Design of Distr. DBMS
  • replication
  • fragmentation (horizontal vertical hybrid)
  • both

12
Design of Distr. DBMS
vertical fragm.
ssn name address
123 smith wall str.
... ... ...
234 johnson sunset blvd
horiz. fragm.
13
Transparency autonomy
  • Issues/goals
  • naming and local autonomy
  • replication and fragmentation transp.
  • location transparency
  • i.e.

14
Problem definition
connect to distr-LA exec sql select from EMPL
ideally
LA
NY
EMPLOYEE
D-DBMS
D-DBMS
EMP
DBMS2
DBMS1
15
Overview
  • Problem motivation
  • Design issues
  • Query optimization semijoins
  • transactions (recovery, conc. control)

16
Distributed Query processing
  • issues (additional to centralized q-opt)
  • cost of transmission
  • parallelism / overlap of delays

(cpu, disk, bytes-transmitted,
messages-trasmitted)
minimize elapsed time? or minimize resource
consumption?
17
Distr. Q-opt semijoins
S2
SUPPLIER
SHIPMENT
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
SUPPLIER Join SHIPMENT ?
18
semijoins
  • choice of plans?

19
semijoins
  • choice of plans?
  • plan 1 ship SHIP -gt S2 join ship -gt S3
  • plan 2 ship SHIP-gtS3 ship SUP-gtS3 join
  • ...
  • others?

20
Distr. Q-opt semijoins
S2
SUPPLIER
SHIPMENT
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
SUPPLIER Join SHIPMENT ?
21
Semijoins
  • Idea reduce the tables before shipping

SUPPLIER
SHIPMENT
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
SUPPLIER Join SHIPMENT ?
22
Semijoins
  • How to do the reduction, cheaply?
  • Eg., reduce SHIPMENT

23
Semijoins
  • Idea reduce the tables before shipping

SUPPLIER
SHIPMENT
(s1,s2,s5,s11)
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
SUPPLIER Join SHIPMENT ?
24
Semijoins
  • Formally
  • SHIPMENT SHIPMENT SUPPLIER
  • express semijoin w/ rel. algebra

25
Semijoins
  • Formally
  • SHIPMENT SHIPMENT SUPPLIER
  • express semijoin w/ rel. algebra

26
Semijoins eg
  • suppose each attr. is 4 bytes
  • Q transmission cost (bytes) for semijoin
  • SHIPMENT SHIPMENT semijoin SUPPLIER

27
Semijoins
  • Idea reduce the tables before shipping

SUPPLIER
SHIPMENT
(s1,s2,s5,s11)
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
4 bytes
SUPPLIER Join SHIPMENT ?
28
Semijoins eg
  • suppose each attr. is 4 bytes
  • Q transmission cost (bytes) for semijoin
  • SHIPMENT SHIPMENT semijoin SUPPLIER
  • A 44 bytes

29
Semijoins eg
  • suppose each attr. is 4 bytes
  • Q1 give a plan, with semijoin(s)
  • Q2 estimate its cost (bytes shipped)

30
Semijoins eg
  • A1
  • reduce SHIPMENT to SHIPMENT
  • SHIPMENT -gt S3
  • SUPPLIER -gt S3
  • do join _at_ S3
  • Q2 cost?

31
Semijoins
SUPPLIER
SHIPMENT
(s1,s2,s5,s11)
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
32
Semijoins eg
  • A2
  • 44 bytes - reduce SHIPMENT to SHIPMENT
  • 38 bytes - SHIPMENT -gt S3
  • 48 bytes - SUPPLIER -gt S3
  • 0 bytes - do join _at_ S3

72 bytes TOTAL
33
Other plans?
34
Other plans?
  • P2
  • reduce SHIPMENT to SHIPMENT
  • reduce SUPPLIER to SUPPLIER
  • SHIPMENT -gt S3
  • SUPPLIER -gt S3

35
Other plans?
  • P3
  • reduce SUPPLIER to SUPPLIER
  • SUPPLIER -gt S2
  • do join _at_ S2
  • ship results -gt S3

36
A brilliant idea Bloom-joins
  • (not in the book not in the final exam)
  • how to ship the projection, say, of SUPPLIER.s,
    even cheaper?
  • A Bloom-filter Lohman
  • quickdirty membership testing

37
Semijoins
SUPPLIER
SHIPMENT
(s1,s2,s5,s11)
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
S3
38
Another brilliant idea two-way semijoins
  • (not in book, not in final exam)
  • reduce both relations with one more exchange
    Kang, 86
  • ship back the list of keys that didnt match
  • CAN NOT LOSE! (why?)
  • further improvement
  • or the list of ones that matched whatever is
    shorter!

39
Two-way Semijoins
S2
SUPPLIER
SHIPMENT
(s1,s2,s5,s11)
S1
s p
s1 p1
s2 p1
s3 p5
s2 p9
s ...
s1
s2
s5
s11
(s5,s11)
S3
40
Overview
  • Problem motivation
  • Design issues
  • Query optimization semijoins
  • transactions (recovery, conc. control)

41
Transactions recovery
  • Problem eg., a transaction moves
  • 100 from NY -gt 50 to LA, 50 to Chicago
  • 3 sub-transactions, on 3 systems, with 3 W.A.L.s
  • how to guarantee atomicity (all-or-none)?
  • Observation additional types of failures (links,
    servers, delays, time-outs ....)

42
Transactions recovery
  • Problem eg., a transaction moves
  • 100 from NY -gt 50 to LA, 50 to Chicago

43
Distributed recovery
CHICAGO
How?
T1,2 50
NY
LA
NY
T1,3 50
T1,1-100
44
Distributed recovery
CHICAGO
Step1 choose coordinator
T1,2 50
NY
LA
NY
T1,3 50
T1,1-100
45
Distributed recovery
  • Step 2 execute a protocol, eg., 2 phase commit

46
2 phase commit
T1,3
T1,1 (coord.)
T1,2
prepare to commit
time
47
2 phase commit
T1,3
T1,1 (coord.)
T1,2
prepare to commit
Y
Y
time
48
2 phase commit
T1,3
T1,1 (coord.)
T1,2
prepare to commit
Y
Y
commit
time
49
2 phase commit (eg., failure)
T1,3
T1,1 (coord.)
T1,2
prepare to commit
time
50
2 phase commit
T1,3
T1,1 (coord.)
T1,2
prepare to commit
Y
N
time
51
2 phase commit
T1,3
T1,1 (coord.)
T1,2
prepare to commit
Y
N
abort
time
52
Distributed recovery
  • Many, many additional details (what if the
    coordinator fails? what if a link fails? etc)
  • and many other solutions (eg., 3-phase commit)

53
Overview
  • Problem motivation
  • Design issues
  • Query optimization semijoins
  • transactions (recovery, conc. control)

54
Distributed conc. control
  • also more complicated
  • distributed deadlocks!

55
Distributed deadlocks
T2,ny
T2,la
CHICAGO
T1,la
T1,ny
NY
LA
NY
56
Distributed deadlocks
T2,la
T2,ny
T1,la
T1,ny
LA
NY
57
Distributed deadlocks
T2,la
T2,ny
T1,la
T1,ny
LA
NY
58
Distributed deadlocks
LA
NY
T2,la
T2,ny
T1,la
T1,ny
  • cites need to exchange wait-for graphs
  • clever algorithms, to reduce messages

59
Conclusions
  • Distr. DBMSs not deployed
  • BUT produced clever ideas
  • semijoins
  • distributed recovery / conc. control
  • which can be useful for
  • parallel db / clusters
  • active disks
  • replicated db (e-commerce servers)
Write a Comment
User Comments (0)
About PowerShow.com