Distribution Transparency and Fragmentation - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Distribution Transparency and Fragmentation

Description:

ith fragment of global relation R = Ri. Reference architecture for distribution transparency ... is correct if each attribute maps to at least one attribute of ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 44
Provided by: ter776
Category:

less

Transcript and Presenter's Notes

Title: Distribution Transparency and Fragmentation


1
Distribution Transparency and Fragmentation
  • Introduction to distribution transparency and
    reference architecture
  • Levels of transparency
  • Fragmentation
  • Relational algebra
  • Fragmentation of relations
  • Horizontal fragmentation
  • Derived horizontal fragmentation
  • Vertical fragmentation

2
Distribution Transparency and Fragmentation
  • Distribution and Fragmentation (of data or
    records)
  • Data distribution data items are distributed at
    several sites (for reliability, performance,
    security, etc..)
  • Fragmentation Divide a global table into several
    fragments (smaller tables)
  • How? Normally, based on how the applications at
    each site access the data items (close to the
    applications)
  • Performance objectives are to minimize the access
    delay and data management cost.
  • I.e., if a transaction wants to access (read or
    write) several fields of a record and the fields
    are distributed at several sites, the access cost
    will be heavy
  • Cost for maintaining database integrity

3
Introduction to Distribution Transparency
  • A distributed database system should look like a
    centralized database system to the applications
  • Distribution transparency
  • The applications do not
  • need to know how the database is partitioned
    (fragmented) and
  • how they are distributed when they access the
    database
  • How to achieve this? Similar to the default path
    in a file system (I.e., define a link)
  • Advantages of distribution transparency
  • Programs can be written as if the database is not
    distributed (more flexible as the mapping can be
    changed) (similar to the situation of using a
    variable for referring to a file)
  • Reallocation the correctness of a program is
    unaffected by the reallocation of data from one
    site to another site (a system administration job)

4
Introduction to Distribution Transparency
  • Separation of the concept of data fragmentation
    from the concept of data location (fragmentation
    transparency and location transparency)
  • A fragment can have multiple physical storages
  • Explicit control of redundancy at fragment level
    (replication transparency)

5
Data Fragmentation
  • Disadvantages of data fragmentation
  • If the data are not fragmented appropriately, the
    benefits may become problems (higher access cost
    and management cost) (Good methods are needed)
  • If the applications have conflicting requirements
    which prevent decomposition of the relation into
    mutually exclusive fragments, those applications
    whose views are defined on more than one
    fragments may suffer performance degradation
  • The problems in semantic data control, integrity
    checking (data management) etc..
  • More fragments, more overheads have to be spent
    on that issues.
  • Consider the update problem and referential
    integrity problem

6
Reference architecture for distribution
transparency
Global Schema
Site Independent Schemas - do not depend on the
data model of the local DBSs
Fragmentation Schema
Allocation Schema
Site 1
Site n
...
Local Mapping Schema n
Local Mapping Schema 1
DBS
DBS
Local Database
Local Database
7
Reference architecture for distribution
transparency
  • Global Schema
  • defines the data which are maintained by the
    system as they are not distributed at all
  • consists of a set of global relations (from
    applications)
  • Fragmentation Schema
  • split each global relation into several
    non-overlapping fragment(s) (1 to 1 / 1 to many
    relation)
  • fragments are logical portions of global
    relations which are physically located at one or
    several sites
  • ith fragment of global relation R Ri

8
Reference architecture for Distribution
Transparency
  • Allocation Schema
  • defines the location of a fragment, at which
    site(s)
  • redundant a fragment maps to several sites
  • non-redundant a fragment maps to 1 site
  • physical image of global relation R at site I
  • all the fragments which correspond to the same
    global relation R at site j
  • Rji (j - site number, i - ith fragment of R)

9
Reference architecture for Distributed
Database(DDB)
  • Local Mapping Schema
  • map the physical images to the data which are
    manipulated by the local DBSs
  • may have different types of local mappings at
    different sites (depends on the database type and
    implementation, heterogeneity)

10
Levels of Transparency
  • Level 1 Fragmentation Transparency
  • All the three parameters (fragmentation,
    allocation, mapping) are transparent to
    applications
  • Applications consider all relations as
    centralized and are unaware of their
    fragmentations, allocation and local mapping
  • Level 2 Location Transparency
  • Allocation and local mapping of relations are
    transparent, but fragmentation is not
  • Applications must specify the referred fragments

11
Levels of Transparency
  • Level 3 Local Transparency
  • Only local mapping of relations are transparent.
  • Applications have to define specific fragments
    and their precise locations.
  • Level 4 No Transparency
  • Application programmers must take responsibility
    in defining all the three parameters

12
Levels of Transparency
Level 1 Fragmentation Transparency
Site 1
No Special Specification
SUPPLIER1
DDBS
read (terminal, SNUM) select NAME into
NAME from SUPPLIER where SNUM
SNUM write(terminal, NAME)
Site 2
SUPPLIER2
Site 3
SUPPLIER2
13
Levels of Transparency
Level 2 Location Transparency
Site 1
Fragment Specification
DDBS
SUPPLIER1
read (terminal, SNUM) select NAME into
NAME from SUPPLIER1 where
SNUM SNUM if not (FOUND) then
select NAME into NAME from
SUPPLIER2 where SNUM SNUM
endif write(terminal, NAME)
Site 2
SUPPLIER2
Site 3
SUPPLIER2
14
Levels of Transparency
Level 3 Local Mapping Transparency
Site 1
Fragment and Location Specification
SUPPLIER1
read (terminal, SNUM) select NAME into
NAME from SUPPLIER1 at Site 1
where SNUM SNUM if not (FOUND) then
select NAME into NAME from
SUPPLIER2 at Site 3 where SNUM
SNUM endif write(terminal, NAME)
DDBS
Site 3
SUPPLIER2
15
Review of Relational Model
  • Data are stored in tables, called relations
  • Each relation has a fixed number of columns
    (attributes) and a dynamic number of rows
    (tuples)
  • The relation name and the names of attributes
    appearing in it is called the relation schema of
    the relation
  • e.g. EMP (EMPNUM, NAME, AGE, DEPTNUM)
  • No two identical tuples in the same relation
  • No defined order on the tuples in a relation

EMPNUM NAME AGE DEPTNUM 3
Jones 27
1 7 Smith 34
2 11 Mary 31
1
16
Relational Algebra
  • Relational algebra A collection of operations on
    relation, each of which takes one or two
    relations as operands and produce one relation as
    result
  • Unary Select, Projection
  • Binary Union, Difference, Cartesian Product,
    Join, Semi-Join, Natural Join, Natural Semi-Join

17
Relational Algebra
  • Unary operations

Selection SL ltconditiongt R e.g. SL Aa R
A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a a 1 d
R
18
Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
Projection PJ ltattribute1,attribute2...gt
R e.g. PJA,B R
A B a 1 b 1 b 2
R
  • Duplicated tuples (a, 1, d) are eliminated

19
Relational Algebra
  • Binary operation

R UN S
A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a a 3 f
A B C a 1 a b 1 b a 1 d b 2 f a 3 f
S
R
Union R1 UN R2 R1 and R2 must be same
relation schema e.g. R UN S duplicate tuples
are eliminated
20
Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a a 3 f
A B C b 1 b a 1 d b 2 f
S
R DF S
R
Difference R1 DF R2 R1 and R2 must be same
relation schema e.g. R DF S same
relation schema, all tuples appearing in R but
not in S
21
A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a a 3 f
R CP S
S
R.A R.B R.C S.A S.B S.C a
1 a a 1 a
b 1 b a 1
a a 1 d a 1
a b 2 f a
1 a a 1 a a
3 f b 1 b
a 3 f a 1 d
a 3 f b 2
f a 3 f
R
Cartesian Product R1 CP R2 if two
attributes with same name in R1 and R2, they are
nevertheless considered different attributes in
order to avoid ambiguity. The name of its each
attribute is prefixed with the name of its
original relation
22
Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
A R.B R.C T.B T.C D a
1 a 1 a 1 a
1 a 2 a
3 b 1 b 3 b
1 a 1 d 1
d 4
R
B C D 1 a 1 3 b 1 3 c 2 1 d 4 2 a 3
R JN R.C T.C T
Join R1 JN ltjoin predicategt R2 SL ltjoin
predicategt (R1 CP R2)
T
23
Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a b 1 b a 1 d
R SJ R.C T.C T
R
Semi-Join R1 SJ ltjoin predicategt R2 PJ
attribute(R1) (R1 JN ltjoin predicategt
R2) where attribute(R1) the set of all
attributes of R1
B C D 1 a 1 3 b 1 3 c 2 1 d 4 2 a 3
T
24
Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
A B C D a 1 a 1 a 1 d 4
R
R NJN T
B C D 1 a 1 3 b 1 3 c 2 1 d 4 2 a 3
Natural Join R1 NJN R2 values with same
attribute name in R1 and R2 must be equal one set
of attributes is omitted from the result
T
25
Relational Algebra
A B C a 1 a b 1 b a 1 d b 2 f
A B C a 1 a a 1 d
R NSJ T
R
Natural Semi-Join R1 NSJ R2 PJ
attribute(R1) (R1 NJN R2) where
attribute(R1) the set of all attributes of R1
B C D 1 a 1 3 b 1 3 c 2 1 d 4 2 a 3
T
26
Fragmentation of Relations
  • Rules for defining fragments
  • Completeness
  • each data item in the global relation must appear
    in at least one of the fragments
  • Reconstruction
  • it must always be possible to reconstruct a
    global relation from its fragments
  • Disjointness
  • fragments have to be disjoint

27
Fragmentation of Relations
  • Types of fragmentations
  • horizontal fragmentation
  • Select operation
  • derived horizontal fragmentation
  • Semi-Join operation with another relation
  • vertical fragmentation
  • Project operation

28
Horizontal Fragmentation
  • Partition the tuples of a global relation into
    (disjoint) subsets
  • Usually, each subset contains the data which have
    common (geographical) properties, I.e., grouping
    the suppliers in the same city into a fragment
  • It can be defined by using Select operation on
    the global relation

SUPPLIER(SNUM, NAME, CITY) SUPPLIER1 SL
CITYSF (SUPPLIER) SUPPLIER2 SL CITYLA
(SUPPLIER)
29
Horizontal Fragmentation
  • Assumption SF and LA are the only possible
    values of CITY attribute
  • Completeness
  • all tuples belong to either SUPPLIER1 or
    SUPPLIER2
  • Reconstruction
  • SUPPLIER SUPPLIER1 UN SUPPLIER2
  • Disjointness
  • SUPPLIER1 ? SUPPLIER2 ?

30
Horizontal Fragmentation
In general completeness of k - fragments of R
Reconstruction of R R R1 UN R2 UN .. UN
Rk
Update problem If the update is performed on an
attribute determining a horizontal
fragmentation, then it may happen that the
update tuple has to be moved to another fragment
31
Derived Horizontal Fragmentation
  • In some cases, horizontal fragmentation may be
    derived from the horizontal fragmentation of
    another relation

SUPPLIER(SNUM, NAME, CITY) SUPPLY(SNUM, PNUM,
DEPTNUM, QUAN)
  • To partition SUPPLY so that a fragment contains
    the tuples for suppliers which are in a given
    city. However, city is not an attribute of the
    SUPPLY relation. hint using SNUM and join

SUPPLY1 SUPPLY SJSNUM SUPPLIER1 SUPPLY2
SUPPLY SJSNUM SUPPLIER2
32
Derived Horizontal Fragmentation
  • Completeness is implied by referential integrity
  • every supplier number (SNUM) appearing in SUPPLY
    appears in SUPPLIER as well
  • So, when you update a record in supplier, you may
    also need to check whether you need to move
    supply also
  • Reconstruction is maintained by UN
  • Disjointness can be provided in all cases (in
    general situations). In this example, SNUM is a
    unique key of SUPPLIER

33
Vertical Fragmentation
  • Subdivision of a global relations attributes
    into groups
  • Fragments are obtained by projecting the global
    relation over each group
  • Fragmentation is correct if each attribute maps
    to at least one attribute of the fragments
    (completeness)

EMP(EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM) EMP1
PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMP EMP2 PJ
EMPNUM, SAL, TAX EMP
34
Vertical Fragmentation
  • Completeness of k fragments of R
  • Reconstruction
  • Must be lossless (discuss later)
  • EMP EMP1 NJN EMP2
  • Disjointness
  • practically irrelevant (not necessary and usually
    not required)

35
Correctness of vertical decomposition
  • Lossy decomposition

A B C 1 2 3 4 2 5
R
R1 PJ A,B (R)
A B 1 2 4 2
R ? R3
A B C 1 2 3 1 2 5 4 2 3 4 2 5
R3 R1 NJN R2
R2 PJ B,C (R)
B C 2 3 2 5
36
Lossless Decomposition
  • Theorem
  • Given a relation R with a schema S and a
    decomposition ? S1, S2, ? is lossless iff S1
    ? S2 is a key of R1 or R2
  • Example
  • EMPLOYEE(NAME, POSITION, SALARY)
  • NAME ? POSITION? SALARY
  • Lossless decomposition
  • ?1 R1(NAME, POSITION), R2(POSITION,
    SALARY)
  • EMPLOYEE R1 NJN R2

37
Lossless Decomposition
  • Lossy decomposition
  • ?2 R3(NAME, SALARY), R4(POSITION,
    SALARY)
  • EMPLOYEE ? R3 NJN R4
  • S3 ? S4 is not a key of R3 or R4

38
Updating fragments
EMPLOYEE(NAME, POSITION, SALARY)
(Smith, Programmer, 12000)
PJ
R1 (NAME, POSITION) (Smith, Programmer)
(Smith, Senior Programmer)
R2 (NAME, SALARY) (Smith, 12000)
updating
JN
EMPLOYEE (Smith, Senior Programmer, 12000)
F NAME ? POSITION, POSITION ? SALARY F1
NAME ? POSITION F2 NAME ? SALARY
39
Updating fragments
  • Update condition

EMPLOYEE(NAME, POSITION, SALARY)
(Smith, Programmer, 12000)
PJ
R1 (NAME, POSITION) (Smith, Programmer)
(Smith, Senior Programmer)
R2 (POSITION, SALARY) (Programmer, 12000)
(Senior Programmer, 20000)
updating
NJN
EMPLOYEE (Smith, Senior Programmer, 20000)
40
Mixed Fragmentation
  • EMP(EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)

vertical fragment - EMPNUM, NAME, SAL, TAX
EMP
- EMPNUM, NAME, MGRNUM, DEPTNUM
EMP4
DEPTNUM
horizontal fragment
EMP1
EMP2
EMP3
DEPTNUM gt 20
10 lt DEPTNUM lt 20
41
Mixed Fragmentation
  • EMP1 SL DNUM lt 10 PJ ENUM, NAME, MGRNUM, DNUM
    (EMP)
  • EMP2 SL 10 lt DNUM lt 20 PJ ENUM, NAME, MGRNUM,
    DNUM (EMP)
  • EMP3 SL DNUM gt 20 PJ ENUM, NAME, MGRNUM, DNUM
    (EMP)
  • EMP4 PJ ENUM, NAME, SAL, TAX (EMP)
  • Reconstruction
  • EMP UN (EMP1, EMP2, EMP3) NJNENUM PJENUM,
    SAL, TAX EMP4

42
Fragmentation Example
  • EXAMPLE_DDB
  • Global Schema
  • EMP(EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNM)
  • DEPT(DEPTNUM, NAME, AREA, MGRNUM)
  • SUPPLIER(SNUM, NAME, CITY)
  • SUPPLY(SNUM, PNUM, DEPTNUM, QUAN)
  • DEPT relation, horizontal fragmented into 3
    fragments on the value of the DEPTNUM attribute

43
Fragmentation Example
  • Fragmentation Schema
  • EMP1 SL DNUM lt 10 PJ ENUM, NAME, MGRNUM, DNUM
    (EMP)
  • EMP2 SL 10 lt DNUM lt 20 PJ ENUM, NAME, MGRNUM,
    DNUM (EMP)
  • EMP3 SL DNUM gt 20 PJ ENUM, NAME, MGRNUM, DNUM
    (EMP)
  • EMP4 PJ ENUM, NAME, SAL, TAX (EMP)
  • DEPT1 SL DNUM lt 10 (DEPT)
  • DEPT2 SL 10 lt DNUM lt 20 (DEPT)
  • DEPT3 SL DNUM gt 20 (DEPT)
  • SUPPLIER1 SL CITYSF (SUPPLIER)
  • SUPPLIER2 SL CITYLA (SUPPLIER)
  • SUPPLY1 SUPPLY SJSNUM SUPPLIER1
  • SUPPLY2 SUPPLY SJSNUM SUPPLIER2
Write a Comment
User Comments (0)
About PowerShow.com