NF-SS: A Normal Form for Semistructured Schemata - PowerPoint PPT Presentation

About This Presentation
Title:

NF-SS: A Normal Form for Semistructured Schemata

Description:

University of Auckland, New Zealand. DASWIS 2001. 2. Outline. Motivations. Semistructured schema and its data tree. Integrity constraints for semistructured data ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 32
Provided by: compN
Category:

less

Transcript and Presenter's Notes

Title: NF-SS: A Normal Form for Semistructured Schemata


1

NF-SS A Normal Form for Semistructured Schemata
  • Xiaoying Wu, Tok Wang Ling, Sin Yeung Lee, Mong
    Li Lee
  • National University of Singapore
  • Gillian Dobbie
  • University of Auckland, New Zealand

2
Outline
  • Motivations
  • Semistructured schema and its data tree
  • Integrity constraints for semistructured data
  • NF-SS Normal Form for Semistructured Schemata
  • Designing of semistructured schema into NF-SS
  • Discussions of the designing approach
  • Comparison with related proposal
  • Summary

3
1. Motivation Example 1
  • lt!ELEMENT department (course)
  • lt!ATTLIST department
  • name ID
    REQUIREDgt
  • lt!ELEMENT course (students)gt
  • lt!ATTLIST course
  • cid ID
    REQUIRED
  • title CDATA
    impliedgt
  • lt!ELEMENT student (grade?)gt
  • lt!ATTLIST student
  • sid ID
    REQUIRED
  • name CDATA
    REQUIRED
  • age CDATA
    IMPLIEDgt
  • lt!ELEMENT grade (PCDATA)gt

4
1. Motivation (cont.)
  • Redundancy name and age of a student
  • Updating Anomaly
  • Insertion
  • Rewriting
  • Deletion

5
1. MotivationExample 2
  • lt!ELEMENT teacher (ClassRoom)gt
  • lt!ATTLIST teacher tid ID
    REQUIREDgt
  • name CDATA REQUIREDgt
  • lt!ELEMENT ClassRoom (subject)gt
  • lt!ATTLIST ClassRoom room ID
    REQUIREDgt
  • lt!ELEMENT subject (time)gt
  • lt!ATTLIST subject
  • cid ID
    REQUIREDgt
  • lt!ELEMENT time EMPTYgt
  • lt!ATTLIST day CDATA
    REQUIRED
  • hour CDATA
    REQUIREDgt
  • Path anomaly
  • The schema doesnt reflect the integrity
    constraints tid,day,hour?cid,room

6
2. Semistructured Schema and Data tree
  • A semistructured schema is defined to be D (E,
    A, B, P, R, r)
  • E is a finite set of object types in D.

E Object type
r root Object type
A attributes
  • A is a finite set of attributes, disjoint from E.
  • B is a set of basic domain type like string,
    integer, Boolean etc.
  • P is a function from E to object type definition
    with symbol in , , ? ,1 called multiplicity
  • e.g P (course) student

multiplicity
  • R is a function from E to the power set of A
  • e.g. R(student) sid, name, age
  • r ? E and is called the object type of the root.
  • e.g. r department

7
2. Semistructured Schema and Data tree (Cont.)
A data tree T with respect to a semistructured
schema D (E, A, B, P, R, r) is defined to be a
tree T(V, lab, obj, att, val, root), showing a
database instance.
department

course

course

name CS









title data Mining

cid
title database design

cid


cs5220

cs4221
student




student
student





sid

sid

age

sid

age

name

name

name

grade

s01

21

s01

21

s02

Jack

Jack

Tom

A

8
2. Semistructured Schema and Data tree (Cont.)
  • The path of a node n in semistructured schema D
    is denoted as pathD(n). e.g. PathD for
    student is /department / course / student
  • The path of a node v in data tree T is denoted as
    PathT(v) e.g. PathT for student s02 is
    /department / course/ student
  • The target set of node n in T, Tn, is v v?V,
    n?E?A PathT(v) PathD(n). e.g. the target set
    Tstudent includes nodes of students with sid
    s02 etc.

9
2. Semistructured Schema and Data tree (Cont.)
  • Two nodes from two data tree w.r.t schema D
    satisfy value equality iff
  • they are attributes nodes with the same tag and
    the same value
  • or they are object nodes having the same tag and
    their children are pairwise value equal
  • Two data trees T1 and T2 w.r.t schema D (E, A,
    B, P, R, r), X ?E ? A. T1 and T2 agree on X,
    denoted as iff the following condition is hold
    ?t1?T1X,t2?T2X, such that (t1vt2)

department

course

course

name CS








title data Mining

cid

cid

title database design

cs5220

cs4221
student
student



student



sid

age

sid

sid

name

age

name

name

grade

s01

s01

21

s02

Jack

21

Jack

Tom

A

10
3. Integrity Constraints for Semistructured Data
  • Extended Functional Dependency(EFD)
  • Let D (E, A, B, P, R, r) be a semistructured
    schema, let X ?
  • E?A and Y ? E?A. Y is extended functionally
    dependent on X,
  • is denoted as X?Y. Let S denotes a set of data
    trees that are
  • images of D, S satisfies X?Y, iff for any data
    trees T1, T2 in S,
  • if they agree on every component in X, then they
    will agree on
  • Y.that is, ?T1, T2 ?S((?x?X, T1xT2) such that
    T1yT2).
  • Inference rule for EFD
  • E1(reflexivity) If Y?X, then X?Y, for any X, Y?
    E?A
  • E2(augmentation) if X?Y then XZ?YZ, for any X,
    Y, Z? E?A
  • E3(transitivity) If X?Y, Y?Z then X?Z, for any
    X, Y, Z ? E?A

11
3. Integrity Constraints for Semistructured Data
(Cont.)
O1_at_X1, , Oi_at_Xi,,On-1_at_Xn-1?On_at_Xn
  • Notation
  • EFD X?Y is partial EFD If there exists an X?X
    such that X?Y. Otherwise, is full EFD.
  • e.g. (1) course_at_cid,student_at_sid?student_at_nam
    e is partial EFD
  • (2) student_at_sid?student_at_name its full
    EFD
  • X?Y is said to be coherent iff /X/Y is a path in
    D otherwise it is called an incoherent EFD.

e.g.teacher_at_tid, time _at_day,
_at_hour?subject_at_cid is an incoherent EFD, since
/teacher / time /subject is not a path in schema.
12
3. Integrity Constraints for Semistructured Data
(Cont.)
  • If there exists Z?E?A, such that X?Y and Y?Z and
    Y X, then Z is transitively extended
    functionally dependent on X via Z.
  • e.g. age is transitively dependent on course via
    student since
  • (1) course_at_cid?student_at_sid
  • (2) student_at_sid?student_at_age and
  • (3)student_at_sid course_at_cid

13
3. Integrity Constraints for Semistructured Data
(Cont.)
  • Theorem Let D (E, A, B, P, R, r) be a
    semistructured schema, X, Y, Z ? E ?A. If Z is
    transitively dependent on X via Y, then there
    exists a data tree of D where a rewriting anomaly
    occurs upon updating the values of Z.

14
3. Integrity Constraints for Semistructured Data
(Cont.)
  • Key Constraints Based on EFD semantics
  • Notation Ko O1_at_X1//Oi_at_Xi//On_at_Xn/O_at_X
  • for key of an object type O in
    semistructured schema D.
  • /O1//O is a path in D
  • If n equals one, then Ko is called an
    absolute key. Otherwise it
  • is called a relative key.
  • Example
  • Kbook book_at_isbn. Kbook is an absolute key
  • Kchapter book_at_isbn/chapter_at_number.
    Kchapter is a relative key
  • Ksection book_at_isbn/chapter_at_number/section_at_nu
    mber. Ksection is a relative key

15
3. Integrity Constraints for Semistructured Data
(Cont.)
  • Let D be a semistructured schema and O be its
    root object
  • type. The set of basic dependencies of D, denoted
    as BD(D), is
  • defined as follows
  • Let X, Y be children of O, non-trivial extended
    functional dependencies of the form X?Y where X
    is a key of O or Y is part of a key of O, are in
    BD(D).
  • Let O1 be a sub-object type of O and D1 be a
    schema tree that is rooted at O1 and add KO as
    attribute(s) of O1, then BD(D1) ? BD(D).
  • No other non-trivial dependencies that is not
    generated from above is in BD(D)

16
4. NF-SS
  • Let D be a semistructured schema and O be its
    root object type. D is in Normal Form for
    Semistructured Schemata (NF-SS), iff
  • O has at least one key.
  • For any non-trivial EFD of the form X?Y
    satisfied by O, where X and Y are attributes of
    O, then either X is a key or Y is part of the key
    of O
  • For any sub-object type O1 of O
  • (a) If adding KO to O1 as its
    components with other remains,
  • a schema tree rooted at O1
    will be in NF-SS.
  • (b) KO ?KO1? or KO ?KO1, where KO
    and KO1 are O and O1s key
  • respectively.
  • (c) O1 is not transitively dependent
    on KO
  • 4. Any non-trivial EFD in D can be derived
    from BD(D) by using the
  • inference rules for EFDs.

17
5. Designing Semistructured Schema into NF-SS
  • We adopt restructuring approach for the
    designing.
  • We propose four heuristic restructuring rules
  • Decomposition object types.
  • Creation new object types.
  • Regrouping components of an object type.
  • Objective
  • Remove transitive or partial EFD and incoherent
    EFD from the given dependency and key constraints.

18
5. Designing Semistructured Schema into
NF-SS(cont.)
  • Rule 1. (Remove Transitive Dependency by
    Decomposition)
  • Given an object type O in a semistructured schema
    D, if there is
  • some non-prime component(s) Y of O that is
    transitively
  • dependent on some key of O, i.e., KO ?X, X ? Y
    and X KO , and
  • X ? KO ?. Then, restructuring the schema as
    follows.
  • 1. Duplicate X to form a new node(s) Z.
  • 2. Move Y and all the descendants of Y and
    their corresponding
  • edges under Z.
  • 3. Make X as foreign key of O, and add a
    reference edge from
  • the original node X to Z.

19
5. Designing Semistructured Schema into
NF-SS(cont.)
  • Example 5.1 schema D satisfies the following
    EFDs
  • (1)department_at_name?course_at_cid (2)
    course_at_cid?department
  • (3)course_at_cid?course_at_title
    (4)course_at_cid?student_at_sid
  • (5)course_at_cid,student_at_sid?grade
    (6)student_at_sid?student_at_name, _at_age

20
5. Designing Semistructured Schema into
NF-SS(cont.)
  • Rule 2. Remove Path Anomaly by Path Splitting
  • Given a semistructured schema D. Suppose there
    exists an
  • incoherent EFD O1_at_X1,,On_at_Xn ? Y, Y is
    either an object
  • type or an attribute, and there exists a path P
    that contains
  • O1,,On,Y. Path P can be split into two
    sub-paths P1 and
  • P2,where P1 only contains O1,,On and Y,
    while P2 contains
  • O1,,On and (P-Y).

21
5. Designing Semistructured Schema into
NF-SS(cont.)
  • Example 5.2schema D satisfies following EFDs
  • (1) teacher_at_tid,time?ClassRoom
    (2)teacher_at_tid, time?subject

22
5. Designing Semistructured Schema into
NF-SS(cont.)
  • Rule 3. Removing Partial Dependency by Creating
    New
  • Object type
  • Given an object type O in a semistructured
    schema, let X be a
  • set of prime attributes of O, and Y be the set of
    Os
  • attributes. Let O1 be a sub-object type of O. If
    (KO -X) ? O1
  • and no proper superset of X satisfy this
    property, then
  • restructure the schema as follows
  • 1. (KO ?Y X) becomes the only attribute(s) of
    O while O1
  • remains to be its sub-object type.
  • 2.Create a new object type O2 that is a direct
    component of O.
  • 3.Move rest of the components of O and all
    their descendants and corresponding edges under
    O2.

23
5. Designing Semistructured Schema into
NF-SS(cont.)
  • Example 5.3 schema D shown in Figure (a). the
    following EFDs O_at_A,_at_B?D, O_at_A,_at_B?O2, O_at_A?
    O1, O_at_A ?E and the key of O is A,B.

24
5. Designing Semistructured Schema into
NF-SS(cont.)
  • Rule 4. (Restructuring To Satisfy Condition 3(b)
    of NF-SS Definition)
  • Given an object type O in a semistructured schema
    D, X be a
  • set of Os attributes and single-valued atomic
    sub-object
  • types, O1 be a complex sub-object type of O. O1
    has relative
  • key KO1 , but KO ? KO1 and KO1 KO .Let Y be
    KO ? KO1 ? X, and Y
  • ??. D is restructured as follows
  • 1. O1 remains to be a sub-object type of O.
  • 2. Make Y as components of O.
  • 3.Create a new object type O2 to be a child
    of O and the rest components of O (excluding Y)
    become children of O2.

25
5. Designing Semistructured Schema into
NF-SS(cont.)
  • Example 5.4 schema D in Figure (a) satisfies the
    EFD (1) O_at_K, _at_A? O1 (2) O_at_K, _at_B?O2 and the
    key of O is K, A, B.

26
5. Designing Semistructured Schema into
NF-SS(cont.)
  • Algorithm 1 Restructuring Algorithm
  • Input A set S that contains semistructured
    schemas, and a set of
  • EFDs for S.
  • Output A set of semistructured schemas that in
    NF-SS.
  • Begin
  • 1. for each semistructured schema D in S do
  • if D is not in NF-SS then repeat until no
    further change
  • (1) if there exists transitive EFD KO ? X, X
    ? Y and X KO for an
  • object type O in D,
  • Case X ? KO ? apply Rule 1 to remove
    the transitive EFD.
  • Case X ? KO apply Rule 3 to remove the
    transitive EFD.
  • Case X ? KO ?? apply Rule 4 to remove
    the transitive EFD.
  • (2) if there exists incoherent EFD then apply
    Rule 2 to remove it.
  • 2. output S.
  • End

27
6. Discussion of Restructuring Approach for
Designing
  • Is the restructuring rules complete? No.
  • covering is not guaranteed
  • dependency preservation is not guaranteed
  • Does it give unique solution? No.
  • depending on the order in which the dependencies
    are examined
  • Designing task can be made easier if more
    semantics available.
  • In 5, We have proposed another approach for
    designing semistructured databases using ORA-SS,
    a semantic rich model .
  • Nevertheless, it does give practical heuristics
    and provides insights into the normalization task
    for semistructured databases.

28
7. Comparison with Related Proposal
  • The first attempt to define normal form for
    semistructured data
  • (ER99 S.Y.Lee, M.L.Lee, T.W.Ling, and
    L.A.Kalinichenko.) 3
  • Defines a schema called S3-Graph, which makes no
    distinction between element node and attribute
    node and no cardinality specification.
  • Proposes S3-NF, but missing key constraints, an
    essential part of database design.
  • The decomposition method may not be able to
    remove some other kinds of anomalies, like
    partial dependency and path anomaly that may
    exist in a schema.
  • The most recent proposal XNF (XML Normal Form)
  •       (ER 2001 D.W.Embley and W.Y.Mok. ) 2
  • It mainly provides algorithms to translate a
    schema, represented in a conceptual model called
    CM hypergraphs, to a scheme-tree forest in XNF.
  • Like S3-Graph, scheme tree doesn't lend itself to
    XML definition.
  • XNF isnt formulated with the concept of key.
  • The algorithms given suffers from efficiency.
  • A large set of results is expected.

29
8. Summary
  • A normal for semistructured schemata
  • It is incorporated with integrity constraints.
  • It guarantees no redundancy and hence no
    undesirable updating anomalies for the conforming
    semistructured databases.
  • It gives more reasonable representations of real
    world semantics
  • Restructuring Approach for designing
    semistructured databases
  • a set of heuristic restructuring rules is
    proposed.
  • an algorithm for iteratively restructuring a
    schema into NF-SS is developed.
  • It provides insights into the normalization task
    for semistructured databases.

30
References
  • 1. J. Clark and S. DeRose. XML Path Language
    (XPath). W3C Working Darft, November 1999.
    http//www.w3.org/TR/xpath.
  • 2.D.W.Embley and W.Y.Mok. Developing XML
    Documents with Guaranteed Good Properties.
    Proceedings of the 20th International Conference
    on Conceptual Modeling (ER), 2001.
  • 3. S. Y. Lee, M. L. Lee, T. W. Ling and L. A..
    Kalinichenko. Designing Good Semi-structured
    Databases. Proceedings of the 18th International
    Conference on Conceptual Modeling (ER), 1999.
  • 4. T. W. Ling and L. L. Yan. NF-NR A Practical
    Normal Form for Nested Relations. Journal of
    Systems Integration. Vol4, 1994, pp309-340
  • 5. Xiaoying Wu, Tok Wang Ling, Mong Li Lee,
    Gillian Dobbie. Designing Semistructured
    Databases Using the ORA-SS Model, accepted for
    publication in Proceedings of the 2nd
    International Conference on Web Information
    Systems Engineering (WISE) , IEEE Computer
    Society, Kyoto, Japan, December 2001.

31
QA
Write a Comment
User Comments (0)
About PowerShow.com