XML Publishing: Bridging Theory and Practice - PowerPoint PPT Presentation

About This Presentation
Title:

XML Publishing: Bridging Theory and Practice

Description:

unbounded depth, nondeterministic 'shape', cannot be decided statically at compile time ... Complexity: evaluation cost, static analyses ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 55
Provided by: cseBu
Category:

less

Transcript and Presenter's Notes

Title: XML Publishing: Bridging Theory and Practice


1
XML PublishingBridging Theory and Practice
  • Wenfei Fan
  • University of Edinburgh
  • and
  • Bell Laboratories

2
XML documents
  • Rooted, node-labeled, ordered, unranked tree
  • element e.g., course, prereq tagged, subtree,
  • subelement, e.g., the prereq child of course
  • text node, e.g., CS650, carrying text, not
    tagged, leaf

3
XML publishing data exchange on the Web
RDB
XML view
view mapping
source
  • Most legacy data is stored in relational
    databases
  • XML has become the prime standard for data
    exchange

Web
XML
XML
Q XML view
publishing
DB1
DB2
4
XML publishing an XML interface of databases
query
answer
DTD
XML
publishing
query translation
middleware
DBMS
RDB
Querying and updating traditional databases via
XML views
5
Example XML publishing
Registrar DB
XML view
R
  • Relational schema R0
  • course (cno, title, type)
  • prereq (cno1, cno2) -- prerequisite
    hierarchy
  • XML DTD D0
  • db ? course
  • course ? cno, title, type, prereq
  • prereq ? cno, prereq
  • type ? regular project

6
XML publishing languages in practice
  • XML view definition languages XML views
    published from RDB
  • Commercial products
  • Microsoft SQL Server 2005 (FOR-XML, XSD)
  • IBM DB2 XML Extender (SQL/XML, DAD SQL, RDB)
  • Oracle 10g XML DB (SQL/XML, DBMS_XMLGEN)
  • Research prototypes
  • XPERANTO
  • TreeQL (SilkRoute)
  • ATG (PRATA)

7
XML publishing in practice
RDB
XML view
view mapping
source
Top-down from the root, via embedded relational
queries
relationalquery
RDB
db
Q
...
course
course
course
Q1
course
cno
type
title
prereq
Q2
...
regular
Web DB
CS650
cno
cno
prereq
8
XML publishing question of the users
  • What language should a user choose to express the
    view?
  • unbounded depth, nondeterministic shape, cannot
    be decided statically at compile time
  • prereq ? cno, prereq
  • type ? regular project

db
...
course
course
course
course
collection
type
cno
title
X
X
regular
Web DB
CS650
project
unbounded
Few publishing languages can define this view
9
XML publishing question of database vendors
  • XML view under each course, list all its
    prerequisites, direct or not
  • collapsing prerequisite hierarchy
  • a tree of depth three
  • Question is it necessary to upgrade DBMS and
    support SQL99?

db
Q
...
course
Q1
course
course
course
...
type
cno
cno
title
cno
CS650
Web DB
project
The expressive power and complexity of XML
publishing languages
10
Outline
  • XML publishing transducers
  • Characterization of XML publishing languages in
    practice
  • Complexity evaluation cost, static analyses
  • Expressive power tree generation, relational
    characterization
  • Dynamic aspect incremental XML publishing, view
    updates
  • Open research issues
  • Joint work with
  • Theory Floris Geerts, Frank Neven PODS07
  • System Michael Benedikt, Phil Bohannon, Cheeyong
    Chan, Rajeev Rastogi, SIGMOD03,04
    VLDB02,04,05 ICDE07

11
Outline
  • XML publishing transducers
  • Characterization of XML publishing languages in
    practice
  • Complexity evaluation cost, static analyses
  • Expressive power tree generation, relational
    characterization
  • Dynamic aspect incremental XML publishing, view
    updates
  • Open research issues

12
XML publishing transducers
  • ? (Q, ?, q0, ?) for a relational schema R
  • Q a finite set of states
  • ? a finite alphabet of XML tags, with a root r
    and text
  • q0 the start state
  • ? for each pair (q, a) in Q ? ?
  • (q, a) ? (q1, a1, ?1(x1, y1)), . . ., (qk,
    ak, ?k(xk, yk)),
  • to generate the children of a nodes a1, . . .,
    ak
  • register Rega set-valued, fixed arity, with each
    a-node
  • ?i query R ? Rega ? Regai in a relational query
    language L
  • xi a list of free variables in ?i, grouping
    attributes
  • deterministic
  • (q, text) ? . -- Empty RHS text nodes have no
    children

13
Top-down transduction
  • Start rule ?(q0, r) -- q0, r0 do not appear on
    the RHS of any rule
  • (q0, db) ? (q, course, ?1(c, t ?))
  • ?1(c, t nil) ? t course(c, t, t)
  • recall course (cno, title, type)
  • tuple register Regc group the result by all
    attributes
  • for each distinct tuple tp in the result of
    ?1(x ?)
  • create a course element
  • carry the tuple tp in Regc
  • expand at leaf nodes

x (c, t) y ?
(q0, db)
(q, a) labeled carrying Reg
...
(q, course)
(q, course)
(q, course)
(q, course)
Regc
Regc
Regc
Regc
14
Registers tuple vs. relation
  • (q, course) ? (q, cno, ?2(c ?)), (q,type,?3(t
    ?)), (q, prereq, ?4(? c))
  • ?2(c ?) ? t Regc(c, t)
  • ?4(? c) ? t, c (Regc(c, t) ? prereq(c,
    c))
  • recall prereq(cno1, cno2)
  • tuple registers Regcno, Regt
  • relation register Regp x ?, the result of
    ?4(? c) is a set
  • top down information passing the parent register
    Regc in ?4(? c)

x ? y ( c )
(q0, db)
...
(q, course)
(q, course)
(q, course)
(q, course)
Regc
Regc
Regc
Regc
(q, type)
(q, cno)
(q, prereq)
Regcno
Regt
Regp
15
Recursive transducer and stop condition
  • (q, prereq) ? (q, cno, ?5(c ?)), (q, prereq,
    ?5(? c))
  • ?5(? c) ? t, c (Regp(c, t) ? prereq(c,
    c))
  • Stop conditions
  • ?5(? c) returns an empty set
  • the RHS of ?(q, a) is empty (e.g., for text
    nodes)
  • there is an ancestor node with the same label,
    tag and register
  • No new information can be added to the tree

relation Reg
tuple Reg
(q0, db)
(q, course)
(q, prereq)
Regp
(q, a)
Rega
...
(q, cno)
(q, prereq)
(q, cno)
Regp
Regcno
Regcno
(q, a)
Rega
STOP
16
Transformation of a publishing transducer ?
  • terminates on a DB of R if all leaf nodes satisfy
    a stop condition
  • ?(DB) XML tree, by striking out states and
    registers
  • ?(R) the set of XML trees generated by ? for all
    DB of R

17
publishing transducers with virtual nodes
  • ? (Q, ?, ?a, q0, ?)
  • ?a a subset of ?, virtual tags
  • Recall the view under each course, list all its
    prerequisites
  • ? (Q, ?, ?a prereq, q0, ?)

Virtual nodes are removed from the output
18
Various classes of publishing transducers
  • PT(L, S, O)
  • L the relational query language (CQ, FO, FP,
    with and ?)
  • S register, relation vs. tuple (a special case
    of relation Reg)
  • O output nodes, normal vs. virtual
  • PTnr(L, S, O) non-recursive subset of PT(L, S,
    O)
  • Example
  • View 1 PT(CQ, relation, normal)
  • View 2 PT(CQ, relation, virtual) and PTnr(FP,
    tuple, normal)
  • As opposed to query automata
  • take a relational database as input, rather than
    an existing tree
  • output a new tree, rather than accepting a tree
    or selecting nodes
  • In contrast to recent work on schema mapping
  • relations to XML, not relation-to-relation or
    XML-to-XML
  • via embedded relational queries, not
    source-to-target constraints

19
Outline
  • XML publishing transducers
  • Characterization of XML publishing languages in
    practice
  • Complexity evaluation cost, static analyses
  • Expressive power tree generation, relational
    characterization
  • Dynamic aspect incremental XML publishing, view
    updates
  • Open research issues

20
Existing XML publishing languages
  • Extensions of SQL by incorporating XML publishing
    functions
  • Microsoft SQL Server 2005 (FOR-XML)
  • IBM DB2 XML Extender (SQL/XML)
  • Oracle 10g XML DB (SQL/XML, DBMS_XMLGEN)
  • XPERANTO
  • Annotating schema or fixed tree template with
    relational queries
  • Microsoft SQL Server 2005 (XSD)
  • IBM DB2 XML Extender (DAD SQL, RDB)
  • TreeQL (SilkRoute)
  • ATG (PRATA)
  • . . .

21
Extensions of SQL for XML publishing
  • SQL/XML XMLElement, XMLForest, XMLAgg,
    XMLConcat,
  • SELECT XMLELEMENT NAMEcourse,
  • XMLFOREST c.cno AS cno, c.title AS
    title
  • FROM course c

db
...
course
course
course
course
title
cno
  • PTnr(FO, tuple, normal) no recursion, virtual
    nodes
  • XPERANTO PTnr(FO, tuple, normal)
  • Microsoft SQL Server 2005 (FOR-XML) PTnr(FO,
    tuple, normal)
  • Oracle 10g XML DB
  • DBMS_XMLGEN PT(FP, tuple, normal) (connect-by of
    SQL99)

22
Annotating schema or tree template
  • ATG of PRATA DTD-directed view definition,
    inherited attributes
  • prereq ? cno, prereq
  • cno ? Q(prereq_p), prereq_c
    Q(prereq_p) / semantic rules /
  • Q SELECT cno2 FROM prereq p,
    prereq_p p
  • WHERE p.cno1 prereq_p.cno
  • prereq_p parent attribute (relation register)

prereq
...
prereq
cno
cno
  • PT(FO, relation, virtual) recursive views,
    virtual nodes, DTD-conformance
  • Microsoft SQL Server 2005 (XSD) PTnr(CQ, tuple,
    normal)
  • IBM DB2 XML Extender DAD-SQL PTnr(CQ, tuple,
    normal),
  • DAD-RDB PTnr(CQ, tuple, normal)
  • TreeQL (SilkRoute) PTnr(CQ, tuple, virtual)

23
Putting these together
Microsoft SQL Server 2005 FOR XML PTnr(FO, tuple, normal)
annotated XSD PTnr(CQ, tuple, normal)
IBM DB2 XML Extender SQL/XML PTnr(FO, tuple, normal)
DAD-SQL PTnr(FO, tuple, normal)
DAD-RDB PTnr(CQ, tuple, normal)
Oracle 10g XML DB SQL/XML PTnr(FO, tuple, normal)
DBMS_XMLGEN PT(FP, tuple, normal)
XPERANTO PTnr(FO, tuple, normal)
SilkRoute TreeQL PTnr(CQ, tuple, virtual)
PRATA ATG PT(FO, relation, virtual)
24
Outline
  • XML publishing transducers
  • Characterization of XML publishing languages in
    practice
  • Complexity evaluation cost, static analyses
  • Expressive power tree generation, relational
    characterization
  • Dynamic aspect incremental XML publishing, view
    updates
  • Open research issues

25
Termination and evaluation cost
  • Given a publishing transducer ? defined for a
    relational schema R,
  • does the transformation of ? on DB terminate on
    all DB of R?
  • how expensive is it to compute ?(DB)?
  • ?(DB) is always defined on any instance DB of R.
  • Worst-case data complexity
  • EXPTIME if ? is in PT(L, tuple, O)
  • 2EXPTIME if ? is in PT(L, relation, O)
  • PTIME if ? is in PTnr(L, S, O)
  • Tight bounds DAG ? tree, n-digit binary counter
  • L and O have no impact on the worst-case data
    complexity

26
Static analyses
  • For a class PT(L, tuple, O) of publishing
    transducers,
  • The emptiness problem given ? in PT(L, tuple,
    O), can ? generate a nontrivial XML tree?
  • Does the publishing transducer make sense?
  • The membership problem given an XML tree T and
    transducer ? in PT(L, tuple, O), can ? generate T
    with some DB?
  • Can ? generate XML views that the user wants?
  • The equivalence problem given ?1, ?2 in PT(L,
    tuple, O) on the same relational schema R, do ?1
    and ?2 generate the same XML views over all
    instances of R?
  • Optimization Can ?1 be replaced by a more
    efficient ?2?

27
Matching complexity bounds for static analyses
  • PT(L, S, O) when L is either FO or FP beyond
    reach
  • emptiness, membership and equivalence
    undecidable
  • PT(CQ, S, O) slightly better
  • Emptiness
  • PTIME if O is normal
  • NP-complete if O is virtual
  • Membership
  • ?2p-complete for PT(CQ, tuple, normal)
  • undecidable if S is relation or O is virtual
  • Reduction from (a) the satisfiability problem for
    FO queries, and (b) the emptiness problem for
    2-head DFA
  • Equivalence undecidable
  • Reduction from the halting problem for 2RMs

28
Complexity bounds for non-recursive transducers
  • PTnr(FO, S, O) all three problems remain
    undecidable
  • PTnr(CQ, S, O) make our lives easier
  • Emptiness the same as PT(CQ, S, O)
  • Membership (S is tuple)
  • PTnr(CQ, tuple, normal) ?2p-complete no better
  • PTnr(CQ, tuple, virtual) undecidable ?
    ?2p-complete
  • Establish the small model property
  • Equivalence
  • PTnr(CQ, tuple, O) undecidable ? ?3p-complete
  • Lower bound reduction from ???3SAT
  • Upper bound a constructive proof

29
Summary complexity bounds
fragments Equivalence Emptiness Membership
PT(FP, S, O) undecidable undecidable undecidable
PT(FO, S, O) undecidable undecidable undecidable
PT(CQ, tuple, normal) undecidable PTIME ?2p-complete
PT(CQ, relation, normal) undecidable PTIME undecidable
PT(CQ, S, virtual) undecidable NP-complete ?2p-complete
PTnr(FO, O, S) undecidable undecidable undecidable
PTnr(CQ, tuple, normal) ?3p-complete PTIME undecidable
PTnr(CQ, tuple, virtual) ?3p-complete NP-complete ?2p-complete
30
Outline
  • XML publishing transducers
  • Characterization of XML publishing languages in
    practice
  • Complexity evaluation cost, static analyses
  • Expressive power tree generation, relational
    characterization
  • Dynamic aspect incremental XML publishing, view
    updates
  • Open research issues

31
Containment relation
PT(FP, relation, virtual) PT(FO, relation,
virtual)
PT(FP, tup, virt)
PT(CQ, rel, virt)
PT(FP, rel, nm)
PT(FP, tup, nm)
PT(FO, rel, nm)
PT(FO, tup, virt)
PT(FO, tup, nm)
PT(CQ, rel, nm)
PTnr(FO, tup, nm)
PT(CQ, tup, virt)
PT(CQ, tup, nm)
PTnr(CQ, tup, virt)
PTnr(CQ, tup, nm)
XML view under each course, list all its
prerequisites, direct or not No need to upgrade
DBMS and support SQL99
32
Compared to logical transduction
  • (?dom(x), ?root(x), ?edge(xy), ?lt(xy),
    ?fc(xy), ?ns(xy), ?a(x))
  • domain, root, edge, order, first-child,
    next-sibling, label
  • define DAGs, unfold into a tree
  • FO-transductions, SO-transduction (fixed
    k-arity), PTIME FO-transductions,
    PSPACE-SO-transductions
  • Publishing transducers vs. logical transductions
  • L-transductions ? PT(L, tuple, virtual)
  • strict for FO
  • PSPACE-SO-transductions ? PT(FP, relation,
    virtual) (ordered)
  • PTIME-FO-transductions ? PT(FO, relation,
    virtual) (ordered)
  • fixed-depth L-transductions PTnr(L, tuple, O)
    (unordered tree)
  • PTnr(L, tuple, O) ? fixed-depth L-transductions
    (L FP, FO)
  • No need to code stop conditions

33
DTD and specialized DTD
  • DTD D (?, r, ?), ? a ? ? for each a ? ?
  • normalized ? a1, , ak a1 ak a,
    Specialized DTD D (?, D, g), D a DTD, g ?
    ? ?
  • T conforms to D there is T s.t. T g(T) and
    T conforms to D
  • Captures MSO definable trees and regular trees
  • Capturing (specialized) DTD
  • specialized DTDs are definable in PT(FO, tuple,
    virtual)
  • normalized DTDs are definable in PT(FO, tuple,
    normal)
  • there are normalized DTDs not definable in PT(CQ,
    S, O)
  • Check each a ? ? in FO, return a default in the
    presence of violation

DTD-directed publishing All members of a
community (or industry) agree on a DTD and then
exchange data w.r.t. the predefined DTD
34
publishing transducer as a relational query
  • Input ? (Q, ?, q0, ?) for R, an output tag
    o ? ?, a DB of R
  • Output the union of Rego(v) for all v in the
    tree generated

db
relational query
...
Q1
course
course
course
course
RDB
cno
type
title
prereq
Q2
Reg
...
regular
Reg
cno
cno
prereq
output
35
Containment hierarchy as relational queries
Flattened PT(L, S, virtual) PT(L, S,normal)
PT(FP, relation, O) PT(FO, relation, O)
PT(FO, rel, O)
PT(FP, tup, O)
PT(CQ, rel, O)
PT(FO, tup, O)
not strict if NLOGSPACE PTIME
PT(CQ, tup, O)
PTnr(FO, tuple, O)
PTnr(CQ, tuple, O)
36
complexity classes and relational query languages
  • PT(FO, relation, O) captures PSPACE (ordered or
    unordered)
  • Recognition problem can be determined using
    PSPACE TM
  • Simulate partial fixpoint query and define a
    total order
  • PT(FP, tuple, O) captures FP and thus PTIME
    (ordered)
  • PT(FO, tuple, O)
  • captures TC0FO and thus NLOGSPACE (ordered)
  • ? TC0FO (unordered)
  • Simulate transitive closure logic and vice versa
  • PT(CQ, relation, O) contains deterministic
    datalog
  • PT(CQ, tuple, O) captures linear datalog
  • datalog p(x) ? p1(x1), , pk(xk)
  • deterministic each p(x) has only one rule
  • linear at most one pj is an IDB

37
non-recursive classes as relational query
languages
  • PTnr(FO, tuple, O) captures FO (ordered or
    unordered)
  • PTnr(CQ, tuple, O) captures UCQ (ordered or
    unordered)
  • Simulate union of conjunctive queries and vice
    versa
  • Those corresponding to existing XML publishing
    languages
  • PTnr(FO, tuple, O) SQL/XML, FOR-XML (Microsoft),
    IBM DAD (SQL),
  • PTnr(CQ, tuple, O) XSD (Microsoft), TreeQL

38
Expressiveness as relational queries
fragments Complexity/language
PT(FP, relation, O) PSPACE
PT(FO, relation, O) PSPACE
PT(FP, tuple, O) FP, PTIME (ordered databases)
PT(FO, tuple, O) TC0FO, NLOGSPACE (ordered databases)
PT(CQ, relation, O) ? deterministic datalog
PT(CQ, tuple, O) TC0CQ, linear datalog
PTnr(FO, tuple, O) FO
PTnr(CQ, tuple, O) UCQ
PT(L, S, virtual) PT(L, S, normal)
39
Outline
  • XML publishing transducers
  • Characterization of XML publishing languages in
    practice
  • Complexity evaluation cost, static analyses
  • Expressive power tree generation, relational
    characterization
  • Dynamic aspect incremental XML publishing, view
    updates
  • Open research issues

40
Incremental publishing
  • Input
  • a publishing transducer ? for relational schema R
  • an instance DB of R
  • XML view T ?(DB)
  • relational updates ?DB
  • Output XML updates ?T such that T ?T ? (DB
    ?DB)
  • Commercial products limited support

XML
?T
publishing
DBMS
middleware
incremental updates
RDB
?DB
41
Why incremental update?
DB
XML publishing
source database
cached T
  • Batch computation recompute the entire XML tree
    from scratch
  • large XML views may take several hours to
    produce!
  • Incremental computation compute XML change ? T
  • Idea the new view T the old view T ? T
  • Typically more efficient to compute ? T (small)
    and update the old view T with ? T
  • Why? the new view T often differs slight from
    the old view T reuse partial results computed
    earlier

42
Reduction Approach
  • Most XML middleware takes a reduction approach
  • treat Relational Database Systems (DBMS) as a
    black box,
  • re-use as much functionality of DBMS as possible
  • Why not the reduction approach for incremental
    updates?
  • XML views are recursive
  • Few systems support WITHRECURSIVE (linear
    recursion)
  • Fewer support its use in views
  • None supports incremental update of recursive
    views (many algorithms are known for incremental
    updates of recursive views, but unfortunately not
    in practice)
  • The lowest common denominator of functionality of
    DBMS -- no need for (recursive) view-update
    support

43
Sub-Tree Property
report
...
patient
patient
patient
patient
policy
treatment
name
SSN
Cheney
234
44
Storing and updating XML a DAG representation
  • Storing each XML sub-tree only once, at any level
    of granularity
  • Associate an ID with each node in the tree
    (Skolem function)
  • Small, unique value derived from the nodes
    register
  • A hash table H to map from (q, type, ID) to a
    node in the graph
  • Sub-tree pool each node has a reference count
    and a children list (q1, type1, ID1), (q2,
    type2, ID2),
  • XML update ?T (E, E-) of edges
    ((q1,type1,ID1), (q2,type2,ID2))
  • E- remove (q2, type2, ID2) from the child list
    of (q1,type1,ID1) and decrement reference count
    on (q2, type2, ID2)
  • E insert (q2, type2, ID2) in the child list of
    (q1, type1, ID1) and increment reference count on
    (q2, type2, ID2)
  • Nodes with 0 reference counts move to sub-tree
    pool to be reused

H
(tname, chemo), (inTreatment, iT23)
(treatment, t123)
(inTreatment, iT234)
(treatment, t345), (treatment, t567),
45
Computing XML changes
  • Computing XML changes ? T from database changes
    ? DB by incrementalizing SQL queries in a
    transducer
  • select IP, P.tname2
  • from ? Procedure P, inTreatment IP
  • where P.tname1 IP
  • Cuts (deletions) given ? DB, deletions of the
    existing edges of T are determined by executing a
    fixed number of non-recursive ? SQL queries
    no recursion is involved (sub-tree property)
  • Buds (new sub-tree generation) top-down
    iteration, evaluating non-recursive ? SQL queries
    at each step
  • Each new sub-tree is computed at most once, by
    sub-tree reusing (sub-tree pool) minimizing
    recomputations
  • Partial results are complete up to a certain
    level at each step, allowing lazy evaluation and
    parallel processing

46
Steps to Bud-Cut
1. For a set of database changes, ?DB, execute a
fixed number of non-recursive queries which
determine direct edge changes, E-, E
report
patient
patient
patient
patient
policy
treatment
name
policy
treatment
name
SSN
SSN
Cheney
234
inTreatment
tname
inTreatment
tname
Bush
123
treatment
treatment
47
The XML view update problem
  • Input
  • a publishing transducer ? for relational schema R
  • an instance DB of R
  • XML view T ?(DB)
  • XML updates ?T
  • Output relational updates ?DB such that T ?T
    ? (DB ?DB)
  • Commercial systems limited support, already hard
    for relational views

XML
?T
publishing
DBMS
middleware
view updates
RDB
?DB
48
New challenges introduced by XML view updates
  • Revising the semantics of side effects
  • ?T delete coursecnoCS650//coursecnoCS4
    50/prereq/
  • Subtree property remove the prerequisites of all
    CS450 occurrences?
  • DTD validation (if any)
  • recursively defined
  • XML views
  • XML updates

db
...
course
course
course
cno
prereq
...
?
CS650
course
X
49
Processing XML view updates
Deriving relational views V from XML views
(edge relations of DAG external storage)
XML
?T
1. DTD validation reject ? T if violation
relational views V
2. Computing view updates ? V from ? T
3. Computing updates ? DB from ? V May not exist
reject ? T if not
?V
4. Update the underlying DB and view V with ? DB
from ? V
?DB
DB
  • Main challenges relational view updates
  • Hard deciding view updatability is
    intractable/undecidable
  • Open complexity, algorithm, commercial system
    support

50
Outline
  • XML publishing transducers
  • Characterization of XML publishing languages in
    practice
  • Complexity evaluation cost, static analyses
  • Expressive power tree generation, relational
    characterization
  • Dynamic aspect incremental XML publishing, view
    updates
  • Open research issues

51
XML integration complexity and expressiveness
DTD
DB
DB
integration
DB
constraints
multiple, distributed sources
  • XML integration transducers
  • Two-way vs. top-down context-dependent
    generation
  • Integrity constraints conformance to XML schema
  • Information preservation data migration
  • XML integration language Attribute Integration
    Grammar (AIG)

52
XML shredding
query
answer
XML
shredding
query translation
DBMS
middleware
RDB
  • Storing XML data in relations storage, query
    processing, RDBMS transaction control,
  • Primary goal
  • store part or entire XML documents content
    based
  • increment existing relations, rather than build a
    new one
  • directed by recursive XML schema

53
XML shredding automata
Q
Reg
prereq
  • Shredding automata vs. publishing transducers
  • take an existing tree as input, rather than
    relations
  • embedded XML queries, not relational, to compute
    Reg
  • output union of relation registers tuples to
    insert
  • combining XML SAX parsing and shredding, e.g.,
    XML2DB
  • Primary goal expressive power and complexity

54
Summary
  • XML publishing a synergy between theory and
    practice
  • characterization of XML publishing languages in
    practice
  • expressive power and matching complexity bounds.
  • helpful guidance for both the users and database
    vendors
  • Dynamic aspects incremental publishing and view
    updates.
  • important yet overlooked by and large
  • Open research issues
  • XML integration transducers
  • XML shredding automata
  • . . .
  • An attempt to bridge theory and practice
Write a Comment
User Comments (0)
About PowerShow.com