XML Publishing - PowerPoint PPT Presentation

About This Presentation
Title:

XML Publishing

Description:

Given two schemas S1 and S2, we want to translate instances of ... one gets an XML view that typechecks only after repeated failures and with luck. QSX (LN 5) ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 67
Provided by: SchoolofI9
Category:
Tags: xml | publishing | run | up

less

Transcript and Presenter's Notes

Title: XML Publishing


1
XML Publishing
  • XML publishing overview
  • SilkRoute (ATT)
  • XPERANTO (IBM Research)
  • Clio (IBM Research and U. Toronto)
  • Schema-directed XML Publishing
  • ATG (Bell Labs and U. Edinburgh)
  • PRATA implementation of ATG
  • Extensions of ATG

2
XML -- its an exchange format
Data base
Data base
HTML
Data base!
  • Most data is stored in pre-existing databases
  • Need to provide XML wrappers to export data

3
XML publishing
  • An XML view definition language specifying
    desired mapping
  • Efficient implementation of the view language

Q XML view
4
From relations to XML Views
ltActorgt ltLnamegtViterellilt/Lnamegt
ltFnamegtJoelt/Fnamegt ltMovie year1999gt Analyze
This lt/Moviegt ltMovie year2001gt See Spot
Run lt/Moviegt lt/Actorgt ltActorgt
ltLnamegtWinterlt/Lnamegt ltFnamegtAlexlt/Fnamegt
ltMovie year1988gt Bill and Teds Excellent
Adventure lt/Moviegt
Actor(aid, lname, fname) lt001, Viterelli,
Joegt, Movie(mid, title, year) lt011,
Analyze This, 1999gt, lt032, See Spot Run,
2001gt Appearance(mid, aid) lt001, 011gt,
lt001, 032gt
5
Commercial systems -- canonical publishing
  • Canonical publishing the universal-relation
    approach
  • Embedding single SQL query in XSL stylesheet
  • Result canonical XML representation of
    relations
  • Systems
  • Oracle 10g XML SQL facilities SQL/XML, XMLGen
  • IBM DB2 XML Extender SQL/XML, DAD
  • Microsoft SQL Server 2005 FOR-XML, XSD
  • incapable of expressing practical XML publishing
    default fixed XML document template

6
Canonical publishing
A.lname A.fname M.year M.title Viterelli
Joe 1999 See Spot Run Viterelli Joe
2001 Analyze This
IBM DB2 SQL Statement giving big relation
ltSQL_stmtgt SELECT A.lname,A.fname, M.year,
M.title FROM Movie M, Actor A, Appearance
Ap WHERE M.midAp.mid AND M.aidA.Aid ORDER BY
aid lt/SQL_stmtgt
ltActorgt ltLnamegtViterellilt/Lnamegt
ltFnamegtJoelt/Fnamegt ltMovie year1999gt Analyze
This lt/Moviegt ltMovie year2001gt See Spot
Run lt/Moviegt lt/Actorgt ltActorgt
ltLnamegtWinterlt/Lnamegt ltFnamegtAlexlt/Fnamegt
Formatting Template annotated with columns of
the universal relation
ltelement_node Actorgt ltelement_node Lnamegt
lttext_nodegt ltColumn nameA.lname/gt
lt/text_nodegt
7
Middleware Approach
View Definition
Query-cost Estimates Source Capabilities
Query Generator
Request
SQL Queries
Tagger
8
XPERANTO
  • Commercial system IBM DB2 XML extender,
    SQL/XML
  • Middleware (vendor-independent) XPERANTO
  • Extending SQL with XML constructors
  • select XML-aggregation
  • from R1, . . ., Rn
  • where conditions
  • XML constructors (XML-aggregation) functions
  • Input tables and XML trees (forest)
  • Output XML tree

9
XML publishing with XPERANTO (SQL/XML)
  • Relational schema
  • Extended SQL
  • select XMLAGG( ACT(lname, fname,
  • select XMLAGG ( MOV(title, year)
  • from Appearance Ap, Movie M
  • where Ap.aid A.aid and Ap.mid
    M.mid
  • group order by A.lname, A. fname
    ))
  • from Actor A

ltActorgt ltLnamegtViterellilt/Lnamegt
ltFnamegtJoelt/Fnamegt ltMovie year1999gt . . .
lt/Moviegt ltMovie year2001gt . . .
lt/Moviegt lt/Actorgt ltActorgt ltLnamegtWinterlt/Lnamegt
ltFnamegtAlexlt/Fnamegt
10
XML constructors
  • Actor constructor
  • create function ACT(lname str, fname str,
    mlist XML)
  • ltActorgt
  • ltLnamegt lname lt/Lnamegt
  • ltFnamegt fname lt/Fnamegt
  • mlist
  • lt/Actorgt
  • Movie constructor (mlist)
  • create function Mov(title str, year int)
  • ltMovie yearyeargt title lt/Moviegt
  • Verbose and cumbersome
  • small document tedious
  • large documents unthinkable

11
SilkRoute
  • Annotated template embedding SQL in a fixed XML
    tree
  • Middleware SilkRoute
  • Commercial SQL Server 2005 XSD, IBM DB2 DAD
  • Advantages
  • More modular comparing to the universal
    relation approach
  • Limited schema-driven conforming to a fixed doc
    template

12
Clio
  • Schema mapping based on inter-schema constraints
  • Given two schemas S1 and S2, we want to translate
    instances of S1 to instances of S2.
    Constraint-driven approach
  • Start with a set of referential integrity
    constraints (foreign keys) from S1 to S2 (schema
    matching)
  • Derive schema mapping from S1 to S2, by reasoning
    about the given integrity constraints (chasing
    technique)
  • Pros and cons
  • Generic system, and can be used for XML
    publishing
  • Semi-automated
  • Chasing may not terminates for XML DTD/Schema
  • Recursive target schema (S2)? unclear

13
Getting real data exchange on the Web
  • All members of a community (industry) agree on a
    DTD and then exchange data w.r.t. it
    e-commerce, health-care, ...
  • XML Publishing
  • mapping relational data to XML
  • conforming to the predefined DTD

Web
DTD
XML
XML
Q XML view
DB1
DB2
14
Data exchange insurance company and hospital
  • Daily report
  • Relational database R at the hospital
  • Patient (SSN, name, tname, policy, date)
  • inTreatment (tname, cost)
  • outTreatment (tname, referral)
  • Procedure (tname1, tname2)
  • treatment
  • in hospital composition hierarchy in Procedure
  • outside of the hospital referral

hospital
insurance company
XML view
R
XML
15
Example insurance company and hospital
  • DTD D predefined by the insurance company
  • report ? patient
  • patient ? SSN, pname,
    treatment, policy
  • treatment ? tname, (inTreatment
    outTreatment)
  • inTreatment ? treatment
  • outTreatment ? referral
  • How to define a mapping ? such that for any
    instance DB of R,
  • ? (DB) is an XML document containing all the
    patients and their treatments (hierarchy,
    referral) from DB, and
  • ? (DB) conforms to D?

16
Challenge recursive type
  • XML data unbounded depth -- cannot be decided
    statically
  • treatment ? tname, (inTreatment
    outTreatment)
  • inTreatment ? treatment ---
    recursive

SSN
123
17
Challenge non-determinism
  • The choice of a production (element type
    definition)
  • treatment ? tname, (inTreatment outTreatment)
  • -- depends on the underlining relational data

report
18
Existing systems
  • fixed XML tree template or ignoring
    DTD-conformance
  • middleware SilkRoute (ATT), XPERANTO (IBM),
  • systems SQL Server 2005, IBM DB2 XML extender,
  • incapable of coping with a predefined DTD (e.g.
    recursion)
  • type checking define a view and then check its
    conformance
  • undecidable in general, co-NEXPTIME for extremely
    restricted view definitions
  • no guidance on how to define XML views that
    typecheck
  • one gets an XML view that typechecks only after
    repeated failures and with luck

19
XML Publishing
  • XML publishing overview
  • SilkRoute (ATT)
  • XPERANTO (IBM Research)
  • Clio (IBM Research and U. Toronto)
  • Schema-directed XML Publishing
  • ATG (Bell Labs and U. Edinburgh)
  • PRATA implementation of ATG
  • Extensions of ATG

20
Attribute Translation Grammar (ATG)
  • DTD normalized element type definitions e ? ?
  • ? PCDATA ? e1, , en e1
    en e
  • Attributes e associated with each element type
    e
  • e tuple-valued, to pass data value as well as
    control
  • Rules associated with each e ? ? for e in ?,
    e Q(e)
  • SQL query Q extracts data from DB
  • parent attribute e as a constant parameter in Q

21
Semantics conceptual evaluation
  • Top-down
  • report ? patient
  • patient ? select SSN, name, tname,
    policy
  • from Patient --- SQL
    query
  • recall Patient (SSN, name, tname, policy)
  • Data-driven a patient element for each tuple in
    Patient relation

patient
patient
patient
patient
22
Inherited attributes
  • Inherited child is computed using parent
  • patient ? SSN, name, treatment, policy
  • SSN patient.SSN, name
    patient.name
  • treatment patient.tname policy
    patient.policy
  • recall patient (SSN, name, tname, policy)
  • SSN ? PCDATA
  • PCDATA SSN

report
...
patient
patient
patient
patient
patient
treatment
name
policy
SSN
SSN
123
Joe
LU23
PCDATA
23
Coping with non-determinism
  • treatment ? tname, (inTreatment
    outTreatment)
  • tname treatment
  • (inTreatment, outTreatment)
  • case Qc(treatment).tag
    --- conditional query
  • 1 (treatment, null)
  • else (null, treatment)
  • Qc select 1 as tag from inTreatment where
    tname treatment
  • conditional query the choice of production
  • parent as constant parameter in SQL query

...
treatment
treatment
tname
tname
inTreatment
outTreatment
24
Coping with recursion
  • inTreatment ? treatment
  • treatment ? select tname2
  • from Procedure
  • where inTreatment tname1
  • recall Procedure (tname1, tname2)
  • parent as constant parameter in SQL query Q
  • inTreatment is further expanded as long as Q(DB)
    is nonempty

treatment
treatment
25
DTD-directed publishing with ATGs
  • DTD-directed the XML tree is constructed
    strictly following the productions of a DTD ---
    DTD conformance
  • Data-driven the choice of productions and
    expansion of the XML tree (recursion) depends on
    relational data

report
...
patient
patient
patient
patient
treatment
name
SSN
inTreatment
...
tname
Joe
123
treatment
treatment
...
...
26
ATGs vs. existing systems
  • DTD-conformance
  • ATGs provide guidance for how to define
    DTD-directed publishing
  • Other systems based on a fixed tree template
  • Expressive power strictly more expressive than
    others
  • ATGs capable of expressing XML views supported
    by other systems
  • Other systems cannot handle recursion/nondetermin
    ism

27
ATGs vs. Attribute Grammars (AGs)
  • AGs
  • Definition w.r.t. a CFG
  • Evaluation parse a string with the CFG, then
    evaluate attributes given the parse tree
  • ATGs combining DTD and database operations
  • Definition w.r.t. an ECFG (DTD) and SQL queries
  • Evaluation given DB, extract relevant data from
    DB with SQL queries to build an XML tree of the
    DTD
  • It does not make sense to parse a database
    w.r.t. a DTD
  • ATGs are not a mild variation of AGs

28
XML Publishing
  • XML publishing overview
  • SilkRoute (ATT)
  • XPERANTO (IBM Research)
  • Clio (IBM Research and U. Toronto)
  • Schema-directed XML Publishing
  • ATG (Bell Labs and U. Edinburgh)
  • PRATA implementation of ATG
  • Possible extensions of ATG

29
PRATA middleware based on ATGs
query plan generation, evaluation
relations
ATG graph
XML
ATG
tagging
parsing
cost estimate query
statistics query results
  • R
  • ATG graph representing the ATG
  • relations representing root-leaf paths ---gt
  • tagging one pass

30
Evaluation of ATGs
  • Conceptual evaluation defining semantics, but
    not efficient
  • Techniques proved useful
  • partitioning reduce db visits, make use of DBMS
    optimizer (handling annotated templates
    complete tree)
  • Relations representing XML trees root-leaf paths
  • Separate tagging

31
ATG graph
  • edge labels SQL queries
  • cyclic introduced by recursion -- not in
    other systems
  • iterative unfolding partial ATG tree to a
    depth d

report
Q1

patient
Q5
Q2
policy
Q3
Q4
Q8
name
SSN
treatment
Q7
Qc
tname
outTreatment
inTreatment
Q9
Q6
referral
procedure

32
Partitioning partial ATG trees
  • query composition (outer join)
  • challenge how to partition into clusters?
  • finer ? fewer nulls, smaller queries sent to
    DBMS
  • coarser ? leverage DBMS optimizer, less DB visits
  • Finding an optimal strategy NP-hard

report
composed Q
Q1

patient
Q2
Q5
Q3
SSN
policy
Q4
name
composed Q
treatment
Qc
Q7
outTreatment
tname
inTreatment
Q9
Q6
procedure
referral
...

Q8
treatment
33
Materialization of intermediate results
  • Partitioning root-leaf relation R ?
    intermediate R1, R2, R3, R4
  • repeated computation each Ri carrying a key from
    the root
  • materialization storing keys in temporary
    tables
  • cons overhead with temporary tables
  • pros reducing recomputation in descendant
    queries
  • What keys to materialize? Not supported by other
    systems

output relation R
R1
sort merge
key
R2
R3
sort merge
key
R4
34
Query plan generation
  • New challenge find best partitioning and
    materialization
  • mutually dependent partitioning and
    materialization must be taken together
  • expensive exponentially many choices NP-hard
  • Efficient cost-based heuristic
  • DBMS statistics arity/cardinality of a table,
    cost to evaluate a query/create temporary table
  • cost estimate simple formulae for computing the
    benefit of materialization/partitioning --
    effective in practice
  • optimal strategy -- dynamic programming
    decides partitioning w.r.t. the benefit of
    materialization

35
Putting these together
  • ATGs
  • ensure DTD-conformance automatically
  • support recursion/non-determinism naturally
  • simple no need for a new language knowing DTD
    SQL is all that one needs for writing an ATG
  • PRATA middleware based on ATGs
  • novel technique of combining partitioning and
    materialization
  • a practical solution for DTD-directed publishing
  • ATG the first systematic way for DTD-directed
    publishing

36
XML Publishing
  • XML publishing overview
  • SilkRoute (ATT)
  • XPERANTO (IBM Research)
  • Clio (IBM Research and U. Toronto)
  • Schema-directed XML Publishing
  • ATG (Bell Labs and U. Edinburgh)
  • PRATA implementation of ATG
  • Extensions of ATG

37
Extension Capturing integrity constraints
  • XML schema
  • Type (DTD)
  • Integrity constraints keys, foreign keys
  • Schema-directed XML publishing automatically
    guarantee that the target document both conforms
    to the type and satisfies the constraints -- in
    a single framework
  • Challenge consistency analyses undecidable

DTD
XML ATG
DB
38
Data integration in XML
  • multiple, heterogeneous data sources
    multi-source queries, query decomposition, object
    fusion,
  • distributed sources scheduling of query
    execution
  • schema-conformance,

query
answer
schema
(D, ?)
XML
query translation
integration
Integration middleware
updates
DB4
DB1
DB2
DB3
39
Example hospital and insurance company
  • patient (SSN, name, policy)
  • visitInfo (SSN, trId, date)

billing (trId, price)
cover (policy, trId)
daily XML report
treatment (trId, name) procedure (trId1, trId2)
  • Given a date, for each patient of that day,
    report
  • SSN, name (DB1)
  • treatments (hierarchy) covered by insurance
    (DB1, DB3, DB4)
  • cost of all and only those treatments received
    (DB2)

40
Predefined schema (D, ?) DTD
  • report ? patient
  • patient ? SSN, name,
    treatments, bill
  • treatments ? treatment
  • treatment ? trId, tname,
    treatments
  • bill ? item
  • item ? trId, price

report
date
patient
patient
patient
treatments
bill
name
SSN
item
treatment
item
treatment
. . .
price
trId
trId
tname
treatments
41
Predefined schema (D, ?) XML constraints ?
  • constraints relative to each patient,
  • key each treatment is charged only once
  • patient ( item.trId ? item )
  • foreign key every treatment has a billing record
  • patient ( treatment.trId ? item.trId )

report
date
patient
patient
patient
name
treatments
SSN
bill
treatment
item
item
treatment
. . .
price
trId
treatments
42
More on XML integrity constraints
  • absolute on the entire document
  • key country.name ? country
  • relative on a subdocument rooted at a country
  • key country (province.name ?
    province)
  • foreign key country (capital.inProvince ?
    province.name)
  • (value inclusion)

43
Challenge context dependency
  • bill subtree all and only the trIds in the
    treatments subtree
  • controlled derivation the bill subtree cannot be
    started before the treatments subtree is
    completed.
  • information passing downward, upward, sideways

date
report
patient
date
SSN
treatments
name
SSN
bill
trIdS
trId
trId
unbounded
44
Challenges
  • DTD-conformance recursive, nondeterministic
  • integrity constraints validation during
    document generation
  • multi-source queries a single one involves
    several databases
  • context-dependency not strictly top-down or
    bottom-up
  • Previous work
  • XML publishing single data source, no
    constraints, top-down.
  • XML integration little support for XML
    schema-conformance.
  • XML query languages type checking provide no
    guidance for how to ensure schema-conformance
    optimization hard.

Schema-directed XML integration is nontrivial!
45
Middleware for schema-directed integration

view definition language
optimization techniques
  • A lightweight language Attribute Integration
    Grammar (AIG)
  • Cost-based optimization in light of
    context-dependency

DTD constraints
semantic attributes

semantic rules
46
Attribute Integration Grammar (AIG)
  • DTD element type definitions e ? ?
  • ? PCDATA ? e1, , en
    e1 en e
  • Attributes associated with each element type e
  • Inh(e) inherited from parent/siblings
    (top-down/sideways)
  • Syn(e) synthesized from children (bottom-up)
  • Syn(e), Inh(e) tuple or set/bag-valued
  • Rules associated with each production e ? ?, for
    e in ?
  • Inh(child) Inh(e) Q(Inh(parent),
    Syn(sibling) )
  • Syn(parent) Syn(e) U (Syn(children))
    -- union
  • Q multi-source SQL query with parameters
  • Dependency e2 must be evaluated before e1 if
  • Inh(e1) Q( Syn(e2) ) -- acyclic graph
    (DAG)

47
AIG semantics conceptual evaluation
  • following the dependency ordering starting from
    the root
  • report ? patient
  • Inh(patient) ? Q1 (Inh(report))
  • select Inh(report) as date, p.SSN,
    p.name, p.policy
  • from DB1 patient p, DB1
    visitInfo v
  • where p.SSN v.SSN and v.date
    Inh(report)
  • Recall DB1 patient (SSN, name, policy),
    visitInfo (SSN, trId, date)
  • Parameter in a query Inh(report) as a constant
  • Data driven the number of patients depends on Q1

report
Inh
patient
patient
patient
48
Multi-source query
  • patient ? SSN, name, treatments, bill
  • Inh(SSN) Inh(patient).SSN, . . . ,
  • Inh(treatments) Q2(v) -- v
    Inh(patient)
  • select t.trId, t.tname
  • from DB1 visitInfo i, DB3 cover
    c, DB4 treatment t
  • where i.SSN v.SSN and i.date
    v.date and t.trId i.trId
  • and c.trId i.trId and c.policy
    v.policy
  • a single query uses DB1, DB3 and DB4
  • tuple- and set-valued attributes (Inh(SSN),
    Inh(treatments))
  • Recall DB1 patient(SSN, name, policy),
    visitInfo(SSN, trId, date)
  • DB3 cover (policy, trId)
  • DB4 treatment (trId, name),
    procedure (trId1, trId2)

49
Initial top-down pass context-dependent
  • patient ? SSN, name, treatments, bill
  • Inh(SSN) Inh(patient).SSN, Inh(name)
    Inh(patient).name,
  • Inh(treatments) Q2(Inh(patient))
  • Inh(bill) Syn(treatments) lt--
    halt
  • DTD-directed generate children following the
    production
  • Inh(bill) defined with sibling --
    Syn(treatments), dependency ordering evaluate
    bill after treatments

lt- - halt
50
Initial top-down pass recursion
  • treatments ? treatment
  • Inh(treatment) ? Inh(treatments) -- set of
    (trId, tname)
  • Data driven treatments expansion depends on
    Inh(treatments)
  • empty expansion terminates, Syn(treatments) is
    empty
  • nonempty expands.

51
Leaf step
  • treatment ? trId, tname, treatments
  • Inh(trId) Inh(treatment).trId, . . .,
  • trId ? PCDATA

Syn(trId) Inh(trId)
treatments
Inh (trId, tname)
treatment
treatment
treatments
tname
trId
Inh
52
Bottom-up step synthesize attributes
  • treatment ? trId, tname, treatments
  • treatments ? treatment
  • Syn(treatments) U Syn(treatment)
  • Processing of an element e Inh(e) ? subtree(e) ?
    Syn(e)

Syn(treatment) Syn(trId) ? Syn(treatments)
53
Sideways step controlled derivation
  • patient ? SSN, name, treatments, bill
  • Inh(bill) Syn(treatments)
  • bill ? item
  • Inh(item) ? Q(Inh(bill) )
  • select trId, price
  • from DB2 billing
  • where trId in Inh(bill) -- set
    membership test
  • DTD-directed each step of construction follows a
    production

Recall DB2 billing (trId, price)
54
Constraint compilation
  • Captured with rules on synthesized attributes of
    patient
  • trIdB bag-valued, collecting trIds under item
  • trIdS1 set-valued, collecting trIds under
    treatment
  • trIdS2 set-valued, collecting trIds under item
  • key patient ( item.trId ? item )
  • unique (Syn(patient).trIdB) -- no
    duplicates in the bag
  • foreign key patient ( treatment.trId ?
    item.trId )
  • subset ( Syn(patient).trIdS1,
    Syn(patient).trIdS2)
  • compilation semantic rules and attributes for
    constraints are automatically generated and
    evaluated

55
Advantages of AIG
  • DTD-directed view definition automatically
    ensures conformance to DTD -- recursive,
    nondeterministic
  • Constraint compilation automatically captures
    integrity constraints in a uniform framework
  • performance avoid post-materialization checking
  • optimization jointly with query evaluation
  • exception handling actions when constraints are
    violated
  • Controlled derivation supports context-dependent
    generation.
  • Information passing top-down, bottom-up,
    sideways
  • Multi-source queries optimizer-based
    decomposition
  • One sweep each node is visited at most twice
    evaluates its inherited attribute, subtree, then
    its synthesized attribute

56
Middleware evaluation of AIGs
optimizer
AIG
XML
merging
query plan execution
pre-processing
tagging
scheduling
data
query
DB3
cost statistics
DB2
  • pre-processing
  • constraint compilation
  • multi-source query decomposition ? single-source
    queries
  • optimizer query plan generation using cost
    statistics
  • execution SQL queries ? data sources, results
    ? mediator
  • tagging relational tables (paths from root) ?
    XML view via merge-sorting
  • DB1

57
Optimization
  • Goal reduce response time
  • costs query execution, data transfer, storage
    (caching)
  • constraints query dependency graph (DAG)
  • nodes queries computing inherited/synthesized
    attributes
  • edges dependency relation (producer-consumer
    relation)
  • recursion iterative unfolding by a certain depth
  • Optimization techniques
  • query merging
  • query scheduling

58
Query scheduling
  • Goal reduce the total response time by
    increasing parallelism
  • Ordering execution of queries on the same site
  • e.g., ltQ4, Q5, Q3gt vs. ltQ3, Q5, Q4gt on DB2
  • Constraints
  • costs execution, communication, caching
    overheads
  • dependency relation
  • Finding an optimal schedule NP-hard

59
Query merging
  • Goal reduce DB visits, leverage DB optimizer
  • Composition of queries on the same site
    outer-join/union
  • Tradeoffs
  • large result tables with null
    communication/caching cost
  • impact on scheduling -- changing query dependency
    graph

Finding an optimal strategy for merging and
scheduling NP-hard
60
Cost-based heuristic scheduling
  • cost estimate given a fixed schedule, estimate
    the completion time of a query Q, comp_time(Q)
  • statistics eval_cost(Q), size(Q), trans_cost(S,
    S, size)
  • dependency Q cant start before comp_time(Q) if
    Q -gt Q
  • scheduling given a fixed query dependency graph,
    a heuristic based on dynamic programming to
  • find the most costly trailing path of each
    query
  • sort queries to favor critical paths

61
Cost-based heuristic merging
  • greedy algorithm repeat until no further
    improvement
  • merge each pair of queries on the same source
  • modify the query dependency graph accordingly
  • invoke scheduling w.r.t. the modified dependency
  • estimate the cost
  • pick the pair with the biggest improvement to
    merge

Interaction between query scheduling and merging
62
AIG Summary
  • AIG a novel specification language
  • ensures DTD-conformance (recursion/nondeterminism)
  • captures integrity constraints in a uniform
    framework
  • supports complex transformations controlled
    derivation (context-dependency), multi-source
    queries, . . .
  • Optimization techniques nontrivial optimization
    problems
  • constraint compilation, multi-source query
    decomposition
  • query scheduling w.r.t. query dependency graph
  • query merging and its interaction with scheduling

63
More on Commercial System MS SQL Server 2005
  • Annotated schema (XSD) fixed tree templates
  • nonrecursive schema
  • associate elements and attributes with table and
    column names
  • Given a relational database, XSD populates an XML
    elements/attributes with corresponding
    tuples/columns
  • FOR-XML
  • An extension of SQL with an FOR-XML construct
  • Nested FOR-XML to construct XML documents
  • Summary
  • incapable of supporting schema-directed
    publishing
  • cant define recursive XML views (bounded
    recursion depth)

64
Commercial System IBM DB2 XML Extender
  • User-defined mapping through DAD (Document Access
    Definition) a fixed XML tree template
    (nonrecusive)
  • SQL mapping a single SQL query, constructing XML
    trees of depth bounded by the arity of the tuples
    returned and group-by
  • RDB node mapping a fixed tree template with
    nodes annotated with conjunctive queries
  • SQL/XML an extension of SQL with XML
    constructors (XMLAGG, XMLELEMENT, etc) as
    discussed earlier
  • Summary
  • incapable of supporting schema-directed
    publishing
  • cant define recursive XML views

65
Commercial System Oracle 10g XML DB
  • SQL/XML
  • DBMS_XMLGEN, a PL/SQL package
  • Supports recursive XML view definition (via
    linear recursion of SQL99)
  • does not support schema-directed XML publishing

66
Summary and review
  • Why publish XML data? What is the major
    difficulty?
  • One can publish relational data via an XQuery
    view. How to do it? What are the pros and cons of
    this approach?
  • What is schema-directed publishing? Why do we
    need it?
  • Why does ATG automatically ensure
    DTD-conformance?
  • How to ensure both DTD conformance and constraint
    satisfaction?
  • Homework
  • Consider projects related to ATG
  • How to answer queries over XML views? Read papers
    on query composition (SilkRoute, XPERANTO).
Write a Comment
User Comments (0)
About PowerShow.com