A Semantic Approach to Rewriting Queries for Integrated XML Data - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

A Semantic Approach to Rewriting Queries for Integrated XML Data

Description:

A Semantic Approach to Rewriting Queries for Integrated XML Data – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 42
Provided by: nus64
Category:

less

Transcript and Presenter's Notes

Title: A Semantic Approach to Rewriting Queries for Integrated XML Data


1
A Semantic Approach to Rewriting Queries for
Integrated XML Data
  • Xia Yang1, Mong Li Lee1, Tok Wang Ling1, Gillian
    Dobbie2
  • 1School of Computing, National University of
    Singapore
  • 2Department of Computer Science, The University
    of Auckland, New Zealand

2
The Problem
Return the title of the books in the integrated
database where the authors name is Tom and the
publication year is 2000
S1
S2
S12 Local schema S1, S2,
Integrated schema S12 .
3
Outline
  • Background
  • Preliminary Definitions
  • Query Rewriting
  • Comparison with Related Work
  • Conclusion

4
Background
  • In data integration, many systems construct a
    global or mediated schema from numerous
    heterogeneous data sources.
  • The local sources are XML repositories.
  • When XML repositories are involved in data
    integration, query rewriting algorithms will need
    to take into consideration the hierarchical
    structures of XML schemas.
  • XML schema languages such as DTD and XML Schema
    lack the necessary semantic information.
  • Our approach utilizes the ORA-SS model which
    provides the semantic information needed for the
    query rewriting process.

5
Preliminary Definitions
  • XML Schema Model
  • ORA-SS schema diagrams
  • XML Query Language
  • XQuery
  • Integrating ORA-SS Schemas
  • Mapping table
  • Query allocation table

6
ORA-SS model
  • The ORA-SS model (Object-Relationship-Attribute
    model for SemiStructured data)1 is a
    semantically rich data model that has been
    designed for semistructured data.
  • The ORA-SS model distinguishes between object
    classes, relationship types and attributes.
  • The main contributions are that
  • relationship type is expressed explicitly
  • the degree of the relationship type expresses the
    number of object classes involved in the
    relationship type
  • the attributes are classified as attributes of
    object class and attributes of relationship type.
  • The semantically rich ORA-SS model makes it
    possible to get the correct result in query
    rewriting for integrated XML data.
  • 1. Tok Wang Ling, Mong Li Lee, and Gillian
    Dobbie. Semistructured Database Design. Springer,
    2005

7
ORA-SS model example
  • An example of ORA-SS schema diagram

8
Integrating ORA-SS Schemas
  • Integrating ORA-SS Schemas
  • The algorithm to correctly integrate ORA-SS
    schemas was presented in a previous ER
    conference 2.

S1
S2
S12 Local schema S1, S2,
Integrated schema S12 2. X. Yang, M.L. Lee, T.W.
Ling. Resolving Structural Conflicts in the
Integration of XML Schemas A Semantic Approach.
ER 2003.
9
Mapping table
  • A mapping table is created during the process of
    integrating the various local schemas S1, S2, ,
    Sn into an integrated schema S.
  • The mapping table contains the mappings between
    the object classes and attributes in the local
    source schemas and the object classes and
    attributes in the integrated schema.

10
Mapping table example
S1 S2
S3 S4
  • S5 Integrated Schema S12345

11
Mapping table example
12
Query allocation table
  • The QAT is derived from the mapping table.
  • A QAT is created for each query.
  • The query allocation table (QAT) stores the
    selection condition paths, the return result
    paths, and the local schemas where the data for
    these paths can be found.
  • The QAT has two parts. The first part contains
    the selection conditions, and the second part
    contains the return result paths.
  • The QAT is automatically generated by our
    algorithm.

13
Query allocation table example
S1 S2
S3 S4
S5 S12345 S12345 is the
integrated schema of local schemas S1, S2, S3,
S4, S5. Query Q3 for b in /book where
b/authorTom and b/year2000
return ltresultgt b/year/text()
b/title/text() lt/resultgt
14
Query allocation table example
Query Allocation Table for Query Q3 Selection
Condition Table
Return Result Table
15
Query Rewriting
  • Step1 Build the query allocation table.
  • Step 2 Group local schemas to form join groups
    that answer the users query.
  • Step 3 Decompose the user query to subqueries on
    the local sources.
  • Step 4 Compose the subqueries from local schemas
    in a join group.

16
Step1 Build the query allocation table.
  • A query allocation table (QAT) consists of a
    selection condition table and a return result
    table.
  • The path of each selection condition and the
    return result is inserted into the selection
    condition table and the return result table
    respectively. The associated schemas identified
    from the mapping table are inserted into the
    corresponding rows.

17
Step 2 Group local schemas to form join groups
that answer the users query.
  • The algorithm finds which local schemas must be
    combined to get the expected results.
  • This is done by scanning the QAT to find the join
    groups.
  • The local schemas in each join group must contain
    all the paths required for the selection
    condition but may be missing some of the paths
    for the return result. This is possible because
    the result of a query can be incomplete.

18
Step 2 Example demonstrating finding the sources
from QAT that answer users query
  • S1 S2
    S3 S4
    S5 S12345
  • S12345 is the integrated schema of local schemas
    S1, S2, S3, S4, S5.
  • Query Q3 for b in /book where b/authorTom
    and b/year2000
  • return ltresultgt b/year/text()
    b/title/text() lt/resultgt

19
Step 2 Example demonstrating finding the sources
from QAT that answer users query
Query Allocation Table for Query Q3 Selection
Condition Table
Return Result Table
  • Output join group S1, S3 S1, S4 S2, S4
    S3
  • Note S2, S4 S3 are join groups, even though
    they cannot offer full results of the query. We
    allow the return results with missing information.

20
Step 3 Decompose the user query to subqueries on
the local sources
  • This step decomposes the user query into queries
    on the local schema based on the join groups.
  • Since subqueries are later composed to compute
    the answers in the same join group, the subquery
    must return the data asked for in the user query
    and also the data necessary to join the parts of
    the answers from different local schemas
    together. We call the classes necessary for
    joining the parts of answers, join object classes.

21
Step 3 Decompose the user query to subqueries on
the local sources
  • The key of the join object class is used for
    testing the equivalence when joining the
    subqueries.
  • Finding join object classes
  • Case 1 For a join group, if there are n paths in
    the QAT from different local schemas with a
    common ancestor in the user query, then the least
    common ancestor in the user query is a join
    object class.
  • There are two other cases that we will not
    consider in this talk.

22
Step 3 Example demonstrating decomposing the
user query to subqueries on the local sources
(case1)
S1 S2
S3 S4
S5 S12345 S12345 is the
integrated schema of local schemas S1, S2, S3,
S4, S5. Query Q3 for b in /book where
b/authorTom and b/year2000
return ltresultgt b/year/text()
b/title/text() lt/resultgt
23
Step 3 Example demonstrating decomposing the
user query to subqueries on the local sources
(case1)
  • Recall the join group S1, S3, S1 provides
    /book/title, /book/author and S3 provides
    /book/year, /book/author.
  • To answer the query Q3, the subqueries from S1
    and S3 need to be composed using the key of their
    least common ancestor i.e. the key isbn of the
    join object class book.

24
Step 4 Compose the subqueries from local schemas
in a join group
  • When joining subqueries on local schemas in the
    same join group, the identifier of the join
    object classes must be tested for equivalence.
  • We start by considering the basic case where the
    same object attributes are from different local
    schemas. To compose subqueries from these local
    schemas in join groups, the for, where, and
    return clause are combined together with the join
    condition equivalence test inserted in the where
    clause.
  • We allow the return results to have missing
    information. The parent object will not be
    removed from the return result, if it has a
    missing child. For each return object or
    attribute, the join equivalence condition test
    related to this return object or attribute is
    nested in the appropriate part of the query.

25
Step 4 Example of composing the subqueries from
local schemas in a join group
S1 S2
S3 S4
S5 S12345 S12345 is the
integrated schema of local schemas S1, S2, S3,
S4, S5. Query Q9 for b in /book where
b/authorTom and b/year2000
returnltresultgtb/year/text() b/title/text()
for p in b/publisher
where contains (b/publisher/location/text(),S
ingapore) returnltpublishergt
b/publisher/name lt/publishergt
lt/resultgt
26
Step 4 Example of composing the subqueries from
local schemas in a join group
  • The join groups are S1, S3, S5, S1, S4, S5,
    S2, S4, S5 and S3, S5.
  • We show the query example for the join group S1,
    S3, S5. The user query is decomposed into
    subqueries on the local schemas S1, S3, and S5.
    The join object class is book for these local
    schemas.

27
Step 4 Example of composing the subqueries from
local schemas in a join group
  • The subqueries on S1, S3 and S5 are shown below
  • Query Q9_S1
  • for b in /book
  • where b/authorTom
  • return ltresultgt b/isbn/text()
    b/title/text() lt/resultgt
  • Query Q9_S3
  • for b in /book
  • where b/authorTom and
    b/year2000
  • return ltresultgt b/isbn/text()
    b/year/text() lt/resultgt
  • Query Q9_S5
  • for b in /book
  • where contains (b/publisher/address/te
    xt(),Singapore)
  • returnltresultgtb/isbn/text()
  • ltpublishergtb/publisher/name
    lt/publishergtlt/resultgt

28
Step 4 Example of composing the subqueries from
local schemas in a join group
  • The composition of the subqueries for local
    schemas S1, S3 and S5 are as follows
  • for b1 in doc(S1.xml)/book, b3 in
    doc(S3.xml)/book
  • where b1/authorTom and b3/authorTom and
    b3/year2000
  • and b1/isbnb3/isbn
  • return ltresultgtb3/year/text()
    b1/title/text()
  • for b5 in doc(S5.xml)/book
  • where contains (b5/publisher/address/text(),S
    ingapore) and b5/isbnb1/isbn
  • returnltpublishergt b5/publisher/nameltpublishe
    rgtlt/resultgt
  • Note that although the join object class for S1,
    S3 and S5 is book, the equivalence tests are on
    separate lines in the rewritten query. This is
    because the parent information should be returned
    even when a child object class is missing.

29
Comparison with Related Work
  • Amman et al. 3 propose a mediator architecture
    for querying and integrating XML data sources.
  • Lakshmanan and Sadri 4 propose an
    infrastructure for interoperability among XML
    data sources.
  • Yu and Popa 5 introduce an algorithm for
    answering queries via a target schema.
  • 3. B. Amann, C. Beeri, I. Fundulaki, M. Scholl.
    Querying XML sources using an Ontology-based
    Mediator. CoopIS, 2002.
  • 4. L.V.S. Lakshmanan, F. Sadri. Interoperability
    on XML Data. ICSW, 2003.
  • 5. C. Yu, L. Popa. Constraint-Based XML Query
    Rewriting for Data Integration. SIGMOD 2004.

30
Comparison with Related Work
  • The models that these authors use cannot
    represent whether a relationship type is binary
    or n-ary and do not distinguish between
    attributes of object classes and attributes of
    relationship types from the local XML sources.
    The lack of such semantic information can lead to
    the retrieval of wrong results.

31
Comparison with Related Work

  • S1 S2
    S12
  • S12 is the integrated schema of S1, S2
  • Query Q1 for j in /project
  • return ltprojectgt j/jno
  • for p in j/part
  • return ltpartgtp/pno
  • for s in
    p/supplier return s lt/partgt
  • lt/projectgt

32
Comparison with Related Work (example)
33
Comparison with Related Work (example)
34
Comparison with Related Work
  • The results returned by the query rewriting
    method in related work contain the project with
    jno j01 has part p01, which is supplied by
    suppliers with sno s01 and s02. This violates
    the local data sources X1 and X2, where the
    project with jno j01 has part p01 is only
    supplied by suppliers with sno s01.

35
Comparison with Related Work
  • This is because the methods in the related work
    treat the relationship type between part and
    supplier as a binary relationship type, instead
    of the intended ternary relationship type
    involving project, part, and supplier. They treat
    the quantity as the attribute of part in S2, so
    when they find the part with pno p01 has
    quantity 100 in X1, and has quantity 200 in
    X2, they will combine them to make the final
    result. This leads to the wrong answer returned.
  • In contrast, our algorithm takes the XML
    hierarchy structure into consideration and
    retrieves the expected answer.

36
Conclusion
  • We develop an algorithm for rewriting queries,
    using the ORA-SS schema diagram that takes the
    semantic relationship between the source schemas
    and the integrated schema into account.
  • This allows us to distinguish between binary and
    n-ary relationship types, attributes of object
    classes and attributes of relationship types, and
    in turn treat these cases differently in our
    rewriting algorithm.

37
  • Questions?

38
Step 2 Algorithm that finds the sources from QAT
that answer users query
39
Step 3 Decompose the user query to subqueries on
the local sources
  • Finding join object classes
  • Case 2 For a join group, if the paths in the QAT
    are from different local schemas, and there is an
    object class that is the end of one path and the
    start of the other path, then this common object
    class is a join object class.
  • Case 3 For a join group, if two attributes of
    the same relationship type in a user query are
    from different local schemas, then all the object
    classes involved in this relationship type are
    join object classes.

40
Step 3 Decompose the user query to subqueries on
the local sources
  • We describe how to rewrite a user query for a
    local schema where the subquery on the local
    schema returns only one object class or
    attribute.
  • Then we describe how to rewrite a user query for
    a local schema where the subquery on the local
    schema returns many object classes or attributes.
  • (For details, please refer to the
    paper.)

41
References
  • 1. B. Amann, C. Beeri, I. Fundulaki, M. Scholl.
    Querying XML sources using an Ontology-based
    Mediator. CoopIS, 2002.
  • 2. P. Buneman, S. Davidson, W. Fan, C. Hara, W.C.
    Tan. Keys for XML. WWW Conference, 2001.
  • 3. Y. Chen, T.W. Ling, M.L. Lee. Automatic
    Generation of XQuery View Definitions from ORA-SS
    Views. ER, 2003.
  • 4. S. Cluet, P. Veltri, D. Vodislav. Views in a
    Large Scale XML Repository. VLDB, 2001.
  • 5. O.M. Duschka, M.R. Genesereth. Answering
    Recursive Queries Using Views. ACM PODS, 1997.
  • 6. L. Haas, D. Kossmann, E. Wimmers, J. Yang.
    Optimizing queries across diverse data sources.
    VLDB, 1997.
  • 7. A. Halevy. Theory of Answering Queries Using
    Views. ACM SIGMOD Record 29(4), 2000.
  • 8. L.V.S. Lakshmanan, F. Sadri. Interoperability
    on XML Data. ICSW, 2003.
  • 9. M.L. Lee, T.W. Ling, W.L. Low. Designing
    Functional Dependencies for XML. EDBT, 2002.
  • 10. A. Levy. Logic-Based Techniques in Data
    Integration. Logic based artificial intelligence,
    1999.
  • 11. T.W. Ling, M.L. Lee, G. Dobbie.
    Semistructured Database Design, ISBN
    0-387-23567-1, Springer, 2005.
  • 12. I. Manolescu, D. Florescu, D. Kossman.
    Answering XML queries over heterogeneous data
    sources. VLDB, 2001.
  • 13. H. Garcia-Molina, Y. Papakonstantinou, D.
    Quass, et al. The TSIMMIS project Integration of
    heterogeneous information sources. Journal of
    Intelligent Information Systems, 1997.
  • 14. K. Passi, E. Chaudhry. A Global-to-Local
    Rewriting Querying Mechanism using Semantic
    Mapping for XML Schema Integration. ODBASE 2003.
  • 15. K. Passi, L. Lane, S.Madria, Bipin C.
    Sakamuri, M. Mohania, S. Bhowmick. A Model for
    XML Schema Integration. EC-Web, 2002.
  • 16. R. Pottinger, A. Levy. A Scalable Algorithm
    for Answering Queries Using Views. VLDB, 2000.
  • 17. M. Stonebraker. Implementation of integrity
    constraints and views by query modification. ACM
    SIGMOD, 1975.
  • 18. Xyleme. A dynamic warehouse for XML Data of
    the Web. IEEE Data Engineering Bulletin, 2001.
  • 19. H.Z. Yang, P.A. Larson. Query Transformation
    for PSJ-queries. VLDB, 1987.
Write a Comment
User Comments (0)
About PowerShow.com