AutoMed:%20Automatic%20generation%20of%20Mediator%20tools%20for%20heterogeneous%20data%20integration - PowerPoint PPT Presentation

About This Presentation
Title:

AutoMed:%20Automatic%20generation%20of%20Mediator%20tools%20for%20heterogeneous%20data%20integration

Description:

a new notion of schema equivalence ... These pathways can be used to automatically translate data and queries between ... One possible direction of further work ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 35
Provided by: Poulova
Category:

less

Transcript and Presenter's Notes

Title: AutoMed:%20Automatic%20generation%20of%20Mediator%20tools%20for%20heterogeneous%20data%20integration


1
AutoMed Automatic generation of Mediator tools
for heterogeneous data integration
  • Alex Poulovassilis
  • School of Computer Science and Information
    Systems, Birkbeck
  • AutoMed is a joint project with Peter McBrien
    (Imperial College),
  • funded under the 2nd DIM call by EPSRC grants
    GR/N38107 and GR/N35915

2
Integrated Schema
Schema
Schema
Schema
3
Background
  • In earlier work (ER97, IS98, DKE98) we
    developed a new framework to support
    transformation and integration of heterogeneous
    database schemas.
  • Our framework consisted of
  • a new notion of schema equivalence
  • a set of primitive schema transformations which
    can be composed to define unconditional or
    conditional equivalences between schemas

4
Background
  • In our data integration approach, we represent
    the modelling constructs of higher-level data
    models (e.g. relational, object-oriented,
    semi-structured, XML, RDF) in terms of a
    low-level hypergraph data model HDM whose
    constructs are nodes, edges and constraints
  • The HDM common data model provides a unifying
    semantics for such higher-level modelling
    constructs
  • It avoids the semantic mismatches that may occur
    between constructs of higher-level modelling
    languages

5
Background
  • Our approach allows constructs from different
    modelling languages to be mixed within the same
    intermediate schema during the schema
    transformation/integration process (CAiSE99)
  • Our schema transformations are automatically
    reversible, setting up a two-way transformation
    pathway between pairs of schema

6
(No Transcript)
7
(No Transcript)
8
  • addClass Series p(p,S)?category
  • addClass Doc p(p,D)?category
  • addClass Film p(p,F)?category
  • addClass Prog p(p,c)?category

9
  • addSubClass Film Prog
  • addSubClass Doc Prog
  • addSubClass Series Prog
  • addClass Series p(p,S)?category
  • addClass Doc p(p,D)?category
  • addClass Film p(p,F)?category
  • addClass Prog p(p,c)?category

10
  • addSubClass Film Prog
  • addSubClass Doc Prog
  • addSubClass Series Prog
  • addClass Series p(p,S)?category
  • addClass Doc p(p,D)?category
  • addClass Film p(p,F)?category
  • addClass Prog p(p,c)?category
  • delRel category (p,F)p?Film U

  • (p,D)p?Doc U

  • (p,S)p?Series

11
  • delSubClass Film Prog
  • delSubClass Doc Prog
  • delSubClass Series Prog
  • delClass Series p(p,S)?category
  • delClass Doc p(p,D)?category
  • delClass Film p(p,F)?category
  • delClass Prog p(p,c)?category
  • addRel category (p,F)p?Film U

  • (p,D)p?Doc U

  • (p,S)p?Series

12
  • addConstraint subset Film Prog
  • addConstraint subset Doc Prog
  • addConstraint subset Series Prog
  • addNode Series p(p,S)?category
  • addNode Doc p(p,D)?category
  • addNode Film p(p,F)?category
  • addNode Prog p(p,c)?category
  • delEdge category (p,F)p?Film U

  • (p,D)p?Doc U

  • (p,S)p?Series
  • delNode Programme Prog
  • delNode Category F,D,S

13
  • delConstraint subset Film Prog
  • delConstraint subset Doc Prog
  • delConstraint subset Series Prog
  • delNode Series p(p,S)?category
  • delNode Doc p(p,D)?category
  • delNode Film p(p,F)?category
  • delNode Prog p(p,c)?category
  • addEdge category (p,F)p?Film U

  • (p,D)p?Doc U

  • (p,S)p?Series
  • addNode Programme Prog
  • addNode Category F,D,S

14
Background
  • These pathways can be used to automatically
    translate data and queries between pairs of
    schemas (ER99)
  • From a pathway TS gt S we
  • compose the queries in the add steps to derive a
    definition of each construct in S as a view over
    S, and
  • compose the queries in the del steps to derive a
    definition of each construct in S as a view over
    S

15
Background
  • Thus
  • Prog p (p,c)?category
  • Film p(p,F)?category
  • Doc p(p,D)?category
  • Series p(p,S)?category
  • and
  • category (p,F)p?Film U (p,D)p?Doc U
    (p,S)p?Series
  • These view definitions can then be used to
    automatically translate data and queries between
    S and S

16
Overview of the AutoMed Project
  • The AutoMed project aims to investigate
  • how our theoretical framework can be practically
    applied real data integration problems
  • how much of a mediators global query processing
    functionality can be automatically generated from
    our transformation pathways
  • evolutionary and heuristic techniques for schema
    improvement and global query optimisation

17
The AutoMed Architecture
Schema and Transformation Repository
Schema Transformation and Integration Tool
Global Query Processor
Global Query Optimiser
Model Definitions Repository
Model Definition Tool
Schema Evolution Tool
18
Schema Transformation/Integration Networks in
AutoMed
GS
id
id
id
id
id
US1
US2
USi
USn




LS1
LS2
LSi
LSn
19
Schema Transformation/Integration Networks in
AutoMed
  • On the previous slide
  • GS is a global schema
  • LS1, , LSn are local schemas
  • US1, , USn are union-compatible schemas
  • the transformation pathways between each pair LSi
    and USi may consist of add, delete, rename,
    expand and contract primitive transformation,
    operating on any modelling construct defined in
    the AutoMed Model Definitions Repository
  • the transformation pathway between USi and GS is
    similar
  • the transformation pathway between each pair of
    union-compatible schemas consists of id
    transformation steps

20
Both-As-View integration
  • Our schema transformation pathways capture at
    least the information available from
    global-as-view (GAV) or local-as-view (LAV)
  • We discuss this in a forthcoming paper (ICDE03)
    and term our integration approach both-as-view
    (BAV)
  • In particular, we discuss how
  • GAV and LAV view definitions can be derived from
    a BAV specification
  • a BAV specification can be partially derived from
    a set of GAV or LAV view definitions

21
Schema Evolution
  • Unlike GAV and LAV, our framework readily
    supports the evolution of both local and global
    schemas
  • The first step is to define the evolution of the
    global or local schema as a schema transformation
    pathway from the old to the new schema
  • There is then a systematic way of evolving, as
    opposed to re-generating, the transformation
    pathways
  • In the case of a local schema evolution, the
    global schema may also be evolved

22
Schema Evolution
  • In particular (see our CAiSE02 and ICDE03
    papers for details)
  • if the evolved schema is semantically equivalent
    to the original schema, then the transformation
    network can be repaired automatically
  • if the evolved schema is a contraction of the
    original schema, the transformation network can
    again be repaired automatically
  • if the evolved schema is an extension of the
    original schema, then domain knowledge may be
    required (but again the network can be evolved
    rather than regenerated)

23
Global Query Processing
  • We are handling query language heterogeneity by
    translation into/from a functional intermediate
    query language IQL Edgar Jasper
    (BNCOD02 poster, BNCOD02 summer school paper)
  • A query Q expressed in a high-level query
    language on a global schema GS is first
    translated into IQL
  • GAV view definitions are derived from the
    transformation pathways between GS and the local
    schemas
  • These view definitions are substituted into Q,
    reformulating it into an IQL query over local
    schema constructs

24
Global Query Processing
  • Query optimisation and query evaluation then
    occur
  • Specific issues for query optimisation in AutoMed
    are
  • optimising the view definitions derived from the
    transformation pathways, and
  • handling heterogeneous modelling constructs
    appearing within these view definitions
  • For query evaluation, wrappers translate IQL
    sub-queries into the local query language, and
    translate results back into the IQL type system.
  • Further query post-processing is possible.

25
Why a Functional Language as the AutoMed
Intermediate Query Language ?
  • Compositionality operators can be composed to an
    arbitrary level of nesting within a query
    provided the types of the operators are respected
    by the expressions passed to them
  • Referential transparency any query evaluates to
    a single answer, irrespective of the order of
    evaluation of its sub-expressions
  • These properties make view generation, query
    reformulation and query rewriting simpler than it
    would be with imperative or logic notations

26
Why a Functional Language as the AutoMed
Intermediate Query Language ?
  • Natural support for collection types and
    aggregation operators
  • Makes this a natural formalism for translating
    into/out of other query languages e.g.
  • OQL is a functional query language
  • SQL can be considered to be a restriction of OQL
  • XQuery has a functional core language
  • other languages for semi-structured and RDF data
    are also functional (UnQL, YATL, RQL)

27
Why a Functional Language as the AutoMed
Intermediate Query Language ?
  • Aggregation operators over collection types such
    as sets, bags and lists are generalised by a
    single fold function (Buneman, Tannen, Naqvi,
    1990s)
  • Optimisation techniques have been developed for
    fold which are applicable to all functional query
    languages with this formalism at their core (e.g.
    work by Wadler, Wong, Fegaras, Grust,
    Poulovassilis Small)
  • We plan to leverage these techniques, and perhaps
    even existing software, for global query
    optimisation in AutoMed

28
XML Data Sources
  • As well as integration of structured data
    sources, we have done some work on translating
    and integrating XML data see our CAiSE01 paper
  • We have defined a representation of XML in terms
    of the nodes, edges and constraints of the HDM
  • We capture the ordering of XML elements by an
    order node and a hyperedge to it from the edge
    representing the parent-child relationship

29
Translating XML into HDM
  • ltcustomer nameJonesgt
  • ltaccount numberA14/gt
  • ltaccount numberB37/gt
  • lt/customergt
  • ltcustomer nameSmithgt
  • ltaccount numberC514/gt
  • ltaccount numberD438/gt
  • lt/customergt

root
order
customer
name
order
number
account
30
XML Data Sources
  • We have defined a set of primitive
    transformations on XML, in terms of the
    underlying transformations on the equivalent HDM
    representation (which is the general AutoMed
    methodology)
  • XML documents are then translated into a simple
    ER representation, which allows them to be
    integrated with each other and with other
    structured data sources
  • One possible direction of further work is
    automatic or semi-automatic transformation and
    integration of the ER models arising from XML
    documents

31
Unstructured Text Sources
  • We have also been working on extracting structure
    from unstructured text sources Dean Williams
  • The aim here is to integrate information
    extracted from unstructured text with structured
    or semi-structured information available from
    other sources
  • We are using existing technology (the GATE tool)
    for the text annotation and IE part of this work

32
Unstructured Text Sources
  • Natural language and domain ontologies will be
    used extend these annotations
  • These will be imported into RDF repositories, and
    we have extended AutoMed to encompass RDF and
    RDFS data sources
  • The information extracted from the text will be
    matched with existing structured information to
    derive new facts and perhaps new schema
    information as well

33
Materialised integration
  • Finally, as well as virtual integration of data
    sources, we are also investigating using the
    AutoMed framework for materialised integration
    i.e. a data warehousing approach
  • In particular, we are looking at incremental view
    maintenance and data lineage tracing using the
    AutoMed schema transformation pathways Hao Fan

34
Ongoing AutoMed Work at Imperial
  • Automatic generation of equivalences between
    different data models
  • A graphical schema transformations editor
  • Data mining techniques for extracting relational
    schema equivalences
  • Using AutoMed for integrating semi-structured and
    structured data, in particular genomic data
  • Optimising schema transformation pathways
Write a Comment
User Comments (0)
About PowerShow.com