AutoMed:%20Automatic%20generation%20of%20Mediator%20tools%20for%20heterogeneous%20data%20integration - PowerPoint PPT Presentation

About This Presentation

Title:

AutoMed:%20Automatic%20generation%20of%20Mediator%20tools%20for%20heterogeneous%20data%20integration

Description:

a new notion of schema equivalence ... These pathways can be used to automatically translate data and queries between ... One possible direction of further work ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 35

Provided by: Poulova

Category:

more less

Transcript and Presenter's Notes

Title: AutoMed:%20Automatic%20generation%20of%20Mediator%20tools%20for%20heterogeneous%20data%20integration

1
AutoMed Automatic generation of Mediator tools
for heterogeneous data integration

Alex Poulovassilis
School of Computer Science and Information
Systems, Birkbeck
AutoMed is a joint project with Peter McBrien
(Imperial College),
funded under the 2nd DIM call by EPSRC grants
GR/N38107 and GR/N35915

2
Integrated Schema
Schema
Schema
Schema
3
Background

In earlier work (ER97, IS98, DKE98) we
developed a new framework to support
transformation and integration of heterogeneous
database schemas.
Our framework consisted of
a new notion of schema equivalence
a set of primitive schema transformations which
can be composed to define unconditional or
conditional equivalences between schemas

4
Background

In our data integration approach, we represent
the modelling constructs of higher-level data
models (e.g. relational, object-oriented,
semi-structured, XML, RDF) in terms of a
low-level hypergraph data model HDM whose
constructs are nodes, edges and constraints
The HDM common data model provides a unifying
semantics for such higher-level modelling
constructs
It avoids the semantic mismatches that may occur
between constructs of higher-level modelling
languages

5
Background

Our approach allows constructs from different
modelling languages to be mixed within the same
intermediate schema during the schema
transformation/integration process (CAiSE99)
Our schema transformations are automatically
reversible, setting up a two-way transformation
pathway between pairs of schema

6
(No Transcript)
7
(No Transcript)
8

addClass Series p(p,S)?category
addClass Doc p(p,D)?category
addClass Film p(p,F)?category
addClass Prog p(p,c)?category

addSubClass Film Prog
addSubClass Doc Prog
addSubClass Series Prog
addClass Series p(p,S)?category
addClass Doc p(p,D)?category
addClass Film p(p,F)?category
addClass Prog p(p,c)?category

addSubClass Film Prog
addSubClass Doc Prog
addSubClass Series Prog
addClass Series p(p,S)?category
addClass Doc p(p,D)?category
addClass Film p(p,F)?category
addClass Prog p(p,c)?category
delRel category (p,F)p?Film U
(p,D)p?Doc U
(p,S)p?Series

delSubClass Film Prog
delSubClass Doc Prog
delSubClass Series Prog
delClass Series p(p,S)?category
delClass Doc p(p,D)?category
delClass Film p(p,F)?category
delClass Prog p(p,c)?category
addRel category (p,F)p?Film U
(p,D)p?Doc U
(p,S)p?Series

addConstraint subset Film Prog
addConstraint subset Doc Prog
addConstraint subset Series Prog
addNode Series p(p,S)?category
addNode Doc p(p,D)?category
addNode Film p(p,F)?category
addNode Prog p(p,c)?category
delEdge category (p,F)p?Film U
(p,D)p?Doc U
(p,S)p?Series
delNode Programme Prog
delNode Category F,D,S

delConstraint subset Film Prog
delConstraint subset Doc Prog
delConstraint subset Series Prog
delNode Series p(p,S)?category
delNode Doc p(p,D)?category
delNode Film p(p,F)?category
delNode Prog p(p,c)?category
addEdge category (p,F)p?Film U
(p,D)p?Doc U
(p,S)p?Series
addNode Programme Prog
addNode Category F,D,S

14
Background

These pathways can be used to automatically
translate data and queries between pairs of
schemas (ER99)
From a pathway TS gt S we
compose the queries in the add steps to derive a
definition of each construct in S as a view over
S, and
compose the queries in the del steps to derive a
definition of each construct in S as a view over
S

15
Background

Thus
Prog p (p,c)?category
Film p(p,F)?category
Doc p(p,D)?category
Series p(p,S)?category
and
category (p,F)p?Film U (p,D)p?Doc U
(p,S)p?Series
These view definitions can then be used to
automatically translate data and queries between
S and S

16
Overview of the AutoMed Project

The AutoMed project aims to investigate
how our theoretical framework can be practically
applied real data integration problems
how much of a mediators global query processing
functionality can be automatically generated from
our transformation pathways
evolutionary and heuristic techniques for schema
improvement and global query optimisation

17
The AutoMed Architecture
Schema and Transformation Repository
Schema Transformation and Integration Tool
Global Query Processor
Global Query Optimiser
Model Definitions Repository
Model Definition Tool
Schema Evolution Tool
18
Schema Transformation/Integration Networks in
AutoMed
GS
id
id
id
id
id
US1
US2
USi
USn

LS1
LS2
LSi
LSn
19
Schema Transformation/Integration Networks in
AutoMed

On the previous slide
GS is a global schema
LS1, , LSn are local schemas
US1, , USn are union-compatible schemas
the transformation pathways between each pair LSi
and USi may consist of add, delete, rename,
expand and contract primitive transformation,
operating on any modelling construct defined in
the AutoMed Model Definitions Repository
the transformation pathway between USi and GS is
similar
the transformation pathway between each pair of
union-compatible schemas consists of id
transformation steps

20
Both-As-View integration

Our schema transformation pathways capture at
least the information available from
global-as-view (GAV) or local-as-view (LAV)
We discuss this in a forthcoming paper (ICDE03)
and term our integration approach both-as-view
(BAV)
In particular, we discuss how
GAV and LAV view definitions can be derived from
a BAV specification
a BAV specification can be partially derived from
a set of GAV or LAV view definitions

21
Schema Evolution

Unlike GAV and LAV, our framework readily
supports the evolution of both local and global
schemas
The first step is to define the evolution of the
global or local schema as a schema transformation
pathway from the old to the new schema
There is then a systematic way of evolving, as
opposed to re-generating, the transformation
pathways
In the case of a local schema evolution, the
global schema may also be evolved

22
Schema Evolution

In particular (see our CAiSE02 and ICDE03
papers for details)
if the evolved schema is semantically equivalent
to the original schema, then the transformation
network can be repaired automatically
if the evolved schema is a contraction of the
original schema, the transformation network can
again be repaired automatically
if the evolved schema is an extension of the
original schema, then domain knowledge may be
required (but again the network can be evolved
rather than regenerated)

23
Global Query Processing

We are handling query language heterogeneity by
translation into/from a functional intermediate
query language IQL Edgar Jasper
(BNCOD02 poster, BNCOD02 summer school paper)
A query Q expressed in a high-level query
language on a global schema GS is first
translated into IQL
GAV view definitions are derived from the
transformation pathways between GS and the local
schemas
These view definitions are substituted into Q,
reformulating it into an IQL query over local
schema constructs

24
Global Query Processing

Query optimisation and query evaluation then
occur
Specific issues for query optimisation in AutoMed
are
optimising the view definitions derived from the
transformation pathways, and
handling heterogeneous modelling constructs
appearing within these view definitions
For query evaluation, wrappers translate IQL
sub-queries into the local query language, and
translate results back into the IQL type system.
Further query post-processing is possible.

25
Why a Functional Language as the AutoMed
Intermediate Query Language ?

Compositionality operators can be composed to an
arbitrary level of nesting within a query
provided the types of the operators are respected
by the expressions passed to them
Referential transparency any query evaluates to
a single answer, irrespective of the order of
evaluation of its sub-expressions
These properties make view generation, query
reformulation and query rewriting simpler than it
would be with imperative or logic notations

26
Why a Functional Language as the AutoMed
Intermediate Query Language ?

Natural support for collection types and
aggregation operators
Makes this a natural formalism for translating
into/out of other query languages e.g.
OQL is a functional query language
SQL can be considered to be a restriction of OQL
XQuery has a functional core language
other languages for semi-structured and RDF data
are also functional (UnQL, YATL, RQL)

27
Why a Functional Language as the AutoMed
Intermediate Query Language ?

Aggregation operators over collection types such
as sets, bags and lists are generalised by a
single fold function (Buneman, Tannen, Naqvi,
1990s)
Optimisation techniques have been developed for
fold which are applicable to all functional query
languages with this formalism at their core (e.g.
work by Wadler, Wong, Fegaras, Grust,
Poulovassilis Small)
We plan to leverage these techniques, and perhaps
even existing software, for global query
optimisation in AutoMed

28
XML Data Sources

As well as integration of structured data
sources, we have done some work on translating
and integrating XML data see our CAiSE01 paper
We have defined a representation of XML in terms
of the nodes, edges and constraints of the HDM
We capture the ordering of XML elements by an
order node and a hyperedge to it from the edge
representing the parent-child relationship

29
Translating XML into HDM

ltcustomer nameJonesgt
ltaccount numberA14/gt
ltaccount numberB37/gt
lt/customergt
ltcustomer nameSmithgt
ltaccount numberC514/gt
ltaccount numberD438/gt
lt/customergt

root
order
customer
name
order
number
account
30
XML Data Sources

We have defined a set of primitive
transformations on XML, in terms of the
underlying transformations on the equivalent HDM
representation (which is the general AutoMed
methodology)
XML documents are then translated into a simple
ER representation, which allows them to be
integrated with each other and with other
structured data sources
One possible direction of further work is
automatic or semi-automatic transformation and
integration of the ER models arising from XML
documents

31
Unstructured Text Sources

We have also been working on extracting structure
from unstructured text sources Dean Williams
The aim here is to integrate information
extracted from unstructured text with structured
or semi-structured information available from
other sources
We are using existing technology (the GATE tool)
for the text annotation and IE part of this work

32
Unstructured Text Sources

Natural language and domain ontologies will be
used extend these annotations
These will be imported into RDF repositories, and
we have extended AutoMed to encompass RDF and
RDFS data sources
The information extracted from the text will be
matched with existing structured information to
derive new facts and perhaps new schema
information as well

33
Materialised integration

Finally, as well as virtual integration of data
sources, we are also investigating using the
AutoMed framework for materialised integration
i.e. a data warehousing approach
In particular, we are looking at incremental view
maintenance and data lineage tracing using the
AutoMed schema transformation pathways Hao Fan

34
Ongoing AutoMed Work at Imperial

Automatic generation of equivalences between
different data models
A graphical schema transformations editor
Data mining techniques for extracting relational
schema equivalences
Using AutoMed for integrating semi-structured and
structured data, in particular genomic data
Optimising schema transformation pathways

Write a Comment

User Comments (0)