Model-Independent Schema and Data Translation - PowerPoint PPT Presentation

About This Presentation
Title:

Model-Independent Schema and Data Translation

Description:

S2 = Diff (S1, map) and more. map3 = Compose (map1, map2) S2 = Select ... DB2'=Diff(DB2,m3) DW2', m4 user defined. m5 = Match(DW1,DW2') DW2 = Merge(DW1,DW2',m5) ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 100
Provided by: profpaol6
Category:

less

Transcript and Presenter's Notes

Title: Model-Independent Schema and Data Translation


1
Model-Independent Schema and Data Translation
  • Paolo Atzeni
  • Università Roma Tre
  • Based on
  • Tutorial at ICDE 2006
  • Paper in EDBT 2006 (with P. Cappellari and P.
    Bernstein)
  • JISBD 2006 --- Sitges, 5 October 2006

2
Outline
  • Introduction
  • Model management
  • Schema and data translation the problem
  • A metamodel based approach

3
A ten-year goal for database research
  • The Asilomar report(Bernstein et al. Sigmod
    Record 1999 www.acm.org/sigmod)
  • The information utilitymake it easy for
    everyone to store, organize, access, and analyze
    the majority of human information online

4
A general framework cooperation
  • "The capacity of a system to interact
    (effectively) with other systems, possibly
    managed by different organizations"
  • Forms of cooperation
  • Process-centered cooperation
  • the systems offer services one another, by
    exchanging messages, or by triggering activities,
    without making remote data explicitly visible
  • Data-centered cooperation
  • the systems offer data one another data is
    distributed, heterogeneous and autonomous

5
Databases in the Internet era
  • Databases before the Internet
  • An internal infrastructure, a precious resource,
    but usually hidden, with some controlled
    cooperation
  • Internet changes the requirements
  • More users (not only humans), more diverse
    cooperating systems (distributed, heterogeneous,
    autonomous), more types of data
  • "Future" Internet changes more
  • New devices (embedded everywhere), even more
    users (many per person), real mobility, need
    for personalization and adaptation

6
The most studied form of data-centered
cooperation integration
  • We are interested in data-centered cooperation,
    often referred to as integration
  • The unification of related, heterogeneous data
    from disparate sources, for example, to enable
    collaboration (Hammer Stonebraker 2005)
  • Some "paradigms"

7
Multidatabase
8
Data Warehousing System
DW manager
Data Warehouse
Integrator
DB
DB
DB
9
Intermediate solutions in practice
Local Manager
DB
Integrator
Mediator
Mediator
DB
DB
DB
10
Peer-based architecture
Peer mgr
Peer mgr
Mediator
Local mgr
Local mgr
Mediator
Local mgr
Local mgr
DB
DB
DB
DB
Mediator
Mediator
Peer mgr
Mediator
Local mgr
Local mgr
DB
DB
Mediator
11
Data is not just in databases
  • Mail messages
  • Web pages
  • Spreadsheets
  • Textual documents
  • Palmtop devices, mobile phones
  • Multimedia annotations (e.g., in digital photos)
  • XML documents

12
The same data in the same form?
  • Adaptivity
  • Personalization content adapted to the user
  • upon system's decision
  • upon user's request
  • Customization structure adapted to the user
  • according to the user's role
  • upon user's request
  • Context dependence
  • User, Device, Network, Place, Time, Rate

13
Summarizing a general need
  • We have data at various places, and data has to
    be
  • exchanged
  • replicated
  • migrated
  • integrated
  • adapted

14
A major difficulty
  • Heterogeneity
  • System level
  • Model level
  • Structural (different structure for similar data)
  • Semantic (different meaning for the same
    structure)
  • Many efforts, but current techniques are mostly
    manual and ad hoc

15
A direction for the solutions
  • Be general!
  • ad hoc solution could work in-the-small, but they
  • are repetitive and time consuming
  • do not scale
  • are not maintainable
  • Historical notes
  • W. C. McGee Generalization Key to Successful
    Electronic Data Processing. J. ACM 1959
  • Indeed, databases are the result of
    generalization applied to secondary storage
    management!

16
Generality requires
  • high-level descriptions of problems within the
    family of interest
  • Metadata
  • data about data
  • (formal or informal) description of structures
    and meaning
  • General solutions leverage on metadata (and then
    operate on data as a consequence)

17
Outline
  • Introduction and motivation
  • Model management
  • Schema and data translation the problem
  • A metamodel based approach

18
A wider perspective
  • (Generic) Model Management
  • A proposal by Bernstein et al (2000 )
  • Includes a set of operators on
  • schemas and
  • mappings between schemas

19
Terminology a warning
20
Schemas and mappings
  • A simplified approach
  • Schema
  • a set of elements, related in some way to one
    another
  • Mapping
  • a set of correspondences (pair of elements) or
  • its reification, a third schema related to the
    other two via two sets of correspondences

21
Model mgmt operators, a first set
  • map Match (S1, S2)
  • S3 Merge (S1, S2, map)
  • S2 Diff (S1, map)
  • and more
  • map3 Compose (map1, map2)
  • S2 Select (S1, pred)
  • Apply (S, f)
  • list Enumerate (S)
  • S2 Copy (S1)

22
Match
  • map Match (S1, S2)
  • given
  • two schemas S1, S2
  • returns
  • a mapping between them
  • the classical initial step in data integration
  • find the common elements of two schemas and the
    correspondences between them

23
Merge
  • S3 Merge (S1, S2, map)
  • given
  • two schemas and a mapping between them
  • returns
  • a third schema (and two mappings)
  • the classical second step in data integration
  • given the correspondences, find a way to obtain
    one schema out of two

24
Diff
  • S2 Diff (S1, map)
  • given
  • a schema and a mapping from it (to some other
    schema, not relevant)
  • returns
  • a (sub-)schema, with the elements that do not
    participate in the mapping

25
Example
  • (Bernstein and Rahm, ER 2000)
  • A database (a source), a data warehouse and a
    mapping between the two
  • we get a second source, with some similarity to
    the first one
  • and we want to update the DW

DW2
DB1
DW1
DB2
26
Example, the "solution"
m2 Match(DB1,DB2) m3 Compose(m2,m1) DB2Diff(D
B2,m3) DW2, m4 user defined m5
Match(DW1,DW2) DW2 Merge(DW1,DW2,m5)
DW2
DB1
DW1
m1
m3
m2
m5
DB2
DW2
m4
DB2
27
Magic does not exist
  • Operators might require human intervention
  • Match is the main case
  • Scripts involving operators might require human
    intervention as well (or at least benefit from
    it)
  • a full implementation of each operator might not
    always available
  • a mapping might require manual specification
  • incomparable alternatives might exist

28
The data level
  • The major operators have also an extended version
    that operates on data, and not only on schemas
  • Especially apparent for
  • Merge

29
We also have heterogeneity
  • Round trip engineering (Bernstein, CIDR 2003)
  • A specification, an implementation
  • then a change to the implementation want to
    revise the specification
  • We need a translation from the implementation
    model to the specification one

S1
S2
I1
I2
30
Model management with heterogeneity
  • The previous operators have to be model generic
    (capable of working on different models)
  • We need a translation operator
  • ltS2, map12gt ModelGen (S1)

31
ModelGen, an additional operator
  • ltS2, map12gt ModelGen (S1)
  • given
  • a schema (in a model)
  • returns
  • a schema (in a different data model) and a
    mapping between the two
  • A translation from a model to another
  • I should call it SchemaGen
  • We should better write
  • ltS2, map12gt ModelGen (S1,mod2)

32
Round trip engineering
S2
S1
S2
m3
m1
m4
m2
I1
I2
I2
m2 Match (I1,I2) m3 Compose (m1,m2) I2
Diff(I2,m3) ltS2,m4 gt Modelgen(I2) Match,
Merge
33
Outline
  • Introduction
  • Model management
  • Schema and data translation the problem
  • A metamodel based approach

34
Schema and data translation, a long standing
issue
  • Schema translation
  • given schema S1 in model M1 and model M2
  • find a schema S2 in M2 that corresponds to S1
  • Schema and data translation
  • given also a database D1 for S1
  • find also a database D2 for S2 that contains the
    same data to D1

35
Schema and data translation, a long standing
issue
  • Translations from a model to another have been
    studied since the 1970s
  • Whenever a new model is defined, techniques and
    tools to generate translations are studied
  • However, proposals and solutions are usually
    model specific

36
Model specific solutions
  • Given an ER schema, find the suitable relational
    schema that implements it
  • the original paper (Chen 1976) contains the
    basics
  • further discussions by many (e.g. Markowitz and
    Shoshani 1989)
  • illustrated in every textbook
  • Similarly with
  • any other conceptual model and any other logical
    one
  • XML and relational (or object)

37
Another problem in the picture data exchange
  • Given a source S1 and a target schema S2 (in
    different models or even in the same one), find a
    translation, that is, a function that given a
    database D1 for S1 produces a database D2 for S2
    that correspond to D1
  • Often emphasized with reference to materialized
    solutions

38
Integration
  • Given two or more sources, build an integrated
    schema or database

39
Schema translation
  • Given a schema find another one with respect to
    some specific goal (better quality, another
    model, )

40
Data exchange
  • Given a source and a target schema, find a
    transformation from the former to the latter

41
Schema translation and data exchange
  • Can be seen a complementary
  • Data translation schema translation data
    exchange
  • Given a source schema and database
  • Schema translation produces the target schema
  • Data exchange generates the target database
  • In model management terms we could write
  • Schema translation
  • ltS2, map12gt ModelGen (S1,mod2)
  • Data exchange
  • i2 DataGen (S1,i1,S2,map12)

42
Outline
  • Introduction
  • Model management
  • Schema and data translation the problem
  • A metamodel based approach

43
 The problem
  • ModelGen
  • given two data models M1 and M2, and a schema S1
    of M1 (the source schema and model),
  • generate a schema S2 of M2 (the target schema
    and model), corresponding to S1
  • and, for each database D1 over S1, generate an
    equivalent database D2 over S2

44
We have been doing this for a while
  • Initial work more than ten years ago (Atzeni
    Torlone, 1996)
  • Major novelty recently
  • translation of both schemas and data
  • data-level translations generated, from
    schema-level ones
  • Moreover
  • a visible, multilevel and (in part)
    self-generating dictionary
  • high-level, visible and customizable translation
    rules
  • in Datalog with OID-invention
  • mappings between elements generated as a
    by-product

45
Heterogeneity
  • We need to handle artifacts and data in various
    models
  • Data are defined wrt to schemas
  • Schemas are defined wrt to models
  • How models can be defined?

Models
Schemas
Data
46
A metamodel approach
  • The constructs in the various models are rather
    similar
  • can be classified into a few categories (Hull
    King 1986)
  • Lexical set of printable values (domain)
  • Abstract (entity, class, )
  • Aggregation a construction based on (subsets
    of) cartesian products (relationship, table)
  • Function (attribute, property)
  • Hierarchies
  • We can fix a set of metaconstructs (each with
    variants)
  • lexical, abstract, aggregation, function, ...
  • the set can be extended if needed, but this will
    not be frequent
  • A model is defined in terms of the metaconstructs
    it uses

47
 The metamodel approach, example
  • The ER model
  • Abstract (called Entity)
  • Function from Abstract to Lexical (Attribute)
  • Aggregation of abstracts (Relationship) 
  • The OR model
  • Abstract (Table with ID)
  • Function from Abstract to Lexical (value-based
    Attribute)
  • Function from Abstract to Abstract (reference
    Attribute)
  • Aggregation of lexicals (value-based Table)
  • Component of Aggregation of Lexicals (Column)

48
 The supermodel
  • A model that includes all the meta-constructs (in
    their most general forms)
  • Each model is subsumed by the supermodel (modulo
    construct renaming)
  • Each schema for any model is also a schema for
    the supermodel (modulo construct renaming)

49
A Multi-Level Dictionary
  • Handles models, schemas and data
  • Has both a model specific and a model independent
    component

50
Multi-Level Dictionary
51
The supermodel description
MSM-Property MSM-Property MSM-Property MSM-Property
OID Name Constr. Type
11 Name 1 String
12 Name 2 String
13 IsKey 2 Boolean
14 IsNullable 2 Boolean
15 Type 2 String
16 Name 3 String
17 Name 4 String
18 IsIdentifier 4 Boolean
19 IsNullable 4 Boolean
20 Type 4 String
21 IsFunct1 5 Boolean
22 IsOptional1 5 Boolean
23 Role1 5 String
24 IsFunct2 5 Boolean
25 IsOptional2 5 Boolean
26 Role2 5 String
MSM-Construct MSM-Construct MSM-Construct
OID Name IsLex
1 AggregationOfLexicals F
2 ComponentOfAggrOfLex T
3 Abstract F
4 AttributeOfAbstract T
5 BinaryAggregationOfAbstracts F
MSM-Reference MSM-Reference MSM-Reference MSM-Reference
OID Name Construct Target
30 Aggregation 2 1
31 Abstract 4 3
32 Abstract1 5 3
33 Abstract2 5 3
52
Schemas in the supermodel
EmpNo
Employees
MSM-Construct MSM-Construct MSM-Construct
OID Name IsLex

3 Abstract F
4 AttributeOfAbstract T

Name
Name
Departments
Address
SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301
404 1 Name T F Char 302
405 1 Address F F Text 302
501 3 Code T F Int 201

SM-Abstract SM-Abstract SM-Abstract
OID Schema Name
301 1 Employees
302 1 Departments
201 3 Clerks
202 3 Offices
Supermodel schemas
53
Instances in the supermodel
i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract
OID AttrOfAbsOID InstOID Value i- AbsOID
4010 401 1 75432 3010
4020 402 1 John Doe 3010

4021 402 1 Bob White 3011

SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301
404 1 Name T F Char 302
405 1 Address F F Text 302
501 3 Code T F Int 201

i-SM-Abstract i-SM-Abstract i-SM-Abstract
OID AbsOID InstOID
3010 301 1
3011 301 1

3010 301 2
SM-Abstract SM-Abstract SM-Abstract
OID Schema Name
301 1 Employees
302 1 Departments
201 3 Clerks
202 3 Offices
Supermodel instances
Supermodel schemas
54
Multi-Level Repository
description
Supermodel description (mSM)
Model descriptions (mM)
model
Supermodel schemas (SM)
Model specific schemas (M)
schema
Supermodel instances (i-SM)
Model specific instances (i-M)
data
model independence
model generic
model specific
55
Model descriptions
MM-Model MM-Model
OID Name
1 Relational
2 Entity-Relationship
3 Object
MSM-Construct MSM-Construct MSM-Construct
OID Name IsLex
1 AggregationOfLexicals F
2 ComponentOfAggrOfLex T
3 Abstract F
4 AttributeOfAbstract T
5 BinaryAggregationOfAbstracts F
MM-Construct MM-Construct MM-Construct MM-Construct
OID Model MSM-Constr Name
1 2 3 ER_Entity
2 2 4 ER_Attribute
3 2 5 ER_Relationship
4 1 1 Rel_Table
5 1 2 Rel_Column
6 3 3 OO_Class
MSM-Property MSM-Property MSM-Property MSM-Property
OID Name Construct Type

MM-Property MM-Property MM-Property MM-Property

MSM-Reference MSM-Reference MSM-Reference MSM-Reference
OID Name Construct

MM-Reference MM-Reference MM-Reference MM-Reference

56
Multi-Level Repository, generation and use
description
Supermodel description (mSM)
Model descriptions (mM)
model
Supermodel schemas (SM)
Model specific schemas (M)
schema
Structure fixed, content provided by tool
designers
Structure fixed, content provided by model
designers out of mSM
Supermodel instances (i-SM)
Model specific instances (i-M)
data
Structure generated by the tool from the content
of mM
Structure generated by the tool from the content
of mSM
model independence
model generic
model specific
Structure generated by the tool from the content
of mSM
57
The metamodel approach, translations
  • The constructs in the various models are rather
    similar
  • can be classified into a few categories
    (metaconstructs'')
  • translations can be defined on metaconstructs,
  • and there are standard, accepted ways to deal
    with translations of metaconstructs
  • they can be performed within the supermodel
  • each translation from the supermodel SM to a
    target model M is also a translation from any
    other model to M
  • given n models, we need n translations, not n2 

58
Generic translation environment
Supermodel
2. Translation
1. Copy
3. Copy
Source model
Target model
Translation composition 1,2 3
59
Translations within the supermodel
  • We still have too many models
  • Just within simple ER model versions, we have 4
    or 5 constructs, and each has several independent
    features which give rise to variants
  • for example, relationships can be
  • binary or N-ary
  • with all possible cardinalities or without
    many-to-many
  • with or without the possibility of specifying
    optionality
  • with or without attributes
  • Combining all these, we get hundreds of models!
  • The management of a specific translation for each
    model would be hopeless

60
Translations, the approach
  • Elementary translation steps to be combined
  • Each translation step handles a supermodel
    construct (or a feature thereof) "to be
    eliminated" or "transformed"
  • A translation is the concatenation of elementary
    translation steps

61
A complex translation, example
(0,N)
(0,N)
  • Eliminate N-ary relationships
  • Eliminate attributes from relationships
  • Eliminate many-to-many relationships
  • Replace relationships with references
  • Eliminate generalizations

62
Complex translations
N-ary ER w/ gen
Elim. N-ary relationships Elim. Relationship
attr.s Elim. MN relationships Replace
relationships with references Elim OO
generalizations Elim ER generalizations
Binary ERw/ gen
N-ary ER w/o gen
Bin ER w/ gen w/o attr on rel
Binary ER w/o gen
Bin ER w/o gen w/o attr on rel
Bin ER w/ gen w/o MN rel
OO w/ gen
Bin ER w/o gen w/o MN rel

Relational
OO w/o gen
63
Translations
  • Basic translations are written in a variant of
    Datalog, with OID invention
  • We specify them at the schema level
  • The tool "translates them down" to the data level
  • Some completion or tuning may be needed

64
A basic translation
  • From (a simple) binary ER model to the relational
    model
  • a table for each entity
  • a column (in the table for E) for each attribute
    of an entity E
  • for each MN relationship
  • a table for the relationship
  • columns
  • for each 1N and 11 relationship
  • a column for each attribute of the identifier

65
A basic translation application
EmpNo
Employees
Employees Employees Employees
EmpNo Name Affiliation
Name
1,1
Affiliation
Departments Departments
Name Address
0,N
Name
Departments
Address
66
A basic translation (in supermodel terms)
  • From (a simple) binary ER model to the relational
    model
  • an aggregation of lexicals for each abstract
  • a component of the aggregation for each attribute
    of abstract
  • for each MN aggregation of abstracts
  • From (a simple) binary ER model to the relational
    model
  • a table for each entity
  • a column (in the table for E) for each attribute
    of an entity E
  • for each MN relationship
  • a table for the relationship
  • columns
  • for each 1N and 11 relationship
  • a column for each attribute of the identifier

67
"An aggregation of lexicals for each abstract"
  • SM_AggregationOfLexicals(
  • OID aggregationOID_1(OID),
  • Name name)
  • ?
  • SM_Abstract (
  • OID OID,
  • Name name )

68
Datalog with OID invention
  • Datalog (informally)
  • a logic programming language with no function
    symbols and predicates that correspond to
    relations in a database
  • we use a non-positional notation
  • Datalog with OID invention
  • an extension of Datalog that uses Skolem
    functions to generate new identifiers when needed
  • Skolem functions
  • injective functions that generate "new" values
    (value that do not appear anywhere else) so
    different Skolem functions have disjoint ranges

69
"An aggregation of lexicals for each abstract"
  • SM_AggregationOfLexicals(
  • OID aggregationOID_1(OID),
  • Name n)
  • ?
  • SM_Abstract (
  • OID OID,
  • Name n )
  • the value for the attribute Name is copied (by
    using variable n)
  • the value for OID is "invented" a new value for
    the function aggregationOID_1(OID) for each
    different value of OID, so a different value for
    each value of SM_Abstract.OID

70
"An aggregation of lexicals for each abstract"
EmpNo
Employees Employees Employees

Employees
SM_AggregationOfLexicals( OID
aggregationOID_1(OID), Name n) ? SM_Abstract
( OID OID, Name n )
Name
SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301

SM-Abstract SM-Abstract SM-Abstract
OID Schema Name
301 1 Employees
302 1 Departments

11
Employees
1001
11
Departments
1002
1001
302
1002



71
"A component of the aggregation for each
attribute of abstract"
  • SM_ComponentOfAggregation (
  • OID componentOID_1(attOID),
  • Name name,
  • AggrOID aggregationOID_1(absOID),
  • IsNullable isNullable,
  • IsKey isIdent,
  • Type type )
  • ?
  • SM_AttributeOfAbstract(
  • OID attOID,
  • Name name,
  • AbstractOID absOID,
  • IsIdent isIdent,
  • IsNullable isNullable ,
  • Type type )
  • Skolem functions
  • are functions
  • are injective
  • have disjoint ranges
  • the first function "generates" a new value
  • the second "reuses" the value generated by the
    first rule

72
A component of the aggregation for each attribute
of abstract"
SM_ComponentOfAggregation ( OID
componentOID_1(attOID), Name name, AggrOID
aggregationOID_1(absOID), IsNullable
isNullable, IsKey isIdent, Type type
) ? SM_AttributeOfAbstract( OID attOID, Name
name, AbstractOID absOID, IsIdent isIdent,
IsNullable isNullable , Type type )
Employees Employees Employees
EmpNo Name
EmpNo
Employees
Name
SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301

SM-Abstract SM-Abstract SM-Abstract
OID Schema Name
301 1 Employees
302 1 Departments

1001
11
Employees
1001
11
Departments
1002
1003
1001
301
1004
402
302
1002



73
Generating data-level translations
  • Same environment
  • Same language
  • High level translation specification

Supermodel description (mSM)
Schema translation
Supermodel schemas (SM)
Supermodel instances (i-SM)
Data translation
74
Translation rules, data level
  • i-SM_ ComponentOfAggregation (
  • OID i-componentOID_1 (i-attOID),
  • i-AggrOID i-aggregationOID_1(i-absOID),
  • ComponentOfAggregationOfLexicalsOID
    componentOID_1(attOID),
  • Value Value )
  • ?
  • i-SM_AttributeOfAbstract(
  • OID i-attOID,
  • i-AbstractOID i-absOID,
  • AttributeOfAbstractOID attOID,
  • Value Value ),
  • SM_AttributeOfAbstract(
  • OID attOID,
  • AbstractOID absOID,
  • Name attName,
  • IsNullable isNull,
  • IsID isIdent,
  • Type type )

SM_ComponentOfAggregation ( OID
componentOID_1(attOID), Name name, AggrOID
aggregationOID_1(absOID), IsNullable
isNullable, IsKey isIdent, Type type
) ? SM_AttributeOfAbstract( OID attOID, Name
name, AbstractOID absOID, IsIdent isIdent,
IsNullable isNullable , Type type )
75
Correctness
  • Usually modelled in terms of information capacity
    equivalence/dominance (Hull 1986, Miller 1993,
    1994)
  • Mainly negative results in practical settings
    that are non-trivial
  • Probably hopeless to have correctness in general
  • We follow an "axiomatic" approach
  • We have to verify the correctness of the basic
    translations, and then infer that of complex ones

76
Experiments
  • A significant set of models
  • ER (in many variants and extensions)
  • Relational
  • OR
  • XSD
  • UML

77
Summary
  • ModelGen was studied a few years ago
  • New interest on it within the "Model management"
    framework
  • New approach
  • Translation of schema and data
  • Visible (and in part self generated) dictionary
  • Visible and modifiable rules
  • Skolem functions describe mappings

78
Conclusion
  • The ten-year goal of the Asilomar report
  • The information utilitymake it easy for
    everyone to store, organize, access, and analyze
    the majority of human information online
  • A lot of interesting work has been done but
  • integration, translation, exchange are still
    difficult
  • 2009 is approaching we are late!

79
?
Grazzie
Gracies
Gracias
Grazie
Thank you
80
(No Transcript)
81
Leftovers
82
Instances in our dictionary
SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract SM-AttributeOfAbstract
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301
404 1 Name T F Char 302

SM-Abstract SM-Abstract SM-Abstract
OID Schema Name
301 1 Employees
302 1 Departments
i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract
OID AttrOfAbsOID InstOID Value i- AbsOID
4010 401 1 75432 3010
4020 402 1 John Doe 3010
4021 402 1 Bob White 3011
4022 404 1 CS 3012

i-SM-Abstract i-SM-Abstract i-SM-Abstract
OID AbsOID InstOID
3010 301 1
3011 301 1
3012 302 1

75432
CS
John Doe

Bob White
83
Translation of instances
EmpNo
Employees
Employees Employees Employees
EmpNo Name Affiliation
Name
1,1
Affiliation
Departments Departments
Name Address
0,N
Name
Departments
Address
Employees Employees Employees
EmpNo Name Affiliation
75432 John Doe CS
Bob White
Departments Departments
Name
CS
84
Instances in our dictionary
CS
i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract
OID AttrOfAbsOID InstOID Value i- AbsOID
4010 401 1 75432 3010
4020 402 1 John Doe 3010
4022 404 1 CS 3012

i-SM-Abstract i-SM-Abstract i-SM-Abstract
OID AbsOID InstOID
3010 301 1
3011 301 1
3012 302 1

Employees Employees Employees
EmpNo Name Affiliation
75432 John Doe CS
Departments Departments
Name
CS
5010
CS
11
1005
4030
85
Instances in our dictionary
i-SM-Abstract i-SM-Abstract i-SM-Abstract
OID AbsOID InstOID
3010 301 1
3011 301 1
3012 302 1

i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract i-SM-AttributeOfAbstract
OID AttrOfAbsOID InstOID Value i- AbsOID
4010 401 1 75432 3010
4020 402 1 John Doe 3010
4022 404 1 CS 3012

i-SM-ComponentOfAggregationOfLexicals
i-SM-AggregationOfLexicals
i- AggrOID
Value
InstOID
CompOfAggOID
OID
AggrOID
InstOID
OID
5010
75432
11
1003
6010
1001
11
5010
5010
John Doe
11
1004
6020
1002
11
5011
5010
CS
11
1005
6030
i-SM_ComponentOfAggregation (OID
i-componentOID_1 (i-attOID), i-AggrOID
i-aggregationOID_1(i-absOID), ComponentOfAggregat
ionOfLexicalsOID componentOID_1(attOID),Value
Val ) ? i-SM_AttributeOfAbstract( OID i-attOID,
i-AbstractOID i-absOID, AttributeOfAbstractOID
attOID, Value Val ), SM_AttributeOfAbstract(
OID attOID, AbstractOID absOID)
86
 The problem
  • ModelGen (a model management operator, Bernstein
    2003)
  • given two data models M1 and M2, and a schema S1
    of M1 (the source schema and model),
  • generate a schema S2 of M2 (the target schema
    and model), corresponding to S1
  • and, for each database D1 over S1, generate an
    equivalent database D2 over S2

Correctness what do corresponding and equivalent
mean?
87
Many artifacts and models for them
  • Many notations, each with variants and
    conventions
  • Object diagrams
  • Interface definitions
  • Database schemas
  • Web site layouts
  • Control flow diagrams
  • XML schemas
  • Form definitions

88
Many models just in the database world
  • With different features and goals
  • semantic models and logical models
  • E-R, functional, (conceptual) object
  • relational, object-relational, object, network
  • general purpose models (for all seasons) and
    problem oriented models (for specific contexts
    DW, statistical, spatial, temporal)
  • Variations of models
  • models with different levels of abstraction
  • versions within a family (e.g. many versions of
    the ER model)
  • More models recently with the Web and XML

89
What do we need to exchange and translate
  • In design
  • Artifacts (schemas and other descriptions)
  • In operations
  • Data (in databases, files, documents, )

90
Elementary steps
  • There can be a set of predefined basic
    translations, for example
  • eliminate n-ary aggregations replace them with
    binary ones (and abstracts)
  • eliminate binary aggregations replace them with
    functions
  • eliminate functions to abstracts replace them
    with aggregations
  • eliminate complex attributes replace them with
    simple attributes and abstracts
  • Assumed to be correct and so complex translations
    built over them are correct by definition (an
    axiomatic approach)

91
A complex translation
  • From an N-ary ER model with generalizations to a
    simple Object model with only single valued
    references and no generalizations
  • Eliminate N-ary relationships (replaced by binary
    ones and new entities)
  • Eliminate attributes from relationships
  • Eliminate many-to-many relationships
  • Transform relationships to references
  • Eliminate generalizations

92
"An aggregation of lexicals for each MN
aggregation of abstracts"
  • SM_AggregationOfLexicals(
  • OID aggregationOID_2(aggOID),
  • Name n )
  • ?
  • SM_BinaryAggregationOfAbstracts(
  • OID aggrOID,
  • Name n,
  • isFun1 "False",
  • isFun2 "False")
  • another Skolem function aggregationOID_2()
  • generates a "table" for a "relationship"
  • the rule applies only to MN "relationships", due
    to the conditions on isFun1 and isFun2

93
Translation rules, data level
SM_ComponentOfAggregation ( OID
componentOID_1(attOID), AggrOID
aggregationOID_1(absOID), Name name,
IsNullable isNullable, IsKey isIdent,
Type type ) ?
  • i-SM_ ComponentOfAggregation (
  • OID i-componentOID_1 (i-attOID),
  • i-AggrOID i-aggregationOID_1(i-absOID),
  • ComponentOfAggregationOfLexicalsOID
    componentOID_1(attOID),
  • Value Value )
  • ?

94
Translation rules, data level
? SM_AttributeOfAbstract( OID
attOID, AbstractOID absOID, Name
name, IsIdent isIdent, IsNullable isNullable
, Type type )
  • ?
  • i-SM_AttributeOfAbstract(
  • OID i-attOID,
  • i-AbstractOID i-absOID,
  • AttributeOfAbstractOID attOID,
  • Value Value ),
  • SM_AttributeOfAbstract(
  • OID attOID,
  • AbstractOID absOID,
  • Name name,
  • IsNullable isNullable,
  • IsID isIdent,
  • Type type )

95
Management of translations
  • Basic properties
  • Correctness, minimality,
  • Construction of complex translations by picking
    basic translations in the library

96
ModelGen the architecture
97
 A simple example
  • An object relational database, to be translated
    in a relational one
  • Source the OR-model
  • Target the relational model

98
 Example, 2
  • Does the OR model allow for keys?
  • Assume EmpNo and Name are keys

99
 Example, 3
  • Does the OR model allow for keys?
  • Assume no keys are specified
Write a Comment
User Comments (0)
About PowerShow.com