The BioMap Data Warehouse - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

The BioMap Data Warehouse

Description:

The BioMap Data Warehouse. Integration of Relational & XML Data Using ... add( A,B ,q) con( C , Void,Any) con( C,B , Void,Any) con( A,C , Void,Any) Example 2 ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 41
Provided by: Luc75
Category:
Tags: biomap | data | warehouse

less

Transcript and Presenter's Notes

Title: The BioMap Data Warehouse


1
The BioMap Data Warehouse
  • Integration of Relational XML Data Using AutoMed

2
Outline
  • The AutoMed toolkit
  • The BioMap integration
  • Automatic XML data transformation/integration

3
Data Integration Approaches
  • Both-As-View (BAV) approach
  • GAV LAV approaches
  • BAV approach
  • Comparison of integration approaches
  • BAV advantages

4
GAV LAV Approaches
  • Global-As-View (GAV) approach describe GS
    constructs with view definitions over LSi
    constructs
  • Local-As-View (LAV) approach describe LSi
    constructs with view definitions over GS
    constructs

5
GAV Example
  • student(id,name,left,degree) x,y,z,w
    ?x,y,z,w,_??ug ? ?x,_,_,_,_??phd ?
  • ?x,y,z,w,_??phd ?
  • w phd
  • monitors(sno,id)
  • x,y ?x,_,_,_,y??ug ?
    ?x,_,_,_,_??phd ?
  • ?x,y??supervises
  • staff(sno,sname,dept)
  • x,y,z ?x,y,z,w,_??tutor ?
    ?x,_,_??supervisor ?
  • ?x,y,z??supervisor

6
LAV Example
  • tutor(sno,sname)
  • x,y ?x,y,_??staff ? ?x,z??monitors
    ?
  • ?z,_,_,w??student ?
  • w ? phd
  • ug(id,name,left,degree,sno)
  • x,y,z,w,v ?x,y,z,w??student ?
    ?v,x??monitors ?
  • w ? phd

7
Both-As-View (BAV) (1/3)
  • Schema transformation approach
  • For each pair (LSi,GS) incrementally modify
    LSi/GS to match GS/LSi

8
Both-As-View (BAV) (2/3)
  • Common Data Model Hypergraph Data Model (HDM)
  • Constructs are nodes, edges constraints
  • It avoids the semantic mismatches that may occur
    between constructs of higher-level modelling
    languages

9
Both-As-View (BAV) (3/3)
  • Modify using primitive schema transformations
  • add/delete
  • rename
  • extend/contract
  • Supply transformations with queries
  • add(??table,attrib3??, q), where
    qt,(a1a2)t,a1???table,attrib1??t,a2???
    table,attrib2??
  • extend(??table,attrib3??, q1,q2)

10
Example (1/2)
  • S1 ? Sg
  • add(??monitors?? ,q1)
  • add(??monitors,sno??,q2)
  • add(??monitors,id??,q3)
  • add(??tutor,dept??,q4)
  • rename(??ug??,??student??)
  • rename(??tutor,??staff??)
  • delete(??student,sno??,q5)
  • S2 ? Sg can be derived similarly

11
Example (2/2)
  • Automatically derivable reverse transformations
  • add(C,q)/extend(C,q1,q2) delete(C,q)/contract(C,
    q1,q2)
  • delete/contract add/extend
  • rename(C1,C2) rename(C2,C1)

12
BAV vs. LAV, GAV GLAV
  • BAV approach subsumes other integration
    approaches
  • Can be used to derive GAV LAV view definitions
    (ICDE03)
  • Comparison with GAV, LAV GLAV in DBIS'04

13
Schema Evolution Example
  • Define the evolution of the global or local
    schema as a schema transformation pathway from
    the old to the new schema

14
Types Of Integration
  • Virtual integration
  • Materialised integration
  • Hybrid integration

15
AutoMed Tools
  • Data Lineage Tracing (DLT)
  • Incremental View Maintenance (IVM)
  • Schema matching tool
  • Transformation pathway optimisation
  • XML transformation/integration tool

16
Outline
  • The AutoMed toolkit
  • The BioMap integration
  • Automatic XML data transformation/integration

17
Integration Outline
  • Wrapping of sources
  • Translation of source and global schemas into the
    XML schema type used within AutoMed
  • Domain expert provides mappings between sources
    global schema
  • Automatic schema transformation/integration
    algorithm

Integrated
Database
Integrated
Database
Wrapper
AutoMed
Integrated
Schema
n
n
o
o
T
i
i
r
t
t
a
y
a
a
n
a
m
y
m
s
f
w
r
a
p
r
o
o
w
a
h
f
r
o
t
t
s
h
m
h
f
t
s
a
n
a
a
w
a
t
n
p
i
a
r
p
o
y
a
T
n
r
T
AutoMed
AutoMed
AutoMed
..
Relational
XML
Relational
Schema
Schema
Schema
XML
RDB
RDB
..
Wrapper
Wrapper
Wrapper
XML
RDB
RDB
..
File
18
Relational To - XMLDSS
19
Integration Outline
  • Wrapping of sources
  • Translation of source and global schemas into the
    XML schema type used within AutoMed
  • Domain expert provides mappings between sources
    global schema
  • Automatic schema transformation/integration
    algorithm

20
Outline
  • The AutoMed toolkit
  • The BioMap integration
  • Automatic XML data transformation/integration

21
Outline
  • Semantic Heterogeneity
  • Schema Matching
  • Ontologies
  • Structural Heterogeneity
  • XML schema type in AutoMed
  • Schema transformation
  • Schema integration

22
Semantic Heterogeneity
  • Problem definition
  • Schema Matching
  • Data mining
  • Neural networks
  • Machine learning (LSD)
  • Ontologies (RDFS/OWL)

23
Schema Matching (1/2)
  • Types
  • 1-1, 1-n, n-1, n-m
  • Subset, superset, equivalence
  • Use schema matching output to create the
    intermediate schemas used by the schema
    restructuring / schema integration algorithms

24
Schema Matching (2/2)
  • Necessary transformations
  • add attributes day, month, year in S
  • delete attribute dob from S
  • The reverse transformation pathway describes a
    n-1 match

25
Structural Heterogeneity
  • Problem Same information can be represented in
    many different ways
  • Ancestor descendant ?? different branches
  • Elements attributes not clearly distinguished
    in XML model
  • Ordering policy

26
Aims
  • XML-specific solution
  • Insert-remove-rename operations on elements,
    attributes, edges
  • Efficient move (node/subtree) operation
  • Element-to-attribute, attribute-to-element
    transformations
  • Avoid loss of data due to structural
    incompatibilities
  • Automation

27
XML DataSource Schema (1/2)
  • Basic characteristics
  • Structure-only representation
  • XML format ? ease of traversal manipulation
  • Automatically derived from an XML file
  • XMLDSS from other schema types (DTD, XML Schema)

28
XML DataSource Schema (2/2)
29
Schema Transformation
  • Target schema T given
  • Source schema S is transformed to match the
    structure of T

30
Algorithm
  • Growing phase traverse the target schema and
    issue an add/extend transformation for every
    construct that does not exist in the source
    schema.
  • Shrinking phase traverse the source schema and
    issue an delete/contract transformation for every
    construct that does not exist in the target
    schema.
  • Completeness of algorithm

31
Transformation Types
  • AutoMed primitive transformations
  • add/extend
  • delete/contract
  • rename
  • Schema level
  • Insert, remove or rename schema constructs
  • Move element/subtree
  • Element ?? attribute

32
Example 1
  • Insert element C
  • ext(ltCgt,Void,Any)
  • ext(ltA,Cgt, Void,Any)
  • ext(ltC,Bgt, Void,Any)
  • del(ltA,Bgt,q)
  • Remove element C
  • add(ltA,Bgt,q)
  • con(ltCgt, Void,Any)
  • con(ltC,Bgt, Void,Any)
  • con(ltA,Cgt, Void,Any)

33
Example 2
  • Insert/remove edge move operation

34
Example 3
  • Move
  • add(ltroot,Bgt,q3)
  • add(ltB,Agt,
  • b,aa,b?ltA,Bgt)
  • delete(ltA,Bgt)
  • a,bb,a?ltB,Agt)
  • Complete
  • add(ltBgt, ltBgtq1)
  • add(ltA,Bgt, ltA,Bgtq2)
  • delete(ltA,Bgt, ltA,Bgt)
  • delete(ltBgt, ltBgt)
  • rename(ltBgt, ltBgt)

Schemas
Data
35
Example 1 - revisited
  • Actually, this can also be treated with an
    add/delete transformation

36
Example 4
  • Element-to-attribute transformation
  • insert(ltA,ABgt,q)
  • remove(ltA,Bgt,q)
  • remove(ltB,PCDATAgt,q)
  • remove(ltBgt,q)
  • Attribute-to-elementtransformation
  • insert(ltBgt,q)
  • insert(ltA,Bgt,q)
  • insert(ltB,PCDATAgt,q)
  • remove(ltA,ABgt,q)

37
Schema Integration
  • Augment with missing constructs
  • Remove redundant constructs

GS
n
add/
extend/
delete
..
restructure
GS
2
add/
extend/
delete
restructure
GS
1
insert
root
..
S
S
S
2
n
1
38
Materialisation
  • Strategy
  • Materialise root and its attributes
  • Consider all edges (ep,ec) in a depth-first way
  • Materialise ec and its attributes

39
Conclusions
  • XML specific transformation integration
    algorithms
  • element??attribute transformations
  • move operation
  • No loss of data by synthetically creating missing
    structure
  • Automation if sources have been previously
    semantically reconciliated

40
Future Work
  • Ontologies instead of schema matching
  • XMLDSS
  • Constraints
  • Support for XML databases
  • XQuery capability for XML wrapper
Write a Comment
User Comments (0)
About PowerShow.com