Composing Mappings between Schemas using a Reference Ontology - PowerPoint PPT Presentation

About This Presentation
Title:

Composing Mappings between Schemas using a Reference Ontology

Description:

Page 1. Composing Mappings between Schemas using a Reference Ontology - ODBASE'04 ... order schemas: CIDR, Excel, Noris, Paragon, and Apertum used to evaluate COMA. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 42
Provided by: RamonLa4
Category:

less

Transcript and Presenter's Notes

Title: Composing Mappings between Schemas using a Reference Ontology


1
Composing Mappings between Schemas using a
Reference Ontology
Eduard Dragut, Ramon Lawrence Iowa Database and
Emerging Applications (IDEA) Laboratory University
of Iowa eduard-dragut, ramon-lawrence_at_uiowa.edu

2
Outline
  • Motivation
  • Integration Approach
  • Background
  • Architecture Overview
  • Ontological Matching
  • Composing Mappings
  • Global View Construction
  • Experimental Results
  • Future Work and Conclusions

3
Motivation
  • Many organizations have pre-existing ontologies
    that are not suitable as global views but are
    suitable as reference ontologies to aid
    integration.
  • Example National Cancer Institute (NCI) and
    National Insitutes of Health (NIH) have caBIG
    grid prototype which standardizes terminology
    (EVS, caDSR) and data elements in cancer domain.
  • Schema-to-ontology matching requires integrators
    understand only their schema instead of all
    schemas that they may want to integrate.

4
Integration Approach
Reference Ontology
Page 4
Composing Mappings between Schemas using a
Reference Ontology - ODBASE04 - Eduard Dragut,
Ramon Lawrence
5
BackgroundOntologies and Integration
  • Ontologies as the integrated, global view
  • Carnot project (Collet91) with Cyc ontology
    (Lenat90)
  • ONTOBROKER (Decker98), OBSERVER (Mena00)
  • Tools for semi-automatically merging ontologies
  • PROMPT (Noy00), Ontobuilder (Gal04)
  • Use ontologies as matching/integration aids
  • MOMIS (Beneventano03) using WordNet
  • Indirect (Xu03), CUPID (Madhavan01), COMA (Do02)
  • Matching ontologies (Doan02)
  • Discovering ontologies (Madhavan03)
  • Corpus-based matching

6
BackgroundModel Management
  • Model management as proposed by (Bernstein03) is
    intended to allow high-level schema operations.
  • Operators include Invert, Compose, Match, Merge.
  • Warning Semantics of all operators are not yet
    fully defined and some of them are not completely
    automatic.
  • Definitions
  • A match is a semantic correspondence between
    schema elements.
  • A mapping between schema elements is an
    expression that relates the elements.
  • Note that most schema matching systems such as
    COMA produce matches not mappings.

7
Architecture Overview
  • We assume the existence of a pre-existing
    reference ontology that has been accepted in a
    domain.
  • The ontology is NOT a global view and may not
    cover the information in all schemas. It cannot
    be edited.
  • Global view construction is a 3-step process
  • 1) Independently match each schema to the
    ontology.
  • 2) Compose schema-to-ontology matches to produce
    schema-to-schema mappings.
  • 3) Merge the schema mappings to produce the
    global view.
  • The challenge is to automate this as much as
    possible.

8
Benefits of Approach
  • Even with manual integration there are several
    benefits to using a reference ontology
  • 1) An integrator must only understand their
    schema and the ontology and not other schemas to
    be integrated.
  • 2) Most validation is performed once during
    schema-to-ontology matching and not for every
    schema integrated.
  • 3) Schema-to-ontology matchings can be re-used
    every time a new schema is integrated into the
    federation.
  • Automation can
  • 1) Help construct schema-to-ontology matchings.
  • 2) Perform composition of mappings.
  • 3) Build a global view from the composed mappings.

9
Automation Challenges
  • There are several challenges in automating this
    process
  • 1) Schema matching systems such as COMA are
    designed for simpler relational schemas.
    Ontologies must be mapped into a suitable format
    for use with COMA.
  • 2) Schema-to-ontology matching is less accurate
    due to more complicated ontological structure and
    because the ontology may not model the entire
    domain or may model it differently.
  • 3) Composing matchings often results in many
    false matches which must be handled.
  • 4) A method for merging schemas using model
    management primitive operators is required.
  • Even with these operators, Merge is not fully
    automatic.

10
BackgroundCOMA
  • COMA (Do02) is a schema matching system that can
    flexibly combine different match algorithms and
    re-use match results.
  • Match algorithms use names, paths, and schema
    properties in various ways.
  • The mapping format between two schemas R and S is
    a triple (r,s,v) where r in R, s in S, and v is
    the similarity value in 0..1 between elements r
    and s.
  • A schema in COMA is represented as a rooted
    directed acyclic graph. Schema elements are
    nodes which may be connected by links of
    different types.

11
Ontological Matching
  • The first step is to convert ontologies in
    OWL/DAML format into COMAs graph representation
    format.
  • Wrote a program that used the JENA parser.
  • During the conversion
  • 1) Explicitly converted a named relationship in
    the ontology into a node and several edges in
    graph.
  • 2) Explicitly encoded attributes inherited over
    IS-A links since COMA does not support IS-A.
  • After conversion, COMA would automatically
    produce a schema-to-ontology match as it would
    appear to be matching two relational schemas.

12
Converting Ontology to a Graph
13
Ontological MatchingMax versus noMax
  • One challenge is what should this match look
    like?
  • Two choices
  • 1) Max - For each schema element, keep the best
    match with the ontology (if any).
  • 2) NoMax - For each schema element, keep all the
    matches that are above the cutoff threshold.
  • Since Max only generates one match, it is
    probably the best in semi-automated settings.
    NoMax will generate many matches which must be
    filtered out by the user or during composition.

14
Composing Mappings
  • Schema-to-ontology mappings must be composed to
    produce direct schema-to-schema mappings.
  • Since mappings carry no semantics, two objects
    are assumed to be identical if they map to the
    same ontological concept. Composition is
    performed transitively and is implemented using a
    natural join.
  • That is, if element r is similar to o and o is
    similar to s, then we assume that r is similar to
    s.
  • For example
  • ltpostalCode,Zip,0.8gt and ltZip, postCode,0.7gt can
    be composed to yield ltpostalCode,postCode,0.75gt.
  • The similarity values may be combined using
    various functions, although average is the most
    common.

15
Composition Example
16
Global View Construction
  • One of the possible applications of constructing
    schema-to-schema mappings in this way is using
    them to build a global view.
  • We have given a script in the paper that uses
    model management operators to compose any number
    of schema-to-ontology mappings into a single
    global view for all sources.
  • Note that this algorithm is not perfect nor fully
    automatic as the mappings are not perfect and the
    Merge operator may require human intervention.

17
Global View Construction Example
18
Experimental Setup
  • Matched the 5 sample order schemas CIDR, Excel,
    Noris, Paragon, and Apertum used to evaluate
    COMA. Numbered these schemas 1, 2, 3, 4, and 5.
  • Created a reference ontology that models some of
    the domain (but not all of it) and is quite
    different than the schemas (uses IS-A for
    example).
  • Used the matchings specified with COMA as
    ground-truth.
  • Evaluation metrics
  • Precision - of correct matches/ of suggested
    matches
  • Recall - of correct matches returned/ total
    matches
  • Overall Recall (2 - 1 / Precision)

19
Reference Order Ontology
20
Experiment 1Schema-to-Ontology Matching
  • Goal Evaluate the accuracy of schema-to-ontology
    matching.
  • Method
  • Automatically convert ontology into COMA format
    and match each schema with ontology.
  • Evaluation
  • Measured the percent overlap of the schema and
    ontology. For many schemas, only 60 of their
    concepts were in the ontology.
  • Evaluated the precision, recall, and overall
    measures relative to the number of matches that
    could be found.
  • E.g. If overlap was 60 and recall was 50, then
    only 30 of all schema elements were matched BUT
    of all the possible matches, 50 were found.

21
Experiment 1 Results
noMax is poor for schema 5 as Buyer incorrectly
matched to ontology.
22
Experiment 2Schema-to-Schema Mappings
  • Goal Determine the accuracy of producing
    schema-to-schema mappings by composing
    schema-to-ontology matchings.
  • Method
  • Used automatically generated schema-to-ontology
    matchings and composed them. Evaluated
    composition result against COMA answers for
    direct matching.
  • Evaluated noMax and Max techniques and manual
    mappings.

23
Experiment 2 Results (Overall)
1 lt-gt 2 is poor because of Street mapping. 4
lt-gt 5 is poor because of Buyer mapping.
24
Experiment 3Improving Direct Matches
  • Goal Determine if the accuracy of producing
    direct schema-to-schema mappings can be improved
    by re-using schema-to-ontology matches.
  • Method
  • Generate schema-to-schema mappings by composing
    schema-to-ontology matchings and then use this as
    past matching information for COMA.
  • Allow COMA to perform direct match given this
    information.
  • Evaluated noMax and Max techniques and manual
    mappings.

25
Experiment 3 Results (Overall)
1 lt-gt 2 is poor because of Street mapping.
26
Discussion and Conclusions
  • Major findings
  • 1) Schema-to-ontology mappings can be constructed
    with good accuracy (70-80 precision, 60
    recall).
  • 2) The composition of schema-to-ontology
    matchings produces similar results to direct
    matching with COMA.
  • 3) Max has higher precision than noMax but with
    lower recall. Max is probably best when the user
    must filter incorrect matches and always saves
    work.
  • 4) It is valuable to re-use schema-to-ontology
    matchings (either automatic or manually
    constructed) to improve the accuracy of direct
    matchings.
  • Major conclusion There is a benefit to building
    semi-automatic schema-to-ontology matchings for
    use in integration and global view construction.

27
Future Work and Challenges
  • The major challenge is that the mappings carry no
    semantics which often results in incorrect
    matches suggested after composition.
  • We are currently working on extending the
    mappings to capture semantics to avoid many of
    these cases.
  • The approach is not fully automatic (nor will it
    ever be). However, most manual work is in the
    schema-to-ontology matching stage. We need
    better algorithms and tools to support this
    matching.
  • Want to perform experimental evaluation on larger
    ontologies such as those from NCI.
  • Issue Many ontologies are not in suitable form
    for intermediate mapping with schemas. (just
    taxonomies)

28
Composing Mappings between Schemas using a
Reference Ontology
Eduard Dragut, Ramon Lawrence Iowa Database and
Emerging Applicatons (IDEA) Laboratory University
of Iowa eduard-dragut, ramon-lawrence_at_uiowa.edu
29
Extra Slides
Extra Slides...
30
Ontology Conversion Algorithm
  • 1) Each ontology concept (class) becomes a node
    in the graph.
  • 2) For each property (attribute) of a class, add
    a node to the graph and connect it to its class.
  • 3) Non-basetype properties (those with domain and
    range in ontology) are converted by
  • 3a) Creating a node in the graph for the
    relationship.
  • 3b) Adding an edge from the class domain to this
    node.
  • 3c) Adding an edge from the new node to the range
    class.
  • Note Do not currently support properties that
    have a domain or range that is union/intersection
    of concepts.
  • 4) IS-A expanded by graph traversal.

31
Mapping Composition Challenges
Composing N1 match with 1N match results in a
cross-product
Cannot handle these cases as mappings have no
semantics.
32
Global View Construction Script
Computes Global View of N Source Schemas (with
ontology mappings)
Operator GlobalView(ArraySchemas, ArrayMappings,
O, n) // ArraySchemas stores the n schemas //
ArrayMappings stores the n schema-to-ontology
mappings 1. If n lt 0 Then Return empty
schema 2. If n 1 Then Return
ArraySchemas0 3. S1 ArraySchemas0 4. S2
ArraySchemas1 5. map1 ArrayMappings0 6.
map2 ArrayMappings1 7. lt S, map gt
GlobalView2(S1, S2, map1, map2, O) 8. For (i2
i lt n-1 i) 9. S1 S 10. map1
map 11. S2 ArraySchemasi 12. map2
ArrayMappingsi 13. lt S, map gt
GlobalView2(S1, S2, map1, map2, O) 14. end
for 15. Return lt S, map gt
33
Global View Construction Script (2)
Computes Global View of Two Source Schemas (with
ontology mappings)
Operator GlobalView2(S1, S2, O, S1_O, S2_O) 1.
S1_S2 S1_O Invert(S2_O) 2. lt M, S1_M, S2_M gt
Merge(S1, S2, S1_S2) 3. M_O Invert(S1_M)
S1_O Invert(S2_M) S2_O 4. Return lt M, M_O gt
34
Sample Order SchemaExcel XML Schema
lt?xml version"1.0"?gt ltSchema name"PurchaseOrder.
biz" xmlns"urnschemas-microsoft-comxml-data"
xmlnsdt"urnschemas-microsoft-comdatatypes"gt
ltElementType name"PurchaseOrder"
content"eltOnly"gt ltelement
type"Header"/gt ltelement type"Items"/gt lteleme
nt type"Footer"/gt ltelement type"InvoiceTo"/gt
ltelement type"DeliverTo"/gt lt/ElementTypegtltEleme
ntType name"Items" content"eltOnly"gt ltAttribut
eType name"itemCount" dttype"int"gtlt/AttributeTy
pegt ltattribute type"itemCount"/gt ltelement
type"Item" maxOccurs"" minOccurs"1"/gt lt/Eleme
ntTypegt ltElementType name"Item"
content"empty"gt ltAttributeType
name"yourPartNumber" dttype"string"gtlt/Attribute
Typegt ltAttributeType name"unitPrice"
dttype"number"gtlt/AttributeTypegt ltAttributeType
name"unitOfMeasure" dttype"string"gtlt/Attribute
Typegt ltAttributeType name"salesValue"
dttype"number"gtlt/AttributeTypegt ltAttributeType
name"quantity" dttype"number"gtlt/AttributeTypegt
ltAttributeType name"partNumber"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"partDescription" dttype"string"gtlt/Attribu
teTypegt ltAttributeType name"itemNumber"
dttype"int"gtlt/AttributeTypegt
35
Sample Order SchemaExcel XML Schema (2)
ltattribute type"itemNumber"/gt ltattribute
type"yourPartNumber"/gt ltattribute
type"partNumber"/gt ltattribute
type"partDescription"/gt ltattribute
type"quantity"/gt ltattribute type"unitOfMeasure
"/gt ltattribute type"unitPrice"/gt ltattribute
type"salesValue"/gt lt/ElementTypegt ltElementType
name"InvoiceTo" content"eltOnly"gt ltelement
type"Contact"/gt ltelement type"Address"/gt lt/E
lementTypegt ltElementType name"Header"
content"eltOnly"gt ltAttributeType
name"yourAccountCode" dttype"string"gtlt/Attribut
eTypegt ltAttributeType name"ourAccountCode"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"orderNum" dttype"string"gtlt/AttributeTypegt
ltAttributeType name"orderDate"
dttype"date"gtlt/AttributeTypegt ltattribute
type"orderNum"/gt ltattribute type"orderDate"/gt
ltattribute type"ourAccountCode"/gt ltattribute
type"yourAccountCode"/gt ltelement
type"Contact"/gt lt/ElementTypegt
36
Sample Order SchemaExcel XML Schema (3)
ltElementType name"Footer" content"empty"gt
ltAttributeType name"totalValue"
dttype"number"gtlt/AttributeTypegt ltattribute
type"totalValue"/gt lt/ElementTypegt ltElementType
name"DeliverTo" content"eltOnly"gt ltelement
type"Contact"/gt ltelement type"Address"/gt lt/E
lementTypegt ltElementType name"Contact"
content"empty"gt ltAttributeType
name"telephone" dttype"string"gtlt/AttributeTypegt
ltAttributeType name"e-mail"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"contactName" dttype"string"gtlt/AttributeTy
pegt ltAttributeType name"companyName"
dttype"string"gtlt/AttributeTypegt ltattribute
type"contactName"/gt ltattribute
type"companyName"/gt ltattribute
type"e-mail"/gt ltattribute type"telephone"/gt lt
/ElementTypegt
37
Sample Order SchemaExcel XML Schema (4)
ltElementType name"Address" content"empty"gt
ltAttributeType name"street4"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"street3" dttype"string"gtlt/AttributeTypegt
ltAttributeType name"street2"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"street1" dttype"string"gtlt/AttributeTypegt
ltAttributeType name"stateProvince"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"postalCode" dttype"string"gtlt/AttributeTyp
egt ltAttributeType name"country"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"city" dttype"string"gtlt/AttributeTypegt lt
attribute type"street1"/gt ltattribute
type"street2"/gt ltattribute type"street3"/gt lt
attribute type"street4"/gt ltattribute
type"city"/gt ltattribute type"stateProvince"/gt
ltattribute type"postalCode"/gt ltattribute
type"country"/gt lt/ElementTypegt lt/Schemagt
38
Experiment 2 Precision
39
Experiment 2 Recall
40
Experiment 3 Results (Precision)
41
Experiment 3 Results (Recall)
Write a Comment
User Comments (0)
About PowerShow.com