Title: A Survey of Approaches to Automatic Schema Matching
1A Survey of Approaches to Automatic Schema
Matching
Erhard Rahm Philip A. Bernstein
The VLDB Journal 10334350 2001
2The Problem
- Schema matching
- Input schemas
- Output mappings
- Motivations
- Manual schema matching
- Generic and customizable schema matching
3Application Domains
- Schema Integration Structures and Terminological
relationships - Data warehouses Sourcetowarehouse
Transformation - Ecommerce Message Translation
- Semantic query processing A Runtime Scenario
4The Match Operator
- Representations of Input Schemas and Output
Mapping - Schema representation
- Schema elements
- Structure
- Mapping representation
- Mapping elements
- Mapping expressions
- Matching Function
- Mathematically unsatisfying
- Heuristics
5Architecture for Generic Match
Tool 2 Ebusiness schemas
Tool 1 Portal schemas
Tool 3 Data warehousing schemas
Global libraries dictionaries schemas
Schema import/export
Generic Match Implementation
Internal schema representation
6Classification of Approaches
- Individual matchers
- Instance vs Schema
- Element vs Structure Matching
- Language vs Constraint
- Matching Cardinality 11 1n n1 and nm
- Auxiliary Information
- Combinations of multiple matchers
7Schemalevel Approaches
- Granularity of match elementlevel vs.
structurelevel - Match cardinality
- Linguistic approaches
- Constraintbased approaches
- Reusing schema and mapping information
8Granularity of match
9Match Cardinality
10Linguistic Approaches
- Name Matching
- Equality of names
- Equality of canonical name representations
- Equality of synonyms
- Equality of hypernyms
- Similarity of names based on common substrings
edit distance pronunciation and soundex - User provided name matches
- Description Matching
- Ex. S1 empn //employee name
- Ex. S2 name //name of employee
11Constraintbased Approaches
12Reusing Schema and Mapping Information
13Instancelevel Approaches
- Linguistic characterization
- Information retrieval techniques
- Ex. Extracting keywords and themes
- Constraintbased characterization
- Numeric value ranges
- Numeric value averages
- Character patterns PhoneNr ISBNs SSNs
14Combining Different Matchers
- Hybrid matchers
- Hardwired combination of multiple matching
criteria - Better performance
- Composite matchers
- Independent basic matchers
- Flexible execution order
15Sample Approaches
- SEMINT
- LSD
- SKAT
- TranScm
- DIKE
- ARTEMIS
- CUPID
16Sample Approaches
- SEMINT
- LSD
- SKAT
- TranScm
- DIKE
- ARTEMIS
- CUPID
17 18Conclusion
- Propose a taxonomy that covers many of the
existing approaches - Suggest quantitative work on the relative
performance and accuracy of different approaches