Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration

Description:

The number of different common hypernym roots of A and B. The sum of distances of A and B to a common hypernym. The sum of the number of senses of A and B ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 18
Provided by: LiXu8
Learn more at: http://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration


1
Multifaceted Exploitation of Metadata for
Attribute Match Discovery in Information
Integration
  • David W. Embley
  • David Jackman
  • Li Xu

2
Background
  • Problem Attribute Matching
  • Matching Possibilities (Facets)
  • Attribute Names
  • Data-Value Characteristics
  • Expected Data Values
  • Data-Dictionary Information
  • Structural Properties

3
Approach
  • Target Schema T
  • Source Schema S
  • Framework
  • Individual Facet Matching
  • Combining Facets
  • Best-First Match Iteration

4
Example
Car
Car
Style
has
01
0
01
01
has
has
has
Cost
Mileage
Miles
Source Schema S
Target Schema T
5
Individual Facet Matching
  • Attribute Names
  • Data-Value Characteristics
  • Expected Data Values

6
Attribute Names
  • Target and Source Attributes
  • T A
  • S B
  • WordNet
  • C4.5 Decision Tree feature selection
  • f0 same word
  • f1 synonym
  • f2 sum of distances to a common hypernym root
  • f3 number of different common hypernym roots
  • f4 sum of the number of senses of A and B

7
WordNet Rule
8
Confidence Measures
9
Data-Value Characteristics
  • C4.5 Decision Tree
  • Features
  • Numeric data
  • (Mean, variation, standard deviation, )
  • Alphanumeric data
  • (String length, numeric ratio, space ratio)

10
Confidence Measures
11
Expected Data Values
  • Target Schema T and Source Schema S
  • Regular expression recognizer for attribute A in
    T
  • Data instances for attribute B in S
  • Hit Ratio N/N for (A, B) match
  • N number of B data instances recognized by the
    regular expressions of A
  • N number of B data instances

12
Confidence Measures
13
Combined Measures
Threshold 0.5
14
Final Confidence Measures
15
Experimental Results
  • Matched Attributes
  • 100 (32 of 32)
  • Unmatched Attributes
  • 99.5 (374 of 376)
  • Feature ---Color
  • Feature ---Body Type.

16
Conclusions
  • Direct Attribute Matching feasible
  • Individual-Facet Matching good
  • Multifaceted Matching better

17
Future Work
  • Additional Facets
  • More Sophisticated Combinations
  • Additional Application Domains
  • Automating Feature Selection
  • Indirect Attribute Matching

www.deg.byu.edu
Write a Comment
User Comments (0)
About PowerShow.com