Title: Direct and Indirect Matching of Schema Elements for Data Integration on the Web
1Direct and Indirect Matching of Schema Elements
for Data Integration on the Web
- Li Xu
- Data Extraction Group
- Brigham Young University
- Sponsored by NSF
2Schema Matching
Color
Year
Year
Feature
Make
Make Model
Body Type
Cost
Car
Model
Car
Car
Style
Phone
Cost
Source
3Mapping
- Direct Matches
- Indirect Matches
- Union
- Selection
- Composition
- Decomposition
4Union and Selection
Color
Year
Year
Feature
Make
Make Model
Body Type
Cost
Car
Model
Car
Car
Style
Phone
Cost
Source
5Composition and Decomposition
Color
Year
Year
Feature
Make
Make Model
Body Type
Cost
Car
Model
Car
Car
Style
Phone
Cost
Source
6Matching Techniques
- Terminological Relationships
- Value Characteristics
- Expected Data Values
- Structure
7Terminological Relationships
- WordNet
- Machine-Learned Rules
- Example (Make, Brand)
The number of different common hypernym roots of
A and B
The sum of the number of senses of A and B
Sum of distances of A and B to a common hypernym
8Value Characteristics
- Machine Learning
- Features LC94
- String length, numeric ratio, space ratio.
- Mean, variation, coefficient variation, standard
deviation
9Expected Values
- Application Concepts
- Data Frames
- CarMake
- ford
- honda
-
- CarModel
- accord
- mustang
- taurus
-
Make Model
Brand Model
Ford Mustang Ford Taurus Ford F150
Legend Mustang A4
Acura Audi BMW
CarMake . CarModel
CarMake
CarModel
Target
Source
10Structure
PO
PurchaseOrder
Items
POShipTo
POBillTo
POLines
DeliverTo
InvoiceTo
Count
Address
Item
ItemCount
City
Street
City
Street
Item
ItemNumber
City
Street
Line
Qty
UoM
Quantity
UnitOfMeasure
Target
Source
11Structure (Cont.)
PO
PurchaseOrder
Items
POShipTo
POBillTo
POLines
DeliverTo
InvoiceTo
DeliverTo
Count
Address
Item
Count
City
Street
City
Street
Item
ItemNumber
City
Street
Line
Qty
UoM
Quantity
UnitOfMeasure
Target
Source
12Structure (Cont.)
PO
PurchaseOrder
Items
POBillTo
POLines
InvoiceTo
POShipTo
DeliverTo
City
Count
City
Item
Count
Street
City
Street
City
Street
Item
Street
ItemNumber
Line
Qty
UoM
Quantity
UnitOfMeasure
Target
Source
13Structure (Cont.)
PO
PurchaseOrder
Items
POBillTo
POLines
InvoiceTo
POShipTo
DeliverTo
City
Count
City
Item
Count
Street
City
Street
City
Street
Item
Street
ItemNumber
ItemNumber
Line
Qty
UoM
Line
Qty
UoM
Line
Qty
Quantity
Quantity
Quantity
UnitOfMeasure
Target
Source
14Structure (Cont.)
PO
PurchaseOrder
Items
POBillTo
POLines
InvoiceTo
POShipTo
DeliverTo
City
City
Count
Count
City
City
Item
Count
Street
Street
Count
City
Street
City
Street
Item
City
Street
City
Street
Street
Street
ItemNumber
Line
Qty
UoM
Line
Qty
Quantity
Quantity
UnitOfMeasure
Target
Source
15Experiments
- Methodology
- Measures
- Precision
- Recall
- F Measure
16Results
Applications (Number of Schemes) Precision () Recall () F () Correct False Positive False Negative
Course Schedule (5) 98 93 96 119 2 9
Faculty Member (5) 100 100 100 140 0 0
Real Estate (5) 92 96 94 235 20 10
Indirect Matches 94 (precision, recall,
F-measure)
- Data borrowed from Univ. of Washington
17Contributions
- Direct Matches
- Indirect Matches
- Expected values
- Structure
- High Precision and High Recall