Title: Transitioning Relational Databases to Ontologies Farid Cerbah Dassault Aviation farid.cerbahdassault
1Transitioning Relational Databases to
OntologiesFarid CerbahDassault
Aviationfarid.cerbah_at_dassault-aviation.fr
2Outline
- Problem statement
- Previous work
- The RDBToOnto tool and the RTAXON method
- Improving the process through database
optimisation - A case study in aircraft maintenance
- Extending RDBToOnto
- Conclusion
3Problem statement
- Relational databases are valuable heterogeneous
sources for ontology learning - Better accuracy can be expected than from text
corpora - Ontology learning from relational databases is
not a new research issue - Limitations of existing support
- Problem often restricted to finding automated
ways to import tables into ontologies - Derivation of ontologies with flat structure that
look like the source databases
4Our contribution
- RDBToOnto Platform
- A comprehensive software support to learn
fine-tuned ontologies - A framework that eases the development and the
experimentation of transitioning methods - RTAXON Method
- To find out taxonomies hidden in the data
5A motivating example
Typical mappings covered by several methods
6Previous work (1)
- RDB -gt Ontology Transformation
- Database Reverse Engineering
- Many transformation rules from this domain are
reused for ontology learning - Behm et al. 1997, Ramanathan Hodges 1997,
- Approaches mostly based on an analysis of the RDB
schema - Data correlations are considered but
- with the restriction "Data Key Values"
- Key inclusion may express inheritance
- Exploiting null values semantics Lammari et al.
2007 - Partitioning of a table on the basis of null
values may reveal concept hierarchies - Involves data from non-key attributes
7Previous work (2)
- Mapping languages and tools
- D2RQ
- RDB to OWL/RDF mapping
- Ontology-based access to relational databases
- Rewriting SPARQL queries into SQL
- Relational.OWL
- A minimal ontology of tables and column and
a processor to populate this ontology with data
from relational databases - Can be used to exchange data between databases
- Triplify
- Plugin for web applications
- Converts the result of SQL queries into RDF
- KAON Reverse
- Software support to interactively map an RDB
schema to a predefined ontology - DataMaster
- Protégé Plugin to import table data into
ontologies
8RDBToOnto
- A user-oriented tool with a full-fledged user
interface - Supports an extensive process from the access to
the data to ontology generation - Includes the RTAXON converter
- Though automated to a large extent, local
constraints can be interactively included to
progressively refine the ontologies - Types of local constraints
- Table and column exclusion
- Naming patterns for classes and instances
- Categorisation patterns
9The RTAXON method
- Major improvement over existing methods
- Further refine the classes derived from the
schema with subclasses found in the content of
the relations - Focus on reliable categorisation patterns
Categorising attribute
Access Zone
Door
Panel
Fairing
Floor
- Two sources involved in the identification of
categ. attributes - Attribute names
- Revealed by lexical clues
- Redundancy in attribute extensions
- Entropy-based approach to find good profiles
Formal definition of RTAXON
Demo
10Optimising the source databases
- Another key improvement is the inclusion of a
database optimisation step - Many input databases suffer from data duplication
problems - Optimisation -gt eliminate data duplication
through the processing of inclusion dependencies
11Effect of inclusion dependency processing
- Inclusion dependencies ? more inter-class
relations (i.e. object properties).
Without ID identification
With ID identification
12Identification of inclusion dependencies
- RDBToOnto includes an editor to interactively
define inclusion dependencies - Automated identification of inclusion
dependencies - A data mining approach Based on LATINO
13Mining inclusion dependencies with LATINO
14A case study in aircraft maintenance
KCIT(GATE-based annotator)
RDBToOnto LATINO
Radiant
OWLIM
15The ontology acquisition process
- The legacy data
- LSA database an heterogeneous relational
database that gathers all information related to
maintenance activity - Required logistic resources
- Aircraft parts (Product tree)
- Scheduling data
- Standards Documents including widely shared
conceptual models - The ontology acquisition process
- A multi-step transitioning process that favours
modular design
16Model Boostrapping Ontology Normalisation
MSG-3
SNS/ATA
FOAF
Reusable Ontologies
ltgtlt/gt ltgt lt/gt . ltgt lt/gt
Model Bootstrapping
Legacy Data
Ontology Learning Tools
17The defined RDBToOnto conversion project
- 75 constraints
- Mostly naming patterns and inclusion dependencies
- Resulting ontology
- Ontology model
- 115 classes, 334 datatypes, 54 object properties
- Population
- 49617 class instances, 51449 object property
instances - No constraints for categorisation
- The ten discovered hierarchies by RTAXON are
relevant - Good behaviour when faced with categorisation
conflicts
18The generated class hierarchy
19Identified object properties
20RDBToOnto extension capabilities
- RDBToOnto is a user-oriented tool but it is also
a framework - Written in Java
- OWL as target language (exploiting Jena 2.5 API)
- Two types of components can be added
- Database readers to cover more database formats
- Converters to implement new learning methods
- New converters can have their specific global
options, local constraints and GUI
21Structure of RDBToONTO
Database
DBReader Database getDatabase() Table
ReadData(String name)
RDBToOntoConverter OntModel
Convert(Database db) OntClass CreateClass(TableDef
)
RTAXON
BasicConverter
MSAccessReader
DB2Reader
can be extended by the users
22The neutral database model
Database
DBSchema
Table
Column
Attribute
TableDef
friendlyNames
Values
String
Key
Input to any converter
PrimaryKey
ForeignKey
23Conclusion
- We presented a significant support for
transitioning relational databases to ontologies - RDBToOnto and RTAXON method have been evaluated
on significant databases - RTAXON is just a first step as many extensions
can be studied - Learning two-level hierarchies
- Automatically generating local constraints (e.g.
naming patterns) - More resources are available on TAO project web
site, including - User Guide and demos
- Development Guide
- A fully implemented sample showing how to extend
the tool