Title: Ontology-Driven Information Retrieval
1Ontology-DrivenInformation Retrieval
ICDL 2004, New Dehli, India
- Nicola Guarino
- Laboratory for Applied Ontology
- Institute for Cognitive Sciences and Technology
(ISTC-CNR) - Trento-Roma, Italy
2Summary
- The role of ontologies in (generalized) IR
- Semantic Matching
- Increasing precision
- Increasing recall
- Problems with WordNet
- The role of Foundational Ontologies
3The key problem
4Simple queries need more knowledge about what
the user wants
- Search for Washington (the person)
- Google 26,000,000 hits
- 45th entry is the first relevant
- Noise places
- Search for George Washington
- Google 2,200,00 hits
- 3rd entry is relevant
- Noise institutions, other people, places
5Solution ontologysemantic markup
- Ontology
- Person
- George Washington
- George Washington Carver
- Place
- Washington, D.C.
- Artifact
- George Washington Bridge
- Organization
- George Washington University
- Semantic disambiguation/markup of questions and
corpora - What Washington are you talking about?
6The role of taxonomy and lexical knowledge
- Search for Artificial Intelligence Research
- Misses subfields of the general field
- Misses references to AI and Machine
Intelligence (synonyms) - Noise non-research pages, other fields
7Solution
- Extra knowledge
- Ontology Sub-fields (of AI)
- Knowledge Representation
- Machine Vision etc.
- Neural networks
- Lexicon Synonyms (for AI)
- Artificial Intelligence
- Machine Intelligence
- Techniques
- Query Expansion
- Add disjuncted sub-fields to search
- Add disjuncted synonyms to search
- Semantic Markup of question and corpora
- Add general terms (categories)
- Add synonyms
8Ontology-driven search engines
- Idealized view
- Ontology-driven search engines act as virtual
librarians - Determine what you really mean
- Discover relevant sources
- Find what you really want
- Requires common knowledge on all ends
- Semantic linkage between questioning agent,
answering agent and knowledge sources - Hence the Semantic Web?
9Two main roles of ontologies in IT
- Semantic Interoperability
- Generalized database integration
- Virtual Enterprises, Concurrent Engineering
- e-commerce
- Web services
- Information Retrieval
- Documents
- Facts (query answering)
- Products
- Services
- Not mentioning IS analysis and design
10The role of linguistic ontologiescoupled with
structured representation formalisms
- Why not just simple thesauri based on fixed
keyword hierarchies? - Datas intrinsic dynamics needs to keep track of
new terms - Need of understanding a rigid set of terms
- Heterogeneous descriptions need a broad-coverage
vocabulary - Linguistic ontologies with structured
representation formalisms - Decouple user vocabulary from data vocabulary
- Increase recall (synonyms, generalizations)
- Increase precision (disambiguation, ontology
navigation) - Further increase precision (by capturing the
structure of queries and data)
11Using WordNet as an ontology
- Unclear semantic interpretation of hyperonimy
- Instantiation vs. subsumption
- Object-level vs. meta-level
- Hyperonymy used to account for polysemy
- (law both a document and a rule)
- Unclear taxonomic structure
- Glosses not consistent with taxonomic structure
- Heterogeneous leves of generality
- Formal constraints violations (especially
concerning roles) - Polysemous use of antonymy (child/parent vs.
daughter/son) - Poor ontology of adjectives and qualities
- Shallow taxonomy of verbs
12When subtle distinctions are important
- Trying to engage with too many partners too fast
is one of the main reasons that so many online
market makers have foundered. The transactions
they had viewed as simple and routine actually
involved many subtle distinctions in terminology
and meaning - Harvard Business Review, October 2001
13Ontologies and intended meaning
Ontology
14Levels of Ontological Precision
game(x) ? activity(x) athletic game(x) ?
game(x) court game(x) ? athletic game(x) ? ?y.
played_in(x,y) ? court(y) tennis(x) ? court
game(x) double fault(x) ? fault(x) ? ?y.
part_of(x,y) ? tennis(y)
game athletic game court game tennis
outdoor game field game football
tennis football game field game court
game athletic game outdoor game
Axiomatized theory
Taxonomy
game NT athletic game NT court game RT
court NT tennis RT double fault
Glossary
DB/OO scheme
Catalog
Thesaurus
Ontological precision
15Ontology Quality Precision and Coverage
Low precision, max coverage
BAD
Max precision, low coverage
16Why precision is important
MD(L)
False agreement!
17DOLCEa Descriptive Ontology for Linguistic and
Cognitive Engineering
- Strong cognitive bias descriptive (as opposite
to prescriptive) attitude - Emphasis on cognitive invariants
- Categories as conceptual containers no deep
metaphysical implications wrt true reality - Clear branching points to allow easy comparison
with different ontological options - Rich axiomatization
18DOLCEs basic taxonomy
Quality Physical Spatial location Temporal
Temporal location Abstract Abstract Quali
ty region Time region Space region Color
region
Endurant Physical Amount of matter Physical
object Feature Non-Physical Mental
object Social object Perdurant Static Stat
e Process Dynamic Achievement Accomplishmen
t
19Research priorities at LOA-CNR
- Foundational ontologies and ontological analysis
- Domain ontologies
- Physical objects
- Information and information processing
- Social interaction
- Ontology of legal and financial entities
- Ontology, language, cognition
- Ontology-driven information systems
- Ontology-driven conceptual modeling
- Ontology-driven information access
- Ontology-driven information integration
www.loa-cnr.it