Using TREC for crosscomparison between classic IR and ontologybased search models at a Web scale Mir - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Using TREC for crosscomparison between classic IR and ontologybased search models at a Web scale Mir

Description:

Using TREC for cross-comparison between classic IR and ontology-based search ... queries 457 (Chevrolet trucks), 523 (facts about the five main clouds) and 524 ... – PowerPoint PPT presentation

Number of Views:109

Avg rating:3.0/5.0

Slides: 24

Provided by: ivncan

Category:

more less

Transcript and Presenter's Notes

Title: Using TREC for crosscomparison between classic IR and ontologybased search models at a Web scale Mir

1
Using TREC for cross-comparison between classic
IR and ontology-based search models at a Web
scaleMiriam Fernández1, Vanessa López2, Marta
Sabou2, Victoria Uren2, David Vallet1, Enrico
Motta2, Pablo Castells1 Semantic Search
2009 Workshop (SemSearch 2009)18th International
World Wide Web Conference (WWW 2009)21st April
2009, Madrid
2
Table of contents

Motivation
Part I. The proposal a novel evaluation
benchmark
Reusing the TREC Web track document collection
Introducing the semantic layer
Reusing ontologies from the Web
Populating ontologies from Wikipedia
Annotating documents
Part II. Analyzing the evaluation benchmark
Using the benchmark to compare an ontology-based
search approach against traditional IR baselines
Experimental conditions
Results
Applications of the evaluation benchmark
Conclusions

3
Motivation (I)

Problem How can semantic search systems be
evaluated and compared with standard IR systems
to study whether and how semantic search engines
offer competitive advantages?
Traditional IR evaluation
Evaluation methodologies generally based on the
Cranfield paradigm (Cleverdon, 1967)
Documents, queries and judgments
Well-known retrieval performance metrics
Precision, Recall, P_at_10, Average Precision (AP),
Mean Average Precision (MAP)
Wide initiatives like TREC to create and use
standard evaluation collections, methodologies
and metrics
The evaluation methods are systematic, easily
reproducible, and scalable

4
Motivation (II)

Ontology-based search evaluation
Ontology-based search approaches
Introduction of a new semantic search space
(ontologies and KBs)
Change in the IR vision (input, output, scope)
The evaluation methods rely on user-centered
studies, and therefore they tend to be high-cost,
non-scalable and difficult to reproduce
There is still a long way to define standard
evaluation benchmarks for assessing the quality
of ontology-based search approaches
Goal develop a new reusable evaluation benchmark
for cross-comparison between classic IR and
ontology-based models on a significant scale

5
Part I. The proposal

Motivation
Part I. The proposal a novel evaluation
benchmark
Reusing the TREC Web track document collection
Introducing the semantic layer
Reusing ontologies from the Web
Populating ontologies from Wikipedia
Annotating documents
Part II. Analyzing the evaluation benchmark
Using the benchmark to compare an ontology-based
search approach against traditional IR baselines
Experimental conditions
Results
Part III. Applications of the evaluation
benchmark
Conclusions

6
The evaluation benchmark (I)

A benchmark collection for cross-comparison
between classic IR and ontology-based search
models at a large scale should comprise five main
components
a set of documents,
a set of topics or queries,
a set of relevance judgments (or lists of
relevant documents for each topic),
a set of semantic resources, ontologies and KBs,
which provide the need semantic information for
ontology-based approaches.
a set of annotations that associate the semantic
resources with the document collection (not
needed for all ontology-based search approaches)

7
The evaluation benchmark (II)

Start from a well-known standard IR evaluation
benchmark
Reuse of the TREC Web track collection used in
the TREC 9 and TREC 2001 editions of the TREC
conference
Document collection WT10g (Bailey, Craswell,
Hawking, 2003). About 10GB in size.1.69 million
Web pages
The TREC topics and judgments for this text
collection are provided with the TREC 9 and TREC
2001 datasets

8
The evaluation benchmark (III)

Construct the semantic search space
In order to fulfill Web-like conditions, all the
semantic search information should be available
online
The selected semantic information should cover,
or partially cover, the domains involved in the
TREC query set
The selected semantic resources should be
completed with a larger set of random ontologies
and KBs to approximate a fair scenario
If the semantic information available online has
to be extended in order to cover the TREC
queries, this must be done with information
sources which are completely independent from the
document collection, and available online

9
The evaluation benchmark (IV)

Document collection
TREC WT10G
Queries and judgments
TREC 9 and TREC 2001 test corpora
100 queries with their corresponding judgments
20 queries selected and adapted to be used by a
NLP QA query processing module
Ontologies
40 public ontologies covering a subset of the
TREC domains and queries (370 files comprising
400MB of RDF, OWL and DAML)
100 additional repositories (2GB of RDF and OWL)
Knowledge Bases
Some of the 40 selected ontologies have been
semi-automatically populated from Wikipedia
Annotations
1.2 108 non-embedded annotations generated and
stored in a MySQL database

10
The evaluation benchmark (V)

Selecting TREC queries
Queries have to be formulated in a way suitable
for ontology-based search systems (informational
queries)
E.g., queries such as discuss the financial
aspects of retirement planning (topic 514) are
not selected
Ontologies must be available for the domain of
the query
We selected 20 queries
Adapting TREC queries

11
The evaluation benchmark (VI)

Populating ontologies from Wikipedia
The semantic resources available online are still
scarce and incomplete (Sabou, Gracia, Angeletou,
D'Anquin, Motta, 2007)
Generation of a simple semi-automatic
ontology-population mechanism
Populates ontology classes with new individuals
Extracts ontology relations for a specific
ontology individual
Uses Wikipedia lists and tables to extract this
information

12
The evaluation benchmark (VII)

Annotating documents with ontology entities
Identify ontology entities (classes, properties,
instances or literals) within the documents to
generate new annotations
Do not populate ontologies, but identify already
available semantic knowledge within the documents
Support annotation in open domain environments
(any document can be associated or linked to any
ontology without any predefined restriction).
This brings scalability limitations. To solve
them we propose to
Generate of ontology indices
Generate of document indices
Construct an annotation database which
stores non-embedded annotations

13
The evaluation benchmark (VIII)
Annotation based on contextual semantic
information

Ambiguities exploit ontologies as background
knowledge (increasing precision but reducing the
number of annotations)

14
Part II. Analyzing the evaluation benchmark

Motivation
Part I. The proposal a novel evaluation
benchmark
Reusing the TREC Web track document collection
Introducing the semantic layer
Reusing ontologies from the Web
Populating ontologies from Wikipedia
Annotating documents
Part II. Analyzing the evaluation benchmark
Using the benchmark to compare an ontology-based
search approach against traditional IR baselines
Experimental conditions
Results
Applications of the evaluation benchmark
Conclusions

15
Experimental conditions

Keyword-based search (Lucene)
Best TREC automatic search
Best TREC manual search
Semantic-based search (Fernandez, et al., 2008)

16
Results (I)
MAP mean average precision
P_at_10 precision at 10

Figures in bold correspond to best result for
each topic, excluding the best TREC manual
approach (because of the way it constructs the
query)

17
Results (II)

By P_at_10, the semantic retrieval outperforms the
other two approaches
It provides maximal quality for 55 of the
queries and it is only outperformed by both
Lucene and TREC in one query (511)
Semantic retrieval provides better results than
Lucene for 60 of the queries and equal for
another 20
Compared to the best TREC automatic engine, our
approach improves 65 of the queries and produces
comparable results in 5
By MAP, there is no clear winner
The average performance of TREC automatic is
greater than semantic retrieval.
Semantic retrieval outperforms TREC automatic in
50 of the queries and Lucene in 75
Bias in the MAP measure
More than half of the documents retrieved by the
semantic retrieval approach have not been rated
in the TREC judgments
The annotation technique used for the semantic
retrieval approach is very conservative (missing
potential correct annotations)

18
Results (III)

For some queries for which the keyword search
(Lucene) approach finds no relevant documents,
the semantic search does
queries 457 (Chevrolet trucks), 523 (facts about
the five main clouds) and 524 (how to erase
scar?)
In the queries in which the semantic retrieval
did not outperform the keyword baseline, the
semantic information obtained by the query
processing module was scarce
Still, overall, the keyword baseline only rarely
provides significantly better results than
semantic search
TREC Web search evaluation topics are conceived
for keyword-based search engines
With complex structured queries (involving
relationships), the performance of semantic
retrieval would improve significantly compared to
the keyword-based
The full capabilities of the semantic retrieval
model for formal semantic queries were not
exploited in this set of experiments

19
Results (IV)

Studying the impact of retrieved non-evaluated
documents
66 of the results returned by semantic retrieval
were not judged
P_at_10 not affected. Results in the first positions
have a higher probability of being evaluated
MAP evaluating the impact
Informal evaluation of the first 10 unevaluated
results returned for every query
89 of these results occur in the first 100
positions for their respective query
A significant portion, 31.5, of the documents we
judged turned out to be relevant
Even though this can not be generalized to all
the unevaluated results returned by the semantic
retrieval approach (the probability of being
relevant drops around the first 100 results and
then varies very little) we believe that the lack
of evaluations for all the results returned by
the semantic retrieval impairs its MAP value

20
Applications of the Benchmark

Goal How this benchmark can be applied to
evaluate other ontology-based search approaches?

21
Conclusions (I)

In the semantic search community, there is the
need of having standard evaluation benchmarks to
evaluate and compare ontology-based approaches
against each other, and against traditional IR
models
In this work, we have addressed two issues
Construction of a potentially widely applicable
ontology-based evaluation benchmark from
traditional IR datasets, such as the TREC Web
track reference collection
Use the benchmark to evaluate a specific
ontology-based search approach (Fernandez, et
al., 2008) against different traditional IR
models at a large scale

22
Conclusions (II)

Potential limitations of the above benchmark are
The need of ontology-based search systems to
participate in the pooling methodology to obtain
a better set of document judgments
The use of queries with a low level of
expressivity in terms of relations, more oriented
to traditional IR models
The scarceness of the publicly available semantic
information to cover the meanings involved in the
document search space
A common understanding of ontology-based search
in terms of inputs, outputs and scope should be
reached before achieving a real standardization
in the evaluation of ontology-based search models