Using TREC for crosscomparison between classic IR and ontologybased search models at a Web scale Mir - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Using TREC for crosscomparison between classic IR and ontologybased search models at a Web scale Mir

Description:

Using TREC for cross-comparison between classic IR and ontology-based search ... queries 457 (Chevrolet trucks), 523 (facts about the five main clouds) and 524 ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 24
Provided by: ivncan
Category:

less

Transcript and Presenter's Notes

Title: Using TREC for crosscomparison between classic IR and ontologybased search models at a Web scale Mir


1
Using TREC for cross-comparison between classic
IR and ontology-based search models at a Web
scaleMiriam Fernández1, Vanessa López2, Marta
Sabou2, Victoria Uren2, David Vallet1, Enrico
Motta2, Pablo Castells1  Semantic Search
2009 Workshop (SemSearch 2009)18th International
World Wide Web Conference (WWW 2009)21st April
2009, Madrid
2
Table of contents
  • Motivation
  • Part I. The proposal a novel evaluation
    benchmark
  • Reusing the TREC Web track document collection
  • Introducing the semantic layer
  • Reusing ontologies from the Web
  • Populating ontologies from Wikipedia
  • Annotating documents
  • Part II. Analyzing the evaluation benchmark
  • Using the benchmark to compare an ontology-based
    search approach against traditional IR baselines
  • Experimental conditions
  • Results
  • Applications of the evaluation benchmark
  • Conclusions

3
Motivation (I)
  • Problem How can semantic search systems be
    evaluated and compared with standard IR systems
    to study whether and how semantic search engines
    offer competitive advantages?
  • Traditional IR evaluation
  • Evaluation methodologies generally based on the
    Cranfield paradigm (Cleverdon, 1967)
  • Documents, queries and judgments
  • Well-known retrieval performance metrics
  • Precision, Recall, P_at_10, Average Precision (AP),
    Mean Average Precision (MAP)
  • Wide initiatives like TREC to create and use
    standard evaluation collections, methodologies
    and metrics
  • The evaluation methods are systematic, easily
    reproducible, and scalable

4
Motivation (II)
  • Ontology-based search evaluation
  • Ontology-based search approaches
  • Introduction of a new semantic search space
    (ontologies and KBs)
  • Change in the IR vision (input, output, scope)
  • The evaluation methods rely on user-centered
    studies, and therefore they tend to be high-cost,
    non-scalable and difficult to reproduce
  • There is still a long way to define standard
    evaluation benchmarks for assessing the quality
    of ontology-based search approaches
  • Goal develop a new reusable evaluation benchmark
    for cross-comparison between classic IR and
    ontology-based models on a significant scale

5
Part I. The proposal
  • Motivation
  • Part I. The proposal a novel evaluation
    benchmark
  • Reusing the TREC Web track document collection
  • Introducing the semantic layer
  • Reusing ontologies from the Web
  • Populating ontologies from Wikipedia
  • Annotating documents
  • Part II. Analyzing the evaluation benchmark
  • Using the benchmark to compare an ontology-based
    search approach against traditional IR baselines
  • Experimental conditions
  • Results
  • Part III. Applications of the evaluation
    benchmark
  • Conclusions

6
The evaluation benchmark (I)
  • A benchmark collection for cross-comparison
    between classic IR and ontology-based search
    models at a large scale should comprise five main
    components
  • a set of documents,
  • a set of topics or queries,
  • a set of relevance judgments (or lists of
    relevant documents for each topic),
  • a set of semantic resources, ontologies and KBs,
    which provide the need semantic information for
    ontology-based approaches.
  • a set of annotations that associate the semantic
    resources with the document collection (not
    needed for all ontology-based search approaches)

7
The evaluation benchmark (II)
  • Start from a well-known standard IR evaluation
    benchmark
  • Reuse of the TREC Web track collection used in
    the TREC 9 and TREC 2001 editions of the TREC
    conference
  • Document collection WT10g (Bailey, Craswell,
    Hawking, 2003). About 10GB in size.1.69 million
    Web pages
  • The TREC topics and judgments for this text
    collection are provided with the TREC 9 and TREC
    2001 datasets

8
The evaluation benchmark (III)
  • Construct the semantic search space
  • In order to fulfill Web-like conditions, all the
    semantic search information should be available
    online
  • The selected semantic information should cover,
    or partially cover, the domains involved in the
    TREC query set
  • The selected semantic resources should be
    completed with a larger set of random ontologies
    and KBs to approximate a fair scenario
  • If the semantic information available online has
    to be extended in order to cover the TREC
    queries, this must be done with information
    sources which are completely independent from the
    document collection, and available online

9
The evaluation benchmark (IV)
  • Document collection
  • TREC WT10G
  • Queries and judgments
  • TREC 9 and TREC 2001 test corpora
  • 100 queries with their corresponding judgments
  • 20 queries selected and adapted to be used by a
    NLP QA query processing module
  • Ontologies
  • 40 public ontologies covering a subset of the
    TREC domains and queries (370 files comprising
    400MB of RDF, OWL and DAML)
  • 100 additional repositories (2GB of RDF and OWL)
  • Knowledge Bases
  • Some of the 40 selected ontologies have been
    semi-automatically populated from Wikipedia
  • Annotations
  • 1.2 108 non-embedded annotations generated and
    stored in a MySQL database

10
The evaluation benchmark (V)
  • Selecting TREC queries
  • Queries have to be formulated in a way suitable
    for ontology-based search systems (informational
    queries)
  • E.g., queries such as discuss the financial
    aspects of retirement planning (topic 514) are
    not selected
  • Ontologies must be available for the domain of
    the query
  • We selected 20 queries
  • Adapting TREC queries

11
The evaluation benchmark (VI)
  • Populating ontologies from Wikipedia
  • The semantic resources available online are still
    scarce and incomplete (Sabou, Gracia, Angeletou,
    D'Anquin, Motta, 2007)
  • Generation of a simple semi-automatic
    ontology-population mechanism
  • Populates ontology classes with new individuals
  • Extracts ontology relations for a specific
    ontology individual
  • Uses Wikipedia lists and tables to extract this
    information

12
The evaluation benchmark (VII)
  • Annotating documents with ontology entities
  • Identify ontology entities (classes, properties,
    instances or literals) within the documents to
    generate new annotations
  • Do not populate ontologies, but identify already
    available semantic knowledge within the documents
  • Support annotation in open domain environments
    (any document can be associated or linked to any
    ontology without any predefined restriction).
  • This brings scalability limitations. To solve
    them we propose to
  • Generate of ontology indices
  • Generate of document indices
  • Construct an annotation database which
  • stores non-embedded annotations

13
The evaluation benchmark (VIII)
Annotation based on contextual semantic
information
  • Ambiguities exploit ontologies as background
    knowledge (increasing precision but reducing the
    number of annotations)

14
Part II. Analyzing the evaluation benchmark
  • Motivation
  • Part I. The proposal a novel evaluation
    benchmark
  • Reusing the TREC Web track document collection
  • Introducing the semantic layer
  • Reusing ontologies from the Web
  • Populating ontologies from Wikipedia
  • Annotating documents
  • Part II. Analyzing the evaluation benchmark
  • Using the benchmark to compare an ontology-based
    search approach against traditional IR baselines
  • Experimental conditions
  • Results
  • Applications of the evaluation benchmark
  • Conclusions

15
Experimental conditions
  • Keyword-based search (Lucene)
  • Best TREC automatic search
  • Best TREC manual search
  • Semantic-based search (Fernandez, et al., 2008)

16
Results (I)
MAP mean average precision
P_at_10 precision at 10
  • Figures in bold correspond to best result for
    each topic, excluding the best TREC manual
    approach (because of the way it constructs the
    query)

17
Results (II)
  • By P_at_10, the semantic retrieval outperforms the
    other two approaches
  • It provides maximal quality for 55 of the
    queries and it is only outperformed by both
    Lucene and TREC in one query (511)
  • Semantic retrieval provides better results than
    Lucene for 60 of the queries and equal for
    another 20
  • Compared to the best TREC automatic engine, our
    approach improves 65 of the queries and produces
    comparable results in 5
  • By MAP, there is no clear winner
  • The average performance of TREC automatic is
    greater than semantic retrieval.
  • Semantic retrieval outperforms TREC automatic in
    50 of the queries and Lucene in 75
  • Bias in the MAP measure
  • More than half of the documents retrieved by the
    semantic retrieval approach have not been rated
    in the TREC judgments
  • The annotation technique used for the semantic
    retrieval approach is very conservative (missing
    potential correct annotations)

18
Results (III)
  • For some queries for which the keyword search
    (Lucene) approach finds no relevant documents,
    the semantic search does
  • queries 457 (Chevrolet trucks), 523 (facts about
    the five main clouds) and 524 (how to erase
    scar?)
  • In the queries in which the semantic retrieval
    did not outperform the keyword baseline, the
    semantic information obtained by the query
    processing module was scarce
  • Still, overall, the keyword baseline only rarely
    provides significantly better results than
    semantic search
  • TREC Web search evaluation topics are conceived
    for keyword-based search engines
  • With complex structured queries (involving
    relationships), the performance of semantic
    retrieval would improve significantly compared to
    the keyword-based
  • The full capabilities of the semantic retrieval
    model for formal semantic queries were not
    exploited in this set of experiments

19
Results (IV)
  • Studying the impact of retrieved non-evaluated
    documents
  • 66 of the results returned by semantic retrieval
    were not judged
  • P_at_10 not affected. Results in the first positions
    have a higher probability of being evaluated
  • MAP evaluating the impact
  • Informal evaluation of the first 10 unevaluated
    results returned for every query
  • 89 of these results occur in the first 100
    positions for their respective query
  • A significant portion, 31.5, of the documents we
    judged turned out to be relevant
  • Even though this can not be generalized to all
    the unevaluated results returned by the semantic
    retrieval approach (the probability of being
    relevant drops around the first 100 results and
    then varies very little) we believe that the lack
    of evaluations for all the results returned by
    the semantic retrieval impairs its MAP value

20
Applications of the Benchmark
  • Goal How this benchmark can be applied to
    evaluate other ontology-based search approaches?

21
Conclusions (I)
  • In the semantic search community, there is the
    need of having standard evaluation benchmarks to
    evaluate and compare ontology-based approaches
    against each other, and against traditional IR
    models
  • In this work, we have addressed two issues
  • Construction of a potentially widely applicable
    ontology-based evaluation benchmark from
    traditional IR datasets, such as the TREC Web
    track reference collection
  • Use the benchmark to evaluate a specific
    ontology-based search approach (Fernandez, et
    al., 2008) against different traditional IR
    models at a large scale

22
Conclusions (II)
  • Potential limitations of the above benchmark are
  • The need of ontology-based search systems to
    participate in the pooling methodology to obtain
    a better set of document judgments
  • The use of queries with a low level of
    expressivity in terms of relations, more oriented
    to traditional IR models
  • The scarceness of the publicly available semantic
    information to cover the meanings involved in the
    document search space
  • A common understanding of ontology-based search
    in terms of inputs, outputs and scope should be
    reached before achieving a real standardization
    in the evaluation of ontology-based search models

23
Thank you! ? http//nets.ii.uam.es/miriam/thesis
.pdf (chapter 6)http//nets.ii.uam.es/publicatio
ns/icsc08.pdf
Write a Comment
User Comments (0)
About PowerShow.com