Less is More Probabilistic Models for Retrieving Fewer Relevant Documents - PowerPoint PPT Presentation

About This Presentation

Title:

Less is More Probabilistic Models for Retrieving Fewer Relevant Documents

Description:

Less is More. Probabilistic Models for Retrieving Fewer ... Infinite number of possibilities, no formalism. Model, heuristics intertwined. Our approach ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 33

Provided by: harr60

Learn more at: http://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents

1
Less is MoreProbabilistic Models for Retrieving
Fewer Relevant Documents

Harr Chen, David R. Karger
MIT CSAIL
ACM SIGIR 2006
August 9, 2006

2
Outline

Motivations
Expected Metric Principle
Metrics
Bayesian Retrieval
Objectives
Heuristics
Experimental Results
Related Work
Future Work and Conclusions

3
Motivation

In IR, we have formal models, and formal metrics
Models provide framework for retrieval
E.g. Probabilistic
Metrics provide rigorous evaluation mechanism
E.g. Precision and recall
Probability ranking principle (PRP) provably
optimal for precision/recall
Ranking by probability of relevance
But other metrics capture other notions of result
set quality ? and PRP isnt necessarily optimal

4
Example Diversity

User may be satisfied with one relevant result
Navigational queries, question/answering
In this case, we want to hedge our bets by
retrieving for diversity in result set
Better to satisfy different users with different
interpretations, than one user many times over
Reciprocal rank/search length metrics capture
this notion
PRP is suboptimal

5
IR System Design

Metrics define preference ordering on result sets
MetricResult set 1 gt MetricResult set 2
?Result set 1 preferred to Result set 2
Traditional approach Try out heuristics that we
believe will improve relevance performance
Heuristics not directly motivated by metric
E.g. synonym expansion, psuedorelevance feedback
Observation Given a model, we can try to
directly optimize for some metric

6
Expected Metric Principle (EMP)

Knowing which metric to use tells us what to
maximize for the expected value of the metric
for each result set, given a model

Corpus
Result Sets
Calculate EMetric using model
Return set with max score
1, 2
Document 1
1, 3
2, 1
Document 2
2, 3
3, 1
Document 3
3, 2
7
Our Contributions

Primary EMP metric as retrieval goal
Metric designed to measure retrieval quality
Metrics we consider precision/recall _at_ n, search
length, reciprocal rank, instance recall, k-call
Build probabilistic model
Retrieve to maximize an objective the expected
value of metric
Expectations calculated according to our
probabilistic model
Use computational heuristics to make optimization
problem tractable
Secondary retrieving for diversity (special
case)
A natural side effect of optimizing for certain
metrics

8
Detour What is a Heuristic?

Ad hoc approach
Use heuristics that are believed to be correlated
with good performance
Heuristics used to improve relevance
Heuristics (probably) make system slower
Infinite number of possibilities, no formalism
Model, heuristics intertwined

Our approach
Build model that directly optimizes for good
performance
Heuristics used to improve efficiency
Heuristics (probably) make optimization worse
Well-known space of optimization techniques
Clean separation between model and heuristics

9
Our Contributions

Primary EMP metric as retrieval goal
Metric designed to measure retrieval quality
Metrics we consider precision/recall _at_ n, search
length, reciprocal rank, instance recall, k-call
Build probabilistic model
Retrieve to maximize an objective the expected
value of metric
Expectations calculated according to our
probabilistic model
Use computational heuristics to make optimization
problem tractable
Secondary retrieving for diversity (special
case)
A natural side effect of optimizing for certain
metrics

10
Search Length/Reciprocal Rank

(Mean) search length (MSL) number of irrelevant
results until first relevant
(Mean) reciprocal rank (MRR) one over rank of
first relevant

Search length 2 Reciprocal rank 1/3
11
Instance Recall

Each topic has multiple instances (subtopics,
aspects)
Instance recall is how many instances covered (in
union) over first n results

Instance recall _at_ 5 0.75
12
k-call _at_ n

Binary metric 1 if top n results has k relevant,
0 otherwise
1-call is (1 no)
See TREC robust track

1-call _at_ 5 1 2-call _at_ 5 1 3-call _at_ 5 0
13
Motivation for k-call

1-call Want one relevant document
Many queries satisfied with one relevant result
Only need one relevant document, more room to
explore ? promotes result set diversity
n-call Want all relevant documents
Perfect precision
Hone in on one interpretation and stick to it!
Intermediate k
Risk/reward tradeoff
Plus, easily modeled in our framework
Binary variable

14
Our Contributions

Primary EMP metric as retrieval goal
Metric designed to measure retrieval quality
Metrics we consider precision/recall _at_ n, search
length, reciprocal rank, instance recall, k-call
Build probabilistic model
Retrieve to maximize an objective the expected
value of metric
Expectations calculated according to our
probabilistic model
Use computational heuristics to make optimization
problem tractable
Secondary retrieving for diversity (special
case)
A natural side effect of optimizing for certain
metrics

15
Bayesian Retrieval Model

There exists distributions that generate relevant
documents, irrelevant documents
PRP rank by
Remaining modeling questions form of rel/irrel
distributions and parameters for those
distributions
In this paper, we assume multinomial models, and
choose parameters by maximum a posteriori
Prior is background corpus word distribution

16
Our Contributions

Primary EMP metric as retrieval goal
Metric designed to measure retrieval quality
Metrics we consider precision/recall _at_ n, search
length, reciprocal rank, instance recall, k-call
Build probabilistic model
Retrieve to maximize an objective the expected
value of metric
Expectations calculated according to our
probabilistic model
Use computational heuristics to make optimization
problem tractable
Secondary retrieving for diversity (special
case)
A natural side effect of optimizing for certain
metrics

17
Objective

Probability Ranking Principle (PRP) maximize
at each step in ranking
Expected Metric Principle (EMP) maximize
for complete result
set
In particular for k-call, maximize

18
Our Contributions

Primary EMP metric as retrieval goal
Metric designed to measure retrieval quality
Metrics we consider precision/recall _at_ n, search
length, reciprocal rank, instance recall, k-call
Build probabilistic model
Retrieve to maximize an objective the expected
value of metric
Expectations calculated according to our
probabilistic model
Use computational heuristics to make optimization
problem tractable
Secondary retrieving for diversity (special
case)
A natural side effect of optimizing for certain
metrics

19
Optimization of Objective

Exact optimization of objective is usually
NP-hard
E.g. Exact optimization for k-call reducible to
NP-hard maximum graph clique problem
Approximation heuristic Greedy algorithm
Select documents successively in rank order
Hold previous documents fixed, optimize objective
at each rank

Maximize Emetric d
d1
20
Optimization of Objective

Exact optimization of objective is usually
NP-hard
E.g. Exact optimization for k-call reducible to
NP-hard maximum graph clique problem
Approximation heuristic Greedy algorithm
Select documents successively in rank order
Hold previous documents fixed, optimize objective
at each rank

Fixed
d1
Maximize Emetric d, d1
d2
21
Optimization of Objective

Exact optimization of objective is usually
NP-hard
E.g. Exact optimization for k-call reducible to
NP-hard maximum graph clique problem
Approximation heuristic Greedy algorithm
Select documents successively in rank order
Hold previous documents fixed, optimize objective
at each rank

Fixed
d1
Fixed
d2
Maximize Emetric d, d1, d2
d3
22
Greedy on 1-call and n-call

1-greedy
Greedy algorithm reduces to ranking each
successive document assuming all previous
documents are irrelevant
Algorithm has discovered incremental negative
pseudorelevance feedback
n-greedy Assume all previous documents relevant

23
Greedy on Other Metrics

Greedy with precision/recall ? reduces to PRP!
Greedy on k-call for general k (k-greedy)
More complicated
Greedy with MSL, MRR, instance recall works out
to 1-greedy algorithm
Intuition to make first relevant document appear
earlier, we want to hedge our bets as to query
interpretation (i.e., diversify)

24
Experiments Overview

Experiments verify that optimizing for metric
improves performance on metric
They do not tell us which metrics to use
Looked at ad hoc diversity examples
TREC topics/queries
Tuned weights on separate development set
Tested on
Standard ad hoc (robust track) topics
Topics with multiple annotators
Topics with multiple instances

25
Diversity on Google Results

Task reranking top 1,000 Google results
In optimizing 1-call, our algorithm finds more
diverse results than PRP, Google results

26
Experiments Robust Track

TREC 2003, 2004 robust tracks
249 topics
528,000 documents
1-call, 10-call results statistically significant

27
Experiments Instance Retrieval

TREC-6,7,8 interactive tracks
20 topics
210,000 documents
7 to 56 instances per topic
PRP baseline instance recall _at_ 10 0.234
Greedy 1-call instance recall _at_ 10 0.315

28
Experiments Multi-annotator

TREC-4,6 ad hoc retrieval
Independent annotators assessed same topics
TREC-4 49 topics, 568,000 documents, 3
annotators
TREC-6 50 topics, 556,000 documents, 2
annotators
? More annotators more satisfied using 1-greedy

29
Related Work

Fits in risk minimization framework (objective as
negative loss function)
Other approaches look at optimizing for metrics
directly, with training data
Pseudorelevance feedback
Subtopic retrieval
Maximal marginal relevance
Clustering
See paper for references

30
Future Work

General k-call (k 2, etc.)
Determination if this is what users want
Better underlying probabilistic model
Our contribution is in the ranking objective, not
the model ? model can be arbitrarily
sophisticated
Better optimization techniques
E.g., Local search would differentiate algorithms
for MRR and 1-call
Other metrics
Preliminary work on mean average precision,
precision _at_ recall
(Perhaps) surprisingly, these metrics are not
optimized by PRP!

31
Conclusions

EMP Metric can motivate model choosing and
believing in a metric already gives us a
reasonable objective, Emetric
Can potentially apply EMP on top of a variety of
different underlying probabilistic models
Diversity is one practical example of a natural
side effect of using EMP with the right metric

32
Acknowledgments

Harr Chen supported by the Office of Naval
Research through a National Defense Science and
Engineering Graduate Fellowship
Jaime Teevan, Susan Dumais, and anonymous
reviewers provided constructive feedback
ChengXiang Zhai, William Cohen, and Ellen
Voorhees provided code and data

Write a Comment

User Comments (0)