Title: Term Feedback for Information Retrieval with Language Models
1Term Feedback for Information Retrieval with
Language Models
- Bin Tan, Atulya Velivelli, Hui Fang,
- ChengXiang Zhai
- SIGIR 07
- University of Illinois at Urbana-Champaign
2Outline
- Background on language models
- Term feedback vs. document feedback
- Probabilistic term feedback methods
- Evaluation
- Conclusions
3Kullback-Leibler Divergence Retrieval Method
Lafferty Zhai 01
Document d
A text mining paper
Query q
Data mining
4Improve ?q through Feedback (e.g., Zhai
Lafferty 01, Lavrenko Croft 01)
Results d1 3.5 d2 2.4 dk 0.5 ...
Retrieval Engine
User
Query
Document collection
5Problems with Doc-Based Feedback
- A relevant document may contain non-relevant
parts
- None of the top-ranked documents is relevant
- User indirectly controls the learned query model
6What about Term Feedback?
- Present a list of terms to a user and asks for
judgments ?
- More direct contribution to estimating ?q
- Works even when no relevant document on top
- Challenges
- How do we select terms to present to a user?
- How do we exploit term feedback to improve our
estimate of ?q ?
7Improve ?q with Term Feedback
d1 3.5 d2 2.4 ...
Retrieval Engine
Query
User
Document collection
8Feedback Term Selection
- General (old) idea
- The original query is used for an initial
retrieval run
- Feedback terms are selected from top N documents
- New idea
- Model subtopics
- Select terms to represent every subtopic well
- Benefits
- Avoid bias in term feedback
- Infer relevant subtopics, thus achieve subtopic
feedback
9User-Guided Query Model Refinement
- - -
User Explored area
Document space
10Collaborative Estimation of ?q
TFB P(t1?TFB)0.2 P(t3?TFB)0.1
Original ?q P(w?q)
qq
q
d1 d2 d3 dN
top N docs ranked by D(qq qd)
CFB P(w ?CFB)0.2P(w?1) 0.1P(w?2)
11Discovering Subtopic Clusters with PLSA Hofmann
99, Zhai et al. 04
Query transportation tunnel disaster
Document d
Maximum Likelihood Estimator (EM Algorithm)
12Selecting Representative Terms
- Original query terms excluded
- Shared terms assigned to most likely clusters
13User Interface for Term Feedback
Cluster 1
Cluster 3
Cluster 2
Cluster 1
Cluster 2
Cluster 3
Cluster 1
Cluster 2
14Experiment Setup
- TREC 2005 HARD Track
- AQUAINT corpus (3GB)
- 50 hard query topics
- NIST assessors spend up to 3 min on each topic
providing feedback using Clarification Form (CF)
- Submitted CFs 1x48, 3x16, 6x8
- Baseline KL-divergence retrieval method with 5
pseudo-feedback docs
- 48 terms generated from top 60 docs of baseline
15Retrieval Accuracy Comparison
- 1C 1x48 3C 3x16 6C 6x8
- (except for CFB1C) Baseline
- CFB1C user feedback plays no role
16MAP variation with the number of presented terms
terms12 1x12/3x4/6x2
17Clarification Form Completion Time
More than half completed in just 1 min
18Term Relevance Judgment Quality
Term relevance
Zaragoza et al. 04
s0 1.0
19 Had the User Checked all Relevant Terms
20Comparison to Relevance Feedback
MAP equivalence TCFB3C Rel FB with 5 docs
21Term Feedback Help Difficult Topics
No rel docs In top 5
22Related Work
- Early work Harman 88, Spink 94, Koenemann
Belkin 96
- More recent Ruthven03, Anick03,
- Main differences
- Language model
- Consistently effective
23Conclusions and Future Work
- A novel way of improving query model estimation
through term feedback
- active feedback based on subtopics
- user-system collaboration
- achieves large performance improvement over
non-feedback baseline with small amount of user
effort
- can compete with relevance feedback, esp. in a
situation when the latter is unable to help
- To explore more complex interaction processes
- Combination of term feedback and relevance
feedback
- Incremental feedback
24Contributions
- Feedback terms for user judgment are selected
considering both
- Single-term information value (relative frequency
in top documents)
- Relationship among terms (formation of topic
clusters)
- (neglected in most previous work)
- Query model is updated according to
- Terms judged relevant by the user
- Clusters containing relevant terms
- (most previous work simply appends feedback
terms to the original query, often without
different weighting)
- Construction of a refined query model needs
careful term selection to maximize information
gained from feedback active feedback
25Feedback Term Selection (cont.)
- Assume the documents are generated by a
background model qB and K topic clusters qk
- Use EM to estimate the cluster models
i-th cluster model
Background model to explain common words
Mixing weight of the i-th cluster in d
LqB, qi, pd,i
26Feedback Term Selection (cont.)
- A cross-collection mixture model for comparative
text mining ---C. Zhai etc. SIGKDD 04
27(No Transcript)
28Query Model Estimation from Feedback TFB
- TFB (direct Term FeedBack)
- Terms judged as relevant (checked terms) are used
for query expansion
- Original query terms have weight m relevant
terms have weight 1 non-relevant terms have
weight 0
- For long query, the original part is more
important
indicator var. of whether w is judged as relevant
total relevant terms
29Query Model Estimation from Feedback TFB (cont.)
- TFB trusts (only) terms that are judged as
relevant
- Terms not presented to the user (under top L in
each cluster) and terms unchecked by the user
(possibly overlooked) are ignored
- But if a non-presented / unchecked term is in a
cluster that has many relevant terms, it is
likely to be relevant too
probably (5/9) relevant cluster
overlooked by user
Non-presented cab , transport, accid,
disast, mile, italy
30Query Model Estimation from Feedback CFB
- CFB (Cluster FeedBack)
- Cluster models qk are mixed with weights
proportional to their likelihood of being
relevant P(Rq, qk)
- It is often hard for a user to directly estimate
cluster relevance (esp. if cluster quality is
poor)
- Thus cluster relevance is inferred from term
relevance
- Roughly proportional to number of relevant terms
in the cluster
relevant terms in cluster i
cluster model
orig. query model
total relevant terms
31Query Model Estimation from Feedback CFB (cont.)
- CFB ignores whether a term is judged as relevant
or not (only its cluster membership matters)
- But if a term is explicitly indicated as
relevant, it is more likely to be relevant than
other terms (non-presented unchecked)
fire/truck/smoke/car/victim are more likely than
others in the cluster
Cluster 3 1/9
Cluster 1 3/9
Cluster 2 5/9
32Query Model Estimation from Feedback TCFB
- TCFB (Term-Cluster FeedBack) interpolation
TFB model
CFB model