Title: Two-stage Language Models for Information Retrieval
1Two-stage Language Models for Information
Retrieval
- ChengXiang Zhai, John Lafferty
- School of Computer Science
- Carnegie Mellon University
New Address Department of Computer
Science University of Illinois, Urbana-Champaign
2Motivation
- Retrieval parameters are needed to
- model different user preferences
- customize a retrieval model according to
different queries and documents - So far, parameters have been set through
empirical experimentation - Can we set parameters automatically?
3Parameters in Traditional Models
- EXTERNAL to the model, hard to interpret
- Most parameters are introduced heuristically to
implement our intuition - As a result, no principles to quantify them
- Set through empirical experiments
- Lots of experimentation
- Optimality for new queries is not guaranteed
4Example of Parameter Tuning (Okapi)
k1, b and k3 are parameters which depend on the
nature of the queries and possibly on the
database k1 and b default to 1.2 and 0.75
respectively, but smaller values of b are
sometimes advantageous in long queries k3 is
often set to 7 or 1000 (effectively infinite).
(Robertson et al. 1999)
5The Way to Automatic Tuning ...
- Parameters must be PART of the model!
- Query modeling (explain difference in query)
- Document modeling (explain difference in doc)
- De-couple the influence of a query on parameter
setting from that of documents - To achieve stable setting of parameters
- To pre-compute query-independent parameters
6The Rest of the Talk
7The Risk Minimization Framework(Lafferty Zhai
01, Zhai 02)
8Parameter Setting in Risk Minimization
Query Language Model
Query
User
Loss Function
Documents
Document Language Models
9Two-stage Language Models
Query
Query Language Model
q
Loss Function
Smoothing!
d
Document Language Model
Doc
10Sensitivity in Traditional (one-stage)
Smoothing
Keyword
Verbose (sentence-like)
11The Need of Two-stage Smoothing (I) Accurate
Estimation of Doc Model
Language Model P(wd)
Document
Query data mining algorithms
text 10/5000.02 mining 3/5000.006 assocation
1/5000.002 algorithm 2/5000.004 data 0/5000
?
Text mining paper
p(q) p(datad)p(miningd)p(algorithmd
) 00.0060.004 0!
P(datad) ?
P(unicornd) ?
12The Need of Two-stage Smoothing (II)Explanation
of Noise in Query
p( algorithmsd1) p(algorithmd2) p(
datad1) lt p(datad2) p( miningd1) lt
p(miningd2) But p(qd1)gtp(qd2)!
We should make p(the) and p(for) less
different for all docs.
13Two-stage Dirichlet-Mixture Smoothing
14Estimating ? using leave-one-out
15Estimating ? using Mixture Model
16Effectiveness of Parameter Estimation
- Five databases
- News articles (AP, WSJ, ZIFF, FBIS, FT, LA)
- Government documents (Federal Register)
- Web pages
- Four types of queries
- Long vs. short
- Verbose (sentence-like) vs. keyword
- Results Automatic 2-stage ? Optimal 1-stage
17Automatic 2-stage results ? Optimal 1-stage
results
Average precision (3 DBs 4 query types, 150
topics)
18Automatic 2-stage results ? Optimal 1-stage
results
Average precision ( 2 large DBs 2 query types,
50 topics)
19Conclusions
- Two-stage language models
- Direct modeling of both queries and documents
- Parameters are part of a probabilistic model
- Parameters can be estimated using standard
estimation techniques - Two-stage Dirichlet-Mixture smoothing
- Involves two meaningful parameters (I.e.,
document sample size and query noise) - Achieves very good performance through
automatically setting smoothing parameters - It is possible to set parameters automatically!
20Future Work
- Optimality analysis in the two-stage parameter
space - Offline vs. online estimation
- Alternative estimation methods
- Parameter estimation for more sophisticated
language models (e.g., with feedback)
21Thank you!