Two-stage Language Models for Information Retrieval

About This Presentation

Title:

Two-stage Language Models for Information Retrieval

Description:

Two-stage Language Models for Information Retrieval ChengXiang Zhai, John Lafferty School of Computer Science Carnegie Mellon University New Address – PowerPoint PPT presentation

Number of Views:151

Avg rating:3.0/5.0

Slides: 22

Provided by: Alex4306

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Two-stage Language Models for Information Retrieval

1
Two-stage Language Models for Information
Retrieval

ChengXiang Zhai, John Lafferty
School of Computer Science
Carnegie Mellon University

New Address Department of Computer
Science University of Illinois, Urbana-Champaign
2
Motivation

Retrieval parameters are needed to
model different user preferences
customize a retrieval model according to
different queries and documents
So far, parameters have been set through
empirical experimentation
Can we set parameters automatically?

3
Parameters in Traditional Models

EXTERNAL to the model, hard to interpret
Most parameters are introduced heuristically to
implement our intuition
As a result, no principles to quantify them
Set through empirical experiments
Lots of experimentation
Optimality for new queries is not guaranteed

4
Example of Parameter Tuning (Okapi)
k1, b and k3 are parameters which depend on the
nature of the queries and possibly on the
database k1 and b default to 1.2 and 0.75
respectively, but smaller values of b are
sometimes advantageous in long queries k3 is
often set to 7 or 1000 (effectively infinite).

(Robertson et al. 1999)
5
The Way to Automatic Tuning ...

Parameters must be PART of the model!
Query modeling (explain difference in query)
Document modeling (explain difference in doc)
De-couple the influence of a query on parameter
setting from that of documents
To achieve stable setting of parameters
To pre-compute query-independent parameters

6
The Rest of the Talk
7
The Risk Minimization Framework(Lafferty Zhai
01, Zhai 02)
8
Parameter Setting in Risk Minimization
Query Language Model
Query
User
Loss Function
Documents
Document Language Models
9
Two-stage Language Models
Query
Query Language Model
q
Loss Function
Smoothing!
d
Document Language Model
Doc
10
Sensitivity in Traditional (one-stage)
Smoothing
Keyword
Verbose (sentence-like)
11
The Need of Two-stage Smoothing (I) Accurate
Estimation of Doc Model
Language Model P(wd)
Document
Query data mining algorithms
text 10/5000.02 mining 3/5000.006 assocation
1/5000.002 algorithm 2/5000.004 data 0/5000
?
Text mining paper
p(q) p(datad)p(miningd)p(algorithmd
) 00.0060.004 0!
P(datad) ?
P(unicornd) ?
12
The Need of Two-stage Smoothing (II)Explanation
of Noise in Query
p( algorithmsd1) p(algorithmd2) p(
datad1) lt p(datad2) p( miningd1) lt
p(miningd2) But p(qd1)gtp(qd2)!
We should make p(the) and p(for) less
different for all docs.
13
Two-stage Dirichlet-Mixture Smoothing
14
Estimating ? using leave-one-out
15
Estimating ? using Mixture Model
16
Effectiveness of Parameter Estimation

Five databases
News articles (AP, WSJ, ZIFF, FBIS, FT, LA)
Government documents (Federal Register)
Web pages
Four types of queries
Long vs. short
Verbose (sentence-like) vs. keyword
Results Automatic 2-stage ? Optimal 1-stage

17
Automatic 2-stage results ? Optimal 1-stage
results
Average precision (3 DBs 4 query types, 150
topics)
18
Automatic 2-stage results ? Optimal 1-stage
results
Average precision ( 2 large DBs 2 query types,
50 topics)
19
Conclusions

Two-stage language models
Direct modeling of both queries and documents
Parameters are part of a probabilistic model
Parameters can be estimated using standard
estimation techniques
Two-stage Dirichlet-Mixture smoothing
Involves two meaningful parameters (I.e.,
document sample size and query noise)
Achieves very good performance through
automatically setting smoothing parameters
It is possible to set parameters automatically!

20
Future Work

Optimality analysis in the two-stage parameter
space
Offline vs. online estimation
Alternative estimation methods
Parameter estimation for more sophisticated
language models (e.g., with feedback)

21
Thank you!

Write a Comment

User Comments (0)