Two-stage Language Models for Information Retrieval - PowerPoint PPT Presentation

About This Presentation
Title:

Two-stage Language Models for Information Retrieval

Description:

Two-stage Language Models for Information Retrieval ChengXiang Zhai*, John Lafferty School of Computer Science Carnegie Mellon University *New Address – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 22
Provided by: Alex4306
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Two-stage Language Models for Information Retrieval


1
Two-stage Language Models for Information
Retrieval
  • ChengXiang Zhai, John Lafferty
  • School of Computer Science
  • Carnegie Mellon University

New Address Department of Computer
Science University of Illinois, Urbana-Champaign
2
Motivation
  • Retrieval parameters are needed to
  • model different user preferences
  • customize a retrieval model according to
    different queries and documents
  • So far, parameters have been set through
    empirical experimentation
  • Can we set parameters automatically?

3
Parameters in Traditional Models
  • EXTERNAL to the model, hard to interpret
  • Most parameters are introduced heuristically to
    implement our intuition
  • As a result, no principles to quantify them
  • Set through empirical experiments
  • Lots of experimentation
  • Optimality for new queries is not guaranteed

4
Example of Parameter Tuning (Okapi)
k1, b and k3 are parameters which depend on the
nature of the queries and possibly on the
database k1 and b default to 1.2 and 0.75
respectively, but smaller values of b are
sometimes advantageous in long queries k3 is
often set to 7 or 1000 (effectively infinite).

(Robertson et al. 1999)
5
The Way to Automatic Tuning ...
  • Parameters must be PART of the model!
  • Query modeling (explain difference in query)
  • Document modeling (explain difference in doc)
  • De-couple the influence of a query on parameter
    setting from that of documents
  • To achieve stable setting of parameters
  • To pre-compute query-independent parameters

6
The Rest of the Talk
7
The Risk Minimization Framework(Lafferty Zhai
01, Zhai 02)
8
Parameter Setting in Risk Minimization
Query Language Model
Query
User
Loss Function
Documents
Document Language Models
9
Two-stage Language Models
Query
Query Language Model
q
Loss Function
Smoothing!
d
Document Language Model
Doc
10
Sensitivity in Traditional (one-stage)
Smoothing
Keyword
Verbose (sentence-like)
11
The Need of Two-stage Smoothing (I) Accurate
Estimation of Doc Model
Language Model P(wd)
Document
Query data mining algorithms
text 10/5000.02 mining 3/5000.006 assocation
1/5000.002 algorithm 2/5000.004 data 0/5000
?
Text mining paper
p(q) p(datad)p(miningd)p(algorithmd
) 00.0060.004 0!
P(datad) ?
P(unicornd) ?
12
The Need of Two-stage Smoothing (II)Explanation
of Noise in Query
p( algorithmsd1) p(algorithmd2) p(
datad1) lt p(datad2) p( miningd1) lt
p(miningd2) But p(qd1)gtp(qd2)!
We should make p(the) and p(for) less
different for all docs.
13
Two-stage Dirichlet-Mixture Smoothing
14
Estimating ? using leave-one-out
15
Estimating ? using Mixture Model
16
Effectiveness of Parameter Estimation
  • Five databases
  • News articles (AP, WSJ, ZIFF, FBIS, FT, LA)
  • Government documents (Federal Register)
  • Web pages
  • Four types of queries
  • Long vs. short
  • Verbose (sentence-like) vs. keyword
  • Results Automatic 2-stage ? Optimal 1-stage

17
Automatic 2-stage results ? Optimal 1-stage
results
Average precision (3 DBs 4 query types, 150
topics)
18
Automatic 2-stage results ? Optimal 1-stage
results
Average precision ( 2 large DBs 2 query types,
50 topics)
19
Conclusions
  • Two-stage language models
  • Direct modeling of both queries and documents
  • Parameters are part of a probabilistic model
  • Parameters can be estimated using standard
    estimation techniques
  • Two-stage Dirichlet-Mixture smoothing
  • Involves two meaningful parameters (I.e.,
    document sample size and query noise)
  • Achieves very good performance through
    automatically setting smoothing parameters
  • It is possible to set parameters automatically!

20
Future Work
  • Optimality analysis in the two-stage parameter
    space
  • Offline vs. online estimation
  • Alternative estimation methods
  • Parameter estimation for more sophisticated
    language models (e.g., with feedback)

21
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com