Probabilistic Information Retrieval Part II: In Depth - PowerPoint PPT Presentation

About This Presentation
Title:

Probabilistic Information Retrieval Part II: In Depth

Description:

Binary Independence Retrieval (BIR) Estimating the probabilities ... 'Independence': terms occur in documents independently ... term independence. terms not in ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 22
Provided by: ale110
Learn more at: http://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Information Retrieval Part II: In Depth


1
Probabilistic Information RetrievalPart II In
Depth
  • Alexander Dekhtyar
  • Department of Computer Science
  • University of Maryland

2
In this part
  • Probability Ranking Principle
  • simple case
  • case with retrieval costs
  • Binary Independence Retrieval (BIR)
  • Estimating the probabilities
  • Binary Independence Indexing (BII)
  • dual to BIR

3
The Basics
  • Bayesian probability formulas
  • Odds

4
The Basics
  • Document Relevance
  • Note

5
Probability Ranking Principle
  • Simple case no selection costs.
  • x is relevant iff p(Rx) gt p(NRx)
  • (Bayes Decision Rule)
  • PRP in action Rank all documents by p(Rx).

6
Probability Ranking Principle
  • More complex case retrieval costs.
  • C - cost of retrieval of relevant document
  • C - cost of retrieval of non-relevant document
  • let d, be a document
  • Probability Ranking Principle if
  • for all d not yet retrieved, then d is the next
    document to be retrieved

7
Next Binary Independence Model
8
Binary Independence Model
  • Traditionally used in conjunction with PRP
  • Binary Boolean documents are represented as
    binary vectors of terms
  • iff term i is present in document
    x.
  • Independence terms occur in documents
    independently
  • Different documents can be modeled as same
    vector.

9
Binary Independence Model
  • Queries binary vectors of terms
  • Given query q,
  • for each document d need to compute p(Rq,d).
  • replace with computing p(Rq,x) where x is vector
    representing d
  • Interested only in ranking
  • Will use odds

10
Binary Independence Model
  • Using Independence Assumption

11
Binary Independence Model
  • Since xi is either 0 or 1

Then...
12
Binary Independence Model
13
Binary Independence Model
14
Binary Independence Model
  • All boils down to computing RSV.

So, how do we compute cis from our data ?
15
Binary Independence Model
  • Estimating RSV coefficients.
  • For each term i look at the following table

16
PRP and BIR The lessons
  • Getting reasonable approximations of
    probabilities is possible.
  • Simple methods work only with restrictive
    assumptions
  • term independence
  • terms not in query do not affect the outcome
  • boolean representation of documents/queries
  • document relevance values are independent
  • Some of these assumptions can be removed

17
Next Binary Independence Indexing
18
Binary Independence Indexing vs. Binary
Independence Retrieval
  • BIR
  • BII
  • Many Documents, One Query
  • Bayesian Probability
  • Varies document representation
  • Constant query (representation)
  • One Document, Many Queries
  • Bayesian Probability
  • Varies query
  • Constant document

19
Binary Independence Indexing
  • Learnng from queries
  • More queries better results
  • p(qx,R) - probability that if document x had
    been deemed relevant, query q had been asked
  • The rest of the framework is similar to BIR

20
Binary Independence IndexingKey Assumptions
  • Term occurrence in queries is conditionally
    independent
  • Relevance of document representation x w.r.t.
    query q depends only on the terms present in the
    query (qi1)
  • For each term i not used in representation x of
    document d (xi0)
  • only positive occurrences of terms count

21
Binary Independence Indexing
Write a Comment
User Comments (0)
About PowerShow.com