Information%20Retrieval

About This Presentation

Title:

Information%20Retrieval

Description:

Fuzzy Information Retrieval ... Experiments with standard test collections are not available. Difficult to compare at this time ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 60

Provided by: bert196

Learn more at: http://www.csc.villanova.edu

Category:

more less

Transcript and Presenter's Notes

Title: Information%20Retrieval

1
Information Retrieval

Chap. 02 Modeling - Part 2
Slides from the text book author, modified by L N
Cassel
September 2003

2
Probabilistic Model

Objective to capture the IR problem using a
probabilistic framework
Given a user query, there is an ideal answer set
Querying as specification of the properties of
this ideal answer set (clustering)
But, what are these properties?
Guess at the beginning what they could be (i.e.,
guess initial description of ideal answer set)
Improve by iteration

3
Probabilistic Model

An initial set of documents is retrieved somehow
User inspects these docs looking for the relevant
ones (in truth, only top 10-20 need to be
inspected)
IR system uses this information to refine
description of ideal answer set
By repeating this process, it is expected that
the description of the ideal answer set will
improve
Have always in mind the need to guess at the very
beginning the description of the ideal answer set
Description of ideal answer set is modeled in
probabilistic terms

4
Probabilistic Ranking Principle

Given a user query q and a document dj, the
probabilistic model tries to estimate the
probability that the user will find the document
dj interesting (i.e., relevant).
The model assumes that this probability of
relevance depends on the query and the document
representations only.
Ideal answer set is referred to as R and should
maximize the probability of relevance. Documents
in the set R are predicted to be relevant.
But,
how to compute probabilities?
what is the sample space?

5
The Ranking

Probabilistic ranking computed as
sim(q,dj) P(dj relevant-to q) / P(dj
non-relevant-to q)
This is the odds of the document dj being
relevant
Taking the odds minimizes the probability of an
erroneous judgement
Definition
wij ? 0,1
P(R dj) probability that given document is
relevant
P(?R dj) probability document is not relevant

6
The Ranking

sim(dj,q) P(R dj) / P(?R dj) P(
dj R) P(R)
P( dj ?R) P(?R)
P( dj R)
P( dj ?R)
P( dj R) probability of randomly selecting
the document dj from the set R of relevant
documents
P(R) probability that a document selected at
random from the whole set of documents is
relevant. P(R) and P(?R) are the same.

7
The Ranking

sim(dj,q) P(dj R)
P(dj ?R) ?
P(ki R) ? P(?ki R) ?
P(ki ?R) ? P(?ki ?R)
P(ki R) probability that the index term ki is
present in a document randomly selected from the
set R of relevant documents

8
The Ranking

sim(dj,q) log ? P(ki R) ?
P(?kj R) ? P(ki ?R) ?
P(?kj ?R) K log ? P(ki
R) P(?ki R) log
? P(ki ?R) P(?ki ?R)
? wiq wij (log P(ki R)
log P(ki ?R) ) P(?ki R)
P(?ki ?R) where P(?ki R)
1 - P(ki R) P(?ki ?R) 1 - P(ki ?R)

9
The Initial Ranking

sim(dj,q) ? wiq wij (log
P(ki R) log P(ki ?R) )
P(?ki R) P(?ki ?R)
Probabilities P(ki R) and P(ki ?R) ?
Estimates based on assumptions
P(ki R) 0.5
P(ki ?R) ni N where ni is
the number of docs that contain ki
Use this initial guess to retrieve an initial
ranking
Improve upon this initial ranking

10
Improving the Initial Ranking

sim(dj,q) ? wiq wij (log
P(ki R) log P(ki ?R) )
P(?ki R) P(?ki ?R)
Let
V set of docs initially retrieved
Vi subset of docs retrieved that contain ki
Reevaluate estimates
P(ki R) Vi V
P(ki ?R) ni - Vi N - V
Repeat recursively

11
Improving the Initial Ranking

sim(dj,q) ? wiq wij (log
P(ki R) log P(ki ?R) )
P(?ki R) P(?ki ?R)
To avoid problems with V1 and Vi0
P(ki R) Vi 0.5 V 1
P(ki ?R) ni - Vi 0.5 N - V 1
Also,
P(ki R) Vi ni/N V 1
P(ki ?R) ni - Vi ni/N N - V 1

12
Pluses and Minuses

Advantages
Docs ranked in decreasing order of probability of
relevance
Disadvantages
need to guess initial estimates for P(ki R)
method does not take into account tf and idf
factors

13
Brief Comparison of Classic Models

Boolean model does not provide for partial
matches and is considered to be the weakest
classic model
Salton and Buckley did a series of experiments
that indicate that, in general, the vector model
outperforms the probabilistic model with general
collections
This seems also to be the view of the research
community

14
Set Theoretic Models

The Boolean model imposes a binary criterion for
deciding relevance
The question of how to extend the Boolean model
to accomodate partial matching and a ranking has
attracted considerable attention in the past
We discuss now two set theoretic models for this
Fuzzy Set Model
Extended Boolean Model (We will not discuss
because of time limitations)

15
Fuzzy Set Model

Queries and docs represented by sets of index
terms matching is approximate from the start
This vagueness can be modeled using a fuzzy
framework, as follows
with each term is associated a fuzzy set
each doc has a degree of membership in this fuzzy
set
This interpretation provides the foundation for
many models for IR based on fuzzy theory
In here, we discuss the model proposed by Ogawa,
Morita, and Kobayashi (1991)

16
Fuzzy Set Theory

Framework for representing classes whose
boundaries are not well defined
Key idea is to introduce the notion of a degree
of membership associated with the elements of a
set
This degree of membership varies from 0 to 1 and
allows modeling the notion of marginal membership
Thus, membership is now a gradual notion,
contrary to the crispy notion enforced by classic
Boolean logic

17
Fuzzy Set Theory

Definition
A fuzzy subset A of U is characterized by a
membership function ?(A,u) U ?
0,1 which associates with each element u of
U a number ?(u) in the interval 0,1
Definition
Let A and B be two fuzzy subsets of U. Also, let
A be the complement of A. Then,
?(A,u) 1 - ?(A,u)
?(A?B,u) max(?(A,u), ?(B,u))
?(A?B,u) min(?(A,u), ?(B,u))

18
Fuzzy Information Retrieval

Fuzzy sets are modeled based on a thesaurus
This thesaurus is built as follows
Let be a term-term correlation matrix
Let c(i,l) be a normalized correlation factor for
(ki,kl) c(i,l) n(i,l)
ni nl - n(i,l)
ni number of docs which contain ki
nl number of docs which contain kl
n(i,l) number of docs which contain both ki and
kl
We now have the notion of proximity among index
terms.

c
19
Fuzzy Information Retrieval

The correlation factor c(i,l) can be used to
define fuzzy set membership for a document dj as
follows ?(i,j) 1 - ? (1 -
c(i,l)) ki ? dj
?(i,j) membership of doc dj in fuzzy subset
associated with ki
The above expression computes an algebraic sum
over all terms in the doc dj (shown here as
complement of negated algebraic product)
A doc dj belongs to the fuzzy set for ki, if its
own terms are associated with ki

20
Fuzzy Information Retrieval

?(i,j) 1 - ? (1 - c(i,l))
ki ? dj
?(i,j) membership of doc dj in fuzzy subset
associated with ki
If doc dj contains a term kl which is closely
related to ki, we have
c(i,l) 1 (correlation between terms i, I is
high)
?(i,j) 1 (membership of term i in document j is
high)
index ki is a good fuzzy index for document j.

21
Fuzzy IR An Example

q ka ? (kb ? ?kc)
(1,1,1) (1,1,0) (1,0,0) cc1
cc2 cc3
?(q,dj) ?(cc1cc2cc3,j) 1 - (1
- ?(a,j) ?(b,j) ?(c,j)) (1 - ?(a,j)
?(b,j) (1-?(c,j))) (1 - ?(a,j)
(1-?(b,j)) (1-?(c,j)))

qdnf
(1,1,1)
(1,1,0)
(1,0,0)
Exercise - put some numbers in the areas and
calculate the actual value of ?(q,dj)
22
Fuzzy Information Retrieval

Fuzzy IR models have been discussed mainly in the
literature associated with fuzzy theory
Experiments with standard test collections are
not available
Difficult to compare at this time

23
Alternative Probabilistic Models

Probability Theory
Semantically clear
Computationally clumsy
Why Bayesian Networks?
Clear formalism to combine evidences
Modularize the world (dependencies)
Bayesian Network Models for IR
Inference Network (Turtle Croft, 1991)
Belief Network (Ribeiro-Neto Muntz, 1996)

24
Bayesian Inference

Basic Axioms
0 lt P(A) lt 1
P(sure)1
P(A V B)P(A)P(B) if A and B are mutually
exclusive

25
Bayesian Inference

Other formulations
P(A)P(A ? B)P(A ? B)
P(A) ??i P(A ? Bi) , where Bi,?i is a set of
exhaustive and mutually exclusive events
P(A) P(A) 1
P(AK) belief in A given the knowledge K
if P(AB)P(A), we sayA and B are independent
if P(AB ? C) P(AC), we say A and B are
conditionally independent, given C
P(A ? B)P(AB)P(B)
P(A) ??i P(A Bi)P(Bi)

26
Bayesian Inference

Bayes Rule the heart of Bayesian techniques
P(He) P(eH)P(H) / P(e)
Where, H a hypothesis and e is an
evidence
P(H) prior probability
P(He) posterior probability
P(eH) probability of e if H is true
P(e) a normalizing constant, then we
write
P(He) P(eH)P(H)

27
Bayesian Networks

Definition
Bayesian networks are directed acyclic graphs
(DAGS) in which the nodes represent random
variables, the arcs portray causal relationships
between these variables, and the strengths of
these causal influences are expressed by
conditional probabilities.

28
Bayes - resource

Look at
http//members.aol.com/johnp71/bayes.html

29
Bayesian Networks
yt
y1
y2

x

yi parent nodes (in this case, root nodes)
x child node
yi cause x
Y the set of parents of x
The influence of Y on x can be quantified by any
function
F(x,Y) such that ??x F(x,Y) 1
0 lt F(x,Y) lt 1
For example, F(x,Y)P(xY)

30
Bayesian Networks
x1

Given the dependencies declared
in a Bayesian Network, the
expression for the joint
probability can be computed as
a product of local conditional
probabilities, for example,
P(x1, x2, x3, x4, x5)
P(x1 ) P(x2 x1 ) P(x3 x1 ) P(x4 x2, x3 ) P(x5
x3 ).
P(x1 ) prior probability of the root node

x2
x3
x5
x4
31
Bayesian Networks
x1

In a Bayesian network each
variable x is conditionally
independent of all its
non-descendants, given its
parents.
For example
P(x4, x5 x2 , x3) P(x4 x2 , x3) P( x5 x3)

x3
x2
x4
x5
32
Inference Network Model

Epistemological view of the IR problem
Random variables associated with documents, index
terms and queries
A random variable associated with a document dj
represents the event of observing that document

33
Inference Network Model

Nodes
documents (dj)
index terms (ki)
queries (q, q1, and q2)
user information need (I)
Edges
from dj to its index term nodes ki indicate that
the observation of dj increase the belief in the
variables ki
.

34
Inference Network Model

dj has index terms k2, ki, and kt
q has index terms k1, k2, and ki
q1 and q2 model boolean formulation
q1((k1? k2) v ki)
I (q v q1)

35
Inference Network Model

Definitions
k1, dj,, and q random variables.
k(k1, k2, ...,kt) a t-dimensional vector
ki,?i?0, 1, then k has 2t possible states
dj,?j?0, 1 ?q?0, 1
The rank of a document dj is computed as P(q?
dj)
q and dj,are short representations for q1 and dj
1
(dj stands for a state where dj 1 and ?l?j ? dl
0, because we observe one document at a time)

36
Inference Network Model

P(q ? dj) ??k P(q ? dj k) P(k)
??k P(q ? dj ? k)
??k P(q dj ? k) P(dj ? k)
??k P(q k) P(k dj ) P( dj )
P((q ? dj)) 1 - P(q ? dj)

37
Inference Network Model

As the instantiation of dj makes all index term
nodes
mutually independent P(k dj ) can be a
product,then
P(q ? dj) ??k P(q k)
(??igi(k)1 P(ki dj ))
(??igi(k)0 P(ki dj))
P( dj )
remember that gi(k) 1 if ki1 in the
vector k
0 otherwise

38
Inference Network Model

The prior probability P(dj) reflects the
probability associated to the event of observing
a given document dj
Uniformly for N documents
P(dj) 1/N
P(dj) 1 - 1/N
Based on norm of the vector dj
P(dj) 1/dj
P(dj) 1 - 1/dj

39
Inference Network Model

For the Boolean Model
P(dj) 1/N
P(ki dj) 1 if gi(dj)1
0 otherwise
P(ki dj) 1 - P(ki dj)
? only nodes associated with the index terms of
the document dj are activated

40
Inference Network Model

For the Boolean Model
1 if ?qcc (qcc? qdnf) ? (? ki, gi(k)
gi(qcc)
P(q k)
0 otherwise
P(q k) 1 - P(q k)
? one of the conjunctive components of the
query must be matched by the active index terms
in k

41
Inference Network Model

For a tf-idf ranking strategy
P(dj) 1 / dj
P(dj) 1 - 1 / dj
? prior probability reflects the importance of
document normalization

42
Inference Network Model

For a tf-idf ranking strategy
P(ki dj) fi,j
P(ki dj) 1- fi,j
? the relevance of the a index term ki is
determined by its normalized term-frequency
factor fi,j freqi,j / max freql,j

43
Inference Network Model

For a tf-idf ranking strategy
Define a vector ki given by
ki k ((gi(k)1) ? (?j?i gj(k)0))
? in the state ki only the node ki is active
and all the others are inactive

44
Inference Network Model

For a tf-idf ranking strategy
idfi if k ki ? gi(q)1
P(q k)
0 if k ? ki v gi(q)0
P(q k) 1 - P(q k)
? we can sum up the individual contributions of
each index term by its normalized idf

45
Inference Network Model

For a tf-idf ranking strategy
As P(qk)0 ?k ? ki, we can rewrite P(q ? dj) as
P(q ? dj) ??ki P(q ki) P(ki dj )
(??ll?i P(kl dj)) P( dj )
(??i P(kl dj)) P( dj )
??ki P(ki dj ) P(q ki) / P(ki
dj)

46
Inference Network Model

For a tf-idf ranking strategy
Applying the previous probabilities we have
P(q ? dj) Cj (1/dj) ??i fi,j idfi
(1/(1- fi,j ))
? Cj vary from document to document
? the ranking is distinct of the one
provided by the vector model

47
Inference Network Model

Combining evidential source
Let I q v q1
P(I ? dj) ??k P(I k) P(k dj ) P( dj)
??k 1 - P(qk)P(q1 k) P(k dj
) P( dj)
? it might yield a retrieval performance which
surpasses the retrieval performance of the query
nodes in isolation (Turtle Croft)

48
Belief Network Model

As the Inference Network Model
Epistemological view of the IR problem
Random variables associated with documents, index
terms and queries
Contrary to the Inference Network Model
Clearly defined sample space
Set-theoretic view
Different network topology

49
Belief Network Model

The Probability Space
Define
Kk1, k2, ...,kt the sample space (a concept
space)
u ? K a subset of K (a concept)
ki an index term (an elementary concept)
k(k1, k2, ...,kt) a vector associated to each u
such that gi(k)1 ? ki ? u
ki a binary random variable associated with the
index term ki , (ki 1 ? gi(k)1 ? ki ? u)

50
Belief Network Model

A Set-Theoretic View
Define
a document dj and query q as concepts in K
a generic concept c in K
a probability distribution P over K, as
P(c)??uP(cu) P(u)
P(u)(1/2)t
P(c) is the degree of coverage of the space K
by c

51
Belief Network Model

Network topology
query side
document side

52
Belief Network Model

Assumption
P(djq) is adopted as the rank of the document dj
with respect to the query q. It reflects the
degree of coverage provided to the concept dj by
the concept q.

53
Belief Network Model

The rank of dj
P(djq) P(dj ? q) / P(q)
P(dj ? q)
??u P(dj ? q u) P(u)
??u P(dj u) P(q u) P(u)
??k P(dj k) P(q k) P(k)

54
Belief Network Model

For the vector model
Define
Define a vector ki given by
ki k ((gi(k)1) ? (?j?i gj(k)0))
? in the state ki only the node ki is active
and all the others are inactive

55
Belief Network Model

For the vector model
Define
(wi,q / q) if k ki ? gi(q)1
P(q k)
0 if k ? ki
v gi(q)0
P(q k) 1 - P(q k)
? (wi,q / q) is a normalized version of
weight of the index term ki in the query q

56
Belief Network Model

For the vector model
Define
(wi,j / dj) if k ki ? gi(dj)1
P(dj k)
0 if k ? ki v
gi(dj)0
P( dj k) 1 - P(dj k)
? (wi,j / dj) is a normalized version of
the weight of the index term
ki in the document d,j

57
Bayesian Network Models

Comparison
Inference Network Model is the first and well
known
Belief Network adopts a set-theoretic view
Belief Network adopts a clearly define sample
space
Belief Network provides a separation between
query and document portions
Belief Network is able to reproduce any ranking
produced by the Inference Network while the
converse is not true (for example the ranking of
the standard vector model)

58
Bayesian Network Models

Computational costs
Inference Network Model one document node at a
time then is linear on number of documents
Belief Network only the states that activate each
query term are considered
The networks do not impose additional costs
because the networks do not include cycles.

59
Bayesian Network Models