Information%20Retrieval - PowerPoint PPT Presentation

About This Presentation
Title:

Information%20Retrieval

Description:

Fuzzy Information Retrieval ... Experiments with standard test collections are not available. Difficult to compare at this time ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 60
Provided by: bert196
Category:

less

Transcript and Presenter's Notes

Title: Information%20Retrieval


1
Information Retrieval
  • Chap. 02 Modeling - Part 2
  • Slides from the text book author, modified by L N
    Cassel
  • September 2003

2
Probabilistic Model
  • Objective to capture the IR problem using a
    probabilistic framework
  • Given a user query, there is an ideal answer set
  • Querying as specification of the properties of
    this ideal answer set (clustering)
  • But, what are these properties?
  • Guess at the beginning what they could be (i.e.,
    guess initial description of ideal answer set)
  • Improve by iteration

3
Probabilistic Model
  • An initial set of documents is retrieved somehow
  • User inspects these docs looking for the relevant
    ones (in truth, only top 10-20 need to be
    inspected)
  • IR system uses this information to refine
    description of ideal answer set
  • By repeating this process, it is expected that
    the description of the ideal answer set will
    improve
  • Have always in mind the need to guess at the very
    beginning the description of the ideal answer set
  • Description of ideal answer set is modeled in
    probabilistic terms

4
Probabilistic Ranking Principle
  • Given a user query q and a document dj, the
    probabilistic model tries to estimate the
    probability that the user will find the document
    dj interesting (i.e., relevant).
  • The model assumes that this probability of
    relevance depends on the query and the document
    representations only.
  • Ideal answer set is referred to as R and should
    maximize the probability of relevance. Documents
    in the set R are predicted to be relevant.
  • But,
  • how to compute probabilities?
  • what is the sample space?

5
The Ranking
  • Probabilistic ranking computed as
  • sim(q,dj) P(dj relevant-to q) / P(dj
    non-relevant-to q)
  • This is the odds of the document dj being
    relevant
  • Taking the odds minimizes the probability of an
    erroneous judgement
  • Definition
  • wij ? 0,1
  • P(R dj) probability that given document is
    relevant
  • P(?R dj) probability document is not relevant

6
The Ranking
  • sim(dj,q) P(R dj) / P(?R dj) P(
    dj R) P(R)
    P( dj ?R) P(?R)
    P( dj R)
    P( dj ?R)
  • P( dj R) probability of randomly selecting
    the document dj from the set R of relevant
    documents
  • P(R) probability that a document selected at
    random from the whole set of documents is
    relevant. P(R) and P(?R) are the same.

7
The Ranking
  • sim(dj,q) P(dj R)
    P(dj ?R) ?
    P(ki R) ? P(?ki R) ?
    P(ki ?R) ? P(?ki ?R)
  • P(ki R) probability that the index term ki is
    present in a document randomly selected from the
    set R of relevant documents

8
The Ranking
  • sim(dj,q) log ? P(ki R) ?
    P(?kj R) ? P(ki ?R) ?
    P(?kj ?R) K log ? P(ki
    R) P(?ki R) log
    ? P(ki ?R) P(?ki ?R)
    ? wiq wij (log P(ki R)
    log P(ki ?R) ) P(?ki R)
    P(?ki ?R) where P(?ki R)
    1 - P(ki R) P(?ki ?R) 1 - P(ki ?R)

9
The Initial Ranking
  • sim(dj,q) ? wiq wij (log
    P(ki R) log P(ki ?R) )
    P(?ki R) P(?ki ?R)
  • Probabilities P(ki R) and P(ki ?R) ?
  • Estimates based on assumptions
  • P(ki R) 0.5
  • P(ki ?R) ni N where ni is
    the number of docs that contain ki
  • Use this initial guess to retrieve an initial
    ranking
  • Improve upon this initial ranking

10
Improving the Initial Ranking
  • sim(dj,q) ? wiq wij (log
    P(ki R) log P(ki ?R) )
    P(?ki R) P(?ki ?R)
  • Let
  • V set of docs initially retrieved
  • Vi subset of docs retrieved that contain ki
  • Reevaluate estimates
  • P(ki R) Vi V
  • P(ki ?R) ni - Vi N - V
  • Repeat recursively

11
Improving the Initial Ranking
  • sim(dj,q) ? wiq wij (log
    P(ki R) log P(ki ?R) )
    P(?ki R) P(?ki ?R)
  • To avoid problems with V1 and Vi0
  • P(ki R) Vi 0.5 V 1
  • P(ki ?R) ni - Vi 0.5 N - V 1
  • Also,
  • P(ki R) Vi ni/N V 1
  • P(ki ?R) ni - Vi ni/N N - V 1

12
Pluses and Minuses
  • Advantages
  • Docs ranked in decreasing order of probability of
    relevance
  • Disadvantages
  • need to guess initial estimates for P(ki R)
  • method does not take into account tf and idf
    factors

13
Brief Comparison of Classic Models
  • Boolean model does not provide for partial
    matches and is considered to be the weakest
    classic model
  • Salton and Buckley did a series of experiments
    that indicate that, in general, the vector model
    outperforms the probabilistic model with general
    collections
  • This seems also to be the view of the research
    community

14
Set Theoretic Models
  • The Boolean model imposes a binary criterion for
    deciding relevance
  • The question of how to extend the Boolean model
    to accomodate partial matching and a ranking has
    attracted considerable attention in the past
  • We discuss now two set theoretic models for this
  • Fuzzy Set Model
  • Extended Boolean Model (We will not discuss
    because of time limitations)

15
Fuzzy Set Model
  • Queries and docs represented by sets of index
    terms matching is approximate from the start
  • This vagueness can be modeled using a fuzzy
    framework, as follows
  • with each term is associated a fuzzy set
  • each doc has a degree of membership in this fuzzy
    set
  • This interpretation provides the foundation for
    many models for IR based on fuzzy theory
  • In here, we discuss the model proposed by Ogawa,
    Morita, and Kobayashi (1991)

16
Fuzzy Set Theory
  • Framework for representing classes whose
    boundaries are not well defined
  • Key idea is to introduce the notion of a degree
    of membership associated with the elements of a
    set
  • This degree of membership varies from 0 to 1 and
    allows modeling the notion of marginal membership
  • Thus, membership is now a gradual notion,
    contrary to the crispy notion enforced by classic
    Boolean logic

17
Fuzzy Set Theory
  • Definition
  • A fuzzy subset A of U is characterized by a
    membership function ?(A,u) U ?
    0,1 which associates with each element u of
    U a number ?(u) in the interval 0,1
  • Definition
  • Let A and B be two fuzzy subsets of U. Also, let
    A be the complement of A. Then,
  • ?(A,u) 1 - ?(A,u)
  • ?(A?B,u) max(?(A,u), ?(B,u))
  • ?(A?B,u) min(?(A,u), ?(B,u))

18
Fuzzy Information Retrieval
  • Fuzzy sets are modeled based on a thesaurus
  • This thesaurus is built as follows
  • Let be a term-term correlation matrix
  • Let c(i,l) be a normalized correlation factor for
    (ki,kl) c(i,l) n(i,l)
    ni nl - n(i,l)
  • ni number of docs which contain ki
  • nl number of docs which contain kl
  • n(i,l) number of docs which contain both ki and
    kl
  • We now have the notion of proximity among index
    terms.

c
19
Fuzzy Information Retrieval
  • The correlation factor c(i,l) can be used to
    define fuzzy set membership for a document dj as
    follows ?(i,j) 1 - ? (1 -
    c(i,l)) ki ? dj
  • ?(i,j) membership of doc dj in fuzzy subset
    associated with ki
  • The above expression computes an algebraic sum
    over all terms in the doc dj (shown here as
    complement of negated algebraic product)
  • A doc dj belongs to the fuzzy set for ki, if its
    own terms are associated with ki

20
Fuzzy Information Retrieval
  • ?(i,j) 1 - ? (1 - c(i,l))
    ki ? dj
  • ?(i,j) membership of doc dj in fuzzy subset
    associated with ki
  • If doc dj contains a term kl which is closely
    related to ki, we have
  • c(i,l) 1 (correlation between terms i, I is
    high)
  • ?(i,j) 1 (membership of term i in document j is
    high)
  • index ki is a good fuzzy index for document j.

21
Fuzzy IR An Example
  • q ka ? (kb ? ?kc)
  • (1,1,1) (1,1,0) (1,0,0) cc1
    cc2 cc3
  • ?(q,dj) ?(cc1cc2cc3,j) 1 - (1
    - ?(a,j) ?(b,j) ?(c,j)) (1 - ?(a,j)
    ?(b,j) (1-?(c,j))) (1 - ?(a,j)
    (1-?(b,j)) (1-?(c,j)))

qdnf
(1,1,1)
(1,1,0)
(1,0,0)
Exercise - put some numbers in the areas and
calculate the actual value of ?(q,dj)
22
Fuzzy Information Retrieval
  • Fuzzy IR models have been discussed mainly in the
    literature associated with fuzzy theory
  • Experiments with standard test collections are
    not available
  • Difficult to compare at this time

23
Alternative Probabilistic Models
  • Probability Theory
  • Semantically clear
  • Computationally clumsy
  • Why Bayesian Networks?
  • Clear formalism to combine evidences
  • Modularize the world (dependencies)
  • Bayesian Network Models for IR
  • Inference Network (Turtle Croft, 1991)
  • Belief Network (Ribeiro-Neto Muntz, 1996)

24
Bayesian Inference
  • Basic Axioms
  • 0 lt P(A) lt 1
  • P(sure)1
  • P(A V B)P(A)P(B) if A and B are mutually
    exclusive

25
Bayesian Inference
  • Other formulations
  • P(A)P(A ? B)P(A ? B)
  • P(A) ??i P(A ? Bi) , where Bi,?i is a set of
    exhaustive and mutually exclusive events
  • P(A) P(A) 1
  • P(AK) belief in A given the knowledge K
  • if P(AB)P(A), we sayA and B are independent
  • if P(AB ? C) P(AC), we say A and B are
    conditionally independent, given C
  • P(A ? B)P(AB)P(B)
  • P(A) ??i P(A Bi)P(Bi)

26
Bayesian Inference
  • Bayes Rule the heart of Bayesian techniques
  • P(He) P(eH)P(H) / P(e)
  • Where, H a hypothesis and e is an
    evidence
  • P(H) prior probability
  • P(He) posterior probability
  • P(eH) probability of e if H is true
  • P(e) a normalizing constant, then we
    write
  • P(He) P(eH)P(H)

27
Bayesian Networks
  • Definition
  • Bayesian networks are directed acyclic graphs
    (DAGS) in which the nodes represent random
    variables, the arcs portray causal relationships
    between these variables, and the strengths of
    these causal influences are expressed by
    conditional probabilities.

28
Bayes - resource
  • Look at
  • http//members.aol.com/johnp71/bayes.html

29
Bayesian Networks
yt
y1
y2

x
  • yi parent nodes (in this case, root nodes)
  • x child node
  • yi cause x
  • Y the set of parents of x
  • The influence of Y on x can be quantified by any
    function
  • F(x,Y) such that ??x F(x,Y) 1
  • 0 lt F(x,Y) lt 1
  • For example, F(x,Y)P(xY)

30
Bayesian Networks
x1
  • Given the dependencies declared
  • in a Bayesian Network, the
  • expression for the joint
  • probability can be computed as
  • a product of local conditional
  • probabilities, for example,
  • P(x1, x2, x3, x4, x5)
  • P(x1 ) P(x2 x1 ) P(x3 x1 ) P(x4 x2, x3 ) P(x5
    x3 ).
  • P(x1 ) prior probability of the root node

x2
x3
x5
x4
31
Bayesian Networks
x1
  • In a Bayesian network each
  • variable x is conditionally
  • independent of all its
  • non-descendants, given its
  • parents.
  • For example
  • P(x4, x5 x2 , x3) P(x4 x2 , x3) P( x5 x3)

x3
x2
x4
x5
32
Inference Network Model
  • Epistemological view of the IR problem
  • Random variables associated with documents, index
    terms and queries
  • A random variable associated with a document dj
    represents the event of observing that document

33
Inference Network Model
  • Nodes
  • documents (dj)
  • index terms (ki)
  • queries (q, q1, and q2)
  • user information need (I)
  • Edges
  • from dj to its index term nodes ki indicate that
    the observation of dj increase the belief in the
    variables ki
  • .

34
Inference Network Model
  • dj has index terms k2, ki, and kt
  • q has index terms k1, k2, and ki
  • q1 and q2 model boolean formulation
  • q1((k1? k2) v ki)
  • I (q v q1)

35
Inference Network Model
  • Definitions
  • k1, dj,, and q random variables.
  • k(k1, k2, ...,kt) a t-dimensional vector
  • ki,?i?0, 1, then k has 2t possible states
  • dj,?j?0, 1 ?q?0, 1
  • The rank of a document dj is computed as P(q?
    dj)
  • q and dj,are short representations for q1 and dj
    1
  • (dj stands for a state where dj 1 and ?l?j ? dl
    0, because we observe one document at a time)

36
Inference Network Model
  • P(q ? dj) ??k P(q ? dj k) P(k)
  • ??k P(q ? dj ? k)
  • ??k P(q dj ? k) P(dj ? k)
  • ??k P(q k) P(k dj ) P( dj )
  • P((q ? dj)) 1 - P(q ? dj)

37
Inference Network Model
  • As the instantiation of dj makes all index term
    nodes
  • mutually independent P(k dj ) can be a
    product,then
  • P(q ? dj) ??k P(q k)
  • (??igi(k)1 P(ki dj ))
  • (??igi(k)0 P(ki dj))
  • P( dj )
  • remember that gi(k) 1 if ki1 in the
    vector k
  • 0 otherwise

38
Inference Network Model
  • The prior probability P(dj) reflects the
    probability associated to the event of observing
    a given document dj
  • Uniformly for N documents
  • P(dj) 1/N
  • P(dj) 1 - 1/N
  • Based on norm of the vector dj
  • P(dj) 1/dj
  • P(dj) 1 - 1/dj

39
Inference Network Model
  • For the Boolean Model
  • P(dj) 1/N
  • P(ki dj) 1 if gi(dj)1
  • 0 otherwise
  • P(ki dj) 1 - P(ki dj)
  • ? only nodes associated with the index terms of
    the document dj are activated

40
Inference Network Model
  • For the Boolean Model
  • 1 if ?qcc (qcc? qdnf) ? (? ki, gi(k)
    gi(qcc)
  • P(q k)
  • 0 otherwise
  • P(q k) 1 - P(q k)
  • ? one of the conjunctive components of the
    query must be matched by the active index terms
    in k

41
Inference Network Model
  • For a tf-idf ranking strategy
  • P(dj) 1 / dj
  • P(dj) 1 - 1 / dj
  • ? prior probability reflects the importance of
    document normalization

42
Inference Network Model
  • For a tf-idf ranking strategy
  • P(ki dj) fi,j
  • P(ki dj) 1- fi,j
  • ? the relevance of the a index term ki is
    determined by its normalized term-frequency
    factor fi,j freqi,j / max freql,j

43
Inference Network Model
  • For a tf-idf ranking strategy
  • Define a vector ki given by
  • ki k ((gi(k)1) ? (?j?i gj(k)0))
  • ? in the state ki only the node ki is active
    and all the others are inactive

44
Inference Network Model
  • For a tf-idf ranking strategy
  • idfi if k ki ? gi(q)1
  • P(q k)
  • 0 if k ? ki v gi(q)0
  • P(q k) 1 - P(q k)
  • ? we can sum up the individual contributions of
    each index term by its normalized idf

45
Inference Network Model
  • For a tf-idf ranking strategy
  • As P(qk)0 ?k ? ki, we can rewrite P(q ? dj) as
  • P(q ? dj) ??ki P(q ki) P(ki dj )
  • (??ll?i P(kl dj)) P( dj )
  • (??i P(kl dj)) P( dj )
  • ??ki P(ki dj ) P(q ki) / P(ki
    dj)

46
Inference Network Model
  • For a tf-idf ranking strategy
  • Applying the previous probabilities we have
  • P(q ? dj) Cj (1/dj) ??i fi,j idfi
    (1/(1- fi,j ))
  • ? Cj vary from document to document
  • ? the ranking is distinct of the one
    provided by the vector model

47
Inference Network Model
  • Combining evidential source
  • Let I q v q1
  • P(I ? dj) ??k P(I k) P(k dj ) P( dj)
  • ??k 1 - P(qk)P(q1 k) P(k dj
    ) P( dj)
  • ? it might yield a retrieval performance which
    surpasses the retrieval performance of the query
    nodes in isolation (Turtle Croft)

48
Belief Network Model
  • As the Inference Network Model
  • Epistemological view of the IR problem
  • Random variables associated with documents, index
    terms and queries
  • Contrary to the Inference Network Model
  • Clearly defined sample space
  • Set-theoretic view
  • Different network topology

49
Belief Network Model
  • The Probability Space
  • Define
  • Kk1, k2, ...,kt the sample space (a concept
    space)
  • u ? K a subset of K (a concept)
  • ki an index term (an elementary concept)
  • k(k1, k2, ...,kt) a vector associated to each u
    such that gi(k)1 ? ki ? u
  • ki a binary random variable associated with the
    index term ki , (ki 1 ? gi(k)1 ? ki ? u)

50
Belief Network Model
  • A Set-Theoretic View
  • Define
  • a document dj and query q as concepts in K
  • a generic concept c in K
  • a probability distribution P over K, as
  • P(c)??uP(cu) P(u)
  • P(u)(1/2)t
  • P(c) is the degree of coverage of the space K
    by c

51
Belief Network Model
  • Network topology
  • query side
  • document side

52
Belief Network Model
  • Assumption
  • P(djq) is adopted as the rank of the document dj
    with respect to the query q. It reflects the
    degree of coverage provided to the concept dj by
    the concept q.

53
Belief Network Model
  • The rank of dj
  • P(djq) P(dj ? q) / P(q)
  • P(dj ? q)
  • ??u P(dj ? q u) P(u)
  • ??u P(dj u) P(q u) P(u)
  • ??k P(dj k) P(q k) P(k)

54
Belief Network Model
  • For the vector model
  • Define
  • Define a vector ki given by
  • ki k ((gi(k)1) ? (?j?i gj(k)0))
  • ? in the state ki only the node ki is active
    and all the others are inactive

55
Belief Network Model
  • For the vector model
  • Define
  • (wi,q / q) if k ki ? gi(q)1
  • P(q k)
  • 0 if k ? ki
    v gi(q)0
  • P(q k) 1 - P(q k)
  • ? (wi,q / q) is a normalized version of
    weight of the index term ki in the query q

56
Belief Network Model
  • For the vector model
  • Define
  • (wi,j / dj) if k ki ? gi(dj)1
  • P(dj k)
  • 0 if k ? ki v
    gi(dj)0
  • P( dj k) 1 - P(dj k)
  • ? (wi,j / dj) is a normalized version of
    the weight of the index term
    ki in the document d,j

57
Bayesian Network Models
  • Comparison
  • Inference Network Model is the first and well
    known
  • Belief Network adopts a set-theoretic view
  • Belief Network adopts a clearly define sample
    space
  • Belief Network provides a separation between
    query and document portions
  • Belief Network is able to reproduce any ranking
    produced by the Inference Network while the
    converse is not true (for example the ranking of
    the standard vector model)

58
Bayesian Network Models
  • Computational costs
  • Inference Network Model one document node at a
    time then is linear on number of documents
  • Belief Network only the states that activate each
    query term are considered
  • The networks do not impose additional costs
    because the networks do not include cycles.

59
Bayesian Network Models
  • Impact
  • The major strength is net combination of distinct
    evidential sources to support the rank of a given
    document.
Write a Comment
User Comments (0)
About PowerShow.com