Gravitation-Based Model For Information Retrieval - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Gravitation-Based Model For Information Retrieval

Description:

There are some problems faced by information retrieval models: Most IR models fail to satisfy even some basic intuitive heuristic ... Okapi's BM25 formula: ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 28
Provided by: DBL94
Category:

less

Transcript and Presenter's Notes

Title: Gravitation-Based Model For Information Retrieval


1
Gravitation-Based Model For Information Retrieval
  • Shuming Shi, Ji-Rong Wen, Qing Yu, Ruihua Song,
    Wei-Ying MaMicrosoft Research Asia, 49 Zhichun
    Road, Beijing, 100080, P.R. China
  • Department of Computer Science, Beijing Institute
    of Technology, Beijing, P.R. China

2
  1. Introduction
  2. Background and related work
  3. Gravitation-based model
  4. Experiments

3
Introduction
  • There are some problems faced by information
    retrieval models
  • Most IR models fail to satisfy even some basic
    intuitive heuristic constraint.
  • Many IR models are commonly lack intuitive
    interpretations, especially physical
    interpretations.

4
  • The relationship between a query and a document
    is modeled as the attractive force between them.

5
Background and related work
  • Newtons theory of gravitation

6
Information retrieval perspectives and models
  • Vector space model
  • Pivoted normalization weighting model formula
  • s is between 0.0 to 1.0

7
  • Probability model
  • Okapis BM25 formula
  • w(t) has the potential negative IDF issue, and
    we will use ln((N 1) / df(t))
  • Language model

8
Structured document retrieval
  • A document is said to be structured when it
    contains multiple fields.
  • The most commonly used approach for structured
    document retrieval may be score/rank (linear)
    combination, which treats each field as a
    separate document and computes scores/ranks for
    them.

9
Gravitation-based model
  • The GBM tries to understand the IR problem within
    the physical framework.
  • Particle
  • Particle has two attributes with it type, and
    mass.
  • Two particles of the same type have some kind of
    gravitation force between them
  • ?

10
  • Term object
  • Two shapes will be discussed in this paper the
    sphere, and the ideal cylinder

11
  • Document object
  • ? explicit terms, include all the terms in
    the content of the document. (D)
  • ? implicit terms represent the hidden
    meaning of the document. (H(D))
  • The mass and the diameter of a document is the
    mass and diameter of all its seen and hidden
    terms.

12
  • A query is modeled as an object composed of its
    terms.
  • Relevance as gravitation force.

13
The discrete version of the model
14
  • The relevance of a document given a query is
    defined by the attractive force between them when
    the document is in its optimal-term placement
    state.
  • ?
  • ?

15
  • We give some straight-forward and natural
    simplicity assumptions on which the estimation is
    based.
  • m(t) the global importance of term t in the
    collection.

16
  • Combine assumption 1 and assumption 2
  • ?

17
  • ?
  • ?
  • ß1/(1?)

18
  • e(D) is the diameter per term when document D has
    average length.

m0
?
19
The continuous version of the model
  • The model description for the discrete GBM model
    also applies to the continuous version.

20
(No Transcript)
21
  • A family of effective ranking formulas

22
Exploiting document structures
  1. Terms will compete for places and the ones have
    larger gravitation forces will get nearer to the
    query terms.
  2. The relevance of a document given a query is
    defined by the maximal gravitation force between
    them.

23
(No Transcript)
24
Experiments
  • We use mean average precision to evaluate the
    query results.

25
Term weighting experiments
26
Field structure experiments
27
Conclusion and future work
  • In this paper, a gravitation-based IR model was
    proposed by adopting a physical perspective on
    information retrieval.
  • Its interesting to study the relationship and
    possible combination between GBM and existing
    models.
  • This model has potential to include term
    proximity and static document ranking.
Write a Comment
User Comments (0)
About PowerShow.com