Vector Space Model in Information Retrieval and related problems - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Vector Space Model in Information Retrieval and related problems

Description:

Measurement of similarity between documents ... Measurement of similarity between documents (Cont') term1. term2. document1. Cosine, dice etc ... – PowerPoint PPT presentation

Number of Views:191
Avg rating:3.0/5.0
Slides: 11
Provided by: GK53
Category:

less

Transcript and Presenter's Notes

Title: Vector Space Model in Information Retrieval and related problems


1
Vector Space Model in Information Retrieval and
related problems
  • 614 Fall/2004
  • YooJin Ha

2
What is Vector space model?
  • The possibility of viewing index terms as
  • corresponding to various dimensions of space
    and the documents as vectors in such a space
    (Salton, 1975)
  • The various information retrieval objects are
    modelled as elements of a vector space.
  • (terms, documents, queries, concepts) (Wong
    Raghvan)

3
Vector space of terms
(x, y)
  • Assumption1. There is no relation between terms
    (Orthogonality)
  • To give a document value in a vector space
  • which is composed of terms

document1
term1
term2
4
Presentation of documents in the vector space
d t a b c d e f
d1 1 1 0 0 0 1
d2 0 0 1 1 0 1
d3 1 0 1 0 1 0
If a document has a term such a (or others) it
can have a value 1 otherwise 0.
5
Measurement of similarity between documents
  • Assumption2 Nearer document to query is likely
    to satisfy the users information need
  • To calculate the distance (similarity
    coefficient) between documents in a vector space
    composed by n numbers of terms.
  • As applied this model to IR, we can get
  • ranking models for documents and queries.

6
Measurement of similarity between documents
(Cont)
document1
Document 2
  • Cosine, dice etc

term1
term2
7
Relate problems1 Orthogonal
  • OrthogonalThere are some terms are related not
    orthogonal
  • Latent Semantic Indexing
  • As applying eigen (egns) analysis, bound
  • some related terms together (age, weight)
  • L1 L2

a b c d e f g
d 0 0 1 1 1 0 0
8
Related problem2 Unit of analysis Word vs.
phrase
  • What can be the unit of analysis?
  • Whether word or phrase?
  • How define the word, term, phrase?
  • Frequency

9
Related Problem3 Polysemy Synonymy
  • Polysemy one word has many different meanings
  • Synonymy many other words have similar meanings
  • - Latent Semantic Indexing can be applied to
    reduce this problem

10
References
  • Salton. (1975). A Theory of Indexing.
    Philadelphia, PA Society for Industrial and
    Applied Mathematics.
  • Salton, Wong, Yang. (1975). A vector space
    model for automatic indexing. Communication of
    the ACM, 18. 613-620. (p.273).
  • Raghavan Wong. (1986). A critical analysis of
    the vector space model for information retrieval.
    JASIS, 37. 279-287.
Write a Comment
User Comments (0)
About PowerShow.com