Discussion Class 4 - PowerPoint PPT Presentation

About This Presentation
Title:

Discussion Class 4

Description:

(a) Why does term weighting using within document frequency improve ranking? ... (a) tf.idf and PageRank are based on fundamentally different considerations. ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 10
Provided by: wya1
Category:

less

Transcript and Presenter's Notes

Title: Discussion Class 4


1
Discussion Class 4
  • Ranking

2
Discussion Classes
Format Question Ask a member of the class to
answer Provide opportunity for others to
comment When answering Give your name. Make
sure that the TA hears it. Stand up Speak
clearly so that all the class can hear
3
Question 1 Inverted Document Frequency (IDF)
In class, we first introduced Salton's original
term weighting, known as Inverted Document
Frequency wik fik N/dk The reading gives
Sparck Jones's term weighting, Inverted Document
Frequency (IDF) IDFi log2 (N/ni) 1 or IDFi
log2 (maxn/ni) 1 What is the relationship
between these alternatives?

4
Q1 (continued) Definitions of Terms
wik weight given to term k in document i fik
frequency with which term k appears in document
i dk number of documents that contain term k N
number of documents in the collection ni total
number of occurrences of term i in the
collection maxn maximum frequency of any term in
the collection

5
Question 2 Inverted Files
"The use of a ranking system instead of a Boolean
retrieval system has several important
implications for supporting inverted file
systems." Discuss the implications of (a)
Adjacency operators (b) Stemming and stoplists
6
Question 3 Operations on Inverted Files
Consider a search of a large set of documents
with the query vector space methods in
information retrieval (a) What are the steps
that the search process must go through? (b)
Where would you expect the computation impact to
be greatest? (c) How can the inverted file
system be organized to minimize the computation?
7
Question 4 Within-Document Frequency
(a) Why does term weighting using within
document frequency improve ranking? (b) Why is
it useful to normalize within-document
frequency? (c) Explain Croft's
normalization cfreqij K (1 - K)
freqij/maxfreqj (d) How does Salton and Buckley's
recommendation term weighting fit with Croft's
normalization?
8
Question 4 (continued) Salton/Buckley
Recommendation
where
and wij freqij x IDFj
freqiq frequency of term i in query q
maxfreqq maximum frequency of any term in query
q IDFi IDF of term i in entire
collection freqij frequency of term i
in document j
9
Question 5 tf.idf compared with Google PageRank
(a) tf.idf and PageRank are based on
fundamentally different considerations. What are
the fundamental differences? (b) Under which
circumstances would you expect each to excel?
Write a Comment
User Comments (0)
About PowerShow.com