Keith van Rijsbergen - PowerPoint PPT Presentation

About This Presentation
Title:

Keith van Rijsbergen

Description:

Commercial successes and failures. Caveats. Why we have survived. ... 1983 - Okapi started. 1985 RIAO-1. 1986 CvR logic model. 1990 Deerwester et al,LSI paper ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 36
Provided by: Kei117
Category:
Tags: keith | okapi | rijsbergen | van

less

Transcript and Presenter's Notes

Title: Keith van Rijsbergen


1
(No Transcript)
2
Landmarks in Information Retrieval
the message out of the bottle
Keith van Rijsbergen Tampere 12th August, 2002
3
Introductory Remarks
  • Exclusions IE, TM, ..
  • Commercial successes and failures
  • Caveats
  • Why we have survived.
  • Where we were, where we are, where we are going.

4
Pre-history
Smee (1850) Wells (1936) Bush (1945) Bagley
(1951) MIT Fairthorne (1945-52)
RAE Luhn (1958) Mooers (1952)
5
Experimental Methodology
Cleverdon Cranfield Lancaster Medlars Keen Cra
nfield/Smart Saracevic CWRU Salton Smart Sparck
Jones Ideal Test Collection Blair
Maron Stairs Harman TREC
6
Evaluation
ABNO/OBNA (Fairthorne) Precision, Recall
-gt trade-off (Cleverdon) Probabilistic versions
(Swets) Measure-theoretic (Bollman)
7
the world in 1980 according to Belver Griffith
Who is missing?
8
Landmarks
Luhns tf weighting Architecture Relevance
Feedback Stemming Poisson Model -gt
BM25 Statistical weighting tfidf Various models
9
Luhns curve
10
Fictive Objects
Information Problem
Representation
Representation
Indexed Objects
Query
Compare
What about evaluation?
11
Architecture (Brenda Gerrie, 1983)
12
Time I (highlights for me)
13
Time II
14
dimensions
Representation
a priori
a posteriori
Logical
Statistical
Language Models
15
Probabilistic Retrieval
Maron and Kuhns Miller (following
Goffman) SER/KSJ Croft
16
Vector Space Model
Salton Murray Rocchio
17
Logical Model
For
Mooers/Faithorne 1960 Hillman 1965 Cooper/Ma
ron 1970 CvR 1986 Nie/Amati/Bruza/Huiber
s 1990
Against
Bar-Hillel 1950 Kasher 1966
18
Buried Treasure
Dependence e.g C.T Yu Unified Probabilistic
Model Maron/Cooper/SER Co-relevance Ivie Stocha
stic Processes Mandelbrot/Herdan Brouwerian
Logics Hillman Error Analysis Hughes/Cover/Dud
a
19
Hypotheses/Principles
Items may be associated without apparent meaning
but exploiting their association may help
retrieval
P R trade-off ABNO/OBNA Exhaustivity/Specifici
ty Cluster Hypothesis Association
Hypothesis Probability Ranking Principle Logical
Uncertainty Principle ASK Polyrepresentation
20
Postulates of Impotence(according to Swanson,
1988)
  • An information need cannot be expressed
    independent of context
  • It is impossible to instruct a machine to
    translate a request into adequate search terms
  • A documents relevance depends on other seen
    documents
  • It is never possible to verify whether all
    relevant documents have been found
  • Machines cannot recognise meaning -gt cant beat
    human indexing etc

21
.more postulates
  • Word-occurrence statistics can neither represent
    meaning nor substitute for it
  • The ability of an IR system to support an
    iterative process cannot be evaluated in terms of
    single-iteration human relevance judgment
  • You can have either subtle relevance judgments or
    highly effective mechanised procedures, but not
    both
  • Thus, consistently effective fully automatic in
    dexing and retrieval is not possible

22
Conclusions
?
23
Matching
Co-ordination is positively correlated with
external relevance Jackson, 1969 Association
Hypothesis The larger the number of matching
descriptive items, for a request and document,
the more likely the document is to be relevant to
the request Sparck Jones, 1971- Relevance
Hypothesis
24
Inference
It is a common fallacy, underwritten at this date
by the investment of several million dollars in a
variety of retrieval hardware, that the algebra
of Boole (1847) is the appropriate formalism for
retrieval design..The logic of Brouwer, as
invoked by Fairthorne, is one such weakening of
the postulate system, Mooers, 1961
Another one Logical Uncertainty Principle CvR,
1986
25
Classification
Co-occurrence of terms as a basis for grouping
makes for good swops i.e. permits substitutions
which retrieve relevant rather than irrelevant
documents. Sparck Jones, 1971. Classification
Hypothesis
If an index term is good at discriminating
relevant from non-relevant document then any
closely associated index term is also likely to
be good at this. CvR, 1979 Association
Hypothesis
Closely associated documents tend to be relevant
to the same requests CvR, 1971 - Cluster
Hypothesis
26
Models
Vector Space/LSI Probabilistic Logical
27
Query Language
Artificial/Natural Multilingual/cross-lingual im
ages none at all
28
Query Definition
Complete/Incomplete Independence/Dependence Weig
hted/Unweighted Query Expansion/one shot
(feedback, web) Sense disambiguation Cross-lingu
al
29
Query Dependence
Relevance Feedback
Query Expansion
Ostensive Retrieval
Context
30
Items wanted
Relevance
ASK Anomolous State of Knowledge
Situated Relevance
31
Error response
Precision and Recall
32
Logic
standard/non-standard probabilistic
logic information flow/logic
33
Representation
Discrimination/Representation
Specificity/Exhaustivity
34
Language Models
NLP
Montague Semantics
Stochastic
35
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com