Evaluation of Evaluation in Information Retrieval - Tefko Saracevic - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Evaluation of Evaluation in Information Retrieval - Tefko Saracevic

Description:

Evaluation is assessing performance or value of a system, process, procedure, ... regularly in trade magazines such as: Online, Online Review, Searcher, etc... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 21
Provided by: comminfo
Category:

less

Transcript and Presenter's Notes

Title: Evaluation of Evaluation in Information Retrieval - Tefko Saracevic


1
Evaluation of Evaluation in Information
Retrieval- Tefko Saracevic
  • Historical Approach to IR Evaluation.

2
Saracevics Definition of Evaluation
  • Evaluation is assessing performance or value of a
    system, process, procedure, product, or policy.

3
Evaluation Requirements
  • A system
  • A prototype
  • A criterion or criteria
  • Objectives of system
  • Measures
  • Recall and precision
  • A measuring instrument
  • Judgments by analysts/users
  • Methodology
  • Procedures, i.e.. for TREC

4
Levels of Evaluation
  • Engineering level
  • Hardware and Software.
  • Input level
  • Contents of system coverage.
  • Processing level
  • Questions regarding the way inputs are
    processed assessment of algorithms, techniques
    and approaches.

5
Levels of Evaluation cont.
  • Output level
  • Interactions with the system and obtained
    output.
  • Use and user level
  • Applications used for given tasks.
  • Social level
  • Effects on research, productivity and decision
    making.
  • Eco-efficient level
  • Economic efficiency questions to be determined
    at each level of analysis.

6
Two more classes of evaluation.
  • End user performance and use
  • Meyer Ruiz, 1990 others summarized in
    Dalrymple Roderer, 1994.
  • Markets, products, and services from information
    industry.
  • Rapp et al., 1990. These evaluations appear
    regularly in trade magazines such as Online,
    Online Review, Searcher, etc... .

7
Output and user and Use level evaluations
  • Fenichel (1981)
  • Borgman (1989)
  • Saracevic, Kantor, Chamis Trivison (1990)
  • Haynes et al. (1990)
  • Fidel (1991)
  • Spink (1995)

8
Processing level ApproachesToy Collections
  • Cranfeld (Cleverdon, Mills Keens, 1966)
  • SMART (Salton 1971, 1989)
  • TREC (Harmon, 1995)

9
Studies conducted on the social level
Evaluating impact of IR area specific systems.
  • Impact of MEDLINE on clinical decision making
    (Lindberg et al., 1993)

10
Criteria in IR Evaluation
  • Relevance as core criteria, Kent et. al. 1955.
  • criteria such as utility and search length did
    not stick.
  • Cranfeld, SMART, TREC all revolved around the
    phenomenon of relevance.
  • Keeping evaluation out of engineering level by
    implications of use.
  • Relevance is a complex human process not of a
    binary nature.
  • Dependent on circumstances

11
Output and User and Use level evaluations
  • Employ a multiplicity of criteria.
  • related to utility, success, completeness,
    worth, satisfaction, value, efficiency, cost
    etc. . .
  • More emphasis on interaction.

12
Market, Business, Industry Evaluations
  • Similar to user use level
  • TQM Total Quality Movement
  • Cost-effectiveness
  • Debate over relevancy is isolated in IR.

13
Isolation of studies within levels of origin.
  • Algorithms
  • Users and Uses
  • Market products/services
  • Social Impacts

14
Process level measures of evaluation
  • Precision
  • Ratio of relevant items retrieved to total
    retrieved items or, probability that a retrieved
    item is relevant.
  • Recall
  • Ratio of relevant items retrieved to all
    available relevant items in a particular file or,
    the probability given that an item retrieved will
    be relevant.

15
Measures User Use level.
  • Semantic differentials
  • Likert scales
  • Which measures to use? How do measures compare?
    How do they effect the results?
  • See, Su, 1992

16
Measuring Instruments
  • Mainly, people, are the instruments that
    determine relevancy of retrieved items.
  • Who are the judges? What effects their
    judgments? How do they effect the results?

17
Methodological issues surrounding notions of
validity and reliability.
  • Collection How are items selected?
  • Requests How are they generated?
  • Searching How is it conducted?
  • Results - How are they obtained?
  • Analysis What comparisons are made?
  • Interpretation / Generalization
  • - What are the conclusions? Are they warranted
    on basis of results? How generalizable are the
    findings?

18
Evaluation outside of traditional IR, i.e.
Digital Libraries and the Internet.
  • Evaluation is limited to software and
    engineering levels.
  • Evaluated on their own level.
  • Many applications are well received, however, on
    most output, user and use levels these
    applications are found to be frustrating,
    unpredictable, wasteful, expensive, trivial
    unreliable and hard to use!

19
Dont through the baby out with the bath water!
  • Dervin and Nilan, 1986
  • Article Swung to the other end of the pendulum
    and called for paradigmatic shift.
  • From system centered to user centered
    evaluations.
  • Both user and system centered approaches are
    needed.

20
Keep it realistic!
  • Possible solution
  • The integration of all levels of evaluation for
    a comprehensive real to life analysis.
Write a Comment
User Comments (0)
About PowerShow.com