Olena Medelyan and Ian H. Witten - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Olena Medelyan and Ian H. Witten

Description:

11,000 non-descriptors, that are linked to descriptors, e.g. Obesity Overweight. An Experiment ... 29% non-matching terms, 10% non-matching concepts. New measure M3: ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 14
Provided by: a15205
Category:

less

Transcript and Presenter's Notes

Title: Olena Medelyan and Ian H. Witten


1

Measuring inter-indexer consistencyusing a
thesaurus
Olena Medelyan and Ian H. Witten Digital Library
LabDepartment of Computer ScienceThe University
of Waikato, New Zealand
  • Agenda
  • Indexing task
  • Controlled vocabulary
  • Experiment with 6 professionals
  • Indexing quality
  • Inter-indexer Consistency
  • Conceptual Consistency
  • New Measure
  • Vector model
  • Weighting relations
  • Results

2
Indexing with a Controlled Vocabulary
An Experiment
  • 10 documents related to agricultural topics
  • 6 professional indexers from FAO

  • FAOs domain-specific thesaurus Agrovoc
  • 17,000 descriptors, i.e. allowed index terms
  • 11,000 non-descriptors, that are linked to
    descriptors, e.g. Obesity ? Overweight

United Nations Food and Agriculture Organization
3
The Global Obesity Problem
Agrovoc terms
4
The Global Obesity Problem
Agrovoc terms
energy value
public health
nutritionaldisorders
regulations
weight reduction
nutrient excesses
developing countries
disease control
nutritional requirements
diet
dietary guidelines
nutritionstatus
nutrition programs
developed countries
feeding habits
meal patterns
nutrition surveillance
overweight
food policies
nutritional physiology
price formation
foodintake
overeating
human nutrition
nutrition policies
price policies
foods
food consumption
fiscal policies
prices
direct taxation
urbanization
globalization
taxes
5
The Global Obesity Problem
Agrovoc terms
energy value
public health
nutritionaldisorders
regulations
weight reduction
Indexers
1 2 3 4 5 6
nutrient excesses
developing countries
disease control
nutritional requirements
diet
dietary guidelines
nutritionstatus
nutrition programs
developed countries
feeding habits
meal patterns
nutrition surveillance
overweight
food policies
nutritional physiology
price formation
foodintake
overeating
human nutrition
nutrition policies
price policies
foods
food consumption
fiscal policies
prices
direct taxation
urbanization
globalization
taxes
6
Agenda
  • Indexing task
  • Controlled vocabulary
  • Experiment with 6 professionals
  • Indexing quality
  • Inter-indexer Consistency
  • Conceptual Consistency
  • New Measure
  • Vector model
  • Weighting relations
  • Results

7
Inter-Indexer Consistency
  • Degree of agreement between indexers in
    assigning index terms
  • Consistency ? Retrieval Efficiency
  • Measuring consistency in library catalogues
  • Consistency (reported 10 - 80) depends on
  • size of the controlled vocabulary
  • measure M2 lt M1
  • matching policies

A C B
Indexer1
Indexer2
8
Conceptual Consistency
  • Conceptual consistency gt terminological
    consistency
  • Experiment
  • 10 documents, 6 professionals
  • 5 - 16 terms/document,
  • 550 in total, 280 unique
  • 1 indexer vs. other 5 indexers
  • exact matching 71
  • semantically related 18
  • 29 non-matching terms, 10 non-matching concepts
  • New measure M3 Consider semantic relations
    between terms!

9
Agenda
  • Indexing task
  • Controlled vocabulary
  • Experiment with 6 professionals
  • Indexing quality
  • Inter-indexer Consistency
  • Conceptual Consistency
  • New Measure
  • Vector model
  • Weighting relations
  • Results

10
Inter-Indexer Consistency
  • Set of index terms as a vector

11
Integrating Semantic Relations
  • Create two matrices
  • RT links
  • BT/NT links
  • Modified measure

12
Consistency in a Group of Indexers
  • Overall consistency
  • vector of terms by indexer i for
    document D
  • Choosing weights to maximize M3'
  • ? 0.2, ? 0.15
  • Results
  • M1 38 M3 49, M3' 51

13
Conclusions
  • Indexers seldom agree on index terms
  • Even professionals
  • Even the same person with itself
  • Indexers agree on concepts
  • Consistency should consider semantic similarity
  • Vector model
  • elegant linear generalization of similarity
  • higher consistency values
  • easily extendable with semantic information
  • Larger testing data is required, but it is
    difficult to acquire
Write a Comment
User Comments (0)
About PowerShow.com