Distincin semntica de compuestos lxicos en Recuperacin de Informacin PowerPoint PPT Presentation

presentation player overlay
1 / 15
About This Presentation
Transcript and Presenter's Notes

Title: Distincin semntica de compuestos lxicos en Recuperacin de Informacin


1
Distinción semántica de compuestos léxicos en
Recuperación de Información
SEPLN 2002 Valladolid
  • Anselmo Peñas, Julio Gonzalo y Felisa Verdejo
  • Dpto. Lenguajes y Sistemas Informáticos, UNED

2
Content
  • Phrase indexing in IR
  • Types of lexical compounds
  • Automatic classification through WordNet
  • Retrieval experiments
  • Conclusions
  • Future work

3
Phrase indexing in IR
  • Literature
  • Phrase indexing doesnt improve retrieval
  • Give partial credit to components
  • Proximity better than adjacency
  • Reason
  • Compositional meaning of phrases
  • No phrase distinction
  • Our proposal
  • Semantic distinction of phrases

4
Types of lexical compounds (in English)
  • Endocentric compound
  • toothed whale
  • One component gives nuclear meaning whale
  • toothed whale is_a_type_of whale

5
Types of lexical compounds (in English)
  • Appositional compound
  • folk song
  • All components add nuclear meaning
  • folk song is_a_type_of song
  • folk song is_a_type_of folk

6
Types of lexical compounds (in English)
  • Exocentric compound
  • mentally retarded
  • No components give nuclear meaning
  • mentally retarded is_a_type_of people
  • However they retain syntactic properties

7
Types of lexical compounds (in English)
  • Copulative compound
  • Miguel de Cervantes
  • No components give nuclear meaning
  • Referred to entities
  • Loss of syntactic properties

8
Automatic classification through WordNet
song
folk
whale
is_a
is_a
is_a
folk song
mentally retarded
Miguel de Cervantes
toothed whale
Appositional
Exocentric
Copulative
Endocentric
  • Endocentric one component is hyperonym
  • Appositional all components are hyperonyms
  • Exocentric no components are hyperonyms
  • Copulative not in WordNet (Entity Recognition)

9
Effect in retrieval
  • Compare
  • Plain text
  • All compounds
  • Only exocentric compounds
  • Experiment conditions
  • INQUERY Search Engine
  • OHSUMED collection
  • 380 Mb
  • 101 queries
  • medical domain

10
(No Transcript)
11
Conclusions
  • Automatic classification of lexical compounds in
    WordNet with semantic criteria
  • Exocentric and endocentric compounds behavior are
    different
  • Detection of exocentric compounds in queries
    seems to increase precision slightly
  • Not significative results
  • Very few exocentric compounds in queries (7)

12
Future Work
  • Try a test collection with longer queries
    (narrative in TREC topics)
  • Detect exocentric compounds in a pseudo-relevance
    feedback framework

13
Distinción semántica de compuestos léxicos en
Recuperación de Información
SEPLN 2002 Valladolid
  • Anselmo Peñas, Julio Gonzalo y Felisa Verdejo
  • Dpto. Lenguajes y Sistemas Informáticos, UNED

14
Phrase indexing
Texts ...a guide for the fisher
who... ...information on cat care... ...arboreal
carnivorous called fisher cat...
15
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com