Title: Distincin semntica de compuestos lxicos en Recuperacin de Informacin
1Distinción semántica de compuestos léxicos en
Recuperación de Información
SEPLN 2002 Valladolid
- Anselmo Peñas, Julio Gonzalo y Felisa Verdejo
- Dpto. Lenguajes y Sistemas Informáticos, UNED
2Content
- Phrase indexing in IR
- Types of lexical compounds
- Automatic classification through WordNet
- Retrieval experiments
- Conclusions
- Future work
3Phrase indexing in IR
- Literature
- Phrase indexing doesnt improve retrieval
- Give partial credit to components
- Proximity better than adjacency
- Reason
- Compositional meaning of phrases
- No phrase distinction
- Our proposal
- Semantic distinction of phrases
4Types of lexical compounds (in English)
- Endocentric compound
- toothed whale
- One component gives nuclear meaning whale
- toothed whale is_a_type_of whale
5Types of lexical compounds (in English)
- Appositional compound
- folk song
- All components add nuclear meaning
- folk song is_a_type_of song
- folk song is_a_type_of folk
6Types of lexical compounds (in English)
- Exocentric compound
- mentally retarded
- No components give nuclear meaning
- mentally retarded is_a_type_of people
- However they retain syntactic properties
7Types of lexical compounds (in English)
- Copulative compound
- Miguel de Cervantes
- No components give nuclear meaning
- Referred to entities
- Loss of syntactic properties
8Automatic classification through WordNet
song
folk
whale
is_a
is_a
is_a
folk song
mentally retarded
Miguel de Cervantes
toothed whale
Appositional
Exocentric
Copulative
Endocentric
- Endocentric one component is hyperonym
- Appositional all components are hyperonyms
- Exocentric no components are hyperonyms
- Copulative not in WordNet (Entity Recognition)
9Effect in retrieval
- Compare
- Plain text
- All compounds
- Only exocentric compounds
- Experiment conditions
- INQUERY Search Engine
- OHSUMED collection
- 380 Mb
- 101 queries
- medical domain
10(No Transcript)
11Conclusions
- Automatic classification of lexical compounds in
WordNet with semantic criteria - Exocentric and endocentric compounds behavior are
different - Detection of exocentric compounds in queries
seems to increase precision slightly - Not significative results
- Very few exocentric compounds in queries (7)
12Future Work
- Try a test collection with longer queries
(narrative in TREC topics) - Detect exocentric compounds in a pseudo-relevance
feedback framework
13Distinción semántica de compuestos léxicos en
Recuperación de Información
SEPLN 2002 Valladolid
- Anselmo Peñas, Julio Gonzalo y Felisa Verdejo
- Dpto. Lenguajes y Sistemas Informáticos, UNED
14Phrase indexing
Texts ...a guide for the fisher
who... ...information on cat care... ...arboreal
carnivorous called fisher cat...
15(No Transcript)