Title: 20th Century Esfinge (Sphinx) solving the riddles at CLEF 2005
120th Century Esfinge (Sphinx) solving the riddles
at CLEF 2005
- Luís Costa
- Luis.costa_at_sintef.no
- Linguateca / SINTEF ICT
- PB 124, Blindern NO-0314 Oslo, Norway
- http//www.linguateca.pt
Esfinge on the Web http//www.linguateca.pt/Esfin
ge/
- General domain question answering system.
- The starting point was the architecture
described in Brill, Eric. Processing Natural
Language without Natural Language Processing, in
A. Gelbukh (ed.), CICLing 2003, LNCS 2588,
Springer-Verlag Berlin Heidelberg, 2003, pp.
360-9 - Exploring the redundancy existent in the Web.
- Exploring the fact that Portuguese is one of the
most used languages in the Web. - Available on the Web.
- Participation at CLEF 2004 and 2005. Two
strategies were tested - Searching the answers in the Web and using the
CLEF document collection to confirm them
(Strategy 1). - Searching the answers only in the CLEF document
collection (Strategy 2). - Additional experiments using Strategy 1 were
performed after error analysis and system
debugging (Post-CLEF).
Esfinge overview
?
Question reformulation module
Strategy 1
Strategy 2
Submition of answer patterns to Google
Passage extraction from CLEF document collection
Doc.s Found?
Doc.s Found?
No
What was new at CLEF 2005?
- Use of the named entity recognizer SIEMES
(detection of humans, countries, settlements,
geographical locations, dates and quantities). - List of not interesting websites (jokes, blogs,
etc.) - Available Brazilian Portuguese document
collection. - Use of the stemmer LinguaPTStemmer for the
generalization of search patterns. - Filtering of undesired answers. A list of these
answers was built based on the logs of last
years participation and tests performed
afterwards. - Searching longer answers the system does not
stop when it finds an acceptable answer. Instead
keeps searching for longer acceptable answers
containing the latter. - Participation in the EN-PT multilingual task.
- Correction of problems detected last year.
Yes
No gt Stem patterns
Yes
Stemmed Pattern 1 Stemmed Pattern n
Passages
N-gram Harvesting
Passage extraction from CLEF document collection
Yes
N-grams
Doc.s Found?
Q. pattern enables use of NER?
No
Esfinges performance
Yes gt SIEMES NER
No
Answer NIL
Task Experiment questions right right
CLEF 2005 PT-PT Strategy 1 200 48 24
CLEF 2005 PT-PT Strategy 2 200 43 22
CLEF 2005 PT-PT Post-CLEF 200 61 31
CLEF 2005 EN-PT Strategy 1 200 25 13
CLEF 2004 PT-PT Strategy 1 199 30 15
CLEF 2004 PT-PT Strategy 2 199 22 11
CLEF 2004 PT-PT Post-CLEF 199 55 28
Filters (ABCD)
N-grams
Filters (BCD)
Any N-grams?
No
Two further right answers were found after the
official results were released.
- The results in the runs using the Web (Strategy
1) were slightly better than the runs using only
the CLEF document collection on both
participations. - The results using Strategy 2 for the questions of
type People and Date are better both comparing to
the other types of questions and to the same type
of questions using Strategy 1. This suggests that
both strategies are still worthwhile to
experiment and study further. - The analysis of the individual modules shows that
the NER system helps the system mainly in the
questions of type People, Quantity and
Date, while the morphological analyser is more
influential in the questions of type Which X ,
Who was ltHUMANgt and What is. - The results show that Esfinge improved comparing
to last year the results are better both with
this years and last years questions.
Yes
No
Any N-grams?
Answer best scored N-gram
Answer NIL
Yes
Filters A Interesting PoS B Answer contained
in question C Undesired answer D Supporting
document
Answer best scored N-gram