Answer Validation Exercise AVE QA subtrack at CrossLanguage Evaluation Forum - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Answer Validation Exercise AVE QA subtrack at CrossLanguage Evaluation Forum

Description:

nlp.uned.es/QA/ave. Looking for robust systems ... Available for CLEF participants at nlp.uned.es/QA/ave/ 1324 (14% YES) Portuguese. 807 (10% YES) ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 13
Provided by: anselm
Category:

less

Transcript and Presenter's Notes

Title: Answer Validation Exercise AVE QA subtrack at CrossLanguage Evaluation Forum


1
Answer Validation Exercise - AVEQA subtrack at
Cross-Language Evaluation Forum
  • UNED (coord.)
  • Anselmo Peñas
  • Álvaro Rodrigo
  • Valentín Sama
  • Felisa Verdejo

Thanks to Bernardo Magnini Danilo Giampiccolo
Pamela Forner Petya Osenova Christelle Ayache
Bodgan Scaleanu Diana Santos Juan Feu Ido dagan
2
What? Answer Validation Exercise
  • Validate the correctness of the answers given by
    real QA systems...
  • ...the answers of participants
  • at CLEF QA 2006
  • Why? Give feedback on a single QA module, improve
    QA
  • systems performance, improve systems self-score,
  • help humans in the assessment of QA systems
    output,
  • develop criteria for collaborative QA systems, ...

3
How? Turning it into a RTE exercise
Question
several sentences lt500 bytes
  • If the text semantically entails the hypothesis,
    then the answer is expected to be correct.

4
Example
  • Question
  • Who is the President of Mexico?
  • Answer (obsolete)
  • Vicente Fox
  • Hypothesis
  • Vicente Fox is the President of Mexico
  • Supporting Text
  • ...President Vicente Fox promises a more
    democratic Mexico...
  • Exercise
  • Text entails Hypothesis?
  • Answer YES NO

5
Looking for robust systems
  • Hypothesis are built semi-automatically from
    systems answers
  • Some answers are correct and exact
  • Many are too large, too short, too wrong
  • Many hypothesis with
  • Wrong syntax but understandable
  • Wrong syntax and not understandable
  • Wrong semantics

6
So, the exercise
  • Return an entailment value (YESNO) for each
    given text-hypothesis pair
  • Results were evaluated against the QA human
    assessments
  • Subtasks
  • English, Spanish, Italian, Dutch, French, German,
    Portuguese and Bulgarian

7
Collections
  • Available for CLEF participants
    at nlp.uned.es/QA/ave/

8
Evaluation
  • Not balanced collections
  • Approach Detect if there is enough evidence to
    accept an answer
  • Measures Precision, recall and F over pairs YES
    (where text entails hypothesis)
  • Baseline system Accept all answers, (give always
    YES)

9
Participants and runs
10
Results
11
Conclusions
  • Developed methodologies
  • Build collections from QA responses
  • Evaluate in chain with a QA Track
  • New testing collections for the QA and RTE
    communities
  • In 7 languages, not only English
  • Evaluation in a real environment
  • Real systems outputs -gt AVE input

12
Conclusions
  • Reformulation of Answer Validation as Textual
    Entailment problem is feasible
  • Introduces a 4 of error (in the semi-automatic
    generation of the collection)
  • Good participation
  • 11 systems, 38 runs, 7 languages
  • Systems that reported the use of Logic obtained
    the best results
Write a Comment
User Comments (0)
About PowerShow.com