Title: Semantic Retrieval for Question Answering
1Semantic Retrievalfor Question Answering
- Student Research Symposium
- Language Technologies Institute
- Matthew W. Bilotti
- mbilotti_at_cs.cmu.edu
- September 23, 2005
2Outline
- What is Question Answering?
- What is the cause of wrong answers?
- What is Semantic Retrieval, and can it help?
- What have other teams tried?
- How is JAVELIN using Semantic Retrieval?
- How can we evaluate the impact of Semantic
Retrieval on Question Answering systems? - Where can we go from here?
3What is Question Answering?
- A process that finds succinct answers to
questions phrased in natural language
Q Where is Carnegie Mellon? A Pittsburgh,
Pennsylvania, USA Q Who is Jared Cohon? A
... is the current President of Carnegie Mellon
University? Q When was Herbert Simon
born? A 15 June 1916
Google. http//www.google.com
4Classic Pipelined QA Architecture
Output Answers
Input Question
- A sequence of discrete modules cascaded such that
the output of the previous module is the input to
the next module.
5Classic Pipelined QA Architecture
Where was Andy Warhol born?
Output Answers
Input Question
6Classic Pipelined QA Architecture
Where was Andy Warhol born?
Output Answers
Input Question
Discover keywords in the question, generate
alternations, and determine answer type.
Keywords Andy (Andrew), Warhol, born Answer
type Location (City)
7Classic Pipelined QA Architecture
Output Answers
Input Question
Formulate IR queries using the keywords, and
retrieve answer-bearing documents
( Andy OR Andrew ) AND Warhol AND born
8Classic Pipelined QA Architecture
Output Answers
Input Question
Andy Warhol was born on August 6, 1928 in
Pittsburgh and died February 22, 1927 in New
York.
Extract answers of the expected type from
retrieved documents.
Andy Warhol was born to Slovak immigrants as
Andrew Warhola on August 6, 1928, on 73 Orr
Street in Soho, Pittsburgh, Pennsylvania.
9Classic Pipelined QA Architecture
Output Answers
Input Question
Pittsburgh
1. Pittsburgh, Pennsylvania 2. New York
select appropriate granularity
Pittsburgh, Pennsylvania
merge
73 Orr Street in Soho, Pittsburgh, Pennsylvania
1.
Answer cleanup and merging, consistency or
constraint checking, answer selection and
presentation.
2.
New York
rank
10What is the cause of wrong answers?
Failure point
Output Answers
Input Question
- A pipelined QA system is only as good as its
weakest module - Poor retrieval and/or query formulation can
result in low ranks for answer-bearing documents,
or no answer-bearing documents retrieved
11What is Semantic Retrieval, and can it help?
- Semantic Retrieval is a broad term for a document
retrieval technique that makes use of semantic
information and language understanding - Hypothesis Use of Semantic Retrieval can improve
performance, retrieving more, and more
highly-ranked, relevant documents
12What have other teams tried?
- LCC/SMU approach
- Use an existing IR system as a black box rich
query expansion - CL Research approach
- Process top documents retrieved from an IR
engine, extracting semantic relation triples,
index and retrieve using RDBMS - IBM (Prager) Predictive Annotation
- Store answer types (QA-Tokens) in the IR systems
index, and retrieve on them
13LCC/SMU Approach
- Syntactic relationships (controlled synonymy),
morphological and derivational expansions for
Boolean keywords - Statistical passage extraction finds windows
around keywords - Semantic constraint check for filtering
(unification) - NE recognition and pattern matching as a third
pass for answer extraction - Ad hoc relevance scoring term proximity,
occurrence of answer in an apposition, etc
Extended Wordnet
Named Entity Extraction
Passage Extraction
Constraint Checking
Boolean query
IR
Keywords and Alternations
Passages
Documents
Answer Candidates
Moldovan, et. al., Performance issues and error
analysis in an open-domain QA system, ACM TOIS,
vol. 21, no. 2. 2003
14Litkowski/CL Research Approach
- Relation triples discourse entity (NP)
semantic role or relation governing word
essentially similar to our predicates - Unranked XPath querying against RDBMS
entity mention canonicalization
jumped
lazy dog
The quick brown fox jumped over the lazy dog.
quick brown fox
RDBMS
Docs
Semantic relationship triples
Sentences
10-20 top PRISE documents
XML/xpath
Litkowski, K.C. Question Answering Using
XML-Tagged Documents. TREC 2003
15Predictive Annotation
- Textract identifies candidate answers at indexing
time - QA-Tokens are indexed as text items along with
actual doc tokens - Passage retrieval, with simple bag-of-words
combo-match (heuristic) ranking formula
Gasoline cost 0.78 MONEY per gallon VOLUME in
1999 YEAR.
Answer type taxonomy
Gasoline cost 0.78 per gallon in 1999.
Textract (IE/NLP)
IR
Docs
Corpus
QA-Tokens
Prager, et. al. Question-answering by predictive
annotation. SIGIR 2000
16How is JAVELIN using Semantic Retrieval?
- Annotate corpus with semantic content (e.g.
predicates), and index this content - At runtime, perform similar analysis on input
questions to get predicate templates - Maximal recall of documents that contain matching
predicate instances - Constraint checking at the answer extraction
stage to filter out false positives and rank best
matches
Nyberg, et. al. Extending the JAVELIN QA System
with Domain Semantics, AAAI 2005.
17Annotating and Indexing the Corpus
Predicate- Argument Structure
loves
ARG1
ARG0
Mary
John
Actual Index Content
loves
ARG1
Mary
Annotation Framework
John
ARG0
RDBMS
IR
Indexer
Text
Nyberg, et. al. Extending the JAVELIN QA System
with Domain Semantics, AAAI 2005.
18Retrieval on Predicate-Argument Structure
Who does John love?"
Output Answers
Input Question
Nyberg, et. al. Extending the JAVELIN QA System
with Domain Semantics, AAAI 2005.
19Retrieval on Predicate-Argument Structure
Who does John love?"
Output Answers
Input Question
loves
ARG0
ARG1
?x
John
Predicate-Argument Template
Nyberg, et. al. Extending the JAVELIN QA System
with Domain Semantics, AAAI 2005.
20Retrieval on Predicate-Argument Structure
Who does John love?"
Output Answers
Input Question
IR
What the IR engine sees
loves
Some Retrieved Documents
ARG0
ARG1
?x
John
Frank loves Alice. John dislikes Bob."
"John loves Mary.
Nyberg, et. al. Extending the JAVELIN QA System
with Domain Semantics, AAAI 2005.
21Retrieval on Predicate-Argument Structure
Who does John love?"
Output Answers
Input Question
Mary
X
RDBMS
Frank loves Alice. John dislikes Bob."
"John loves Mary.
loves
loves
ARG0
ARG1
ARG0
ARG1
Alice
Frank
dislikes
Mary
John
ARG0
ARG1
Bob
John
Matching Predicate Instance
Nyberg, et. al. Extending the JAVELIN QA System
with Domain Semantics, AAAI 2005.
22How can we evaluate the impact of Semantic
Retrieval on QA systems?
- Performance can be indirectly evaluated by
measuring the performance of the end-to-end QA
system while varying the document retrieval
strategy employed, in one of two ways - NIST-style comparative evaluation
- Absolute evaluation against new test sets
- Direct analysis of document retrieval performance
- Requires an assumption such as, maximal recall
of relevant documents translates to best
end-to-end system performance
23NIST-style Comparative Evaluation
- Answer keys developed by pooling
- All answers gathered by all systems are checked
by a human to develop the answer key - Voorhees showed that the comparative orderings
between systems are stable regardless of
exhaustiveness of judgments - Answer keys from TREC evaluations are never
suitable for post-hoc evaluation (nor were they
intended to be), since they may penalize a new
strategy for discovering good answers not in the
original pool - Manual scoring
- Judging system output involves semantics
(Voorhees 2003) - Abstract away from differences in vocabulary or
syntax, and robustly handle paraphrase - This is the same methodology used in the
Definition QA evaluation in TREC 2003 and 2004
24Absolute Evaluation
- Requires building new test collections
- Not dependent on pooled results from systems, so
suitable for post-hoc experimentation - Human effort is required a methodology is
described in (Katz and Lin 2005), (Bilotti, Katz
and Lin 2004) and (Bilotti 2004) - Automatic scoring methods based on n-grams, or
fuzzy unification on predicate-argument structure
(Lin and Demner-Fushman 2005), (Vandurme et al.
2003) can be applied - Can evaluate at the level of documents or
passages retrieved, predicates matched, or
answers extracted, depending on the level of
detail in the test set.
25Preliminary ResultsTREC 2005 Relationship QA
Track
- 25 scenario-type questions the first time such
questions have occurred officially in the TREC QA
track - Semi-automatic runs were allowed JAVELIN
submitted a second run using manual question
analysis - Results (in MRR of relevant nuggets)
- Run 1 0.1356
- Run 2 0.5303
- Example on the next slide!
26Example Question Analysis
The analyst is interested in Iraqi oil smuggling.
Specifically, is Iraq smuggling oil to other
countries, and if so, which countries? In
addition, who is behind the Iraqi oil smuggling?
interested
smuggling
ARG0
ARG1
ARG0
ARG2
Iraqi oil smuggling
The analyst
Iraq
which countries
ARG1
oil
smuggling
is behind
ARG0
ARG2
Iraq
ARG0
ARG1
other countries
the Iraqi oil smuggling
Who
ARG1
oil
27Example Results
The analyst is interested in Iraqi oil smuggling.
Specifically, is Iraq smuggling oil to other
countries, and if so, which countries? In
addition, who is behind the Iraqi oil smuggling?
1. The amount of oil smuggled out of Iraq has
doubled since August last year, when oil prices
began to increase, Gradeck said in a telephone
interview Wednesday from Bahrain. 2. U.S.
Russian Tanker Had Iraqi Oil By ROBERT BURNS, AP
Military Writer WASHINGTON (AP) Tests of oil
samples taken from a Russian tanker suspected of
violating the U.N. embargo on Iraq show that it
was loaded with petroleum products derived from
both Iranian and Iraqi crude, two senior defense
officials said. 5. With no American or allied
effort to impede the traffic, between 50,000 and
60,000 barrels of Iraqi oil and fuel products a
day are now being smuggled along the Turkish
route, Clinton administration officials estimate.
(7 of 15 relevant)
28Where do we go from here?
- What to index and how to represent it
- Moving to Indri1 allows exact representation of
our predicate structure in the index - Building a Scenario QA test collection
- Query formulation and relaxation
- Learning or planning strategies
- Ranking retrieved predicate instances
- Aggregating information across documents
- Inference and evidence combination
- Extracting answers from predicate-argument
structure
1. http//www.lemurproject.org
29References
- Bilotti. Query Expansion Techniques for Question
Answering. Masters Thesis, MIT. 2004. - Bilotti, et. al. What Works Better for Question
Answering Stemming or Morphological Query
Expansion? IR4QA workshop at SIGIR 2004. - Lin and Demner-Fushman. Automatically Evaluating
Answers to Definition Questions. HLT/EMNLP 2005. - Litkowski, K.C. Question Answering Using
XML-Tagged Documents. TREC 2003. - Metzler and Croft. Combining the Language Model
and Inference Network Approaches to Retrieval.
Information Processing and Management Special
Issue on Bayesian Networks and Information
Retrieval, 40(5), 735-750, 2004. - Metzler, et. al., Indri at TREC 2004 Terabyte
Track. TREC 2004. - Moldovan, et. al., Performance issues and error
analysis in an open-domain question answering
system, ACM TOIS, vol. 21, no. 2. 2003. - Nyberg, et. al. Extending the JAVELIN QA System
with Domain Semantics, Proceedings of the 20th
National Conference on Artificial Intelligence
(AAAI 2005). - Pradhan, S., et. al. Shallow Semantic Parsing
using Support Vector Machines. HTL/NAACL-2004. - Prager, et. al. Question-answering by predictive
annotation. SIGIR 2000. - Vandurme, B. et. al. Towards Light Semantic
Processing for Question Answering. HLT/NAACL
2003. - Voorhees, E. Overview of the TREC 2003 question
answering track. TREC 2003.