What is the Jeopardy Model A QuasiSynchronous Grammar for Question Answering - PowerPoint PPT Presentation

About This Presentation
Title:

What is the Jeopardy Model A QuasiSynchronous Grammar for Question Answering

Description:

1. Bush later met with French president Jacques Chirac. ... leader of France French president. Q A. 8. Existing work in QA. Semantics ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 49
Provided by: scie224
Learn more at: https://cs.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: What is the Jeopardy Model A QuasiSynchronous Grammar for Question Answering


1
What is the Jeopardy Model? A Quasi-Synchronous
Grammar for Question Answering
  • Mengqiu Wang, Noah A. Smith and Teruko Mitamura
  • Language Technology Institute
  • Carnegie Mellon University

2
The task
1. Bush later met with French president Jacques
Chirac. 2. Henri Hadjenberg, who is the leader
of France s Jewish community, 3.
1. Henri Hadjenberg, who is the leader of France
s Jewish community, 2. Bush later met with
French president Jacques Chirac. (as of May 16
2007) 3.
Who is the leader of France?
High-efficiency document retrieval
High-precision answer ranking
3
Challenges
1. Bush later met with French president Jacques
Chirac. 2. Henri Hadjenberg, who is the leader
of France s Jewish community, 3.
Who is the leader of France?
High-efficiency document retrieval
High-precision answer ranking
4
Semantic Tranformations
  • QWho is the leader of France?
  • A Bush later met with French president Jacques
    Chirac.

5
Syntactic Transformations
mod
mod
  • Who

leader
the
France
of
is
?
mod
Bush
met
French
with
president
Jacques
Chirac
6
Syntactic Variations
mod
mod
  • Who

leader
the
France
of
is
?
mod
mod
Henri
Hadjenberb
,
who
leader
is
the
of
France
s
Jewish
community
7
Two key phenomena in QA
Q A
  • Semantic transformation
  • leader president
  • Syntactic transformation
  • leader of France French president

8
Existing work in QA
  • Semantics
  • Use WordNet as thesaurus for expansion
  • Syntax
  • Use dependency parse trees, but merely transform
    the feature space into dependency parse feature
    space. No fundamental changes in the algorithms
    (edit-distance, classifier, similarity measure).

9
Where else have we seen these transformations?
  • Machine Translation (especially in syntax-based
    MT)
  • Paraphrasing
  • Sentence compression
  • Textual entailment

F E
10
Noisy-channel
  • Machine Translation
  • Question Answering

S E
Language model
Translation model
Q A
Jeopardy model
retrieval model
11
What is Jeopardy! ?
  • From wikipedia.org
  • Jeopardy! is a popular international television
    quiz game show (2 of the 50 Greatest Game Show
    of All Times).
  • 3 contestants select clues in the form of an
    answer, to which they must supply correct
    responses in the form of a question.
  • The concept of "questioning answers" is original
    to Jeopardy!.

12
Jeopardy Model
  • We make use of a formalism called
    quasi-synchronous grammar D. Smith Eisner
    06, originally developed for MT

13
Quasi-Synchronous Grammars
  • Based on key observations in MT
  • translated sentences often have some isomorphic
    syntactic structure, but not usually in entirety.
  • the strictness of the isomorphism may vary across
    words or syntactic rules.
  • Key idea
  • Unlike some synchronous grammars (e.g. SCFG,
    which is more strict and rigid), QG defines a
    monolingual grammar for the target tree,
    inspired by the source tree.

14
Quasi-Synchronous Grammars
  • In other words, we model the generation of the
    target tree, influenced by the source tree (and
    their alignment)
  • QA can be thought of as extremely free
    translation within the same language.
  • The linkage between question and answer trees in
    QA is looser than in MT, which gives a bigger
    edge to QG.

15
Jeopardy Model
  • Works on labeled dependency parse trees
  • Learn the hidden structure (alignment between Q
    and A trees) by summing out ALL possible
    alignments
  • One particular alignment tells us both the
    syntactic configurations and the word-to-word
    semantic correspondences
  • An example

answer parse tree
question parse tree
an alignment
answer
question
16
root
root
Q
A
root
root
met VBD
is VB
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
det
of
nmod
president NN
the DT
France NNP location
nmod
French JJ location
17
root
root
Q
A
root
root
met VBD
is VB
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
det
of
nmod
president NN
the DT
France NNP location
nmod
French JJ location
18
root
root
Q
A
root
root
met VBD
is VB
subj
with
Bush NNP person
Jacques Chirac NNP person
nmod
president NN
nmod
given its parent, a word is independent of all
other words (including siblings).
French JJ location
Our model makes local Markov assumptions to allow
efficient computation via Dynamic Programming
(details in paper)
19
root
root
Q
A
root
root
met VBD
is VB
subj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
nmod
president NN
nmod
French JJ location
20
root
root
Q
A
root
root
met VBD
is VB
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
nmod
president NN
nmod
French JJ location
21
root
root
Q
A
root
root
met VBD
is VB
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
det
nmod
president NN
the DT
nmod
French JJ location
22
root
root
Q
A
root
root
met VBD
is VB
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
det
of
nmod
president NN
the DT
France NNP location
nmod
French JJ location
23
6 types of syntactic configurations
  • Parent-child

24
root
root
Q
A
root
root
met VBD
is VB
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
det
of
nmod
president NN
the DT
France NNP location
nmod
French JJ location
25
Parent-child configuration
26
6 types of syntactic configurations
  • Parent-child
  • Same-word

27
root
root
Q
A
root
root
met VBD
is VB
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
det
of
nmod
president NN
the DT
France NNP location
nmod
French JJ location
28
Same-word configuration
29
6 types of syntactic configurations
  • Parent-child
  • Same-word
  • Grandparent-child

30
root
root
Q
A
root
root
met VBD
is VB
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
det
of
nmod
president NN
the DT
France NNP location
nmod
French JJ location
31
Grandparent-child configuration
32
6 types of syntactic configurations
  • Parent-child
  • Same-word
  • Grandparent-child
  • Child-parent
  • Siblings
  • C-command
  • (Same as D. Smith Eisner 06)

33
(No Transcript)
34
Modeling alignment
  • Base model

35
root
root
Q
A
root
root
met VBD
is VB
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
det
of
nmod
president NN
the DT
France NNP location
nmod
French JJ location
36
root
root
Q
A
root
root
met VBD
is VB
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
det
of
nmod
president NN
the DT
France NNP location
nmod
French JJ location
37
Modeling alignment cont.
  • Base model
  • Log-linear model
  • Lexical-semantic features from WordNet,
  • Identity, hypernym, synonym, entailment, etc.
  • Mixture model

38
Parameter estimation
  • Things to be learnt
  • Multinomial distributions in base model
  • Log-linear model feature weights
  • Mixture coefficient
  • Training involves summing out hidden structures,
    thus non-convex.
  • Solved using conditional Expectation-Maximization

39
Experiments
  • Trec8-12 data set for training
  • Trec13 questions for development and testing

40
Candidate answer generation
  • For each question, we take all documents from the
    TREC doc pool, and extract sentences that contain
    at least one non-stop keywords from the question.
  • For computational reasons (parsing speed, etc.),
    we only took answer sentences

41
Dataset statistics
  • Manually labeled 100 questions for training
  • Total 348 positive Q/A pairs
  • 84 questions for dev
  • Total 1415 Q/A pairs
  • 3.1, 17.1-
  • 100 questions for testing
  • Total 1703 Q/A pairs
  • 3.6, 20.0-
  • Automatically labeled another 2193 questions to
    create a noisy training set, for evaluating model
    robustness

42
Experiments cont.
  • Each question and answer sentence is tokenized,
    POS tagged (MX-POST), parsed (MSTParser) and
    labeled with named-entity tags (Identifinder)

43
Baseline systems (replications)
  • Cui et al. SIGIR 05
  • The algorithm behind one of the best performing
    systems in TREC evaluations.
  • It uses a mutual information-inspired score
    computed over dependency trees and a single fixed
    alignment between them.
  • Punyakanok et al. NLE 04
  • measures the similarity between Q and A by
    computing tree edit distance.
  • Both baselines are high-performing, syntax-based,
    and most straight-forward to replicate
  • We further enhanced the algorithms by augmenting
    them with WordNet.

44
Results
Mean Average Precision
Mean Reciprocal Rank of Top 1
28.2
41.2
30.3
23.9
Statistically significantly better than the 2nd
best score in each column
45
Summing vs. Max
46
Conclusion
  • We developed a probabilistic model for QA based
    on quasi-synchronous grammar
  • Experimental results showed that our model is
    more accurate and robust than state-of-the-art
    syntax-based QA models
  • The mixture model is shown to be powerful. The
    log-linear model allows us to use arbitrary
    features.
  • Provides a general framework for many other NLP
    applications (compression, textual entailment,
    paraphrasing, etc.)

47
Future Work
  • Higher-order Markovization, both horizontally and
    vertically, allows us to look at more context, at
    the expense of higher computational cost.
  • More features from external resources, e.g.
    paraphrasing database
  • Extending it for Cross-lingual QA
  • Avoid the paradigm of translation as pre- of
    post-processing
  • We can naturally fit in a lexical or phrase
    translation probability table into our model to
    model the translation inherently
  • Taking into account parsing uncertainty

48
Thank you!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com