Answering Questions through Understanding - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Answering Questions through Understanding

Description:

Ralph Weischedel, Ana Licuanan, Scott Miller, Jinxi Xu. 4 December 2003. 2. AQUAINT ... Spouse-of (e.g. 'Clinton', 'Hillary') Founder-of (e.g. 'Gates', 'Microsoft' ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 35
Provided by: jac83
Category:

less

Transcript and Presenter's Notes

Title: Answering Questions through Understanding


1
Answering Questions throughUnderstanding
AnalysisBBN AQUA
  • Ralph Weischedel, Ana Licuanan, Scott Miller,
    Jinxi Xu

4 December 2003
2
Executive Summary of Accomplishments
  • Technical innovation
  • New hybrid approach to finding extended answers
    across documents
  • Answers questions regarding terms, organizations,
    and persons (biographies)
  • Performs very well given NIST TREC QA evaluation
  • Automatic approach to evaluating extended answers
  • Collaborative contributions
  • Developed answer taxonomy and distributed trained
    name tagger to five other teams for their QA
    systems
  • Co-led pilot study with Dan Moldovan in
    definitional questions, which became part of TREC
    2003 QA evaluation
  • Question classification training data distributed
    to UMass

3
Outline
  • Approach
  • Component technologies
  • Factoid QA
  • Extended answers from multiple documents
  • Accomplishments

4
Approach
  • Overview
  • Key Components
  • Factoid/List Questions
  • Questions requiring Extended Answers

5
BBNs Hybrid Approach to QA
  • Theme Extract features from questions and
    answers using various technologies
  • Stems/words (from document retrieval)
  • Names and descriptions (from information
    extraction)
  • Parse trees
  • Entity recognition (from information extraction)
  • Proposition recognition (from information
    extraction)
  • Analyze the question
  • Reduce question to propositions and a bag of
    words
  • Predict the type of the answer
  • Finding answers
  • Rank candidate answers using passage retrieval
    from primary corpus (the Aquaint corpus)
  • Other knowledge sources (e.g. the Web
    structured data) are optionally used to rerank
    answers
  • Re-rank candidates based on all features
    (propositions, patterns, etc.)
  • Eliminate redundancy (for list questions and
    extended answers)
  • Estimate confidence for answers
  • Presenting the answer

6
Question Classification
  • A hybrid approach based on rules and statistical
    parsing question templates
  • Match question templates against statistical
    parses
  • Back off to statistical bag-of-word
    classification
  • Example features used for classification
  • The type of WHNP starting the question (e.g.
    Who, What, When )
  • The headword of the core NP
  • WordNet definition
  • Bag of words
  • Main verb of the question
  • Example Which pianist won the last
    International Tchaikovsky Competition?
  • Headword of core NPpianist,
  • WordNet definitionperson
  • Answer type Person

7
Question-Answer Types
Thanks to USC/ISI and IBM groups for sharing the
conclusions of their analyses.
8
Question Answer Types (contd)
9
Frequency of Q Types
10
Name Extraction via Hidden Markov Models (HMMs)
The delegation, which included the commander of
the U.N. troops in Bosnia, Lt. Gen. Sir Michael
Rose, went to the Serb stronghold of Pale, near
Sarajevo, for talks with Bosnian Serb leader
Radovan Karadzic.
Training Program
training sentences
answers

HMM
Entities
Text
Extractor
  • Performance
  • Over 90 F on English newswire
  • 72 F on English broadcasts with 30 word
    error rate
  • 85 F on Chinese newswire
  • 76 F on Chinese OCR with 15 word error
  • 88 F on Arabic news
  • 90 F on Spanish news

11
Name Extraction to Pinpoint Candidate Short
Answers
  • IdentiFinderTM extracts names for 24 types
  • Current IdentiFinder performance on types
  • IdentiFinder easily trainable for other
    languages, e.g., Arabic and Chinese
  • Distributed to Carnegie-Mellon University,
    Columbia Univ., Univ. of Albany, Univ. of
    Colorado, MIT, USC/Information Sciences Institute

12
Parsing via Lexicalized Probabilistic CFGs
  • Performance
  • 88 F on English newswire with 900,000 word
    training set
  • 81 F on English newswire with 100,000 word
    training set
  • 80 F on Chinese newswire with 100,000 word
    training set
  • 74 F on Arabic newswire with 100,000 word
    training set

13
Proposition Indexing
  • A shallow semantic representation
  • Deeper than bags of words
  • But broad enough to cover all the text
  • Characterizes documents by
  • Entities they contain
  • Propositions (relations) involving those entities
  • Resolves all references to entities
  • Whether named, described, or pronominal

14
Proposition Finding Example
  • Propositions
  • (e1 Dell)
  • (e2 Comaq)
  • (e3 the most PCs)
  • (e4 2001)
  • (sold subje1, obje3, ine4)
  • (beating subje1, obje2)
  • Question Which company sold the most PCs in
    2001?
  • Text Dell, beating Compaq, sold the most PCs in
    2001.
  • Passage retrieval would select the wrong answer

Answer
15
Proposition Recognition Strategy
  • Start with a lexicalized, probabilistic (LPCFG)
    parsing model
  • Distinguish names by replacing NP labels with NPP
  • Currently, rules normalize the parse tree to
    produce propositions
  • At a later date, extend the statistical model to
  • Predict argument labels for clauses
  • Resolve references to entities

16
Basic System for Factoid/List Questions
Question
17
Confidence Estimation
  • Compute probability P(correctQ,A) from the
    following features
  • P(correctQ,A)?P(correcttype(Q), ltm,ngt, PropSat)
  • type(Q) question type
  • m question length
  • n number of matched question words in answer
    context
  • PropSat whether answer satisfies propositions in
    the question
  • Confidence for answers found on the Web
  • P(correctQ,A)?P(correctFreq, InTrec)
  • FreqNumber of Web hits, using Google
  • InTrecWhether Q was also a top answer from
    Aquaint corpus

18
Technique for Questions Requiring Extended
Answers
  • Select nuggets (phrases) by feature
  • Linguistic features
  • Appositives
  • Copula constructions
  • Surface structure patterns
  • Propositions
  • Semantic features from information extraction
  • Co-reference within document
  • Relations
  • Rank features via information retrieval
  • Remove redundancy
  • Cut off at target length of answer

19
Providing Extended Answers
20
Linguistic Features
  • Method
  • Good features include the target (QTERM) as an
    argument in
  • Propositions
  • Appositives
  • Copulas extracted from parse trees
  • Surface structure patterns
  • Example Blobel, a biologist at Rockefeller
    University, won the Nobel Prize in Medicine.
  • Proposition ltsubgtQTERM ltverbgtwon
    ltobjgtprize
  • Appositive ltappositivegtbiologist
  • Example The court , formally known as the
    International Court of Justice , is the judicial
    arm of the United Nations .
  • Copula ltcopulagtthe judicial arm of the United
    Nations
  • Example The International Court of Justice is
    composed of 15 Judges and has its headquarters at
    The Hague .
  • Surface structure patterns QTERM is composed
    of NP

21
Semantic Features
  • Method
  • SERIF, a state of the art information extraction
    engine, was used
  • Co-reference used for name comparison, e.g.,
  • Depending on context, he and Bush may be the
    same person
  • Relations used as additional features for
    sentence selection. Types of relations include
  • Spouse-of (e.g. Clinton, Hillary)
  • Founder-of (e.g. Gates, Microsoft)
  • Management-of (e.g. Welch, GE)
  • Residence-of (e.g. John Doe, Boston)
  • Citizenship-of (e.g. John Doe, American)
  • Staff-of (e.g. Weischedel, BBN)

22
Relation Types (8/1/2003)
  • Person-Organization
  • Affiliation
  • Owner
  • Founder
  • Management
  • Client
  • General Staff
  • Located-at
  • Member
  • Other
  • Person-Location
  • Resident-of
  • Citizen-of
  • Located-at
  • Other
  • Organization-Location
  • Located-in
  • Organization-Organization
  • Subsidiary
  • Affiliate
  • Client
  • Member
  • Other
  • Person-Person
  • Parent
  • Sibling
  • Spouse
  • Grandparent
  • Client
  • Associate
  • Contact
  • Manager
  • Other-relative
  • Other-professional
  • Other

23
How to Extract Phrases for Features
  • Motivation
  • A good sentence may contain portions irrelevant
    to the question
  • Goal extract only the pertinent parts of a
    sentence
  • Method
  • Operations are performed on parse trees
  • Find the smallest phrase that contains all the
    arguments of an important fact (i.e. proposition,
    appositive, copula, relation, etc.)
  • Relative clauses not attached to the question
    term are trimmed from phrase

24
How to Extract Phases for Features
  • Examples
  • In 1971, Blobel and Dr. David D. Sabatini, who
    now heads cell biology at New York University
    School of Medicine, proposed a bold idea known as
    the signal hypothesis.
  • Proposition verbproposed subTERM objidea
  • Phrase "In 1971 , Blobel and Dr. David D.
    Sabatini , , proposed a bold idea known as the
    signal hypothesis .
  • Though Warner - Lambert has been one of the drug
    industry's hottest performers -- routinely
    reporting quarterly earnings gains of more than
    30 percent -- analysts were concerned that its
    pipeline lacked the depth of those of some
    competitors.
  • Copula Warner Lambert has been
  • Phrase Warner - Lambert has been one of the
    drug industry's hottest performers -- routinely
    reporting quarterly earnings gains of more than
    30 percent

25
Ranking of Features
  • Each feature is reduced to a bag of words
  • All features are ranked according to two factors
  • The type of the feature
  • Appositives/copulas gt patterns gt special
    propositions gt relations gt propositions/sentences
  • Similarity score (tf.idf) between the feature and
    the question profile
  • The question profile is a bag of words, which
    models the importance of different words in
    defining the question
  • Profile is compiled from
  • Existing definitions (e.g., Webster dictionary,
    Encyclopedia, etc.)
  • A collection of human created biographies
  • The centriod of all features

26
Performance in TREC 2003
TREC 2003 Definitional Questions
27
TREC2003 Error Analysis
  • Of 50 definitional questions, 9 received score
    zero of those 9 questions,
  • 4 are due to faulty heuristics for question
    interpretation that could not deal with the
    question context
  • What is ETA in Spain? (The exact string ETA in
    Spain is assumed to be the question term.)
  • What is Ph in biology?
  • What is the medical condition shingles?
  • Who is Akbar the Great? (Great is assumed to be
    the last name of a person)
  • 1 is due to misspelling
  • Who is Charles Lindberg? (The more common variant
    should be Lindbergh)

28
TREC 2003 Error Analysis (Continued)
  • When a question received a low F score,
  • Often because of low recall, not low precision.
  • Average F score is 0.555 (BBN2003C)
  • Assuming perfect precision (1.0) for all
    questions, the score would be 0.614
  • Assuming perfect recall (1.0) for all questions,
    the score would be 0.797
  • NIST F score designed to favor recall
  • For some questions, low recall arises from
    aggressive / errorful redundancy removal
  • Example, Who is Ari Fleisher?
  • Ari Fleischer , Dole 's former spokesman who now
    works for Bush
  • Ari Fleischer , a Bush spokesman

29
Lessons Learned
  • Approach yields interesting performance by
    combining
  • Information retrieval
  • Linguistic analysis
  • Information extraction
  • Redundancy detection
  • Trainability
  • From examples of biographies, organizational
    profiles, and term dictionary/encyclopedia
  • Improves performance
  • Selects items like those seen in human-generated
    answers
  • Offers customizability

30
Towards Automatic Evaluation
  • Goal
  • A repeatable, automatic scorer to allow frequent
    experiments
  • Test 26 biographical questions with human
    created answers (3 human answers/question)
  • ¼ from pilot corpus
  • ¾ from BBN creation
  • For each question, system produces the top N
    response items that are less or equal to the size
    of the manual answer
  • BLEU metric from machine translation evaluations
    used
  • Answer brevity, which should be rewarded for
    bio/def QA, is penalized by BLEU

31
BLEU vs. Human Judgments
  • Too few human judgments and too little data to
    draw firm conclusions
  • BLEU promising for automatic evaluation of
    progress in system development
  • May not be accurate enough for cross-system
    evaluation
  • Result Used in our development

32
Challenges (1)
  • Modeling context to better rank importance to
    user
  • Distinguishing redundancy
  • Redundancy example
  • Jeff Bezos, the former stock trader who founded
    the company
  • Bezos was setting up Amazon.com
  • Jeff Bezos opened the Seattle-based company in
    1995
  • Jeff Bezos started the company in his garage in
    1994
  • Jeff Bezos, Amazon's founder
  • Organizing parts of answer by time, e.g.,
  • Positions in career as part of biography
  • Merger history of a company

33
Challenges (2)
  • Detecting Inconsistency, e.g.,
  • Barbara Jordan, a reporter and former news editor
    of The Chronicle of Willimantic, died Sunday of
    cancer. 1999-10-04
  • Jordan would keep the extent of her health
    problems concealed until the day she died early
    in 1996.
  • BARBARA JORDAN (1936-1996) Jordan was born in
    Houston.
  • Jordan, 33, went on to become an all-conference
    outfielder at CSUN 1999-09-26
  • But Jordan, from Agoura, realized a different
    goal last week
  • Detecting ambiguous questions, e.g.,
  • Who is Barbara Jordan? (3 in the AQUAINT corpus)
  • A newspaper editor,
  • Texas congressperson,
  • A softball coach
  • Automatic evaluation

34
Accomplishments
  • New hybrid approach to finding extended answers
    across documents
  • Answers questions regarding terms, organizations,
    and persons (biographies)
  • Emphasizes high recall, avoiding redundancy, and
    relative importance
  • Employs techniques from information retrieval,
    linguistic analysis, information extraction
  • Performs very well given NIST evaluation
  • Automatic approach to evaluating extended answers
  • Developed answer taxonomy and distributed trained
    name tagger to five other teams for their QA
    systems (English, Arabic, and Chinese available)
  • Co-led pilot study with Dan Moldovan in
    definitional questions, which became part of TREC
    2003 QA evaluation
Write a Comment
User Comments (0)
About PowerShow.com