Robust Local Textual Inference - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Robust Local Textual Inference

Description:

... of a plumber, a carpenter, a painter, an electrician, and an interior decorator. ... The interior decorator has to complete his job before that of the electrician. ... – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 52
Provided by: christo394
Learn more at: https://nlp.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Robust Local Textual Inference


1
Robust Local Textual Inference
  • Christopher Manning, Stanford University
  • Bill MacCartney, Marie-Catherine de Marneffe (U.
    C. de Louvain), Teg Grenager, Daniel Cer (U.
    Colorado), Rajat Raina, Christopher Cox,
    Anna Rafferty, Roger Grosse, Josh Ainslie, Aria
    Haghighi, Jenny Finkel, Jeff Michels, Kristina
    Toutanova, and Andrew Y. Ng

2
The backdrop
  • There is a long, sometimes successful history of
    writing by hand systems writing systems that
    understand more deeply
  • Using limited vocabulary and syntax in a literal
    way over limited domains. The TAUM-METEO system.
  • Recently, statistical/machine learning
    computational linguistics has provided tools for
    disambiguating natural language.
  • Parsers and annotators for any text from any
    domain
  • E.g., Named Entity Recognition Person? Company?
  • In August 2004, Charles Schwab came to Arizona
    and opened a temporary location on 92nd Street

3
An external perspective on NLP
  • NLP has many successful tools with all sorts of
    uses
  • Part of speech tagging, named entity recognition,
    syntactic parsing, semantic role parsing,
    coreference determination
  • but they concentrate on structure not meaning
  • By-and-large non-NLP people want systems for more
    holistic semantic tasks
  • Text categorization
  • Information retrieval/web search
  • The state-of-the-art in these areas is (slightly
    extended) bag-of-words models

4
The problem for NLP
  • The problem for NLP Search engines actually work
    pretty well for people
  • But people would like to get more from
    text-processing applications
  • Information gathering is not just surface
    expression
  • Answers to many questions are a bit below the
    surface
  • This interpretation is the difference between
    data and knowledge
  • Challenge a tool that works robustly on any text
    and understands a useful, greater amount of
    sentence meaning

5
Talk Outline
  • The NLP Challenge Beyond the bag of words
  • The Pascal task of robust local textual inference
  • Deep logical approaches to NLP
  • Answering GRE analytic section logic puzzles
    Lev, MacCartney, Levy, and Manning 2004
  • Some first attempts
  • Raina, Ng, and Manning 2005 Haghighi, Ng, and
    Manning 2005
  • A second attempt
  • MacCartney, deMarneffe, Grenager, Cer, and
    Manning 2006

6
2. The PASCAL Textual Inference TaskDagan,
Glickman, and Magnini 2005
  • The task Can systems correctly perform local
    textual inferences individual inference steps?
  • On the assumption that some piece of text (T) is
    true, does this imply the truth of some other
    hypothesis text (H)?
  • Sydney was the host city of the 2000 Olympics ?
  • The Olympics have been held in Sydney TRUE
  • The format could be used for evaluating extended
    inferential chains or knowledge
  • But, in practice, fairly direct, local stuff

7
The PASCAL Textual Inference Task
  • The task focuses on the variability of semantic
    expression in language
  • The reverse task of disambiguation
  • The Dow Jones Industrial Average closed up 255
  • The Dow climbed 255 points today
  • The Dow Jones Industrial Average gained over 250
    points
  • An abstraction from any particular application,
    but directly applicable to applications

8
Natural Examples Reading Comprehension
  • (CNN Student News) -- January 24, 2006
  • Answer the following questions about today's
    featured news stories. Write your answers in the
    space provided.
  • 1. Where is the country of Somalia located? What
    ocean borders this country?
  • 2. Why did crew members from the USS Winston S.
    Churchill recently stop a small vessel off the
    coast of Somalia? What action did the crew of the
    Churchill take?

9
Real Uses
  • Semantic search
  • Find documents about lobbyists attempting to
    bribe U.S. senators
  • (lobbyist attempted to bribe U.S. senator)
  • Question answering
  • Who acquired Overture?
  • Use to score candidate answers based on passage
    retrieval and named entity recognition
  • Customer email response
  • My Squeezebox regularly skips during music
    playback
  • ? Sender can hear music through Squeezebox
  • Relation extraction (database building)
  • Document summarization

10
Verification of terms Dan Roth
  • Non-disclosure Agreement
  • WHEREAS Recipient is desirous of obtaining said
    confidential information for purposes of
    evaluation thereof and as a basis for further
    discussions with Owner regarding assistance with
    development of the confidential information for
    the benefit of Owner or for the mutual benefit of
    Owner and Recipient  THEREFORE, Recipient
    hereby agrees to receive the information in
    confidence and to treat it as confidential for
    all purposes. Recipient will not divulge or use
    in any manner any of said confidential
    information unless by written consent from Owner,
    and Recipient will use at least the same efforts
    it regularly employs for its own confidential
    information to avoid disclosure to others. 
    Provided, however, that this obligation to
    treat information confidentially will not apply
    to any information already in Recipient's
    possession or to any information that is
    generally available to the public or becomes
    generally available through no act or influence
    of Recipient. Recipient will inform Owner of the
    public nature or Recipient's possession of the
    information without delay after Owner's
    disclosure thereof or will be stopped from
    asserting such as defense to remedy under this
    agreement. 
  • Each party acknowledges that all of the
    disclosing party's Confidential Information is
    owned solely by the disclosing party (or its
    licensors and/or other vendors) and that the
    unauthorized disclosure or use of such
    Confidential Information would cause irreparable
    harm and significant injury, the degree of which
    may be difficult to ascertain. Accordingly, each
    party agrees that the disclosing party will have
    the right to obtain an immediate injunction
    enjoining any breach of this Agreement, as well
    as the right to pursueany and all other rights
    and remedies available at law or in equity for
    such a breach.
  • Recipient will exercise its best efforts to
    conduct its evaluation within a reasonable time
    after Owner's disclosure and will provide Owner
    with its assessment thereof without delay.
    Recipient will return all information, including
    all copies thereof, to Owner upon request. This
    agreement shall remain in effect for ten years
    after the date of it's execution, and it shall be
    construed under the laws of the State of Texas. 
  • Conditions I care about
  • All information discussed is freely shareable
    unless other party indicates in advance that it
    is confidential
  • TRUE? FALSE?

11
PASCAL RTE Examples
Should be easy
  • T iTunes software has seen strong sales in
    Europe.
  • H Strong sales for iTunes in Europe. TRUE
  • T The anti-terrorist court found two men guilty
    of murdering Shapour Bakhtiar and his secretary
    Sorush Katibeh, who were found with their throats
    cut in August 1991.
  • H Shapour Bakhtiar died in 1991. TRUE
  • T Like the United States, U.N. officials are
    also dismayed that Aristide killed a conference
    called by Prime Minister Robert Malval in
    Port-au-Prince in hopes of bringing all the
    feuding parties together.
  • H Aristide had Prime Minister Robert Malval
    murdered in Port-au-Prince. FALSE

Note not entailed!
Theyre allowed to try to trick you
12
Evaluation
  • The notion of inference is as would typically be
    interpreted by people, assuming common human
    understanding of language and common background
    knowledge.
  • Not entailment according to some linguistic
    theory
  • High agreement on this data human accuracy is
    about 95
  • Accuracy you correctly say whether the
    hypothesis does or does not follow from the text
  • Confidence weighted score or average accuracy
  • Rank all n pairs by system-supplied confidence
  • Use ranking to define a weighted average
  • Tests whether you know what you know

13
3. Logics mapping from NL to Reasoning GRE/LSAT
logic puzzles
  • Six sculpturesC, D, E, F, G, and Hare to be
    exhibited in rooms 1, 2, and 3 of an art gallery.
  • Sculptures C and E may not be exhibited in the
    same room.
  • Sculptures D and G must be exhibited in the same
    room.
  • If sculptures E and F are exhibited in the same
    room, no other sculpture may be exhibited in that
    room.
  • At least one sculpture must be exhibited in each
    room, and no more than three sculptures may be
    exhibited in any room.
  • 4. If sculpture D is exhibited in room 1 and
    sculptures E and F are exhibited in room 2, which
    of the following must be true ?
  • (A) Sculpture C must be exhibited in room 1.
  • (B) Sculpture H must be exhibited in room 3.
  • (C) Sculpture G must be exhibited in room 1.
  • (D) Sculpture H must be exhibited in room 2.
  • (E) Sculptures C and H must be exhibited in the
    same room.

14
The GRE logic puzzles domain
  • An English description of a constraint
    satisfaction problem, followed by questions about
    satisfying assignments
  • Answers cannot be found in the text by surface
    question answering methods (e.g., TREC QA)
  • Formalization and logical inference are necessary
  • Obtaining proper formalization requires
  • Accurate syntactic parsing
  • Resolving semantic ambiguities (scope,
    co-reference)
  • Discourse analysis
  • Easy to test (found test material)
  • If the formalization is right, the reasoning is
    easy
  • No ambiguity or subjectivity about the correct
    answer

15
Challenges
  • For most puzzles, the puzzle type, the variables,
    and values for assignments are not obvious
  • Mrs. Green wishes to renovate her cottage.
    She hires the services of a plumber, a carpenter,
    a painter, an electrician, and an interior
    decorator. The renovation is to be completed in a
    period of one working week i.e. Monday to
    Friday. Every worker will be taking one complete
    day to do his job. Mrs. Green will allow just one
    person to work per day.
  • The painter will do his work only after the
    plumber and the carpenter have completed their
    jobs.
  • The interior decorator has to complete his job
    before that of the electrician.
  • The type of this puzzle is a constrained linear
    ordering of things (here, contractors)

16
Scope Needs to be Resolved!
  • At least one sculpture must be exhibited in each
    room.
  • The same sculpture in each room?
  • No more than three sculptures may be exhibited in
    any room.
  • Reading 1 For every room, there are no more
    than three sculptures exhibited in it.
  • Reading 2 Only three or less sculptures are
    exhibited (the rest are not shown).
  • Reading 3 Only a certain set of three or less
    sculptures may be exhibited in any room (for the
    other sculptures there are restrictions in
    allowable rooms).
  • Some readings will be ruled out by being
    uninformative or by contradicting other
    statements
  • Otherwise we must be content with probability
    distributions over scope-resolved semantic forms

17
System overviewLev, MacCartney, Levy, and
Manning 2004
English text
parse trees
SLformulas
FOLformulas
URs
DLformulas
correctanswer
18
Semantic logic (SL)
  • Our goal is a translation to First Order Logic
    (FOL)
  • But FOL is ungainly, and far from NL
  • NL has events, plurals, modalities, complex
    quantifiers
  • Intermediate representation semantic logic (SL)
  • Event and group variables
  • Modal operators?? (necessarily) and ? (possibly)
  • Generalized quantifiers Q(type, var, restrictor,
    body)
  • Our example becomes
  • ? Q(?, x1, room(x1), Q(?1, x2, sculpture(x2), ?e
    exhibit(e) ? patient(e, x2) ? in(e, x1))
  • ?? Q(?, y, room(y), Q(gt3, g, sculpture(g), ?e
    exhibit(e) ? patient(e, g) ? in(e, y))
  • More compact, more natural

19
Combinatorial semantics
  • Aim is to assign a semantic representation
    (roughly, a lambda expression) to each semantic
    unit
  • The hope is to use a small lexicon for
    semantically potent words and to synthesize
    semantics for open class words

every dog barks (S) ?x.(dog(x)?bark(x))
every dog (NP) ?Q.?x.(dog(x)?Q_at_x)
barks (VP) ?x.bark(x)
every (Det) ?P.?Q.?x.(P_at_x?Q_at_x)
dog (Noun) ?x.dog(x)
20
FOL Reasoning module
  • Complementary reasoning engines
  • A theorem prover (TP) is used to show that a set
    of formulas is inconsistent (proof by
    contradiction)
  • A model builder (MB) is used to show that a set
    of formulas is consistent (proof by example)
  • Idea harness TP and MB in tandem
  • Could questions examine each answer choice
  • MB says choice consistent ? choice is correct
  • TP says choice inconsistent ? choice is incorrect
  • Must questions examine negation of each choice
  • MB says negation consistent ? choice is incorrect
  • TP says negation inconsistent ? choice is correct
  • Just a theorem prover is not enough
  • Cant handle could be true questions properly
  • Despite finite domain, some proofs too deep to
    find

21
How far did we get?
  • Worked to be able to handle the sculptures
    example (set of 6 questions) completely
  • Worked to be able to do a second problem
  • What about new puzzle texts?
  • Statistical parse is correct (fully usable) in
    about 60 of cases
  • Main problem is unhandled semantic phenomena,
    e.g.,  different, except, a complete list,
    VP ellipsis, ,
  • Only 1 out of 21 questions actually doable start
    to end!

22
Pascal RTE comparison
  • Bos and Markert (2005) used a similar theorem
    prover/model builder combination as part of a two
    strategy entry in RTE1.
  • Indeed, our logic puzzles approach was strongly
    influenced by Bos work
  • Coverage/correctness of approach
  • Found proof/contradiction for 30
    pairs(3.757.5)
  • Of these, 23 were correct (77)
  • Example error
  • T Microsoft was established in Italy in 1985. ?
  • H Microsoft was established in 1985.

23
How real world textual inference differs from
logical semantics
  • Modals
  • Text Researchers at the Harvard School of Public
    Health say that people who drink coffee may be
    doing a lot more than keeping themselves awake -
    this kind of consumption apparently also can help
    reduce the risk of diseases.
  • ?Hypothesis Coffee drinking has health benefits.
    (RTE1 ID 19)
  • May is a discourse hedge, not a possible worlds
    modal
  • Reported views/speech
  • Text According to the Encyclopedia Britannica,
    Indonesia is the largest archipelagic nation in
    the world, consisting of 13,670 islands.
  • ? Hypothesis 13,670 islands make up Indonesia.
    (RTE1 ID 605)
  • Source for information, not to suggest truth
    unknown

24
Speaker meaning
  • The Pascal RTE task can be taken as an applied
    test of human notions of speaker meaning
  • It clearly goes beyond the literal meaning of the
    text
  • Recanati (2004 19) proposes regarding what is
    said as what a normal interpreter would
    understand as being said, in the context at
    hand.
  • Pascal RTE could be viewed as operationalizing
    such a criterion.

25
4. Tackling robust textual inference Weighted
Abductive inference
  • Idea Raina, Ng, and Manning 2005
  • Represent text and hypothesis as logical
    formulae.
  • A hypothesis can be inferred from the text if and
    only if the hypothesis logic formula can be
    proved from the text logical formula (at some
    cost).
  • Toy example

T Kidnappers released a Filipino hostage. H A Filipino hostage was freed. TRUE
(? A, B, E) Kidnappers(A) ? released(E, A, B) ? Filipino(B) ? hostage(B) (? X, F) Filipino(X) ? hostage(X) ? freed(F, X)
Prove?
Weighted abduction Allow assumptions at various
costs released(p, q, r) 2 gt freed(s,
r) (Hobbs et al., 1993)
26
Representation Example Bills mother walked to
the grocery store
Logical formula mother(A) Bill(B) poss(B,
A) grocery(C) store(C) walked(E, A, C)
VBD
PERSON
ARGM-LOC
VBD
PERSON
  • Can and do make this representation richer
  • walked is a verb
  • Bill is a PERSON (named entity).
  • Add sem roles store is the location/destination
    of walked.

27
Linguistic preprocessing
  • High performance Named Entity Recognizers
    Finkel et al.  2005
  • Canonicalization quantity, date, and money
    expressions
  • Normalized dates and relational expressions of
    amount gt 200
  • T Kessler's team conducted 60,643 face-to-face
    interviews with adults in 14 countries.
  • H Kessler's team interviewed more than 60,000
    adults in 14 countries.
  • Statistical parser.
  • Update data a little for 2005 Al Qaeda Aal
    -Qa?ieda
  • Collocations if appearing in WordNet (Bill
    hung_up the phone)
  • Semantic Role identification Propbank roles
    Toutanova et al. 2005
  • Coreference
  • T Since its formation in 1948, Israel H
    Israel was established in 1948.
  • Heuristics to find event nouns (the murder of
    police commander)
  • Hand-built acronyms, country and nationality,
    factive verbs

28
How can we model abductive assumption costs in
proof?
  • Consider assumptions that unify pairs of terms.
  • Need to assign cost C(A) to all assumptions A of
    the form
  • Possible considerations

Predicate match cost Synonym cost? Are they similar? Same named entity type? Argument match cost Same semantic role? Coreference for constants?
29
Abductive assumptions
  • Compute features f(A)(f1(A), f2(A), ?, fD(A)) of
    A.
  • Given feature weights w(w1, w2, ?, wD), define
  • Each such assumption provides a potential proof
    step.
  • Can find a minimum cost complete proof by uniform
    cost search.
  • Output TRUE iff this proof has cost lt a threshold
    wD1.
  • Weak proof theory!!

30
Can we learn the assumption costs?
  • Intuition Given a data set, find assumptions
    that are used in the proofs for TRUE examples,
    and lower their costs.
  • The minimum cost proof Pmin consists of a
    sequence of assumptions A1, A2, ?, AN.
  • Construct a feature vector for the proof Pmin
  • If Pmin is given, the final cost for an example
    is linear in w.
  • However, the overall feature vector is computed
    by abductive theorem proving, which uses w
    internally!
  • Solve by an iterative procedure guaranteed to
    converge to a local maximum of the (nonconvex)
    likelihood function.

31
Example features
  • Zero cost to match same item with same arguments
  • Low cost to unify things listed in WordNet as
    synonyms
  • Higher cost to match something with vague LSA
    similarity
  • Higher cost if arguments of verb mismatch
  • Antonyms/Negation High cost for unifying verbs,
    if they are antonyms or one is negated and the
    other not.
  • T Stocks fell. H
    Stocks rose. FALSE
  • T Clintons book was not a hit H Clintons
    book was a hit. FALSE
  • Non-factive verbs High cost for unifying verbs,
    if only one is modified by a non-factive verb.
  • T John was charged for doing X. H John
    did X. FALSE

32
Results
  • Evaluate on PASCAL RTE1 dataset.
  • Development set of 567 examples.
  • Test set of 800 examples.
  • Divided into 7 tasks (motivating applications)
  • Balanced number of true/false examples.
  • Output TRUE/FALSE, confidence value.
  • Empirically found to be tough.
  • Baselines
  • TF, TFIDF
  • Standard information retrieval algorithms.
  • Ignore natural language syntax.

33
RTE1 Results Raina et al. 2005
Baselines Baselines General General ByTask ByTask
tf tf.idf Accuracy CWS Accuracy CWS
DevSet 1 64.8 0.778 65.5 0.805
DevSet 2 52.1 0.578 55.7 0.661
DevSet 1 DevSet 2 58.5 0.679 60.8 0.743
Test Set 49.5/0.548 51.8/0.56 56.3 0.620 55.3 0.686
  • Difficult! Best other results Accuracy58.6,
    CWS0.617

34
Partial coverage accuracy results
Both know something, but task-specific
optimization is better!
ByTask
ByTask
General
Random
35
5. New System ArchitectureMacCartney, Grenager,
de Marneffe, Cer, Manning, HLT-NAACL 2006
An inference
Linguistic Preprocessing
Aligner
Inferer
Answer R? yes, no
36
Why the old approach was broken!
  • P DeLay bought Enron stock and Clinton sold
    Enron stock
  • H DeLay sold Enron stock

Yes
No
Probably, yes
37
Why we need sloppy matching
  • Passage Today's best estimate of giant panda
    numbers in the wild is about 1,100 individuals
    living in up to 32 separate populations mostly in
    China's Sichuan Province, but also in Shaanxi and
    Gansu provinces.
  • Hypothesis 1 There are 32 pandas in the wild in
    China. (FALSE)
  • Hypothesis 2 There are about 1,100 pandas in the
    wild in China. (TRUE)
  • Wed like to get this right, but we just dont
    have the technology to fully understand from best
    estimate of giant panda numbers in the wild is
    about 1,100 to there are about 1,100 pandas in
    the wild

38
A solution Align, then evaluate
  • P DeLay bought Enron stock and Clinton sold
    Enron stock
  • H DeLay sold Enron stock

39
Things we aim to fix MacCartney, Grenager, de
Marneffe, Cer, Manning, HLT-NAACL 2006
  • Confounding of alignment and entailment
  • Assumption on monotonicity
  • Matching/embedding methods assume upward
    monotonicity
  • Sue saw Les Miserables in London ?
  • Sue saw Les Miserables
  • But
  • Fedex began business in Zimbabwe in 2003 ?
  • Fedex began business in 2003
  • Assumption/requirement of locality

40
Whether an alignment is good depends on non-local
factors
1 P Some students came to school by car. Q Did
any students come to school? A Yes 2 P No
students came to school by car. Q Did any
students come to school? A Dont know Context
of monotonicity Whether it is okay to have by
car as extra material in the hypothesis
depends on subject quantifier
3 P It is not the case that Bin Laden was seen
in Tora Bora. Q Was Bin Laden seen in Tora
Bora? A no Its difficult to see non-factive
context when aligning seen ? seen
41
Representation/alignment example
  • T Mitsubishi Motors Corp.s new vehicle sales in
    the US fell 46 percent in June.
  • H Mitsubishi sales rose 46 percent.
  • Answer not entailed

Alignment from hypothesis to text
rose ? fell
sales ? sales
Mitsubishi ? Mitsubishi_Motors_Corp.
percent ? percent
46 ? 46

Alignment score 0.89 Alignment score 0.89 Alignment score 0.89
Features Aligned antonyms in pos/pos context
Structure main predicate good match Numeric
quantity match Date text
date deleted in hypothesis Alignment good

Infererence score -5.42 ? FALSE
42
Modal Inferer
  • Identify aligned roots
  • Determine modality of each root
  • Using linguistic features
  • e.g. can, perhaps, might ? POSSIBLE
  • Six canonical modalities
  • POSSIBLE, NOT_POSSIBLE, ACTUAL, NOT_ACTUAL,
    NECESSARY, NOT_NECESSARY
  • Look up judgment for modality pair
  • (POSSIBLE, ACTUAL) ? dontknow
  • (NECESSARY, NOT_ACTUAL) ? no
  • (ACTUAL, POSSIBLE) ? yes
  • P The Scud C has a range of 500 kilometers and
    is manufactured in Syria with know-how from North
    Korea.
  • H A Scud C can fly 500 kilometers.
  • (ACTUAL, POSSIBLE) ? yes

43
Factives other implicatives
  • T Libya has tried, with limited success, to
    develop its own indigenous missile, and to extend
    the range of its aging SCUD force for many years
    under the Al Fatah and other missile programs.
  • H Libya has developed its own domestic missile
    program.
  • Answer not entailed. Tried to X does not
    entail X.
  • Evaluate governing verbs for implicativity
  • Unknown say, tell, suspect, try,
  • Fact know, wonderful,
  • True manage to,
  • False doubtful, misbelieve,
  • Need to check for negative context

44
Numeric Mismatches
  • Check alignment of number, date, money nodes
  • T The Pew Internet Life survey interviewed
    people in 26 countries.
  • H The Pew Internet Life study interviewed people
    in more than 20 countries
  • T BioPort Corp. of Lansing, Michigan is the
    sole U.S. manufacturer of an anthrax vaccine.
  • H There are three U.S. manufacturers of anthrax
    vaccine.
  • three is aligned to NO_WORD here sole ? 1 ?

45
Restrictive adjuncts
  • We can check whether adding/dropping restrictive
    adjuncts is licensed relative to upward and
    downward entailing contexts
  • In all, Zerich bought 422 million worth of oil
    from Iraq, according to the Volcker committee
  • ? Zerich bought oil from Iraq during the embargo
  • Zerich didnt buy any oil from Iraq, according to
    the Volcker committee
  • ? Zerich didnt buy oil from Iraq during the
    embargo

46
What do we have?
  • Not full, deep semantics
  • But it still isnt possible to do logical
    inference for open domain robust textual
    inference (with real data)
  • We do inference-pattern matching
  • On semantic dependency graphs not surface
    patterns
  • Calculate rich semantic features
  • Adjunct_deletion_licensed_relative_to_universal
  • Related to the notion of natural logic

47
Natural logic
  • A logic whose vehicle of inference is natural
    language (syntactic structures)
  • No translation into conventional logical notation
  • Aristotles syllogisms ? Leibniz (who coined
    term) ? Lakoff ? van Benthem ? Sánchez Valencia
  • Natural logic lets us sidestep having to fully
    translate sentences into an accurate semantic
    representation
  • Exercise accurately translate into FOL
  • According to Ruiz, police may have been reluctant
    to enter the building before they were convinced
    that most of the weapons had been found.

The police found few weapons.
All guns are weapons.
The police found few guns.
?
48
Our RTE2 Results
Learned Accuracy
Dev Set 67.0
Test Set 60.5
49
Most useful features
  • Positive
  • Structural match
  • Good alignment score
  • Modal yes
  • Polarity text and hypothesis both negative
    polarity
  • Negative
  • Date inserted/mismatched
  • Structure clear mismatch
  • Quantifier mismatch
  • Bad alignment score
  • Different polarity
  • Modal no/dont know

50
Things that its hard to do
  • Non-entailment is easier than entailment
  • Good at finding knock-out features
  • Hard to be certain that weve considered
    everything
  • Deal with dropping/adding modifiers vs.
    upward/downward entailing contexts is hard
  • Need to know which are restrictive/not/discourse
    items
  • Maurice was subsequently killed in Angola.
  • Multiword lexical semantics/world knowledge
  • Were pretty good at synonyms, hyponyms, antonyms
  • But we cant resolve a lot of multi-word
    equivalences
  • T David McCool took the money and decided to
    start Muzzy Lane in 2002
  • H David McCool is the founder of Muzzy Lane

51
Envoi
  • What Ive shown
  • The beginnings of an ability to do robust textual
    inference
  • Still lots of room for improving/fixing
    everything!
  • Potential for applications like evidence
    extraction
  • Find passages suggesting price fixing by Enron
  • Moderate precision is still useful for reranking
    applications like semantic search/question
    answering
  • Meta question the path to semantics for NLP
  • Hand-built logical methods still dont scale
  • Is continuing to annotate data the only answer?
  • How much can we do with unsupervised learning?
    With hand-built resources?
Write a Comment
User Comments (0)
About PowerShow.com