Three Stories on Automated Reasoning for Natural Language Understanding - PowerPoint PPT Presentation

1 / 117
About This Presentation
Title:

Three Stories on Automated Reasoning for Natural Language Understanding

Description:

Text: Vincent loves Mia. DRT: FOL: x y(vincent(x) & mia(y) & love(x,y) ... Mia likes Vincent. She does not like him. Two interpretations, only one consistent: ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 118
Provided by: joha167
Category:

less

Transcript and Presenter's Notes

Title: Three Stories on Automated Reasoning for Natural Language Understanding


1
Three Stories on Automated Reasoning for
Natural Language Understanding
  • Johan Bos
  • University of Rome "La Sapienza
  • Dipartimento di Informatica

2
Background
  • My work is in between
  • natural language processing
  • computational linguistics
  • formal and computational semantics
  • Aim of my work
  • implement linguistic theories
  • use automated reasoning in modeling natural
    language understanding

3
Applications
  • What kind of applications?
  • Human-machine dialogue systems
  • Question answering systems
  • Textual entailment systems
  • Use of logical inference
  • Off the shelf systems, FOL theorem provers and
    finite model builders
  • Empirically successful?

4
Surprise
  • Perhaps surprisingly, automated reasoning tools
    rarely make it intoNLP applications
  • Why?
  • Requires interdisciplinary background
  • Gap between formal semantic theory and practical
    implementation
  • It is just not trendy --- statistical approaches
    dominate the field

5
Three Stories
  • World Wide Computational Semantics
  • The worlds first serious implementationof
    Discourse Representation Theory, with the help
    of the web and theorem proving
  • Godot, the talking robot
  • The first robot that computes semantic
    representations and performs inferences using
    theorem proving and model building
  • Recognising Textual Entailment
  • Automated deduction applied in wide-coveragenatur
    al language processing

6
The First Story
World Wide Computational Semantics The first
serious implementation of Discourse
Representation Theory, with the help of the
internet
1994-2001
7
How it started
  • Implementing tools for the semantic analysis of
    English
  • Follow linguistic theory as closely as possible
  • Discourse Representation Theory DRT
  • First-order logic
  • Presupposition projection
  • Computational Semantics

8
Computational Semantics
  • How can we automate the process of associating
    semantic representations with expressions of
    natural language?
  • How can we use logical representations of natural
    language expressions to automate the process of
    drawing inferences?

9
Basic idea
  • Text Vincent loves Mia.

10
Basic idea
  • Text Vincent loves Mia.
  • DRT

11
Basic idea
  • Text Vincent loves Mia.
  • DRT
  • FOL ?x?y(vincent(x) mia(y) love(x,y))

12
Basic idea
  • Text Vincent loves Mia.
  • DRT
  • FOL ?x?y(vincent(x) mia(y) love(x,y))
  • BK ?x (vincent(x) ? man(x)) ?x (mia(x) ?
    woman(x)) ?x (man(x) ? ? woman(x))

13
Basic idea
  • Text Vincent loves Mia.
  • DRT
  • FOL ?x?y(vincent(x) mia(y) love(x,y))
  • BK ?x (vincent(x) ? man(x)) ?x (mia(x) ?
    woman(x)) ?x (man(x) ? ? woman(x))
  • Model D d1,d2 F(vincent)d1
    F(mia)d2
    F(love)(d1,d2)

14
?
  • Text Vincent loves Mia.
  • DRT

15
Compositional Semantics
  • The ProblemGiven a natural language expression,
    how do we convert it into a logical formula?
  • Freges principleThe meaning of a compound
    expression is a function of the meaning of its
    parts.

16
Lexical semantics
17
A derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p(x)q(x) ?z.
    ?y.

18
A derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p(x)q(x) ?z.
    ?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?p. ?q. p(x)q(x)(?z.
    )

19
A derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p(x)q(x) ?z.
    ?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?q.
    q(x))

20
A derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p(x)q(x) ?z.
    ?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?q. q(x)

21
A derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p(x)q(x) ?x.
    ?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?q. q(x)
  • ---------------------------------------
    ----------------------------------------- (BA)

  • S a spokesman lied
  • ?q.
    q(x)(?y. )

22
A derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p(x)q(x) ?x.
    ?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?q. q(x)
  • ---------------------------------------
    ----------------------------------------- (BA)

  • S a spokesman lied


23
A derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p(x)q(x) ?x.
    ?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?q. q(x)
  • ---------------------------------------
    ----------------------------------------- (BA)

  • S a spokesman lied

24
The DORIS System
  • Reasonable grammar coverage
  • Parsed English sentences, followed by resolving
    ambiguities
  • Pronouns
  • Presupposition
  • Generated many different semantic representation
    for a text

25
Texts and Ambiguity
  • Usually, ambiguities cause many possible
    interpretations
  • ExampleButch walks into his modest kitchen.
    He opens the refrigerator. He takes out a milk
    and drinks it.

26
Texts and Ambiguity
  • Usually, ambiguities cause many possible
    interpretations
  • ExampleButch walks into his modest kitchen.
    He opens the refrigerator. He takes out a milk
    and drinks it.

27
Texts and Ambiguity
  • Usually, ambiguities cause many possible
    interpretations
  • ExampleButch walks into his modest kitchen.
    He opens the refrigerator. He takes out a milk
    and drinks it.

28
Texts and Ambiguity
  • Usually, ambiguities cause many possible
    interpretations
  • ExampleButch walks into his modest kitchen.
    He opens the refrigerator. He takes out a milk
    and drinks it.

29
Basic idea of DORIS
  • Given a text, produce as many different DRSs
    semantic interpretations as possible
  • Filter out strange interpretations
  • Inconsistent interpretations
  • Uninformative interpretations
  • Applying theorem proving
  • Use general purpose FOL theorem prover
  • Bliksem Hans de Nivelle

30
Screenshot
31
Consistency checking
  • Inconsistent text
  • Mia likes Vincent.
  • She does not like him.
  • Two interpretations, only one consistent
  • Mia likes Jody.
  • She does not like her.

32
Informativity checking
  • Uninformative text
  • Mia likes Vincent.
  • She likes him.
  • Two interpretations, only one informative
  • Mia likes Jody.
  • She likes her.

33
Local informativity
  • Example
  • Mia is the wife of Marsellus.
  • If Mia is the wife of Marsellus, Vincent will be
    disappointed.
  • The second sentence is informative with respect
    to the first. But

34
Local informativity
35
Local informativity
?
36
Local consistency
  • Example
  • Jules likes big kahuna burgers.
  • If Jules does not like big kahuna burgers,
    Vincent will order a whopper.
  • The second sentence is consistent with respect to
    the first. But

37
Local consistency
38
Local consistency
?
39
Studying Presupposition
  • The DORIS system allowed one to study the
    behaviour of presupposition
  • Examples such as
  • If Mia has a husband, then her husband is out of
    town.
  • If Mia is married, then her husband is out of
    town.
  • If Mia is dating Vincent, then her husband is out
    of town.

40
Applying Theorem Proving
  • The first version of DORIS sort of worked, but
  • Many readings to start with, explosion
  • The local constraints added a large number of
    inference tasks
  • It could take about 10 minutes for a complex
    sentence

41
MathWeb
  • MathWeb by Kohlhase Franke came to the rescue
  • Theorem proving services via the internet
  • Interface Doris with MathWeb
  • At the time this was a sensation!
  • What happened exactly?

42
World Wide Computational Semantics
  • Just in order to find out whether Mia was married
    or not, thousands of computers world wide were
    used

43
What can we say about DORIS?
  • Demonstrated that FOL could play an interesting
    role in natural language understanding
  • Helped to develop presupposition theory
  • Generated a new application area for automated
    deduction TPTP
  • Limitations
  • Scalability
  • Coverage

44
Six feet under
  • The DORIS system 1993-2001
  • Why?
  • Limited grammatical coverage
  • Unfocussed application domain
  • It would take at least 20 years to develop a
    parser that was both robust and accurateat
    least that was my belief

45
The Second Story
Godotthe talking robot Or how a mobile,
talking robot engaged in conversation using
theorem provers and model builders for
first-order logic
2001-2004
46
Human-Computer Dialogue
  • Focus on small domains
  • Grammatical coverage ensured
  • Background knowledge encoding
  • Spoken Dialogue system killer app
  • Godot the robot
  • Speech recognition and synthesis
  • People could give Godot directions, ask it
    questions, tell it new information
  • Godot was a REAL robot

47
Godot the Robot
Godot with Tetsushi Oka
48
Typical conversation with Godot
  • Simple dialogues in beginning
  • Human Robot?
  • Robot Yes?
  • Human Where are you?
  • Robot I am in the hallway.
  • Human OK. Go to the rest room!

49
Adding inference to Godot
  • Using the components of DORIS, we added semantics
    and inference
  • Using DRT
  • Inference tasks
  • Consistency checking
  • Informativity checking
  • Using FOL theorem provers and finite model
    builders

50
Advanced conversation with Godot
  • Dealing with inconsistency and informativeness
  • Human Robot, where are you?
  • Robot I am in the hallway.
  • Human You are in my office.
  • Robot No, that is not true.
  • Human You are in the hallway.
  • Robot Yes I know!

51
Parallel inference
  • Given a text T and a translation into FOL ?
  • If theorem prover finds a proof for ??, then T
    is inconsistent
  • If model builder finds a model for ?, then T is
    consistent
  • In other words, this is the Yin and Yen of
    inference

52
Why is this relevant to natural language?
  • Testing a discourse for consistency

53
Why is this relevant to natural language?
  • Testing a discourse for consistency

54
Why is this relevant to natural language?
  • Testing a discourse for consistency

55
Why is this relevant to natural language?
  • Testing a discourse for consistency

56
Why is this relevant to natural language?
  • Testing a discourse for informativity

57
Why is this relevant to natural language?
  • Testing a discourse for informativity

58
Why is this relevant to natural language?
  • Testing a discourse for informativity

59
Why is this relevant to natural language?
  • Testing a discourse for informativity

60
Minimal Models
  • Model builders normally generate models by
    iteration over the domain size
  • As a side-effect, the output is a model with a
    minimal domain size
  • From a linguistic point of view, this is
    interesting, as there is no redundant information
  • Minimal in extensions

61
Using models
  • ExamplesTurn on a light.Turn on every
    light.Turn on everything except the radio. Turn
    off the red light or the blue light.Turn on
    another light.

62
Videos of Godot
Video 1 Godot in the basement of Bucceuch Place
Video 2 Screenshot of dialogue manager with
DRSs and camera view of Godot
63
What can we say about Godot?
  • Demonstrated that FOL could play an interesting
    role in human machine dialogue systems
  • Also showed a new application of finite model
    building
  • Domain known means all background knowledge known
  • Limitations
  • Scalability, only small dialogues
  • Lack of incremental inference
  • Minimal models required

64
Godot the Robot later
Godot at the Scottish museum
65
The Third Story
Recognising Textual Entailment Or how
first-order automated deduction is applied to
wide-coverage semantic processing of texts
2005-present
66
Recognising Textual Entailment
  • What is it?
  • A task for NLP systems to recognise entailment
    between two (short) texts
  • Proved to be a difficult, but popular task.
  • Organisation
  • Introduced in 2004/2005 as part of the PASCAL
    Network of Excellence, RTE-1
  • A second challenge (RTE-2) was held in 2005/2006
  • PASCAL provided a development and test set of
    several hundred examples

67
RTE Example (entailment)
RTE 1977 (TRUE)
His family has steadfastly denied the
charges. ----------------------------------------
------------- The charges were denied by his
family.
68
RTE Example (no entailment)
RTE 2030 (FALSE)
Lyon is actually the gastronomical capital of
France. ------------------------------------------
----------- Lyon is the capital of France.
69
Aristotles Syllogisms
ARISTOTLE 1 (TRUE)
All men are mortal. Socrates is a
man. ------------------------------- Socrates is
mortal.
70
Five methods
  • Five different methods to RTE
  • Ranging in sophistication from very basic to
    advanced

71
Recognising Textual Entailment
  • Method 1
  • Flipping a coin

72
Flipping a coin
  • Advantages
  • Easy to implement
  • Cheap
  • Disadvantages
  • Just 50 accuracy

73
Recognising Textual Entailment
  • Method 2
  • Calling a friend

74
Calling a friend
  • Advantages
  • High accuracy (95)
  • Disadvantages
  • Lose friends
  • High phone bill

75
Recognising Textual Entailment
  • Method 3
  • Ask the audience

76
Ask the audience
RTE 893 (????)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
77
Human Upper Bound
RTE 893 (TRUE)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
78
Recognising Textual Entailment
  • Method 4
  • Word Overlap

79
Word Overlap Approaches
  • Popular approach
  • Ranging in sophistication from simple bag of word
    to use of WordNet
  • Accuracy rates ca. 55

80
Word Overlap
  • Advantages
  • Relatively straightforward algorithm
  • Disadvantages
  • Hardly better than flipping a coin

81
RTE State-of-the-Art
  • Pascal RTE challenge
  • Hard problem
  • Requires semantics

82
Recognising Textual Entailment
  • Method 5
  • Semantic Interpretation

83
Basic idea
  • Given a textual entailment pair T/H withtext T
    and hypothesis H
  • Produce DRSs for T and H
  • Translate these DRSs into FOL
  • Generate Background Knowledge in FOL
  • Use ATPs to determine the likelyhood of
    entailment

84
Wait a minute
  • This requires that we have the means to produce
    semantic representations DRSs for any kind of
    English input
  • Recall DORIS experience
  • Do we have English parsers at our disposal that
    do this?

85
Robust Parsing
  • Rapid developments in statistical parsing the
    last decades
  • These parsers are trained on large annotated
    corpora tree banks
  • Yet most of these parsers produced syntactic
    analyses not suitable for systematic semantic
    work
  • This changed with the development of CCG bank and
    a fast CCG parser

86
Implementation CCG/DRT
  • Use standard statistical techniques
  • Robust wide-coverage parser
  • Clark Curran (ACL 2004)
  • Grammar derived from CCGbank
  • 409 different categories
  • Hockenmaier Steedman (ACL 2002)
  • Compositional Semantics, DRT
  • Wide-coverage semantics
  • Bos (IWCS 2005)

87
Example Output
  • ExamplePierre Vinken, 61 years old, will join
    the board as a nonexecutive director Nov. 29. Mr.
    Vinken is chairman of Elsevier N.V., the Dutch
    publishing group.
  • Semantic representation, DRT
  • Complete Wall Street Journal

88
Using Theorem Proving
  • Given a textual entailment pair T/H with text T
    and hypothesis H
  • Produce DRSs for T and H
  • Translate these DRSs into FOL
  • Give this to the theorem prover
  • T ? H
  • If the theorem prover finds a proof, then we
    predict that T entails H

89
Vampire (Riazanov Voronkov 2002)
  • Lets try this. We will use the theorem prover
    Vampire
  • This gives us good results for
  • apposition
  • relative clauses
  • coodination
  • intersective adjectives/complements
  • passive/active alternations

90
Example (Vampire proof)
RTE-2 112 (TRUE)
On Friday evening, a car bomb exploded outside a
Shiite mosque in Iskandariyah, 30 miles south of
the capital. -------------------------------------
---------------- A bomb exploded outside a mosque.
91
Example (Vampire proof)
RTE-2 489 (TRUE)
Initially, the Bundesbank opposed the
introduction of the euro but was compelled to
accept it in light of the political pressure of
the capitalist politicians who supportedits
introduction. ------------------------------------
----------------- The introduction of the euro
has been opposed.
92
Background Knowledge
  • However, it doesnt give us good results for
    cases requiring additional knowledge
  • Lexical knowledge
  • World knowledge
  • We will use WordNet as a start to get additional
    knowledge
  • All of WordNet is too much, so we create
    MiniWordNets

93
MiniWordNets
  • MiniWordNets
  • Use hyponym relations from WordNet to build an
    ontology
  • Do this only for the relevant symbols
  • Convert the ontology into first-order axioms

94
MiniWordNet an example
  • Example text
  • There is no asbestos in our products now.
    Neither Lorillard nor the researchers who studied
    the workers were aware of any research on smokers
    of the Kent cigarettes.

95
MiniWordNet an example
  • Example text
  • There is no asbestos in our products now.
    Neither Lorillard nor the researchers who studied
    the workers were aware of any research on smokers
    of the Kent cigarettes.

96
(No Transcript)
97
?x(user(x)?person(x)) ?x(worker(x)?person(x)) ?x(r
esearcher(x)?person(x))
98
?x(person(x)??risk(x)) ?x(person(x)??cigarette(x))
.
99
Using Background Knowledge
  • Given a textual entailment pair T/H with text T
    and hypothesis H
  • Produce DRS for T and H
  • Translate drs(T) and drs(H) into FOL
  • Create Background Knowledge for TH
  • Give this to the theorem prover
  • (BK T) ? H

100
MiniWordNets at work
RTE 1952 (TRUE)
Crude oil prices soared to record
levels. ------------------------------------------
----------- Crude oil prices rise.
  • Background Knowledge?x(soar(x)?rise(x))

101
Troubles with theorem proving
  • Theorem provers are extremely precise.
  • They wont tell you when there is almost a
    proof.
  • Even if there is a little background knowledge
    missing, Vampire will say
  • dont know

102
Vampire no proof
RTE 1049 (TRUE)
Four Venezuelan firefighters who were traveling
to a training course in Texas were killed when
their sport utility vehicle drifted onto the
shoulder of a Highway and struck a parked
truck. -------------------------------------------
--------------------- Four firefighters were
killed in a car accident.
103
Using Model Building
  • Need a robust way of inference
  • Use model builders Mace, Paradox
  • McCune
  • Claessen Sorensson (2003)
  • Use size of (minimal) model
  • Compare size of model of T and TH
  • If the difference is small, then it is likely
    that T entails H

104
Using Model Building
  • Given a textual entailment pair T/H withtext T
    and hypothesis H
  • Produce DRSs for T and H
  • Translate these DRSs into FOL
  • Generate Background Knowledge
  • Give this to the Model Builder
  • i) BK T
  • ii) BK T H
  • If the models for i) and ii) are similar, then
    we predict that T entails H

105
Model similarity
  • When are two models similar?
  • Small difference in domain size
  • Small difference in predicate extensions

106
Example 1
  • T John met Mary in RomeH John met Mary
  • Model T 3 entitiesModel TH 3 entities
  • Modelsize difference 0
  • Prediction entailment

107
Example 2
  • T John met Mary H John met Mary in Rome
  • Model T 2 entitiesModel TH 3 entities
  • Modelsize difference 1
  • Prediction no entailment

108
Model size differences
  • Of course this is a very rough approximation
  • But it turns out to be a useful one
  • Gives us a notion of robustness
  • Negation
  • Give not T and not T H to model builder
  • Disjunction
  • Not necessarily one unique minimal model

109
How well does this work?
  • We tried this at the RTE-1 and RTE-2
  • Using standard machine learning methods to build
    a decision tree using features
  • Proof (yes/no)
  • Domain size difference
  • Model size difference
  • Better than baseline, still room for improvement

110
RTE Results 2004/5
Bos Markert 2005
111
RTE State-of-the-Art
  • Pascal RTE challenge
  • Hard problem
  • Requires semantics

112
What can we say about RTE?
  • We can use FOL inference techniques successfully
  • There might be an interesting role for model
    building
  • The bottleneck is getting the right background
    knowledge

113
Lack of Background Knowledge
RTE-2 235 (TRUE)
Indonesia says the oil blocks are within its
borders, as does Malaysia, which has also sent
warships to the area, claiming that its waters
and airspace have been violated. ----------------
----------------------------------------------- Th
ere is a territorial waters dispute.
114
Winding up
  • Summary
  • Conclusion
  • Shameless Plug
  • Future

115
Summary
  • Use of first order inference tools has a major
    influence on how computational semantics is
    perceived today
  • Implementations used in pioneering work of using
    first-order inference in NLP
  • Implementations used in spoken dialogue systems
  • Now also used in wide-coverage NLP systems

116
Conclusions
  • We have got the tools for doing computational
    semantics in a principled way using DRT
  • For many applications, success depends on the
    ability to systematically generate background
    knowledge
  • Small restricted domains dialogue
  • Open domain
  • Finite model building has potential
  • Incremental inference

117
Shameless Plug
  • For more on the basic architecture underlying
    this work on computational semantics, and
    particular on implementations on the lambda
    calculus, and parallel use of theorem provers and
    model builders, see
  • www.blackburnbos.org
Write a Comment
User Comments (0)
About PowerShow.com