Three Stories on Automated Reasoning for Natural Language Understanding

About This Presentation

Title:

Three Stories on Automated Reasoning for Natural Language Understanding

Description:

Text: Vincent loves Mia. DRT: FOL: x y(vincent(x) & mia(y) & love(x,y) ... Mia likes Vincent. She does not like him. Two interpretations, only one consistent: ... – PowerPoint PPT presentation

Number of Views:130

Avg rating:3.0/5.0

Slides: 118

Provided by: joha167

Category:

more less

Transcript and Presenter's Notes

Title: Three Stories on Automated Reasoning for Natural Language Understanding

1
Three Stories on Automated Reasoning for
Natural Language Understanding

Johan Bos
University of Rome "La Sapienza
Dipartimento di Informatica

2
Background

My work is in between
natural language processing
computational linguistics
formal and computational semantics
Aim of my work
implement linguistic theories
use automated reasoning in modeling natural
language understanding

3
Applications

What kind of applications?
Human-machine dialogue systems
Question answering systems
Textual entailment systems
Use of logical inference
Off the shelf systems, FOL theorem provers and
finite model builders
Empirically successful?

4
Surprise

Perhaps surprisingly, automated reasoning tools
rarely make it intoNLP applications
Why?
Requires interdisciplinary background
Gap between formal semantic theory and practical
implementation
It is just not trendy --- statistical approaches
dominate the field

5
Three Stories

World Wide Computational Semantics
The worlds first serious implementationof
Discourse Representation Theory, with the help
of the web and theorem proving
Godot, the talking robot
The first robot that computes semantic
representations and performs inferences using
theorem proving and model building
Recognising Textual Entailment
Automated deduction applied in wide-coveragenatur
al language processing

6
The First Story
World Wide Computational Semantics The first
serious implementation of Discourse
Representation Theory, with the help of the
internet
1994-2001
7
How it started

Implementing tools for the semantic analysis of
English
Follow linguistic theory as closely as possible
Discourse Representation Theory DRT
First-order logic
Presupposition projection
Computational Semantics

8
Computational Semantics

How can we automate the process of associating
semantic representations with expressions of
natural language?
How can we use logical representations of natural
language expressions to automate the process of
drawing inferences?

9
Basic idea

Text Vincent loves Mia.

10
Basic idea

Text Vincent loves Mia.
DRT

11
Basic idea

Text Vincent loves Mia.
DRT
FOL ?x?y(vincent(x) mia(y) love(x,y))

12
Basic idea

Text Vincent loves Mia.
DRT
FOL ?x?y(vincent(x) mia(y) love(x,y))
BK ?x (vincent(x) ? man(x)) ?x (mia(x) ?
woman(x)) ?x (man(x) ? ? woman(x))

13
Basic idea

Text Vincent loves Mia.
DRT
FOL ?x?y(vincent(x) mia(y) love(x,y))
BK ?x (vincent(x) ? man(x)) ?x (mia(x) ?
woman(x)) ?x (man(x) ? ? woman(x))
Model D d1,d2 F(vincent)d1
F(mia)d2
F(love)(d1,d2)

14
?

Text Vincent loves Mia.
DRT

15
Compositional Semantics

The ProblemGiven a natural language expression,
how do we convert it into a logical formula?
Freges principleThe meaning of a compound
expression is a function of the meaning of its
parts.

16
Lexical semantics
17
A derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p(x)q(x) ?z.
?y.

18
A derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p(x)q(x) ?z.
?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?p. ?q. p(x)q(x)(?z.
)

19
A derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p(x)q(x) ?z.
?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?q.
q(x))

20
A derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p(x)q(x) ?z.
?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?q. q(x)

21
A derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p(x)q(x) ?x.
?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?q. q(x)
---------------------------------------
----------------------------------------- (BA)
S a spokesman lied
?q.
q(x)(?y. )

22
A derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p(x)q(x) ?x.
?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?q. q(x)
---------------------------------------
----------------------------------------- (BA)
S a spokesman lied

23
A derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p(x)q(x) ?x.
?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?q. q(x)
---------------------------------------
----------------------------------------- (BA)
S a spokesman lied

24
The DORIS System

Reasonable grammar coverage
Parsed English sentences, followed by resolving
ambiguities
Pronouns
Presupposition
Generated many different semantic representation
for a text

25
Texts and Ambiguity

Usually, ambiguities cause many possible
interpretations
ExampleButch walks into his modest kitchen.
He opens the refrigerator. He takes out a milk
and drinks it.

26
Texts and Ambiguity

Usually, ambiguities cause many possible
interpretations
ExampleButch walks into his modest kitchen.
He opens the refrigerator. He takes out a milk
and drinks it.

27
Texts and Ambiguity

Usually, ambiguities cause many possible
interpretations
ExampleButch walks into his modest kitchen.
He opens the refrigerator. He takes out a milk
and drinks it.

28
Texts and Ambiguity

Usually, ambiguities cause many possible
interpretations
ExampleButch walks into his modest kitchen.
He opens the refrigerator. He takes out a milk
and drinks it.

29
Basic idea of DORIS

Given a text, produce as many different DRSs
semantic interpretations as possible
Filter out strange interpretations
Inconsistent interpretations
Uninformative interpretations
Applying theorem proving
Use general purpose FOL theorem prover
Bliksem Hans de Nivelle

30
Screenshot
31
Consistency checking

Inconsistent text
Mia likes Vincent.
She does not like him.
Two interpretations, only one consistent
Mia likes Jody.
She does not like her.

32
Informativity checking

Uninformative text
Mia likes Vincent.
She likes him.
Two interpretations, only one informative
Mia likes Jody.
She likes her.

33
Local informativity

Example
Mia is the wife of Marsellus.
If Mia is the wife of Marsellus, Vincent will be
disappointed.
The second sentence is informative with respect
to the first. But

34
Local informativity
35
Local informativity
?
36
Local consistency

Example
Jules likes big kahuna burgers.
If Jules does not like big kahuna burgers,
Vincent will order a whopper.
The second sentence is consistent with respect to
the first. But

37
Local consistency
38
Local consistency
?
39
Studying Presupposition

The DORIS system allowed one to study the
behaviour of presupposition
Examples such as
If Mia has a husband, then her husband is out of
town.
If Mia is married, then her husband is out of
town.
If Mia is dating Vincent, then her husband is out
of town.

40
Applying Theorem Proving

The first version of DORIS sort of worked, but
Many readings to start with, explosion
The local constraints added a large number of
inference tasks
It could take about 10 minutes for a complex
sentence

41
MathWeb

MathWeb by Kohlhase Franke came to the rescue
Theorem proving services via the internet
Interface Doris with MathWeb
At the time this was a sensation!
What happened exactly?

42
World Wide Computational Semantics

Just in order to find out whether Mia was married
or not, thousands of computers world wide were
used

43
What can we say about DORIS?

Demonstrated that FOL could play an interesting
role in natural language understanding
Helped to develop presupposition theory
Generated a new application area for automated
deduction TPTP
Limitations
Scalability
Coverage

44
Six feet under

The DORIS system 1993-2001
Why?
Limited grammatical coverage
Unfocussed application domain
It would take at least 20 years to develop a
parser that was both robust and accurateat
least that was my belief

45
The Second Story
Godotthe talking robot Or how a mobile,
talking robot engaged in conversation using
theorem provers and model builders for
first-order logic
2001-2004
46
Human-Computer Dialogue

Focus on small domains
Grammatical coverage ensured
Background knowledge encoding
Spoken Dialogue system killer app
Godot the robot
Speech recognition and synthesis
People could give Godot directions, ask it
questions, tell it new information
Godot was a REAL robot

47
Godot the Robot
Godot with Tetsushi Oka
48
Typical conversation with Godot

Simple dialogues in beginning
Human Robot?
Robot Yes?
Human Where are you?
Robot I am in the hallway.
Human OK. Go to the rest room!

49
Adding inference to Godot

Using the components of DORIS, we added semantics
and inference
Using DRT
Inference tasks
Consistency checking
Informativity checking
Using FOL theorem provers and finite model
builders

50
Advanced conversation with Godot

Dealing with inconsistency and informativeness
Human Robot, where are you?
Robot I am in the hallway.
Human You are in my office.
Robot No, that is not true.
Human You are in the hallway.
Robot Yes I know!

51
Parallel inference

Given a text T and a translation into FOL ?
If theorem prover finds a proof for ??, then T
is inconsistent
If model builder finds a model for ?, then T is
consistent
In other words, this is the Yin and Yen of
inference

52
Why is this relevant to natural language?

Testing a discourse for consistency

53
Why is this relevant to natural language?

Testing a discourse for consistency

54
Why is this relevant to natural language?

Testing a discourse for consistency

55
Why is this relevant to natural language?

Testing a discourse for consistency

56
Why is this relevant to natural language?

Testing a discourse for informativity

57
Why is this relevant to natural language?

Testing a discourse for informativity

58
Why is this relevant to natural language?

Testing a discourse for informativity

59
Why is this relevant to natural language?

Testing a discourse for informativity

60
Minimal Models

Model builders normally generate models by
iteration over the domain size
As a side-effect, the output is a model with a
minimal domain size
From a linguistic point of view, this is
interesting, as there is no redundant information
Minimal in extensions

61
Using models

ExamplesTurn on a light.Turn on every
light.Turn on everything except the radio. Turn
off the red light or the blue light.Turn on
another light.

62
Videos of Godot
Video 1 Godot in the basement of Bucceuch Place
Video 2 Screenshot of dialogue manager with
DRSs and camera view of Godot
63
What can we say about Godot?

Demonstrated that FOL could play an interesting
role in human machine dialogue systems
Also showed a new application of finite model
building
Domain known means all background knowledge known
Limitations
Scalability, only small dialogues
Lack of incremental inference
Minimal models required

64
Godot the Robot later
Godot at the Scottish museum
65
The Third Story
Recognising Textual Entailment Or how
first-order automated deduction is applied to
wide-coverage semantic processing of texts
2005-present
66
Recognising Textual Entailment

What is it?
A task for NLP systems to recognise entailment
between two (short) texts
Proved to be a difficult, but popular task.
Organisation
Introduced in 2004/2005 as part of the PASCAL
Network of Excellence, RTE-1
A second challenge (RTE-2) was held in 2005/2006
PASCAL provided a development and test set of
several hundred examples

67
RTE Example (entailment)
RTE 1977 (TRUE)
His family has steadfastly denied the
charges. ----------------------------------------
------------- The charges were denied by his
family.
68
RTE Example (no entailment)
RTE 2030 (FALSE)
Lyon is actually the gastronomical capital of
France. ------------------------------------------
----------- Lyon is the capital of France.
69
Aristotles Syllogisms
ARISTOTLE 1 (TRUE)
All men are mortal. Socrates is a
man. ------------------------------- Socrates is
mortal.
70
Five methods

Five different methods to RTE
Ranging in sophistication from very basic to
advanced

71
Recognising Textual Entailment

Method 1
Flipping a coin

72
Flipping a coin

Advantages
Easy to implement
Cheap
Disadvantages
Just 50 accuracy

73
Recognising Textual Entailment

Method 2
Calling a friend

74
Calling a friend

Advantages
High accuracy (95)
Disadvantages
Lose friends
High phone bill

75
Recognising Textual Entailment

Method 3
Ask the audience

76
Ask the audience
RTE 893 (????)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
77
Human Upper Bound
RTE 893 (TRUE)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
78
Recognising Textual Entailment

Method 4
Word Overlap

79
Word Overlap Approaches

Popular approach
Ranging in sophistication from simple bag of word
to use of WordNet
Accuracy rates ca. 55

80
Word Overlap

Advantages
Relatively straightforward algorithm
Disadvantages
Hardly better than flipping a coin

81
RTE State-of-the-Art

Pascal RTE challenge
Hard problem
Requires semantics

82
Recognising Textual Entailment

Method 5
Semantic Interpretation

83
Basic idea

Given a textual entailment pair T/H withtext T
and hypothesis H
Produce DRSs for T and H
Translate these DRSs into FOL
Generate Background Knowledge in FOL
Use ATPs to determine the likelyhood of
entailment

84
Wait a minute

This requires that we have the means to produce
semantic representations DRSs for any kind of
English input
Recall DORIS experience
Do we have English parsers at our disposal that
do this?

85
Robust Parsing

Rapid developments in statistical parsing the
last decades
These parsers are trained on large annotated
corpora tree banks
Yet most of these parsers produced syntactic
analyses not suitable for systematic semantic
work
This changed with the development of CCG bank and
a fast CCG parser

86
Implementation CCG/DRT

Use standard statistical techniques
Robust wide-coverage parser
Clark Curran (ACL 2004)
Grammar derived from CCGbank
409 different categories
Hockenmaier Steedman (ACL 2002)
Compositional Semantics, DRT
Wide-coverage semantics
Bos (IWCS 2005)

87
Example Output

ExamplePierre Vinken, 61 years old, will join
the board as a nonexecutive director Nov. 29. Mr.
Vinken is chairman of Elsevier N.V., the Dutch
publishing group.
Semantic representation, DRT
Complete Wall Street Journal

88
Using Theorem Proving

Given a textual entailment pair T/H with text T
and hypothesis H
Produce DRSs for T and H
Translate these DRSs into FOL
Give this to the theorem prover
T ? H
If the theorem prover finds a proof, then we
predict that T entails H

89
Vampire (Riazanov Voronkov 2002)

Lets try this. We will use the theorem prover
Vampire
This gives us good results for
apposition
relative clauses
coodination
intersective adjectives/complements
passive/active alternations

90
Example (Vampire proof)
RTE-2 112 (TRUE)
On Friday evening, a car bomb exploded outside a
Shiite mosque in Iskandariyah, 30 miles south of
the capital. -------------------------------------
---------------- A bomb exploded outside a mosque.
91
Example (Vampire proof)
RTE-2 489 (TRUE)
Initially, the Bundesbank opposed the
introduction of the euro but was compelled to
accept it in light of the political pressure of
the capitalist politicians who supportedits
introduction. ------------------------------------
----------------- The introduction of the euro
has been opposed.
92
Background Knowledge

However, it doesnt give us good results for
cases requiring additional knowledge
Lexical knowledge
World knowledge
We will use WordNet as a start to get additional
knowledge
All of WordNet is too much, so we create
MiniWordNets

93
MiniWordNets

MiniWordNets
Use hyponym relations from WordNet to build an
ontology
Do this only for the relevant symbols
Convert the ontology into first-order axioms

94
MiniWordNet an example

Example text
There is no asbestos in our products now.
Neither Lorillard nor the researchers who studied
the workers were aware of any research on smokers
of the Kent cigarettes.

95
MiniWordNet an example

Example text
There is no asbestos in our products now.
Neither Lorillard nor the researchers who studied
the workers were aware of any research on smokers
of the Kent cigarettes.

96
(No Transcript)
97
?x(user(x)?person(x)) ?x(worker(x)?person(x)) ?x(r
esearcher(x)?person(x))
98
?x(person(x)??risk(x)) ?x(person(x)??cigarette(x))
.
99
Using Background Knowledge

Given a textual entailment pair T/H with text T
and hypothesis H
Produce DRS for T and H
Translate drs(T) and drs(H) into FOL
Create Background Knowledge for TH
Give this to the theorem prover
(BK T) ? H

100
MiniWordNets at work
RTE 1952 (TRUE)
Crude oil prices soared to record
levels. ------------------------------------------
----------- Crude oil prices rise.

Background Knowledge?x(soar(x)?rise(x))

101
Troubles with theorem proving

Theorem provers are extremely precise.
They wont tell you when there is almost a
proof.
Even if there is a little background knowledge
missing, Vampire will say
dont know

102
Vampire no proof
RTE 1049 (TRUE)
Four Venezuelan firefighters who were traveling
to a training course in Texas were killed when
their sport utility vehicle drifted onto the
shoulder of a Highway and struck a parked
truck. -------------------------------------------
--------------------- Four firefighters were
killed in a car accident.
103
Using Model Building

Need a robust way of inference
Use model builders Mace, Paradox
McCune
Claessen Sorensson (2003)
Use size of (minimal) model
Compare size of model of T and TH
If the difference is small, then it is likely
that T entails H

104
Using Model Building

Given a textual entailment pair T/H withtext T
and hypothesis H
Produce DRSs for T and H
Translate these DRSs into FOL
Generate Background Knowledge
Give this to the Model Builder
i) BK T
ii) BK T H
If the models for i) and ii) are similar, then
we predict that T entails H

105
Model similarity

When are two models similar?
Small difference in domain size
Small difference in predicate extensions

106
Example 1

T John met Mary in RomeH John met Mary
Model T 3 entitiesModel TH 3 entities
Modelsize difference 0
Prediction entailment

107
Example 2

T John met Mary H John met Mary in Rome
Model T 2 entitiesModel TH 3 entities
Modelsize difference 1
Prediction no entailment

108
Model size differences

Of course this is a very rough approximation
But it turns out to be a useful one
Gives us a notion of robustness
Negation
Give not T and not T H to model builder
Disjunction
Not necessarily one unique minimal model

109
How well does this work?

We tried this at the RTE-1 and RTE-2
Using standard machine learning methods to build
a decision tree using features
Proof (yes/no)
Domain size difference
Model size difference
Better than baseline, still room for improvement

110
RTE Results 2004/5
Bos Markert 2005
111
RTE State-of-the-Art

Pascal RTE challenge
Hard problem
Requires semantics

112
What can we say about RTE?

We can use FOL inference techniques successfully
There might be an interesting role for model
building
The bottleneck is getting the right background
knowledge

113
Lack of Background Knowledge
RTE-2 235 (TRUE)
Indonesia says the oil blocks are within its
borders, as does Malaysia, which has also sent
warships to the area, claiming that its waters
and airspace have been violated. ----------------
----------------------------------------------- Th
ere is a territorial waters dispute.
114
Winding up

Summary
Conclusion
Shameless Plug
Future

115
Summary

Use of first order inference tools has a major
influence on how computational semantics is
perceived today
Implementations used in pioneering work of using
first-order inference in NLP
Implementations used in spoken dialogue systems
Now also used in wide-coverage NLP systems

116
Conclusions

We have got the tools for doing computational
semantics in a principled way using DRT
For many applications, success depends on the
ability to systematically generate background
knowledge
Small restricted domains dialogue
Open domain
Finite model building has potential
Incremental inference

117
Shameless Plug

For more on the basic architecture underlying
this work on computational semantics, and
particular on implementations on the lambda
calculus, and parallel use of theorem provers and
model builders, see
www.blackburnbos.org

Write a Comment

User Comments (0)