Connectionist Sentence Comprehension - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Connectionist Sentence Comprehension

Description:

The function of an Artificial Neural Network is to produce an output pattern from a ... Artificial Intelligence 2nd ed, Russel & Norvig, Prentice Hall, 2003 ... – PowerPoint PPT presentation

Number of Views:143

Avg rating:3.0/5.0

Slides: 28

Provided by: csUman

Category:

more less

Transcript and Presenter's Notes

Title: Connectionist Sentence Comprehension

1

Connectionist Sentence Comprehension
and Production System
A model by Dr. Douglas Rohde, M.I.T
by
Dave Cooke
Nov. 6, 2004

2
Overview

Introduction
A brief overview of Artificial Neural Networks
The basic architecture
Introduce Douglas Rohde's CSCP model
Overview
Penglish Language
Architecture
Semantic System
Comprehension, Prediction, and Production System
Training
Testing
Conclusions
Bibliography

3
A Brief Overview

Basic definition of an Artificial Neural Network
A network of interconnected neurons inspired by
the biological nervous system.
The function of an Artificial Neural Network is
to produce an output pattern from a given input.
First described by Warren McCulloch and Walter
Pitts in 1943 in their seminal paper A logical
calculus of ideas imminent in nervous activity.

Artificial neurons are modeled after biological
neurons
The architecture of an Artificial Neuron

5
Architecture -- Structure

Network Structure
Many types of neural network structures
Ex Feedforward, Recurrent
Feedforward
Can be single layered or multi-layered
Inputs are propagated forward to the output layer

6
Architecture -- Recurrent NN

Recurrent Neural Networks
Operate on an input space and an internal state
space they have memory.
Primary types of Recurrent neural networks
simple recurrent
fully recurrent
Below is an example of a simple recurrent network
(SRN)

7
Architecture -- Learning

Learning used in NN's
Learning change in connection weights
Supervised networks network is told about
correct answer
ex. back propagation, back propagation through
time, reinforcement learning
Unsupervised networks network has to find
correct input.
competitive learning, self-organizing or Kohonen
maps

8
Architecture -- Learning (BPTT)

Backpropagation Through Time (BPTT) is used in
the CSCP Model and SRNs
In BPTT the network runs ALL of its forward
passes then performs ALL of the backward passes.
Equivalent to unrolling the network backwards
through time

9
The CSCP Model

Connectionist Sentence Comprehension and
Production model
Primary Goal learn to comprehend and produce
sentences developed in the Penglish( Pseudo
English) language.
Secondary Goal to construct a model that will
acount for a wide range of human sentence
processing behaviours.

10
Basic Architecture

A Simple Recurrent NN is used
Penglish (Pseudo English) was used to train and
test the model.
Consists of 2 separate parts contected by a
message layer
Semantic System (Encoding/Decoding System)
CPP system
Backpropagation Through Time (BPTT) is the
learning algorithm.
method for learning temporal tasks

11
Penglish

Goal to produce only sentences that are
reasonably valid in english
Built around the framework of a stochastic
context-free grammar.
Given a SCFG it is easy to generate sentences,
parse sentences, and perform optimal prediction
Subset of english some grammatical structures
used are
56 verb stems
45 noun stems
adjectives, determiners, adverbs, subordinate
clauses
several types of logical ambiguity.

12
Penglish

Penglish sentences do not always sound entirely
natural even though constraints to avoid semantic
violations were implemented
Example sentences are
(1) We had played a trumpet for you
(2) A answer involves a nice school.
(3) The new teacher gave me a new book of
baseball.
(4) Houses have had something the mother has
forgotten

13
The CSCP Model
Start
Semantic System
stores all propositions seen for current sentence
CPP System
14
Semantic System
Propositions loaded sequentially
Propositions stored in Memory
15
Semantic System
Error measure
16
Training (SS)

Backpropagation
Trained separate and prior to the rest of the
model.
The decoder uses standard single-step
backpropagation
The encoder is trained using BPTT.
Majority of the running time is in the decoding
stage.

17
Training (SS)
Error is assessed here.
18
CPP System
Error measure
The CPP System
Phonologically encoded word.
19
CPP System (cont.)
Starts here by trying to predict next word in
sentence.
Goal to produce next word in sentence and pass it
to Word Input Layer
20
The CPP System - Training
1. BPTT starts here.
4. BPTT
2. Backpropagated to here.
3. Previously recorded output errors are injected
here
21
Training

16 Penglish training sets
Set 250,000 sentences, total 4 million
sentences
50 000 weight updates per set 1 epoch
Total of 16 epochs.
The learning rate start at .2 for the first epoch
and then was gradually reduced over the course of
learning.
After the Semantic System the CPP system was
similarily trained
Training began with limited complexity sentences
and complexity increased gradually.
Training a single network took about 2 days on a
500Mhz alpha. Total training time took about
two months.
Overall 3 networks were trained

22
Testing

50,000 sentences
33.8 of testing sentences also appeared in one
of the training sets.
Nearly all of the sentences had 1 or 2
propositions.
3 forms of measurement are used in measuring
comprehension.
multiple choice measure
Reading time measure
Grammaticality rating measure

23
Testing (Multiple Choice)

Example When the owner let go, the dog ran
after the mailman.
Expressed as ran after, theme, ?
Possible answers
Mailman (correct answer)
owner, dog, girls, cats. (distractors)
Error measure is
When applying four distractors, the chance
performance is 20 correct.

24
Testing (Reading Time)

Also known as Simulated Reading Time
Its a weighted average of 4 components.
1 and 2 Measure the degree to which the current
word was expected
3rd The change in the message that occurred when
the current word was read
4th The average level of activation in the
message layer
The four components are multiplied by scaling
factors to achieve average values of close to 1.0
for each of them and a weighted average is then
taken.
Ranges from .4 for easy words to 2.5 or more for
very hard words.

25
Testing (Grammaticality)

The Grammaticality Method
(1) prediction accuracy (PE)
Indicator of syntactic complexity
Involves the point in the sentence at which the
worst two consecutive predictions occur.
(2) comprehension performance (CE)
Average strict-criterion comprehension error rate
on the sentence.
Intented to reflect the degree to which the
sentence makes sense.
Simulated ungrammaticality rating (SUR)
SUR (PE 8) X (CE 0.5)
combines the two components into a single measure
of ungrammaticality

26
Conclusions

General Comprehension Results
final networks are able to provide complete,
accurate answer
Given NO choices 77
Given 5 choices 92
Sentential Complement Ambiguity
Strict criterion error rate 13.5
Multiple choice 2
Subordinate Clause Ambiguity
Ex. Although the teacher saw a book was taken in
the school.
Intransitive, weak bad, weak good condition,
strong bad, and strong good all were under 20
error rate on multiple choice questions.

27
Bibliography

Artificial Intelligence 4th ed, Luger G.F.,
Addison Wesley, 2002
Artificial Intelligence 2nd ed, Russel Norvig,
Prentice Hall, 2003
Neural Networks 2nd ed, Picton P., Palgrave, 2000
A connectionist model of sentence comprehension
and production, Rohde D., MIT, March 2 2002
Finding Structure in Time, Elman J.L, UC San
Diego, Cognitive Science, 14, 179-211, 1990
Fundamentals of Neural Networks, Fausett L,
Pearson, 1994