Selforganizing word representations for fast sentence processing - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Selforganizing word representations for fast sentence processing

Description:

Why would the mental lexicon be organized paradigmatically? ... A paradigmatic organization of the mental lexicon can facilitate fast sentence processing. ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 32

Provided by: stefan52

Category:

more less

Transcript and Presenter's Notes

Title: Selforganizing word representations for fast sentence processing

1
Self-organizing word representations for fast
sentence processing
Stefan Frank Nijmegen Institute for Cognition and
Information Nijmegen, The Netherlands
2
Relations among words Syntagmatic and
paradigmatic
horizontal syntagmatic relations
3
Relations among words Syntagmatic and
paradigmatic
horizontal syntagmatic relations
vertical paradigmatic relations
4
Syntagmatic and paradigmatic relations
5
Syntagmatic and paradigmatic relations
6
Syntagmatic and paradigmatic relations
7
Representing words as vectors

In several models, like LSA (Landauer Dumais,
1997) and HAL (Burgess, Livesay Lund, 1998)
each word corresponds to a vector in a
high-dimensional state space.
Distances between vectors encode relations
between the represented words.

8
Representing words as vectors
Projections of a small part of word space onto
two dimensions(Burgess, Livesay, Lund, 1998)
listened
talked
crawled
ran
walked
cat
wolf
dog
9
Representing words as vectors

Vectors are near each other in state space if the
corresponding words belong to the same word class
and/or have similar meaning.
Words are organized paradigmatically.
LSA en HAL account for some experimental
findings, e.g., semantic priming and synonym
judgement (and false memories?)

10
Paradigmatic versus syntagmatic organization

Why would the mental lexicon be organized
paradigmatically?
This makes it easy to find similar words, because
these are nearby in space.
But we dont use words to do semantic priming or
synonym judgement We use words to make
sentences.
For fast speaking and understanding, we need fast
access to words that are likely to occur next.
This calls for a syntagmatic organization.

11
Linking the two types of organization
Hypothesis A paradigmatic organization of words
facilitates a syntagmatic organization of
word-sequences

Words are organized paradigmatically because this
allows for fast sentence processing/production

12
A computational modelling recipe
Ingredients a simplified language, a measure for
syntagmaticity, and a measure of paradigmaticity

Construct a dynamical system (or recurrent neural
network)
Feed the system with sentences one word at a
time. Its state at each moment represents the
word sequence (input) so far.
Adjust word representations to increase the
syntagmatic organization of the system states.
Show that the resulting word representations are
organized paradigmatically.

13
The language lexiconFarka Crocker (2006)

Total 72 words
Nouns
Proper John, Kate, Mary, Steve
Singular boy, girl, cat, dog,
Plural boys, girls, cats, dogs,
Mass bread, meat, fish
Verbs
Auxiliary do(es), is, are, were
Transitive eat(s), chase(s), like(s)
Intransitive eat(s), bark(s), walk(s)
Adjectives crazy, good, happy,
Function words where, who, what, the, that,
those, .

14
The language example sentencesFarka Crocker
(2006)

Declaratives
the good boy eats .
smart Kate who eats meat feeds the dog .
a girl is sleazy .
those are women .
Interrogatives
where is the hungry cat .
does Steve run .
what does the man who the happy girls see do .
does Mary wanna eat bread .
Imperatives
sing .
walk the dog .

15
A dynamical connectionist modelNetwork
architecture
connections to k lt n units
16
A dynamical connectionist modelState-space
trajectories

A simple, discrete, linear dynamical system
xt1 Wxt yt1

input to FRN at time step t
FRN connect-ion weights
n-dimensional FRN state vector at time step t (x0
1)

nn-matrix W has small random values.
The sequence x0, , xt is the trajectory through
state space resulting from input sequence y1, ,
yt
Trajectories are syntagmatic if the distance
between xt and xt1 reflects the unlikelihood of
input yt1 .

17
A dynamical connectionist modelWord
representation

Each word w is represented by a k-dimensional
vector vw (with k lt n).
If word w occurs at time step t, the input vector
yt equals vw (up to the kth element the rest is
0).
The objective is to obtain a syntagmatic
organisation of trajectories x0,,xt by adjusting
the word vectors v
Matrix W is not adjusted

18
A dynamical connectionist model Adapting word
representations

Initially, all vectors v are random between -1
and 1
5000 training sentences, e.g., The girls are nice
.
A 2D-example (n 2, k 1)

The system starts at x0

x0
and receives y1 (vthe , 0)T

vthe is adjusted to vthe

x1 Wx0y1
?vthe
19
A dynamical connectionist model Adapting word
representations

Initially, all vectors v are random between -1
and 1
5000 training sentences, e.g., The girls are nice
.
A 2D-example (n 2, k 1)

The system starts at x0

x0
and receives y1 (vthe , 0)T

vthe is adjusted to vthe

x1 Wx0y1
so that x1 moves closer to x0
?vgirls

Input is y2 (vgirls , 0)T

x2 Wx1y2

Adjust vgirls

20
A dynamical connectionist model Adapting word
representations

Initially, all vectors v are random between -1
and 1
5000 training sentences, e.g., The girls are nice
.
A 2D-example (n 2, k 1)

The system starts at x0

x0
and receives y1 (vthe , 0)T

vthe is adjusted to vthe

x1 Wx0y1
so that x1 moves closer to x0

Input is y2 (vgirls , 0)T

x2 Wx1y2

Adjust vgirls

21
Measuring syntagmaticity

Compare
Grammatical test (i.e., non-training) sentences
Pseudosentences random word strings with the
same length distribution, word frequencies, and
number of word repetitions as test sentences.
If trajectories are organized syntagmatically,
the total trajectory length for test sentences
(Lest) should be shorter than for pseudosentences
(Lpseudo).
Syntagmaticity Lpseudo/Ltest ( 1 before
training).

22
Measuring paradigmaticity

If words are represented paradigmatically,
similar words have similar vectors
Divide the words into groups according to word
class and meaning
Names John, Kate, Mary, Steve
Mass nouns bread, meat, fish
Articles a, the
Singular auxiliary verbs does, is, was
Plural auxiliary verbs do, are, were
etcetera...

23
Measuring paradigmaticity

If words are represented paradigmatically,
similar words have similar vectors
Divide the words into groups according to word
class and meaning
If words are organized paradigmatically, the
average distances within groups (dwithin) should
be smaller than between groups (dbetween)
Paradigmaticity dbetween/dwithin ( 1 before
training).

24
Results
n 100 k 50
25
Word representations plotted by first two
principal components
26
(No Transcript)
27
Conclusion

Word representations were adapted to decrease the
lengths of state-space trajectories resulting
from training sentences.
As a result, words that occur after similar
contexts received similar representations.
A paradigmatic organization of the mental lexicon
can facilitate fast sentence processing.

28
Bonus SlidesSyntagmaticity and reading-time
predictions

In a really good syntagmatic organization, the
distance through state space travelled at a time
step, should correspond to the improbability of
the input word of that time step.
Word probabilities are known, because sentences
were produced by a known probabilistic grammar.
The correlation between the amount of information
( 2log(Pr)) conveyed by words (of test
sentences) and the resulting state-space
distances can serve as an alternative measure for
syntagmaticity

29
Bonus SlidesSyntagmaticity and reading-time
predictions

Presumably, it takes more time to travel a larger
distance through state space.
More informative words take longer to process.
Hale (2003) word-reading times reflect word
information.
If Hale is right, the model could make
reading-time predictions when trained on a corpus
of natural text.

30
Bonus SlidesSyntagmaticity and reading-time
predictions
31
Bonus SlidesAdjusting word representations

The current state vector is xt and the input is
yt .
New state xt1 Wxt yt .
If there is nothing else to learn, the state
vector can stay the same xt1 xt ? yt xt -
Wxt .
But there are other things to learn, so take
small steps?yt ?(xt - Wxt - yt), with
learning rate ? .001.
If the input word was w, its representation vw is
adjusted to result in ?yt (for the first k
elements)
?vw ?(xt - Wxt - yt),
using only the first k elements of xt and the
first k rows of W.