Selforganizing word representations for fast sentence processing - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Selforganizing word representations for fast sentence processing

Description:

Why would the mental lexicon be organized paradigmatically? ... A paradigmatic organization of the mental lexicon can facilitate fast sentence processing. ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 32
Provided by: stefan52
Category:

less

Transcript and Presenter's Notes

Title: Selforganizing word representations for fast sentence processing


1
Self-organizing word representations for fast
sentence processing
Stefan Frank Nijmegen Institute for Cognition and
Information Nijmegen, The Netherlands
2
Relations among words Syntagmatic and
paradigmatic
horizontal syntagmatic relations
3
Relations among words Syntagmatic and
paradigmatic
horizontal syntagmatic relations
vertical paradigmatic relations
4
Syntagmatic and paradigmatic relations
5
Syntagmatic and paradigmatic relations
6
Syntagmatic and paradigmatic relations
7
Representing words as vectors
  • In several models, like LSA (Landauer Dumais,
    1997) and HAL (Burgess, Livesay Lund, 1998)
    each word corresponds to a vector in a
    high-dimensional state space.
  • Distances between vectors encode relations
    between the represented words.

8
Representing words as vectors
Projections of a small part of word space onto
two dimensions(Burgess, Livesay, Lund, 1998)
listened
talked
crawled
ran
walked
cat
wolf
dog
9
Representing words as vectors
  • Vectors are near each other in state space if the
    corresponding words belong to the same word class
    and/or have similar meaning.
  • Words are organized paradigmatically.
  • LSA en HAL account for some experimental
    findings, e.g., semantic priming and synonym
    judgement (and false memories?)

10
Paradigmatic versus syntagmatic organization
  • Why would the mental lexicon be organized
    paradigmatically?
  • This makes it easy to find similar words, because
    these are nearby in space.
  • But we dont use words to do semantic priming or
    synonym judgement We use words to make
    sentences.
  • For fast speaking and understanding, we need fast
    access to words that are likely to occur next.
  • This calls for a syntagmatic organization.

11
Linking the two types of organization
Hypothesis A paradigmatic organization of words
facilitates a syntagmatic organization of
word-sequences
  • Words are organized paradigmatically because this
    allows for fast sentence processing/production

12
A computational modelling recipe
Ingredients a simplified language, a measure for
syntagmaticity, and a measure of paradigmaticity
  • Construct a dynamical system (or recurrent neural
    network)
  • Feed the system with sentences one word at a
    time. Its state at each moment represents the
    word sequence (input) so far.
  • Adjust word representations to increase the
    syntagmatic organization of the system states.
  • Show that the resulting word representations are
    organized paradigmatically.

13
The language lexiconFarka Crocker (2006)
  • Total 72 words
  • Nouns
  • Proper John, Kate, Mary, Steve
  • Singular boy, girl, cat, dog,
  • Plural boys, girls, cats, dogs,
  • Mass bread, meat, fish
  • Verbs
  • Auxiliary do(es), is, are, were
  • Transitive eat(s), chase(s), like(s)
  • Intransitive eat(s), bark(s), walk(s)
  • Adjectives crazy, good, happy,
  • Function words where, who, what, the, that,
    those, .

14
The language example sentencesFarka Crocker
(2006)
  • Declaratives
  • the good boy eats .
  • smart Kate who eats meat feeds the dog .
  • a girl is sleazy .
  • those are women .
  • Interrogatives
  • where is the hungry cat .
  • does Steve run .
  • what does the man who the happy girls see do .
  • does Mary wanna eat bread .
  • Imperatives
  • sing .
  • walk the dog .

15
A dynamical connectionist modelNetwork
architecture
connections to k lt n units
16
A dynamical connectionist modelState-space
trajectories
  • A simple, discrete, linear dynamical system
  • xt1 Wxt yt1

input to FRN at time step t
FRN connect-ion weights
n-dimensional FRN state vector at time step t (x0
1)
  • nn-matrix W has small random values.
  • The sequence x0, , xt is the trajectory through
    state space resulting from input sequence y1, ,
    yt
  • Trajectories are syntagmatic if the distance
    between xt and xt1 reflects the unlikelihood of
    input yt1 .

17
A dynamical connectionist modelWord
representation
  • Each word w is represented by a k-dimensional
    vector vw (with k lt n).
  • If word w occurs at time step t, the input vector
    yt equals vw (up to the kth element the rest is
    0).
  • The objective is to obtain a syntagmatic
    organisation of trajectories x0,,xt by adjusting
    the word vectors v
  • Matrix W is not adjusted

18
A dynamical connectionist model Adapting word
representations
  • Initially, all vectors v are random between -1
    and 1
  • 5000 training sentences, e.g., The girls are nice
    .
  • A 2D-example (n 2, k 1)
  • The system starts at x0

x0
and receives y1 (vthe , 0)T
  • vthe is adjusted to vthe

x1 Wx0y1
?vthe
19
A dynamical connectionist model Adapting word
representations
  • Initially, all vectors v are random between -1
    and 1
  • 5000 training sentences, e.g., The girls are nice
    .
  • A 2D-example (n 2, k 1)
  • The system starts at x0

x0
and receives y1 (vthe , 0)T
  • vthe is adjusted to vthe

x1 Wx0y1
so that x1 moves closer to x0
?vgirls
  • Input is y2 (vgirls , 0)T

x2 Wx1y2
  • Adjust vgirls

20
A dynamical connectionist model Adapting word
representations
  • Initially, all vectors v are random between -1
    and 1
  • 5000 training sentences, e.g., The girls are nice
    .
  • A 2D-example (n 2, k 1)
  • The system starts at x0

x0
and receives y1 (vthe , 0)T
  • vthe is adjusted to vthe

x1 Wx0y1
so that x1 moves closer to x0
  • Input is y2 (vgirls , 0)T

x2 Wx1y2
  • Adjust vgirls

21
Measuring syntagmaticity
  • Compare
  • Grammatical test (i.e., non-training) sentences
  • Pseudosentences random word strings with the
    same length distribution, word frequencies, and
    number of word repetitions as test sentences.
  • If trajectories are organized syntagmatically,
    the total trajectory length for test sentences
    (Lest) should be shorter than for pseudosentences
    (Lpseudo).
  • Syntagmaticity Lpseudo/Ltest ( 1 before
    training).

22
Measuring paradigmaticity
  • If words are represented paradigmatically,
    similar words have similar vectors
  • Divide the words into groups according to word
    class and meaning
  • Names John, Kate, Mary, Steve
  • Mass nouns bread, meat, fish
  • Articles a, the
  • Singular auxiliary verbs does, is, was
  • Plural auxiliary verbs do, are, were
  • etcetera...

23
Measuring paradigmaticity
  • If words are represented paradigmatically,
    similar words have similar vectors
  • Divide the words into groups according to word
    class and meaning
  • If words are organized paradigmatically, the
    average distances within groups (dwithin) should
    be smaller than between groups (dbetween)
  • Paradigmaticity dbetween/dwithin ( 1 before
    training).

24
Results
n 100 k 50
25
Word representations plotted by first two
principal components
26
(No Transcript)
27
Conclusion
  • Word representations were adapted to decrease the
    lengths of state-space trajectories resulting
    from training sentences.
  • As a result, words that occur after similar
    contexts received similar representations.
  • A paradigmatic organization of the mental lexicon
    can facilitate fast sentence processing.

28
Bonus SlidesSyntagmaticity and reading-time
predictions
  • In a really good syntagmatic organization, the
    distance through state space travelled at a time
    step, should correspond to the improbability of
    the input word of that time step.
  • Word probabilities are known, because sentences
    were produced by a known probabilistic grammar.
  • The correlation between the amount of information
    ( 2log(Pr)) conveyed by words (of test
    sentences) and the resulting state-space
    distances can serve as an alternative measure for
    syntagmaticity

29
Bonus SlidesSyntagmaticity and reading-time
predictions
  • Presumably, it takes more time to travel a larger
    distance through state space.
  • More informative words take longer to process.
  • Hale (2003) word-reading times reflect word
    information.
  • If Hale is right, the model could make
    reading-time predictions when trained on a corpus
    of natural text.

30
Bonus SlidesSyntagmaticity and reading-time
predictions
31
Bonus SlidesAdjusting word representations
  • The current state vector is xt and the input is
    yt .
  • New state xt1 Wxt yt .
  • If there is nothing else to learn, the state
    vector can stay the same xt1 xt ? yt xt -
    Wxt .
  • But there are other things to learn, so take
    small steps?yt ?(xt - Wxt - yt), with
    learning rate ? .001.
  • If the input word was w, its representation vw is
    adjusted to result in ?yt (for the first k
    elements)
  • ?vw ?(xt - Wxt - yt),
  • using only the first k elements of xt and the
    first k rows of W.
Write a Comment
User Comments (0)
About PowerShow.com