Sentence%20Processing%20using%20a%20Simple%20Recurrent%20Network - PowerPoint PPT Presentation

About This Presentation
Title:

Sentence%20Processing%20using%20a%20Simple%20Recurrent%20Network

Description:

NOUN-FRAG glass plate. NOUN-FOOD cookie bread sandwich. VERB-INTRAN think sleep exist ... 2. dog see man who eat sandwich (RP-sub, VERB-TRAN) ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 27
Provided by: dwk
Category:

less

Transcript and Presenter's Notes

Title: Sentence%20Processing%20using%20a%20Simple%20Recurrent%20Network


1
Sentence Processing using a Simple Recurrent
Network
  • EE 645 Final Project
  • Spring 2003
  • Dong-Wan Kang

5/14/2003
2
Contents
  1. Introduction - Motivations
  2. Previous Related Worksa) McClelland
    Kawamoto(1986)b) Elman (1990, 1993, 1999)c)
    Miikkulainen (1996)
  3. Algorithms (Williams and Zipser, 1989) - Real
    Time Recurrent Learning
  4. Simulations
  5. Data Encoding schemes
  6. Results
  7. Discussion Future work

3
Motivations
  • Can the neural network recognize the lexical
  • classes from the sentence and learn the
    various
  • types of sentences?
  • From cognitive science perspective
  • - comparison between human language learning
  • and neural network learning pattern
  • - (e.g.) Learning English past tense
    (Rumelhart
  • McClelland,1986), Grammaticality judgment
    (Allen
  • Seidenberg,1999), Embedded
    sentences(Elman 1993,
  • Miikkulainen1996, etc.)

4
Related Works
  • McClelland Kawamoto (1986)
  • - Sentences with Case Role Assignments and
    semantic features by using backpropagation
    algorithm
  • - output 2500 case role units for each
    sentence
  • - (e.g.)
  • input the boy hit the wall
    with the ball.
  • output Agent Verb Patient
    Instrument other features
  • - Limitation poses a hard limit on the
    number of input size.
  • - Alternative Instead of detecting the
    input patterns displaced in
  • space, detect the patterns which were in
    time (sequential inputs).

5
Related Works (continued)
  • Elman (1990,1993, 1999)
  • - Simple Recurrent Network Partially Recurrent
    Network using Context units
  • - Network with a dynamic memory
  • - Context units at time t hold a copy of the
    activations of the hidden units from the previous
    time step at time t-1.
  • - Network can recognize sequences.

input Many years ago
boy and girl

output years ago
boy and girl
6
Related Works (continued)
  • Miikkulainen (1996)
  • - SPEC architecture
  • (Subsymbolic Parser for Embedded Clauses ?
    Recurrent Network)
  • - Parser, Segmenter, and Stack
  • process the center and tail embedded
    sentences 98,100 sentences with 49 different
    sentence by using case role assignments

- (e.g.) Sequential Inputs input ,
the girl, who, liked, the dog, saw, the boy,
output , the girl, saw, the boy
the girl, liked, the dog
case role (agent, act, patient)
(agent, act, patient)
7
Algorithms
  • Recurrent Network
  • - Unlike feedforward networks, they allow
    connections both ways between a pair of units and
    even from a unit to itself.
  • - Backpropagation through time (BPTT) unfolds
    the temporal
  • operation of the network into a layered
    feedforward network at
  • every time step. (Rumelhart, et al.,
    1986)
  • - Real Time Recurrent Learning (RTRL) two
    versions (Williams and Zipser, 1989) 1)
    update weights after processing sequences is
    completed.
  • 2) on-line update weights while
    sequences are being presented.
  • - Simple Recurrent Network (SRN) partially
    recurrent network in
  • terms of time and space. It has context
    units which store the
  • outputs of the hidden units (Elman, 1990).
    (It can be modified from
  • RTRL algorithm.)

8
Real Time Recurrent Learning
  • Williams and Zipser (1989)
  • - This algorithm computes the derivatives of
    states and
  • outputs with respect to all weights as the
    network
  • processes the sequence.
  • Summary of Algorithm
  • In recurrent network, for any unit
    connected to any other and the input at
    node i at time t, the dynamic update rule is

9
RTRL (continued)
  • Error measure
  • with target outputs defined for
    some ks and ts
  • if
    is defined at time t

  • otherwise
  • Total cost function
  • , t 0,1, , T,
    where


10
RTRL (continued)
  • The gradient of E separate in time, to do
  • gradient descent, we define
  • The derivative of update rule
  • where initial condition t 0,

11
RTRL (continued)
  • Depending on the way of updating weights, there
    can be two versions of RTRL.
  • 1) Update the weights after the sequences are
  • completed at (t T ).
  • 2) Update the weights after each time step
    on-line
  • Elmans tlearn simulator program for Simple
    Recurrent Network (which Im using for this
    project) is implemented based on the classical
    backpropagation algorithm and the modification of
    this RTRL algorithm.

12
Simulation
  • Based on Elmans data and Simple Recurrent
  • Network(1990,1993, 1999), simple sentences
    and
  • embedded sentences are simulated by using
    tlearn
  • neural network program (BP modified version
    of RTLR
  • algorithm) available at http//crl.ucsd.edu/in
    nate/index.shtml.
  • Question
  • 1. Can the network discover the lexical
    classes from word order?
  • 2. Can the network recognize the relative
    pronouns and predict them?

13
Network Architecture
  • 31 input nodes
  • 31 output nodes
  • 150 hidden nodes
  • 150 context nodes
  • black arrow
  • distributed and learnable
  • dotted blue arrow
  • linear function and
  • one-to-one connection with hidden nodes

14
Training Data
  • Lexicon (31 words)
  • Grammar (16 templates)

NOUN-HUM man woman boy girl NOUN-ANIM cat
mouse dog lion NOUN-INANIM book rock
car NOUN-AGRESS dragon monster NOUN-FRAG glass
plate NOUN-FOOD cookie bread sandwich VERB-INTRAN
think sleep exist VERB-TRAN see chase
like VERB-AGPAT move break VERB-PERCEPT smell
see VERB-DESTROY break smash VERB-EAT
eat ----------------------------- RELAT-HUM
who RELAT-INHUM which
NOUN-HUM VERB-EAT NOUN-FOOD NOUN-HUM
VERB-PERCEPT NOUN-INANIM NOUN-HUM VERB-DESTROY
NOUN-FRAG NOUN-HUM VERB-INTRAN NOUN-HUM
VERB-TRAN NOUN-HUM NOUN-HUM VERB-AGPAT
NOUN-INANIM NOUN-HUM VERB-AGPAT NOUN-ANIM
VERB-EAT NOUN-FOOD NOUN-ANIM VERB-TRAN
NOUN-ANIM NOUN-ANIM VERB-AGPAT NOUN-INANIM NOUN-AN
IM VERB-AGPAT NOUN-INANIM VERB-AGPAT NOUN-AGRESS
VERB-DESTROY NOUN-FRAG NOUN-AGRESS VERB-EAT
NOUN-HUM NOUN-AGRESS VERB-EAT NOUN-ANIM NOUN-AGRES
S VERB-EAT NOUN-FOOD
15
Sample Sentences Mapping
  • Simple sentences 2 types
  • - man think (2 words)
  • - girl see dog (3 words)
  • - man break glass (3 words)
  • Embedded sentences - 3 types (RP Relative
    Pronoun)
  • 1. monster eat man who sleep (RPsub,
    VERB-INTRAN)
  • 2. dog see man who eat sandwich (RP-sub,
    VERB-TRAN)
  • 3. woman eat cookie which cat chase (RP-obj,
    VERB-TRAN)
  • Input-Output Mapping (predict next input
    sequential input)
  • INPUT girl see dog man
    break glass cat


  • OUTPUT see dog man break
    glass cat

16
Encoding scheme
  • Random word representation
  • - 31-bit vector for each lexical item, each
    lexical item is
  • represented by a randomly-assigned
    different bit.
  • - not semantic feature encoding

sleep 0000000000000000000000000000001
dog 0000100000000000000000000000000
woman 0000000000000000000000000010000
17
Training a network
  • Incremental Input (Elman, 1993)
  • Starting small strategy
  • Phase I simple sentences (Elman, 1990, used
    10,000 sentences)
  • - 1,564 sentences generated(4,636 31-bit
    vectors)
  • - train all patterns learning rate 0.1, 23
    epochs
  • Phase II embedded sentences (Elman, 1993, 7,500
    sentences)
  • - 5,976 sentences generated(35,688 31-bit
    vectors)
  • - loaded with weights from phase I
  • - train (1,564 5,976) sentences together
  • learning rate 0.1, 4 epochs

18
Performance
  • Network performance was measured by Root Mean
    Squared Error


  • the number of input
    patterns
  • RMS error
    target output
    vector


  • actual
    output vector
  • Phase I After 23 epochs, RMS 0.91
  • Phase II After 4 epochs, RMS 0.84
  • Why can RMS not be lowered? The prediction task
    is
  • nondeterministic, so the network cannot
    produce the unique output
  • for the corresponding input. For this
    simulation, RMS is NOT
  • the best measurement of performance.
  • Elmans simulation RMS 0.88 (1990),
  • Mean Cosine
    0.852 (1993)

19
Phase I RMS 0.91 after 23 epochs
20
Phase II RMS 0.84 after 4 epochs
21
Results and Analysis
  • ltTest results after phase IIgt
  • Output Target
  • which (target which )
  • ? (target lion )
  • ? (target see )
  • ? (target boy ) ?
  • ? (target move )
  • ? (target sandwich )
  • which (target which )
  • ? (target cat )
  • ? (target see )
  • ? (target book )
  • which (target which )
  • ? (target man )
  • see (target see )
  • ? (target dog )?
  • Arrow(?) indicates the start of the sentence.
  • In all positions, the word which is
  • predicted correctly! But most of words
  • are not predicted including the word
  • who is not. Why? ? Training Data
  • Since the prediction task is non-
  • deterministic, predicting the exact next
  • word can not be the best performance
  • measurement.
  • We need to look at hidden unit
  • activations of each input, since they
  • reflect what the network has learned
  • about classes of inputs with regard to
  • what they predict. ?Cluster Analysis, PCA

22
  • Cluster
  • Analysis
  • The network
  • successfully
  • recognizes
  • VERB, NOUN,
  • and some of
  • their subcategories.
  • WHO and
  • WHICH has
  • different
  • distance
  • VERB-INTRAN
  • failed to fit in
  • VERB
  • ltHierarchical

23
(No Transcript)
24
Discussion Conclusion
  • 1. The network can discover the lexical classes
    from word order. Noun and Verb are different
    classes except the VERB-INTRAN. Also,
    subclasses for NOUN are classified correctly, but
    some subclasses for VERB are mixed. This is
    related to the input example.
  • 2. The network can recognize and predict the
    relative pronoun, which,
  • but not who Why? Because the sentences for
    who is not RP-obj, so who is just
    considered as one of normal subject in simple
  • sentences.
  • 3. The organization of input data is important
    and sensitive to the recurrent network, since it
    processes the input sequentially and on-line.
  • 4. Generally, recurrent networks by using RTRL
    recognized sequential
  • input, but it requires more training time and
    computation resources.

25
Future Studies
  • Recurrent Least Squared Support Vector Machines,
    Suykens, J.A.K. Vandewalle, J., (2000).
  • - provides new perspectives for time-series
  • prediction and nonlinear modeling
  • - seems more efficient than BPTT, RTRL SRN

26
References
  • Allen, J., Seidenberg, M. (1999). The emergence
    of grammaticality in connectionist networks. In
    Brian MacWhinney (Ed.), The emergence of language
    (pp.115-151). Hillsdale, NJ Lawrence Erlbaum.
  • Elman, J.L. (1990). Finding structure in time.
    Cognitive Science, 14, 179-211.
  • Elman, J. (1993). Learning and development in
    neural networks the importance of starting
    small. Cognition, 48, 71-99.
  • Elman, J.L. (1999). The emergence of language A
    conspiracy theory. In B. MacWhinney (Ed.)
    Emergence of Language. Hillsdale, NJ Lawrence
    Earlbaum Associates.
  • Hertz, J., Krogh, A., Palmer, R.G. (1991).
    Introduction to the Theory of Neural Computation.
    Redwood City, CA Addison-Wesley.
  • MacClelland, J. L., Kawamoto A. H. (1986).
    Mechanisms of sentence processing Assigning
    roles to constituents of sentences. (273-325). In
    J. L. McClelland D. E. Rumelhart (Eds.),
    Parallel distributed processing Explorations in
    the microstructure of cognition. Cambridge, MA
    MIT Press.
  • Miikkulaninen R. (1996). Subsymbolic case-role
    analysis of sentences with embedded clauses,
    Cognitive Science, 20, 47-73.
  • Rummelhart, D., Hinton, G. E., and Williams, R.
    (1986). "Learning Internal Representations by
    Error Propagation," Parallel and Distributed
    Processing Exploration in the Microstructure of
    Cognition, Vol. 1, D. Rumelhart and J. McClelland
    (Eds.), MIT Press, Cambridge, Massachusetts,
    318-362
  • Rumelhart, D.E., McClelland, J.L. (1986). On
    learning the past tense of English verbs. In J.L.
    McClelland D.E. Rumelhart (Eds.), Parallel
    distributed processing Explorations in the
    microstructure of cognition. Cambridge, MA MIT
    Press.
  • Williams, R. J., Zipser, D. (1989). A learning
    algorithm for continually running fully recurrent
    neural networks. Neural Computation, 1, 270--280.
Write a Comment
User Comments (0)
About PowerShow.com