Introduction to Computational Natural Language Learning Linguistics 79400 Under: Topics in Natural L - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Introduction to Computational Natural Language Learning Linguistics 79400 Under: Topics in Natural L

Description:

Each of these words represents a point in 150-Dimentional space averaged from ... 2. represented over context wrt to words that come before. ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 12
Provided by: x764
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Computational Natural Language Learning Linguistics 79400 Under: Topics in Natural L


1
Introduction to Computational Natural Language
LearningLinguistics 79400 (Under Topics in
Natural Language Processing)Computer Science
83000 (Under Topics in Artificial
Intelligence)The Graduate School of the City
University of New YorkFall 2001
  • William Gregory Sakas
  • Hunter College, Department of Computer Science
  • Graduate Center, PhD Programs in Computer Science
    and Linguistics
  • The City University of New York

2
Elmans Single Recurrent Network
book
boy
dog
run
rock
see
eat
1-to-1 exact copy of activations
"regular" trainable weight connections
book
boy
dog
run
rock
see
eat
1) activate from input to output as usual (one
input word at a time), but copy the hidden
activations to the context layer. 2) repeat 1
over and over - but activate from the input AND
copy layers to the ouput layer.
3
From Elman (1990) Templates were set up and
lexical items were chosen at random from
"reasonable" categories.
Categories of lexical items NOUN-HUM man,
woman NOUN-ANIM cat, mouse NOUN-INANIM book,
rock NOUN-AGRESS dragon, monster NOUN-FRAG glass,
plate NOUN-FOOD cookie, sandwich VERB-INTRAN
think, sleep VERB-TRAN see, chase VERB-AGPA move,
break VERB-PERCEPT smell, see VERB-DESTROY break,
smash VERB-EA eat
Templates for sentence generator NOUN-HUM
VERB-EAT NOUN-FOOD NOUN-HUM VERB-PERCEPT
NOUN-INANIM NOUN-HUM VERB-DESTROY
NOUN-FRAG NOUN-HUM VERB-INTRAN NOUN-HUM VERB-TRAN
NOUN-HUM NOUN-HUM VERB-AGPAT NOUN-INANIM NOUN-HUM
VERB-AGPAT NOUN-ANIM VERB-EAT NOUN-FOOD NOUN-ANIM
VERB-TRAN NOUN-ANIM NOUN-ANIM VERB-AGPAT
NOUN-INANIM NOUN-ANIM VERB-AGPAT NOUN-INANIM
VERB-AGPAT NOUN-AGRESS VERB-DESTORY
NOUN-FRAG NOUN-AGRESS VERB-EAT NOUN-HUM NOUN-AGRES
S VERB-EAT NOUN-ANIM NOUN-AGRESS VERB-EAT
NOUN-FOOD
4
Training data
Supervisor's answers
womansmashplatecatmovemanbreakcarboymove
girleatbreaddog
smashplatecatmovemanbreakcarboymovegirle
atbreaddogmove
Resulting training and supervisor files. Files
were 27,354 words long, made up of 10,000 two and
three word "sentences."
5
Cluster (Similarity) analysis
Hidden activations were for each word were
averaged together.
For simplicity assume only 3 hidden nodes (in
fact there were 150).After the SRN was trained,
the file was run through the network. The
activations at the hidden nodes was recorded (I
made up these numbers for the example).Now the
average was taken for every word
boysmashplate...dragoneatboy...boyeat
cookie...
lt.5 .3 .2gtlt.4 .4 .2gt lt.2 .3 .8gt...lt.6 .1
.3gtlt.1 .2 .4gt lt.9 .9 .7gt...lt.7 .6 .7gtlt.4
.3 .6gt lt.2 .3 .4gt
lt.70 .60 .53gtlt.40 .40 .20gt lt.20 .30 .80gtlt.60
.10 .30gtlt.25 .25 .50gt lt.20 .30 .40gt
boysmashplatedragoneatcookie
Each of these vectors represents a point in 3-D
space some vectors are close together, some
furthur apart.
6
Each of these words represents a point in
150-Dimentional space averaged from all
activations generated by the network when
processing that word. Each joint (where there is
a connection) represents the distance between
clusters. So for example, the distance between
animals and humans is approx .85 and the
distance between ANIMATES and INANIMATES is
approx 1.5.
7
Seems to correctly discover Nouns vs Verbs, verb
subcategorization, animates/inanimates, etc.
Cool, eh? Remarks No information is represented
in the input (localist, orthogonal) There are no
"rules" in the traditional sense. The categories
are learned from statistical regularities in the
sentences there is no structure being provided
to the network (more on this in a bit) There are
no "symbols" in the traditional sense. Classic
symbol manipulating systems use names for
well-defined classes of entities (N, V, adj,
etc). In an SRN the representation of the concept
of, say, boy, is1. distributed (as a vector
of activations), and2. represented over context
wrt to words that come before. (E.g. boy is
represented one way when used as an object and
another when used in subject position) Although.
note that when a cluster analysis is performed on
specific occurrences of a word, the cluster is
very tight, but there is some variation based on
a words context.
8
From Elman (1991) Constituency, long distance
relations, optionality.
A simple context-free grammar was used S -gt NP
VP . NP -gt PropN N N RC VP -gt V (NP) RC -gt
who NP VP who VP (NP) N -gt boy girl cat
dog boys girls cats dogs V -gt chase
feed see hear walk live chases feeds
sees hears walks lives PropN -gt John
Mary Plus constraints on number agreement, and
verb argument subcategorization.
9
This allows a variety of interesting sentences
that were used for training. (note 'd items were
not used for training. For you CS people out
there, frequently means ungrammatical) Dogs
live.Dogs live cats. Boys see. Boys see
dogs.Boys see dog. Boys hit dogs.Boys
hit. Dog who chases cat sees girl.Dog who
chase cat sees girl. Dog who cat chases sees
girl. Boys who girls who dogs chase see
hear. Boys see dogs who see girls who hear.Boys
see dogs who see girls.Boys see dogs.Boys see.
Transitive
Optionally transitive
intransitive
long distance number agreement
Ambiguous sentence boundaries.
10
  • Boys who Mary chase feed cats.
  • This is much, much difficult input than Elman
    1990.
  • Long distance agreement
  • chases agrees with Boys, but who Mary is in the
    way.
  • Subcategorization
  • chases is mandatorily transitive, but in a
    relative clause, the network has to NOT mistake
    it as the independent sentence Mary chases.

11
Analysis of results Principle Component
Analysis. Suppose you have 3 hidden nodes and
four vectors of activation that correspond to
boysubj, boyobj, girlsubj, girlobj.
And hierarchical clustering gives you this
boysubj
boyobj
girlobj
girlsubj
Adapted from Crocker (2001)
Write a Comment
User Comments (0)
About PowerShow.com