Title: A computational study of cross-situational techniques for learning word-to-meaning mappings
1A computational study of cross-situational
techniques for learning word-to-meaning mappings
- Jeffrey Mark Siskind
- Presented by David Goss-Grubbs
- March 5, 2006
2The Problem Mapping Words to Concepts
- Child hears John went to school
- Child sees GO(John, TO(school))
- Child must learn
- John ? John
- went ? GO(x, y)
- to ? TO(x)
- school ? school
3Two Problems
- Referential uncertainty
- MOVE(John, feet)
- WEAR(John, RED(shirt))
- Determining the correct alignment
- John ? TO(x)
- walked ? school
- to ? John
- school ? GO(x, y)
4Helpful Constraints
- Partial Knowledge
- Cross-situational inference
- Covering constraints
- Exclusivity
5Partial Knowledge
- Child hears Mary lifted the block
- Child sees
- CAUSE(Mary, GO(block, UP))
- WANT(Mary, block)
- BE(block, ON(table))
- If the child knows lift contains CAUSE, the
second two hypotheses can be ruled out.
6Cross-situational inference
- John lifted the ball ? CAUSE(John, GO(ball, UP))
- Mary lifted the block ? CAUSE(Mary, GO(block,
UP)) - Thus, lifted ? UP, GO(x, y), GO(x, UP),
CAUSE(x, y), CAUSE(x, GO(y, z)), CAUSE(x, GO(y,
UP))
7Covering constraints
- Assume all components of an utterances meaning
come from the meanings of words in that
utterance. - If it is known that CAUSE is not part of the
meaning of John, the or ball, it must be part of
the meaning of lifted. - (But what about constructional meaning?)
8Exclusivity
- Assume any portion of the meaning of an
utterance comes from no more than one of its
words. - If John walked ? WALK(John) andJohn ? JohnThen
walked can be no more thanwalked ? WALK(x)
9Three more problems
- Bootstrapping
- Noisy Input
- Homonymy
10Bootstrapping
- Lexical acquisition is much easier if some of the
language is already known - Some of Siskinds strategies (e.g.
cross-situational learning) work without such
knowledge - Others (e.g. exclusivity) require it.
- The algorithm starts off slow, then speeds up
11Noise
- Only a subset of all possible meanings will be
available to the algorithm - If none of them contain the correct meaning,
cross-situational learning would cause those
words never to be acquired - Some portion of the input must be ignored.
- (A statistical approach is rejected it is not
clear why)
12Homonymy
- Similar to noisy input, cross-situational
techniques would fail to find a consistent
mapping for homonymous words. - When an inconsistency is found, a split is made.
- If the split is corroborated, a new sense is
created otherwise it is noise.
13The problem, formally stated
- From a sequence of utterances
- Each utterance is an unordered collection of
words - Each utterance is paired with a set of conceptual
expressions - To a lexicon
- The lexicon maps each word to a set of conceptual
expressions, one for each sense of the word
14Composition
- Select one sense for each word
- Find all ways of combining these conceptual
expressions - The meaning of an utterance is derived only from
the meaning of its component words. - Every conceptual expression in the meanings of
the words must appear in the final conceptual
expression (copies are possible)
15The simplified algorithm no noise or homonymy
- Two learning stages
- Stage 1 The set of conceptual symbols
- E.g. CAUSE, GO, UP
- Stage 2 The conceptual expression
- CAUSE(x, GO(y, UP))
16Stage 1 Conceptual symbol set
- Maintain sets of necessary and possible
conceptual symbols for each word - Initialize the former to the empty set and the
latter to the universal set - Utterances will increase the necessary set and
decrease the possible set, until they converge on
the actual conceptual symbol set
17Stage 2 Conceptual expression
- Maintain a set of possible conceptual expressions
for each word - Initialize to the set of all expressions that can
be composed from the actual conceptual symbol set - New utterances will decrease the possible
conceptual expression set until only one remains
18Example
necessary Possible
John John John, ball
Took CAUSE CAUSE, WANT, GO, TO, arm
The WANT, arm
Ball ball ball, arm
19Selecting the meaning
- John took the ball
- CAUSE(John, GO(ball, TO(John)))
- WANT(John, ball)
- CAUSE(John, GO(PART-OF (LEFT(arm), John),
TO(ball))) - Second is eliminated because no CAUSE
- Third is eliminated because no word has LEFT or
PART-OF
20Updated table
necessary Possible
John John John
Took CAUSE, GO, TO CAUSE, GO, TO
The
Ball ball ball
21Stage 2
CAUSE(John, GO(ball, TO(John))) CAUSE(John, GO(ball, TO(John)))
John John
Took CAUSE(x, GO(y, TO(x)))
The
Ball ball
22Noise and Homonymy
- Noisy or homonymous data can corrupt the lexicon
- Adding an incorrect element to the set of
necessary elements - Taking a correct element away from the set of
possible elements - This may or may not create an inconsistent entry
23Extended algorithm
- Necessary and possible conceptual symbols are
mapped to senses rather than words - Words are mapped to their senses
- Each sense has a confidence factor
24Sense assignment
- For each utterance, find the cross-product of all
the senses - Choose the best consistent sense assignment
- Update the entries for those senses as before
- Add to a senses confidence factor each time it
is used in a preferred assignment
25Inconsistent utterances
- Add the minimal number of new senses until the
utterance is no longer inconsistent three
possibilities - If the current utterance is noise, new senses are
bad (and will be ignored) - There really are new senses
- The original senses were bad, and the right
senses are only now being added. - On occasion, remove senses with low confidence
factors
26Four simulations
- Vary the task along five parameters
- Vocabulary growth rate by size of corpus
- Number of required exposures to a word by size of
corpus - How high can it scale?
27Method (1 of 2)
- Construct a random lexicon
- Vary it by three parameters
- Vocabulary size
- Homonymy rate
- Conceptual-symbol inventory size
28Method (2 of 2)
- Construct a series of utterances, each paired
with a set of meaning hypotheses - Vary this by the following parameters
- Noise rate
- Degree of referential uncertainty
- Cluster size (5)
- Similarity probability (.75)
29Sensitivity analysis
30Vocabulary size
31Degree of referential uncertainty
32Noise rate
33Conceptual-symbol inventory size
34Homonymy rate
35Vocabulary Growth
36Number of exposures