Preliminary Experiments in Morphological Evolution

About This Presentation

Title:

Preliminary Experiments in Morphological Evolution

Description:

Why do words fall into different inflectional 'equivalence classes' ... Hare, M. and Elman, J. L. (1995) Learning and morphological change. Cognition, 56(1):61--98. ... – PowerPoint PPT presentation

Number of Views:116

Avg rating:3.0/5.0

Slides: 63

Provided by: richar782

Category:

more less

Transcript and Presenter's Notes

Title: Preliminary Experiments in Morphological Evolution

1
(Preliminary)Experiments in Morphological
Evolution

Richard Sproat
University of Illinois at Urbana-Champaign
rws_at_uiuc.edu
3rd Workshop on "Quantitative Investigations in
Theoretical Linguistics" (QITL-3)
Helsinki, 2-4 June 2008

2
Overview

The explananda
Previous work on evolutionary modeling
Computational models and preliminary experiments

3
Phenomena

How do paradigms arise?
Why do words fall into different inflectional
equivalence classes
Why do stem alternations arise?
Why is there syncretism?
Why are there rules of referral?

4
Stem alternations in Sanskrit
zero
guna
Examples from Stump, Gregory (2001) Inflectional
Morphology A Theory of Paradigm Structure.
Cambridge University Press.
5
Stem alternations in Sanskrit
morphomic (Aronoff, M. 1994. Morphology by
Itself. MIT Press.)
vrddhi
lexeme-class particular
lexeme-class particular
6
Evolutionary Modeling (A tiny sample)

Hare, M. and Elman, J. L. (1995) Learning and
morphological change. Cognition, 56(1)61--98.
Kirby, S. (1999) Function, Selection, and
Innateness The Emergence of Language Universals.
Oxford
Nettle, D. "Using Social Impact Theory to
simulate language change". Lingua,
108(2-3)95--117, 1999.
de Boer, B. (2001) The Origins of Vowel Systems.
Oxford
Niyogi, P. (2006) The Computational Nature of
Language Learning and Evolution. Cambridge, MA
MIT Press.

7
Experiment 1 Rules of Referral
8
Rules of referral

Stump, Gregory (1993) On rules of referral.
Language. 69(3), 449-479
(After Zwicky, Arnold (1985) How to describe
inflection. Berkeley Linguistics Society. 11,
372-386.)

9
Latin declensions
10
Are rules of referral interesting?

Are they useful for the learner?
Wouldnt the learner have heard instances of
every paradigm?
Are they historically interesting
Does morphological theory need mechanisms to
explain why they occur?

11
Another example Bögüstani nominal declension
sSg Du Pl
sSg Du Pl
sSg Du Pl
Nom Acc Gen Dat Loc Inst Abl Illat

Bögüstani
A language of Uzbekistan
ISO 639-3 bgs
Population 15,500 (1998 Durieux).
Comments Capsicum chinense and Coffea arabica
farmers

12
Monte Carlo simulation(generating Bögüstani)

Select a re-use bias B
For each language
Generate a set of vowels, consonants and affix
templates
a, i, u, e
n f r w B s x j D
V, C, CV, VC
Decide on p paradigms (minimum 3), r rows
(minimum 2), c columns (minimum 2)

13
Monte Carlo simulation

For each paradigm in the language
Iterate over (r, c)
Let a be previous affix stored for r with p B
retain a in L
Let ß be previous affix stored for c with p B
retain ß in L
If either L is non-empty, set (r, c) to random
choice from L
Otherwise generate a new affix for (r, c)
Store (r, c)s affix for r and c
Note that P(new-affix) (1-B)2

14
Sample language bias 0.04
Consonants x n p w j B t r s S m Vowels a
i u e Templates V, C, CV, VC
15
Sample language bias 0.04
Consonants n f r w B s x j D Vowels a i u
e Templates V, C, CV, VC
16
Sample language bias 0.04
Consonants r p j d G D Vowels a i u e o y
O Templates V, C, CV, VC, CVC, VCV, CVCV, VCVC
17
Sample language bias 0.04
Consonants D k S n b s l t w j B g G d Vowels
a i u e Templates V, C, CV, VC
18
Results of Monte Carlo simulations(8000 runs,
5000 languages per run)
19
Interim conclusion

Syncretism, including rules of referral, may
arise as a chance byproduct of tendencies to
reuse inflectional exponents --- and hence reduce
the number of exponents needed in the system.
Side question is the amount of ambiguity among
inflectional exponents statistically different
from that among lexemes? (cf. Beards
Lexeme-Morpheme-Base Morphology)
Probably not since inflectional exponents tend to
be shorter, so the chances of collisions are much
higher

20
Experiment 2Stabilizing Multiple Paradigms in a
Multiagent Network
21
Paradigm Reduction in Multi-agent Models with
Scale-Free Networks

Agents connected in scale-free network
Only connected agents communicate
Agents more likely to update forms from
interlocutors they trust
Each individual agent has pressure to simplify
its morphology by collapsing exponents
Exponent collapse is picked to minimize an
increase in paradigm entropy
Paradigms may be simplified removing
distinctions and thus reducing paradigm entropy
As the number of exponents decreases so does the
pressure to reduce
Agents analogize paradigms to other words

22
Scale-free networks
23
Scale-free networks

Connection degrees follow the Yule-Simon
distribution
where for sufficiently large k
i.e. reduces to Zipfs law (cf. Baayen, Harald
(2000) Word Frequency Distributions. Springer.)

24
Scale-free vs. Random1000 nodes
25
Relevance of scale-free networks

Social networks are scale-free
Nodes with multiple connections seem to be
relevant for language change.
cf James Milroy and Lesley Milroy (1985)
Linguistic change, social network and speaker
innovation. Journal of Linguistics, 21339384.

26
Scale-free networks in the model

Agents communicate individual forms to other
agents
When two agents differ on a form, one agent will
update its form with a probability p proportional
to how well connected the other agent is
p MaxP X ConnectionDegree(agent)/MaxConnectionDe
gree
(Similar to Page Rank)

27
Paradigm entropy

For exponents f and morphological functions µ,
define the Paradigm Entropy as
(NB this is really just the conditional
entropy)
If each exponent is unambiguous, the paradigm
entropy is 0

28
Example
29
Syncretism tends to be most common in rarer
parts of paradigm
30
Old Latin 1st/2nd Declensions
31
Simulation

100 agents in scale-free or random network
Roughly 250 connections in either case
20 bases
5 cases, 2 numbers each slot associated with
a probability
Max probability of updating ones form for a
given slot given what another agent has is 0.2 or
0.5
Probability of analogizing within ones own
vocabulary is 0.01, 0.02 or 0.05
Also a mode where we force analogy every 50
iterations
Analogize to words within same analogy group (4
such groups in current simulation)
Winner-takes all strategy
(Numbers in the titles of the ensuing plots are
given as UpdateProb/AnalogyProb (e.g. 0.2/0.01))
Run for 1000 iterations

32
Features of simulation

At nth iteration, compute
The paradigm distribution over agents for each
word.
Paradigm purity is the proportion of the winning
paradigm
The number of distinct winning paradigms

33
Scale-free Network 0.2/0.01
34
Scale-free network 0.5/0.05
35
Random network 0.5/0.05
36
Scale-free network 0.5/0.055000 runs
37
Random network 0.5/0.055000 runs
38
Scale-free network 0.5/0.005000 runs No analogy
39
Scale-free network 0.5/0.0030,000 runs No
analogy
40
Sample final state
0.24
0.21
0.095
0.095
0.06
0.12
0.095
0.048
0.024
0.012
41
Adoption of acc/acc/acc/acc/acc/ACC/ACC/ACC/ACC/AC
Cin a 0.5/0.05 run
42
Interim conclusions

Scale-free networks dont seem to matter
convergence behavior seems to be no different
from a random network
Is that a big surprise?
Analogy matters
Paradigm entropy (conditional entropy) might be a
model for paradigm simplification

43
Experiment 3Large-scale multi-agent
evolutionary modeling with learning(work in
progress)
44
Synopsis

System is seeded with a grammar and small number
of agents
Initial grammars all show an agglutinative
pattern
Each agent randomly selects a set of phonetic
rules to apply to forms
Agents are assigned to one of a small number of
social groups
2 parents beget child agents.
Children are exposed to a predetermined number of
training forms combined from both parents
Forms are presented proportional to their
underlying frequency
Children must learn to generalize to unseen slots
for words
Learning algorithm similar to
David Yarowsky and Richard Wicentowski (2001)
"Minimally supervised morphological analysis by
multimodal alignment." Proceedings of ACL-2000,
Hong Kong, pages 207-216.
Features include last n-characters of input form,
plus semantic class
Learners select the optimal surface form to
derive other forms from (optimal requiring the
simplest resulting ruleset a Minimum
Description Length criterion)
Forms are periodically pooled among all agents
and the n best forms are kept for each word and
each slot
Population grows, but is kept in check by
natural disasters and a quasi-Malthusian model
of resource limitations
Agents age and die according to reasonably
realistic mortality statistics

45
Population growth, 300 years
46
Phonological rules

c_assimilation
c_lenition
degemination
final_cdel
n_assimilation
r_syllabification
umlaut
v_nasalization
voicing_assimilation
vowel_apocope
vowel_coalescence
vowel_syncope

K ptkbdgmnNfvTDszSZxGCJlrhX L wy V
aeiouAEIOU_at_0âêîôûÂÊÎÔÛãõÕ Regressive
voicing assimilation b -gt p / - _
?ptkfTsSxC d -gt t / - _ ?ptkfTsSxC g -gt k /
- _ ?ptkfTsSxC D -gt T / - _ ?ptkfTsSxC z -gt
s / - _ ?ptkfTsSxC Z -gt S / - _
?ptkfTsSxC G -gt x / - _ ?ptkfTsSxC J -gt C /
- _ ?ptkfTsSxC
K ptkbdgmnNfvTDszSZxGCJlrhX L wy V
aeiouAEIOU_at_0âêîôûÂÊÎÔÛãõÕ td -gt D /
aeiouâêîôûã? _ ?aeiouâêîôûã pb -gt v /
aeiouâêîôûã? _ ?aeiouâêîôûã gk -gt G /
aeiouâêîôûã? _ ?aeiouâêîôûã
47
Example run

Initial paradigm
Abog placc Abogmeon
Abog pldat Abogmeke
Abog plgen Abogmei
Abog plnom Abogmeko
Abog sgacc Abogaon
Abog sgdat Abogake
Abog sggen Abogai
Abog sgnom Abogako
NUMBER 'a' sg 0.7 'me' pl 0.3
CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen
0.2 'ke' dat 0.1
PHONRULE_WEIGHTING0.60
NUM_TEACHING_FORMS1500

48
Behavior of agent 4517 at 300 years
Abog placc Abogmeon Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaon Abog
sgdat Abogake Abog sggen Abogai Abog
sgnom Abogako
Abog placc Abogmeô Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaô Abog sgdat
Abogake Abog sggen Abogai Abog sgnom
Abogako
lArpux placc lArpuxmeô lArpux pldat
lArpuxmeGe lArpux plgen lArpuxmei lArpux
plnom lArpuxmeGo lArpux sgacc lArpuxaô
lArpux sgdat lArpuxaGe lArpux sggen
lArpuxai lArpux sgnom lArpuxaGo
lIdrab placc lIdravmeô lIdrab pldat
lIdrabmeke lIdrab plgen lIdravmei lIdrab
plnom lIdrabmeGo lIdrab sgacc
lIdravaô lIdrab sgdat lIdravaGe lIdrab
sggen lIdravai lIdrab sgnom lIdravaGo
59 paradigms covering 454 lexemes
49
Another run
50
Another run

Initial paradigm
Adgar placc Adgarmeon
Adgar pldat Adgarmeke
Adgar plgen Adgarmei
Adgar plnom Adgarmeko
Adgar sgacc Adgaraon
Adgar sgdat Adgarake
Adgar sggen Adgarai
Adgar sgnom Adgarako
PHONRULE_WEIGHTING0.80
NUM_TEACHING_FORMS1500

51
Behavior of agent 5061 at 300 years
Albir placc Elbirmen Albir pldat
ElbirmeGe Albir plgen Elbirm Albir plnom
ElbirmeGo Albir sgacc Elbiran Albir
sgdat Elbira Albir sggen Elbi Albir
sgnom Elbira
Abog placc Abogmeon Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaon Abog
sgdat Abogake Abog sggen Abogai Abog
sgnom Abogako
rIsxuf placc rIsxufamen rIsxuf pldat
rIsxufamke rIsxuf plgen rIsxufme rIsxuf
plnom rIsxufmeGo rIsxuf sgacc
rIsxufan rIsxuf sgdat rIsxufaGe rIsxuf
sggen rIsxufa rIsxuf sgnom rIsxufaGo
Utber placc Ubbermen Utber pldat
UbbermeGe Utber plgen Ubberme Utber
plnom UbberameGo Utber sgacc
Ubberan Utber sgdat UbberaGe Utber sggen
Ubbera Utber sgnom UbberaGo
109 paradigms covering 397 lexemes
52
One more example
53
One more example

Initial paradigm as before
PHONRULE_WEIGHTING0.80
NUM_TEACHING_FORMS1000

54
Behavior of agent 4195 at 300 years
Abog placc Abogmeon Abog pldat
Abogmeke Abog plgen Abogmei Abog plnom
Abogmeko Abog sgacc Abogaon Abog
sgdat Abogake Abog sggen Abogai Abog
sgnom Abogako
Odeg placc Odm Odeg pldat Ô Odeg
plgen Odm Odeg plnom Oxm Odeg sgacc
O Odeg sgdat O Odeg sggen O Odeg
sgnom O
fApbof placc fAbofdm fApbof pldat
fAbofm fApbof plgen fAbofdm fApbof plnom
fAbofxm fApbof sgacc fAbof fApbof sgdat
fAbof fApbof sggen fAbof fApbof sgnom fAbof
dugfIp placc dikfIdm dugfIp pldat
dikfÎ dugfIp plgen dikfIdm dugfIp plnom
dikfIxm dugfIp sgacc dikfI dugfIp sgdat
dikfI dugfIp sggen dikfI dugfIp sgnom dikfI
unfEr placc ûfEdm unfEr pldat
ûfÊ unfEr plgen ûfEtm unfEr plnom
ûfExm unfEr sgacc ûfE unfEr sgdat
ûfE unfEr sggen ûfE unfEr sgnom ûfE
exgUp placc exgUdm exgUp pldat
exgÛ exgUp plgen exgUgm exgUp plnom
exgUxm exgUp sgacc exgU exgUp sgdat
exgU exgUp sggen exgU exgUp sgnom exgU
66 paradigms covering 250 lexemes
55
One final example
56
Final example

NUMBER 'a' sg 0.6 'tu' du 0.1 'me' pl
0.3
CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen
0.2 'ke' dat 0.1
PHONRULE_WEIGHTING0.80
NUM_TEACHING_FORMS1000

57
Final example (some agent or other)
Abbus duacc Abbustuon Abbus dudat
Abbustuke Abbus dugen Abbustui Abbus dunom
Abbustuko Abbus placc Abbusmeon Abbus
pldat Abbusmeke Abbus plgen Abbusmei Abbus
plnom Abbusmeko Abbus sgacc Abbusaon Abbus
sgdat Abbusake Abbus sggen Abbusai Abbus
sgnom Abbusako
Agsaf duacc Aksaf Agsaf dudat
AkstuG Agsaf dugen Aksaf Agsaf dunom
Aksaf Agsaf placc Aksafm Agsaf pldat
Aksafm Agsaf plgen Aksafm Agsaf plnom
Aksafm Agsaf sgacc Aksaf Agsaf sgdat
Aksaf Agsaf sggen Aksaf Agsaf sgnom Aksaf
mampEl duacc mãpEl mampEl dudat
mãptuG mampEl dugen mãpEl mampEl dunom
mãpEl mampEl placc mãpElm mampEl pldat
mãpElrm mampEl plgen mãpElm mampEl plnom
mãpElm mampEl sgacc mãpEl mampEl sgdat
mãpEl mampEl sggen mãpEl mampEl sgnom mãpEl
odEs duacc odEs odEs dudat ottuG odEs
dugen odEs odEs dunom oktuG odEs
placc odEsm odEs pldat odEsrm odEs
plgen odEsm odEs plnom odEskm odEs
sgacc odEs odEs sgdat odEs odEs sggen
odEs odEs sgnom odEs
rIndar duacc rÎdar rIndar dudat
rÎttuG rIndar dugen rÎdar rIndar dunom
rÎktuG rIndar placc rÎdarm rIndar pldat
rÎdarm rIndar plgen rÎdarm rIndar plnom
rÎdarm rIndar sgacc rÎdar rIndar sgdat
rÎdar rIndar sggen rÎdar rIndar sgnom rÎdar
171 paradigms covering 228 lexemes
58
Questions

Are there too many paradigms?
Is there too much irregularity?

59
How many paradigms can there be?

Russian nouns belong to one of three declension
patterns. (Wade, Terence (1992) Comprehensive
Russian Grammar. Blackwell, Oxford)
Wade discusses many subclasses
From Zaliznjak, A. (1987) Gramaticheskij slovar
russkogo jazyka, Russki jazyk, Moscow
at least 500 classes spread over 55,000 nouns

60
How irregular can things be? Hindi/Urdu Number
Names
61
Future work

More realistic learning
Incorporate paradigm reduction and analogy
mechanisms from Experiment 2
Add other sources of variation, such as borrowing
of other forms
Develop evaluation metrics
Can we go beyond look Ma, it learns?

62
Acknowledgments

Center for Advanced Studies for release time Fall
2007
The National Science Foundation through TeraGrid
resources provided by the National Center for
Supercomputing Applications
Google Research grant (for infrastructure
originally associated with another project)
For helpful discussion/suggestions
Chen Li
Shalom Lappin
Juliette Blevins
Les Gasser the LEADS group
Audience at UIUC Linguistics Seminar

Write a Comment

User Comments (0)

About PowerShow.com

Preliminary Experiments in Morphological Evolution - PowerPoint PPT Presentation

Preliminary Experiments in Morphological Evolution

Why do words fall into different inflectional 'equivalence classes' ... Hare, M. and Elman, J. L. (1995) Learning and morphological change. Cognition, 56(1):61--98. ... – PowerPoint PPT presentation