Modelling language origins and evolution - PowerPoint PPT Presentation


PPT – Modelling language origins and evolution PowerPoint presentation | free to view - id: 89c39-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Modelling language origins and evolution


These abstractions make it hard for non ... The field should understand that abstraction is not necessarily bad. ... Your model is an abstraction of reality. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 45
Provided by: TonyBe87


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Modelling language origins and evolution

Modelling the evolution of language for modellers
and non-modellers
  • Benefits of modelling
  • Pitfalls
  • How to communicate your results?

  • Computer simulations are a synthetic science
    (versus analytic science)
  • A theory is implemented as a model.
  • The model is simulated using a computer.

Advantages of computer modelling
  • CMs allow us a view on difficult to study
  • Old, complex or single-occurrence processes.
  • CMs allow us to study mathematically intractable
  • Complex non-linear systems such as language.
  • CMs are explicit, detailed, consistent, and
  • But that is also its weak point. More on that
  • CMs, through their relative simplicity, allow
  • Experimental reproduction is rare in other

More advantages of computer modelling
  • CMs produce falsifiable claims.
  • This is really conducting science in the
    Popperian tradition.
  • CMs produce quantitative predictions.
  • Allowing clear and unambiguous comparison with
    real data.
  • CMs allow exploring different parameter settings
  • Evolutionary, environmental, individual and
    social factors can be easily varied.
  • CMs allow unethical experiments.
  • No permission is needed from your ethical
    commission to do language deprivation experiments
    on agents.

  • Of course… to balance all the advantages,
    computer modelling also has some disadvantages.
  • Being aware of possible problems, might enable us
    to dodge them.

Caveat 1 CMs are explicit, detailed, consistent,
and clear
  • Computer models contain simplifications and
    abstractions which are immediately obvious
    because of their clear specification.
  • This makes models lightning rods for criticism.

Caveat 1 CMs are explicit, detailed, consistent,
and clear
  • Solutions
  • Obfuscate your model so everyone is awed by its
    complexity and dares not criticise it.
  • Or better, justify every choice made during the
    construction of your model and stress the
    relevance for linguistics.

Caveat 2 Too far from reality
  • We want computer models to explain cognitive or
    linguistic phenomena.
  • Examples
  • A grammar is a symbol G with a learning
  • An individual creates utterances consisting of
    strings drawn from an alphabet a,b,c, …
  • …
  • These abstractions make it hard for non-modellers
    to accept CM results.

Caveat 2 Too far from reality
  • The field should understand that abstraction is
    not necessarily bad.
  • Most scientific disciplines use abstraction.
    Think of physics or theoretical biology.
  • Verbal models and field research use abstraction
    and assumptions as well, but these are hardly
    ever doubted.

Caveat 3 CM is too much fun
  • Too often computer models are just run for the
    fun of it, and the goal of modelling is
  • It is all too tempting to try yet another
    variation of a simulation or add yet another neat
  • Eventually you end up with too much data, making
    a proper analysis impossible.

Caveat 3 CM is too much fun
  • Solution
  • Define a hypothesis which you will a test using
    CM, work towards testing this hypothesis.
  • Demonstration is good, understanding is better.
  • Do exploratory data analysis look beneath
    immediate results for explanations
  • Look for variability what parameters have an
    influence on the results, what you are looking
    for is a causal effect.

Caveat 4 CMs are not embedded in the field
  • Sometimes CM and their results are solitary
  • Models and results are not brought to bear with
    existing theories or existing empirical data.

Caveat 4 data should be related back to other
  • Solution
  • Start from a claim, and look for existing
    theories in the field.
  • Empirical data is wonderful if you can lay your
    hands on it. But be aware that making the link
    between empirical data and your results is often
    very difficult.
  • Explain how your results might shed new light on
    existing theories, but dont be overconfident.

Caveat 5 magic numbers
  • When building models, one inescapably ends up
    introducing magic numbers.
  • Learning rate for a neural network, merging
    parameter for categories, number of possible
    grammars, …
  • Sometimes magic numbers are inherent to the
    phenomenon your studying (like in physics).

Caveat 5 magic numbers
  • Solution
  • Try to avoid magic numbers (easier said than
  • Try to choose extreme values, this polarises your
  • Learning rate is either 0 for memory-less
    learner, or 1 for a batch-learner (cfr. Gold,
    1967 Nowak, 2001 Zuidema, 2003).
  • Find optimal values for magic numbers.
  • Using some kind of optimisation (e.g. K. Smith,
  • Justify the magic numbers as well as possible.
  • Could the magic numbers be the important result
    of your research?
  • Try to make your results insensitive to them.

Caveat 6 reification
  • Your model is an abstraction of reality.
  • Even though it behaves as the real thing, are you
    allowed to make claims about the real thing based
    on an abstract model?
  • Are you sure that the dynamics of your model are
    similar to what goes on in the real world? Do
    submarines swim?

Caveat 6 reification
  • Solutions
  • Again, the field should understand that
    abstraction is not necessarily bad.
  • Make sure that you do not present simulation
    result as the truth and nothing but the truth.
    CMs do not provide proof!
  • CM is an exploratory tool, and should if
    possible be checked against hard data.

Some more practical advice
  • Good advice that each of us neglected once upon
    a time- for doing computational modelling.

  • A control is an experiment in which the
    hypothesized cause is left out
  • So the hypothesized effect should not occur
  • Be aware that placebo effects might occur,
    rendering your control experiment worthless.

  • Control experiments provide a base line to check
    your results against.
  • How successful are agents at communicating if
    they randomly generate syntactic rules (instead
    of using grammatical induction)?
  • Are the results where agents use grammatical
    induction significantly better?
  • Without a base line, your results are meaningless.

Hypothesis testing
  • Different ways to interpret results
  • Exploratory data analysis looking for patterns
    in the data, often after filtering the data with
    statistical methods.
  • Hypothesis testing however remains superior.

Hypothesis testing
  • Example toss a coin ten times, observe eight
    heads. Is the coin fair (i.e., what is its long
    run behavior?) and what is your residual
  • You say, If the coin were fair, then eight or
    more heads is pretty unlikely, so I think the
    coin isnt fair.
  • Proof by contradiction Assert the opposite (the
    coin is fair) show that the sample result ( 8
    heads) has low probability p, reject the
    assertion, with residual uncertainty related to
  • Estimate p with a sampling distribution.

(From Cohen, Gent Walsh)
Hypothesis testing
  • If the coin were fair (p .5, the null
    hypothesis) what is the probability distribution
    of r, the number of heads, obtained in N tosses
    of a fair coin? Get it analytically or estimate
    it by simulation (on a computer)
  • Loop K times
  • r 0 r is num.heads in N tosses
  • Loop N times simulate the tosses
  • Generate a random 0 x 1.0
  • If x lt p increment r p is the probability of a
  • Push r onto sampling_distribution
  • Print sampling_distribution

Hypothesis testing
  • 10,000 times 10 tosses produces this distribution
  • This is an estimated distribution using Monte
    Carlo sampling
  • Probability of 8 or more heads in N10 tosses
    is 0.057
  • As this probability is very low, we can reject
    the null hypothesis (H0 the coin is fair).
  • p 0.057 is the residual uncertainty.

Dos and donts…
  • Dont throw away old code
  • When programming keep a log of all program code
    and all parameter settings.
  • Use version control.
  • Dont change two things at once in your
  • You will never know which parameter caused what.
  • Do collect all your data
  • But be reasonable about this. Gigabyte large data
    files are often of little use.

Dos and donts…
  • Repeat your experiments
  • Using different settings, different random seeds,
  • Make sure your experiments are reproducible
    (dont end up with a cold fusion experience).
  • Dont trust yourself on bugs
  • Time and time again tiny bugs are discovered in
    code that was taught to be flawless.
  • Do look at the raw data
  • Statistical measures often obfuscate results
    (e.g. outliers are averaged away).

Dos and donts
  • Make a fast implementation
  • When your program runs faster, you will do more
    experiments and explore more parameter settings

  • Eventually you want to communicate your
    simulation results to others. How to do that?
  • Bridging the gap between modellers and
    non-modellers using communication.

Hallmarks of a good experimental paper
  • Clearly define your goals and claims
  • Perform a large scale test
  • Both in number and size of instances
  • Use a mixture of problems
  • Real-world, random, standard benchmarks, ...
  • Do a statistical analysis of results

(source Bernard Moret David Johnson)
Hallmarks continued
  • Place your work in context
  • Compare your work to other work in the field.
  • Mention work by others
  • Ensure reproducibility
  • Forces you to be clear.
  • Adds support to your claims.
  • Publish code and data on the web.
  • Ensure comparability
  • Makes it easier for others to check your results.
  • Report all experimental settings.
  • Do not hide anomalous results.

  • Result could be predicted by back-of-envelope
  • Bad experimental setup
  • To few experiments.
  • Being happy with one lucky run.
  • Poor presentation of data
  • Lack of statistics.
  • No mention of base line
  • Too much statistics, thus neglecting the raw data.

Pitfalls continued
  • Failing to report key implementation issues.
  • Extrapolating from tiny samples.
  • Drawing conclusion not supported by the data.
  • Ignoring the literature.

Resistance against modelling
  • Modellers often have to answer critical remarks
    from non-modellers.
  • A survey among 30 experienced researchers in the
    field has yielded the following themes.

How can you validate this model?
  • Often a mistaken assumption that simulation
    models must be realistic and hence calibrated
    against real data.
  • Or a neglect on the part of the modeller, to not
    make the results falsifiable.

You've built in the result"
  • Show how there are parameter settings for the
    model where the particular result in question
    does not emerge.
  • Be clear about what hypotheses the model is
    testing and to maintain a clear distinction
    between data, model and theory.

This model stands on its own and has no relation
with any linguistic phenomenon
  • This is only caused by neglecting the existing
  • Always embed your model in the proper
    cognitive/linguistic context.
  • Often modellers do not start from empirical data.
  • An appeal for starting for building models on
    existing research.

It is possible to build models which come up
with contrary results - how can you 'prove' which
is correct?
  • Every model hinges on its initial assumptions,
    these should be clearly defined and maintained
    throughout the model.
  • Your model is only as good as the initial
    assumptions it is based on.

Your model uses evolutionary computing
techniques, but language does not evolve - it is
  • There often is confusion between the techniques
    used and the phenomena which are studied.
  • It is not because some parameter is optimized
    using genetic algorithms, that the phenomenon is
  • One should also realize that genetic algorithms
    are by no means a model of evolution, but rather
    an optimization technique

I liked your talk. I study Mayan grammatical
constructions, can you incorporate this in your
  • This is a misapprehension about simple idealistic
    models - they are not intended to be exhaustive,
    but instead directed at testing a specific

Where do modellers publish?
  • Journals sympathetic to computational modelling
  • Artificial Life.
  • Adaptive Behavior.
  • Journal of Artificial Societies and Social
  • Artificial Intelligence
  • Others
  • Complex Systems
  • Journal of theoretical Biology.
  • Connection Science
  • Studies in Language
  • Advances in Complex Systems
  • Proceedings of the Royal Society of London,
    Series B
  • Brain and Language
  • Cognitive Science
  • Trends in Cognitive Science
  • Verbum
  • Language Typology
  • Sprachtypologie und Universalienforschung
  • Language and Cognitive Processes

Where do modellers gather?
  • Evolution of Language Conference
  • International Conference on Artificial Life
  • European Conference on Artificial Life
  • From animals to animats Simulation of Adaptive
    Behavior conference
  • Emergence and Evolution of Linguistic
  • …

What tools do modellers use?
  • Programming languages
  • C, C, Lisp, Objective CAML, Prolog, Scheme,
    Perl, Java, …
  • Mathematical packages
  • Matlab, Maxima, …
  • Visualization tools
  • GNUplot, xfig, Grace (open source and free tools)
  • MS Excel (for graph plotting)
  • Miscellaneous
  • Tlearn (neural net package), PHYLIP (phylogenetic
    tree reconstruction)
  • NSL simulation environment (neural networks)
  • SPSS (statistics)
  • Praat (phonetics simulator)

Take home messages
  • Non-modellers have a hard time understanding your
    terminology and techniques. Explain and justify
    anything you do.
  • Non-modellers often fail to see the usefulness of
    modelling. Place you model in a context and place
    your results in that context. Demonstrate how
    your results provide insights that could not be
    gotten from pen-and-paper analysis.
  • Dont do modelling for the modelling. Take a
    concrete problem and tackle it.

  • Evolution of language resources http//www.isrl.u
  • These slides, code and miscellaneous
    stuff http//
  • paulv/tutorial.html