Modelling language origins and evolution - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Modelling language origins and evolution

Description:

These abstractions make it hard for non ... The field should understand that abstraction is not necessarily bad. ... Your model is an abstraction of reality. ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 45
Provided by: TonyBe87
Category:

less

Transcript and Presenter's Notes

Title: Modelling language origins and evolution


1
Modellingthe evolution of languagefor modellers
and non-modellers
  • Benefits of modelling
  • Pitfalls
  • How to communicate your results?

2
Recapitulation
  • Computer simulations are a synthetic science
    (versus analytic science)
  • A theory is implemented as a model.
  • The model is simulated using a computer.

3
Advantages of computer modelling
  • CMs allow us a view on difficult to study
    processes
  • Old, complex or single-occurrence processes.
  • CMs allow us to study mathematically intractable
    problems.
  • Complex non-linear systems such as language.
  • CMs are explicit, detailed, consistent, and
    clear.
  • But that is also its weak point. More on that
    later
  • CMs, through their relative simplicity, allow
    verification.
  • Experimental reproduction is rare in other
    disciplines.

4
More advantages of computer modelling
  • CMs produce falsifiable claims.
  • This is really conducting science in the
    Popperian tradition.
  • CMs produce quantitative predictions.
  • Allowing clear and unambiguous comparison with
    real data.
  • CMs allow exploring different parameter settings
  • Evolutionary, environmental, individual and
    social factors can be easily varied.
  • CMs allow unethical experiments.
  • No permission is needed from your ethical
    commission to do language deprivation experiments
    on agents.

5
Caveats
  • Of course to balance all the advantages,
    computer modelling also has some disadvantages.
  • Being aware of possible problems, might enable us
    to dodge them.

6
Caveat 1 CMs are explicit, detailed, consistent,
and clear
  • Computer models contain simplifications and
    abstractions which are immediately obvious
    because of their clear specification.
  • This makes models lightning rods for criticism.

7
Caveat 1 CMs are explicit, detailed, consistent,
and clear
  • Solutions
  • Obfuscate your model so everyone is awed by its
    complexity and dares not criticise it.
  • Or better, justify every choice made during the
    construction of your model and stress the
    relevance for linguistics.

8
Caveat 2 Too far from reality
  • We want computer models to explain cognitive or
    linguistic phenomena.
  • Examples
  • A grammar is a symbol G with a learning
    probability.
  • An individual creates utterances consisting of
    strings drawn from an alphabet a,b,c,
  • These abstractions make it hard for non-modellers
    to accept CM results.

9
Caveat 2 Too far from reality
  • The field should understand that abstraction is
    not necessarily bad.
  • Most scientific disciplines use abstraction.
    Think of physics or theoretical biology.
  • Verbal models and field research use abstraction
    and assumptions as well, but these are hardly
    ever doubted.

10
Caveat 3 CM is too much fun
  • Too often computer models are just run for the
    fun of it, and the goal of modelling is
    neglected.
  • It is all too tempting to try yet another
    variation of a simulation or add yet another neat
    feature.
  • Eventually you end up with too much data, making
    a proper analysis impossible.

11
Caveat 3 CM is too much fun
  • Solution
  • Define a hypothesis which you will a test using
    CM, work towards testing this hypothesis.
  • Demonstration is good, understanding is better.
  • Do exploratory data analysis look beneath
    immediate results for explanations
  • Look for variability what parameters have an
    influence on the results, what you are looking
    for is a causal effect.

12
Caveat 4 CMs are not embedded in the field
  • Sometimes CM and their results are solitary
  • Models and results are not brought to bear with
    existing theories or existing empirical data.

13
Caveat 4 data should be related back to other
disciplines
  • Solution
  • Start from a claim, and look for existing
    theories in the field.
  • Empirical data is wonderful if you can lay your
    hands on it. But be aware that making the link
    between empirical data and your results is often
    very difficult.
  • Explain how your results might shed new light on
    existing theories, but dont be overconfident.

14
Caveat 5 magic numbers
  • When building models, one inescapably ends up
    introducing magic numbers.
  • Learning rate for a neural network, merging
    parameter for categories, number of possible
    grammars,
  • Sometimes magic numbers are inherent to the
    phenomenon your studying (like in physics).

15
Caveat 5 magic numbers
  • Solution
  • Try to avoid magic numbers (easier said than
    done).
  • Try to choose extreme values, this polarises your
    argument.
  • Learning rate is either 0 for memory-less
    learner, or 1 for a batch-learner (cfr. Gold,
    1967 Nowak, 2001 Zuidema, 2003).
  • Find optimal values for magic numbers.
  • Using some kind of optimisation (e.g. K. Smith,
    2003).
  • Justify the magic numbers as well as possible.
  • Could the magic numbers be the important result
    of your research?
  • Try to make your results insensitive to them.

16
Caveat 6 reification
  • Your model is an abstraction of reality.
  • Even though it behaves as the real thing, are you
    allowed to make claims about the real thing based
    on an abstract model?
  • Are you sure that the dynamics of your model are
    similar to what goes on in the real world?Do
    submarines swim?

17
Caveat 6 reification
  • Solutions
  • Again, the field should understand that
    abstraction is not necessarily bad.
  • Make sure that you do not present simulation
    result as the truth and nothing but the truth.
    CMs do not provide proof!
  • CM is an exploratory tool, and should if
    possible be checked against hard data.

18
Some more practical advice
  • Good advice that each of us neglected once upon
    a time- for doing computational modelling.

19
Control
  • A control is an experiment in which the
    hypothesized cause is left out
  • So the hypothesized effect should not occur
    either.
  • Be aware that placebo effects might occur,
    rendering your control experiment worthless.

20
Control
  • Control experiments provide a base line to check
    your results against.
  • How successful are agents at communicating if
    they randomly generate syntactic rules (instead
    of using grammatical induction)?
  • Are the results where agents use grammatical
    induction significantly better?
  • Without a base line, your results are meaningless.

21
Hypothesis testing
  • Different ways to interpret results
  • Exploratory data analysis looking for patterns
    in the data, often after filtering the data with
    statistical methods.
  • Hypothesis testing however remains superior.

22
Hypothesis testing
  • Example toss a coin ten times, observe eight
    heads. Is the coin fair (i.e., what is its long
    run behavior?) and what is your residual
    uncertainty?
  • You say, If the coin were fair, then eight or
    more heads is pretty unlikely, so I think the
    coin isnt fair.
  • Proof by contradiction Assert the opposite (the
    coin is fair) show that the sample result ( 8
    heads) has low probability p, reject the
    assertion, with residual uncertainty related to
    p.
  • Estimate p with a sampling distribution.

(From Cohen, Gent Walsh)
23
Hypothesis testing
  • If the coin were fair (p .5, the null
    hypothesis) what is the probability distribution
    of r, the number of heads, obtained in N tosses
    of a fair coin? Get it analytically or estimate
    it by simulation (on a computer)
  • Loop K times
  • r 0 r is num.heads in N tosses
  • Loop N times simulate the tosses
  • Generate a random 0 x 1.0
  • If x lt p increment r p is the probability of a
    head
  • Push r onto sampling_distribution
  • Print sampling_distribution

24
Hypothesis testing
  • 10,000 times 10 tossesproduces this distribution
  • This is an estimated distributionusing Monte
    Carlo sampling
  • Probability of 8 or moreheads in N10 tosses
    is0.057
  • As this probability is very low, we can reject
    the null hypothesis (H0 the coin is fair).
  • p 0.057 is the residual uncertainty.

25
Dos and donts
  • Dont throw away old code
  • When programming keep a log of all program code
    and all parameter settings.
  • Use version control.
  • Dont change two things at once in your
    simulation
  • You will never know which parameter caused what.
  • Do collect all your data
  • But be reasonable about this. Gigabyte large data
    files are often of little use.

26
Dos and donts
  • Repeat your experiments
  • Using different settings, different random seeds,
  • Make sure your experiments are reproducible
    (dont end up with a cold fusion experience).
  • Dont trust yourself on bugs
  • Time and time again tiny bugs are discovered in
    code that was taught to be flawless.
  • Do look at the raw data
  • Statistical measures often obfuscate results
    (e.g. outliers are averaged away).

27
Dos and donts
  • Make a fast implementation
  • When your program runs faster, you will do more
    experiments and explore more parameter settings

28
Communication
  • Eventually you want to communicate your
    simulation results to others. How to do that?
  • Bridging the gap between modellers and
    non-modellers using communication.

29
Hallmarks of a good experimental paper
  • Clearly define your goals and claims
  • Perform a large scale test
  • Both in number and size of instances
  • Use a mixture of problems
  • Real-world, random, standard benchmarks, ...
  • Do a statistical analysis of results

(source Bernard Moret David Johnson)
30
Hallmarks continued
  • Place your work in context
  • Compare your work to other work in the field.
  • Mention work by others
  • Ensure reproducibility
  • Forces you to be clear.
  • Adds support to your claims.
  • Publish code and data on the web.
  • Ensure comparability
  • Makes it easier for others to check your results.
  • Report all experimental settings.
  • Do not hide anomalous results.

31
Pitfalls
  • Result could be predicted by back-of-envelope
    calculation.
  • Bad experimental setup
  • To few experiments.
  • Being happy with one lucky run.
  • Poor presentation of data
  • Lack of statistics.
  • No mention of base line
  • Too much statistics, thus neglecting the raw data.

32
Pitfalls continued
  • Failing to report key implementation issues.
  • Extrapolating from tiny samples.
  • Drawing conclusion not supported by the data.
  • Ignoring the literature.

33
Resistance against modelling
  • Modellers often have to answer critical remarks
    from non-modellers.
  • A survey among 30 experienced researchers in the
    field has yielded the following themes.

34
How can you validate this model?
  • Often a mistaken assumption that simulation
    models must be realistic and hence calibrated
    against real data.
  • Or a neglect on the part of the modeller, to not
    make the results falsifiable.

35
You've built in the result"
  • Show how there are parameter settings for the
    model where the particular result in question
    does not emerge.
  • Be clear about what hypotheses the model is
    testing and to maintain a clear distinction
    between data, model and theory.

36
This model stands on its own and has no relation
with any linguistic phenomenon
  • This is only caused by neglecting the existing
    literature.
  • Always embed your model in the proper
    cognitive/linguistic context.
  • Often modellers do not start from empirical data.
  • An appeal for starting for building models on
    existing research.

37
It is possible to build models which come up
with contrary results - how can you 'prove'which
is correct?
  • Every model hinges on its initial assumptions,
    these should be clearly defined and maintained
    throughout the model.
  • Your model is only as good as the initial
    assumptions it is based on.

38
Your model uses evolutionary computing
techniques, but language does not evolve - it is
learned
  • There often is confusion between the techniques
    used and the phenomena which are studied.
  • It is not because some parameter is optimized
    using genetic algorithms, that the phenomenon is
    evolutionary.
  • One should also realize that genetic algorithms
    are by no means a model of evolution, but rather
    an optimization technique

39
I liked your talk. I study Mayan grammatical
constructions, can you incorporate this in your
model?
  • This is a misapprehension about simple idealistic
    models - they are not intended to be exhaustive,
    but instead directed at testing a specific
    hypotheses.

40
Where do modellers publish?
  • Journals sympathetic to computational modelling
  • Artificial Life.
  • Adaptive Behavior.
  • Journal of Artificial Societies and Social
    Simulation.
  • Artificial Intelligence
  • Others
  • Complex Systems
  • Journal of theoretical Biology.
  • Connection Science
  • Studies in Language
  • Advances in Complex Systems
  • Proceedings of the Royal Society of London,
    Series B
  • Brain and Language
  • Cognitive Science
  • Trends in Cognitive Science
  • Verbum
  • Language Typology
  • Sprachtypologie und Universalienforschung
  • Language and Cognitive Processes

41
Where do modellers gather?
  • Evolution of Language Conference
  • International Conference on Artificial Life
  • European Conference on Artificial Life
  • From animals to animats Simulation of Adaptive
    Behavior conference
  • Emergence and Evolution of Linguistic
    Communication

42
What tools do modellers use?
  • Programming languages
  • C, C, Lisp, Objective CAML, Prolog, Scheme,
    Perl, Java,
  • Mathematical packages
  • Matlab, Maxima,
  • Visualization tools
  • GNUplot, xfig, Grace (open source and free tools)
  • MS Excel (for graph plotting)
  • Miscellaneous
  • Tlearn (neural net package), PHYLIP (phylogenetic
    tree reconstruction)
  • NSL simulation environment (neural networks)
  • SPSS (statistics)
  • Praat (phonetics simulator)

43
Take home messages
  • Non-modellers have a hard time understanding your
    terminology and techniques. Explain and justify
    anything you do.
  • Non-modellers often fail to see the usefulness of
    modelling. Place you model in a context and place
    your results in that context. Demonstrate how
    your results provide insights that could not be
    gotten from pen-and-paper analysis.
  • Dont do modelling for the modelling. Take a
    concrete problem and tackle it.

44
Resources
  • Evolution of language resourceshttp//www.isrl.u
    iuc.edu/amag/langev
  • These slides, code and miscellaneous
    stuffhttp//www.ling.ed.ac.uk/
  • paulv/tutorial.html
Write a Comment
User Comments (0)
About PowerShow.com