New approaches to language and prehistory from typology, genetics, and quantitative linguistics - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

New approaches to language and prehistory from typology, genetics, and quantitative linguistics

Description:

Relationship between the Order of Object and Verb and the Order of Adjective and Noun ... Adjectives without Nouns. 28.1. Prefixing vs. Suffixing in ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 50
Provided by: emailE
Category:

less

Transcript and Presenter's Notes

Title: New approaches to language and prehistory from typology, genetics, and quantitative linguistics


1
New approaches to language andprehistory from
typology, genetics,and quantitative linguistics
Søren Wichmann MPI-EVA Leiden University
2
Lecture III The utility of phylogenetic
algorithms and software
3
Software used
PHYLIP (http//evolution.genetics.washington.edu/p
hylip.html) Splitstree (www.splitstree.org) PAUP
(www.sinauer.com) MrBayes (www.mrbayes.net) TreeVi
ew (http//taxonomy.zoology.gla.ac.uk/rod/treeview
.html)
4
Selecting or weighting features
5
Selecting or weighting features
  • Assumption-1 The feature value which is most
    favored in a given genus is the one that should
    be reconstructed for the proto-language of the
    genus.

6
Selecting or weighting features
  • Assumption-1 The feature value which is most
    favored in a given genus is the one that should
    be reconstructed for the proto-language of the
    genus.
  • Assumption-2 The better represented the most
    favored feature value in a given genus is, the
    more stable that feature may be assumed to be.

7
Selecting or weighting features
  • Assumption-1 The feature value which is most
    favored in a given genus is the one that should
    be reconstructed for the proto-language of the
    genus.
  • Assumption-2 The better represented the most
    favored feature value in a given genus is, the
    more stable that feature may be assumed to be.
  • Strategy study the distribution of values of a
    given feature for each genus and then calculate
    an average of how well represented the best
    represented value is throughout all genera in the
    WALS sample.

8
Selecting or weighting features
  • Assumption-1 The feature value which is most
    favored in a given genus is the one that should
    be reconstructed for the proto-language of the
    genus.
  • Assumption-2 The better represented the most
    favored feature value in a given genus is, the
    more stable that feature may be assumed to be.
  • Strategy study the distribution of values of a
    given feature for each genus and then calculate
    an average of how well represented the best
    represented value is throughout all genera in the
    WALS sample.
  • Problem how are we to compare the stability of
    features when three variables are involved the
    number of occurrences of the best represented
    feature value, the number of possible feature
    values, and the number of languages for which the
    feature is attested in the WALS sample?

9
Selecting or weighting features (cont.)
  • Exemplification of problem
  • How do we compare the stability of the two
    following features in Germanic given the
    variables indicated?

10
Solving such problems by handa simple example
  • k (number of possible values) 2 (a and b)
  • n (number of languages) 4
  • r (number of times that the best represented
    feature occurs)
  • Distributional possibilities
  • r
  • aaaa 4
  • bbbb 4
  • aaab 3
  • aaba 3
  • abaa 3
  • baaa 3
  • abbb 3
  • babb 3
  • bbab 3
  • bbba 3
  • aabb 2
  • abab 2

11
Solving such problems by handa simple example
  • k (number of possible values) 2 (a and b)
  • n (number of languages) 4
  • r (number of times that the best represented
    feature occurs)
  • Distributional possibilities
    Probabilities
  • r r probability
  • aaaa 4 4 2/16
  • bbbb 4 3 8/16
  • aaab 3 2 6/16
  • aaba 3
  • abaa 3
  • baaa 3
  • abbb 3
  • babb 3
  • bbab 3
  • bbba 3
  • aabb 2
  • abab 2

12
A formula for calculating the probability or
p-value for any set of (n, k, r)
k number of possible values n number of
languages r number of times that the best
represented feature occurs
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
A sample of 5 pairs of related languages for
testing different methods (selection criterion
best documented genealogical pairs in WALS
dataset) Athapaskan Slave Navajo Chibch
an Ika Rama Uto-Aztecan Yaqui Comanch
e Oto-Manguean Chalcatongo Mixtec Lealao
Chinantec Carib Hixkaryana Macushi
21
Neighbour-Joining using the 17 highest-ranking
features
22
Neighbour-Joining using the 17 lowest-ranking
features
23
A Neighbour Net representation (using SplitsTree)
24
Maximum-Parsimony analysis using Paup
25
Bayesian analysis using MrBayes
26
Tree generated by (presumed) knowledge of
ancestral states, using 23 informative Native
American founder features
27
Effects of using p-values for weighting
Method Branch and Bound Bootstrap Search
(Bootstrap 50 majority-rule consensus tree) No
Matrices, Equal Weighting /----------------------
-------------------------------------------------
Slave(1) --------------------------------------
--------------------------------- Navajo(2)

/------------------------ Yaqui(3)
/----------91----------
\------------------
------ Comanche(4)
------------------------------
----------------- NezPerce(5)
-------------------
---------------------------- HanisCoos(6) \-------
---96-----------
/------------------------
ChalMixtec(7)
----------85----------
\------------------------
LealChinantec(8)

/------------------------ Hixkaryana(9)
\----------54----------

\------------------------ Macushi(10)

28
Effects of using p-values for weighting
Method Branch and Bound Bootstrap Search
(Bootstrap 50 majority-rule consensus tree) No
Matrices, Equal Weighting /----------------------
-------------------------------------------------
Slave(1) --------------------------------------
--------------------------------- Navajo(2)

/------------------------ Yaqui(3)
/----------91----------
\------------------
------ Comanche(4)
------------------------------
----------------- NezPerce(5)
-------------------
---------------------------- HanisCoos(6) \-------
---96-----------
/------------------------
ChalMixtec(7)
----------85----------
\------------------------
LealChinantec(8)

/------------------------ Hixkaryana(9)
\----------54----------

\------------------------ Macushi(10)
No Matrices, PValue /--------------------------
---------------------------------------------
Slave(1) --------------------------------------
--------------------------------- Navajo(2)

/------------------------ Yaqui(3)
/----------87----------
\------------------
------ Comanche(4)
------------------------------
----------------- NezPerce(5)
-------------------
---------------------------- HanisCoos(6) \-------
---94-----------
/------------------------
ChalMixtec(7)
----------90----------
\------------------------
LealChinantec(8)

/------------------------ Hixkaryana(9)
\----------60----------

\------------------------ Macushi(10)
29
An additional method for enhancing phylogenetic
signals step matrices
Step matrices specify how many steps a language
has to pass through to get from one feature value
to another. The number of steps feeds into the
calculation of the most parsimonious tree.
30
An additional method for enhancing phylogenetic
signals step matrices
Step matrices specify how many steps a language
has to pass through to get from one feature value
to another. The number of steps feeds into the
calculation of the most parsimonious tree. A
simple example THE VELAR NASAL (WALS feature
no. 9)
31
A consideration one step is harder to take if
the value is universally rare, and easier if the
value is universally common.
32
  • A consideration one step is harder to take if
    the value is universally rare, and easier if the
    value is universally common.
  • We stipulate for the extreme cases that
  • going to a feature value shared by 100 of all
    languages in
  • the sample is a non-step, i.e. it should
    subtract one
  • step from the step matrix.

33
  • A consideration one step is harder to take if
    the value is universally rare, and easier if the
    value is universally common.
  • We stipulate for the extreme cases that
  • going to a feature value shared by 100 of all
    languages in
  • the sample is a non-step, i.e. it should
    subtract one
  • step from the step matrix.
  • going to a value shared by (100/v) of all
    languages
  • (where v the number of values) should
    neither add to or
  • detract from the number of steps.

34
  • A consideration one step is harder to take if
    the value is universally rare, and easier if the
    value is universally common.
  • We stipulate for the extreme cases that
  • going to a feature value shared by 100 of all
    languages in
  • the sample is a non-step, i.e. it should
    subtract one
  • step from the step matrix
  • going to a value shared by (100/v) of all
    languages
  • (where v the number of values) should
    neither add to or
  • detract from the number of steps
  • going to a value that none of all languages have
    adds one
  • step extra to the matrix

35
  • A consideration one step is harder to take if
    the value is universally rare, and easier if the
    value is universally common.
  • We stipulate for the extreme cases that
  • going to a feature value shared by 100 of all
    languages in
  • the sample is a non-step, i.e. it should
    subtract one
  • step from the step matrix
  • going to a value shared by (100/v) of all
    languages
  • (where v the number of values) should
    neither add to or
  • detract from the number of steps
  • going to a value that none of all languages have
    adds one
  • step extra to the matrix
  • These stipulations allow us to set up a formula
    to modify the steps in the matrix, taking into
    account world-wide distributions.

36
The formula is of a polynomial nature and has the
following shape s steps added or detracted
(max 1, min -1) w world-wide distribution
(percent of all languages in the sample) v
number of feature values s (v(v 2)/(v
1))w2 ((v2 2)/(v - 1))w 1
37
Returning to the example of the velar nasal, we
find that the world-wide distribution in the WALS
sample is No velar nasal 234 ? w 50.0 ? s
-0.38 Non-initial only 88 ? w 18.8 ? s
0.39 Initial and non-initial 146 ? w 31.2 ? s
0.05
Thus the step matrix should be modified as
follows
Original
38
Returning to the example of the velar nasal, we
find that the world-wide distribution in the WALS
sample is No velar nasal 234 ? w 50.0 ? s
-0.38 Non-initial only 88 ? w 18.8 ? s
0.39 Initial and non-initial 146 ? w 31.2 ? s
0.05
Thus the step matrix should be modified as
follows
Original
Modified
39
Effects of using step matrices for computing
genealogical trees
40
What happens when more data is added? (63
languages, 96 features)
41
Forcing known, shallow nodes by adding lexical
data
'Acoma rrcaaaqanrarn????aca...rcrc??r?rnn?aaa???
aa 'Apurina' arnaaaaanaaaaanraaaa...rcrrdrarraeq
???d??aa 'Araona' cadddaarnraaanga?aa?...?aaar
?arqrar???d???? 'Arawak' ????????n????rgaa???..
.rcrra?a?r??q???drr?? 'AwaPit'
aannaaadnrana????aan...rdraa?rrqncarradrraa 'Ayma
ra' dacaadrrn?araqga?aan...rcdc??arrrra????ar??
'BarasanoA ????????na?????????n...rrqan?rraaernr
a?rraa 'Cahuilla' nadaadarrranargeaaa?...rcdn?dr
rr??qnaa?aa?? 'CanelaKra' anaaaaara?anaega?aaa...
?nraa?r?raeaaara??aa 'Carib'
rrrrraaanraraaqd?aa?...?crc??arararrrr?ar?? 'Cayu
vava' rnrrdaaanaaaacgaaaad...rcnr??a?aae???????aa
'ChinantecB nrnrraara?arnaea?aa?...rrra??ar?aea?
???rr?? 'Comanche' araaaaaanrarargaaaa?...ncrnr?a
rrnna???arr?? 'CoosHani' crcrrdrdnranaanacaaa...r
cccn?arraeaaaa???aa 'CreePlains aaraaaaanraracgaa
aaa...nccnn?rrraeaaaa?rraa 'EpenaPedee rnrrraaana
araanr?dar...caaaa?rrarcraara???? ..............
............................. 'UrubuKaap'
rrraaaaaraaraegaraa?...rrnaa?rrraeaaarcra?? 'Wara
o' arraaaannaaaaqgaaaar...?naaadararar???
arraa 'Wari' rrraaaaanarraegacnaa...dcrr
ndandraq??????ar 'Wichi'
nrnaarqrnrarargaaaan...?ccrndrrrnranaarrrar 'Wich
ita' nacaaaqanra?aaccnraa...rccr?arrrnnanra?r
raa 'Yagua' araaaaaanaarr????nac...ncrrnd
rrraearrra??aa 'Yaqui' rrnraaarnrarrrgeaa
an...qaaandarraeaaard??ar 'Yuchi'
crdrraqdnaa?aaeqdaaa...rcrd??rnaaer??????aa 'Yuro
k' drdaaaqdnrana????aaa...rcrc??a?rnqaaar
???an 'ZoqueCop' rrraaaarr?anaqgaaaan...rcrc
??r?nrdraaa??raa 'Zuni'
nrnaaardn?ararga?aan...?aaa??arq???????araa
42
Forcing known, shallow nodes by adding lexical
data
'Acoma rrcaaaqanrarn????aca...rcrc??r?rnn?aaa???
aa--------- 'Apurina' arnaaaaanaaaaanraaaa...rcr
rdrarraeq???d??aa--------- 'Araona'
cadddaarnraaanga?aa?...?aaar?arqrar???d????------
--- 'Arawak' ????????n????rgaa???...rcrra?a?r??
q???drr??--------- 'AwaPit' aannaaadnrana????aa
n...rdraa?rrqncarradrraa--------- 'Aymara'
dacaadrrn?araqga?aan...rcdc??arrrra????ar??------
--- 'BarasanoA ????????na?????????n...rrqan?rraae
rnra?rraa--------- 'Cahuilla' nadaadarrranargeaa
a?...rcdn?drrr??qnaa?aa??aaaaaaaaa 'CanelaKra'
anaaaaara?anaega?aaa...?nraa?r?raeaaara??aa------
--- 'Carib' rrrrraaanraraaqd?aa?...?crc??arar
arrrr?ar??--------- 'Cayuvava' rnrrdaaanaaaacgaaa
ad...rcnr??a?aae???????aa--------- 'ChinantecB nr
nrraara?arnaea?aa?...rrra??ar?aea????rr??---------
'Comanche' araaaaaanrarargaaaa?...ncrnr?arrnna??
?arr??aaaaaaaaa 'CoosHani' crcrrdrdnranaanacaaa..
.rcccn?arraeaaaa???aa--------- 'CreePlains aaraaa
aanraracgaaaaa...nccnn?rrraeaaaa?rraa--------- 'Ep
enaPedee rnrrraaanaaraanr?dar...caaaa?rrarcraara?
???--------- ...................................
................. 'UrubuKaap'
rrraaaaaraaraegaraa?...rrnaa?rrraeaaarcra??------
--- 'Warao' arraaaannaaaaqgaaaar...?naaad
ararar???arraa--------- 'Wari'
rrraaaaanarraegacnaa...dcrrndandraq??????ar------
--- 'Wichi' nrnaarqrnrarargaaaan...?ccrndr
rrnranaarrrar--------- 'Wichita'
nacaaaqanra?aaccnraa...rccr?arrrnnanra?rraa------
--- 'Yagua' araaaaaanaarr????nac...ncrrnd
rrraearrra??aa--------- 'Yaqui'
rrnraaarnrarrrgeaaan...qaaandarraeaaard??araaaaaa
aaa 'Yuchi' crdrraqdnaa?aaeqdaaa...rcrd??
rnaaer??????aa--------- 'Yurok'
drdaaaqdnrana????aaa...rcrc??a?rnqaaar???an------
--- 'ZoqueCop' rrraaaarr?anaqgaaaan...rcrc??
r?nrdraaa??raa--------- 'Zuni'
nrnaaardn?ararga?aan...?aaa??arq???????araa------
---
43
The effect of forced nodes
Black dots forced nodes gray dots correct
nodes emerging from the data
44
The effect of known ancestral states for a large
dataset
Ancestral states known
45
Ancestral states unknown
46
(No Transcript)
47
In the larger phylogenetic picture the use of
knowledge of founder effect values has a positive
effect on the classification of languages known
to be related
48
Next step verifying hypotheses by traditional
methods
49
- Fin -
  • Tomorrow a critical evaluation of Dunn et al.s
    recent paper in Science on Austronesian/Papuan
    and some vistas regarding the classification of
    the NEw World language family
Write a Comment
User Comments (0)
About PowerShow.com