New approaches to language and prehistory from typology, genetics, and quantitative linguistics - PowerPoint PPT Presentation

1 / 49

About This Presentation

Title:

New approaches to language and prehistory from typology, genetics, and quantitative linguistics

Description:

Relationship between the Order of Object and Verb and the Order of Adjective and Noun ... Adjectives without Nouns. 28.1. Prefixing vs. Suffixing in ... – PowerPoint PPT presentation

Number of Views:113

Avg rating:3.0/5.0

Slides: 50

Provided by: emailE

Category:

more less

Transcript and Presenter's Notes

Title: New approaches to language and prehistory from typology, genetics, and quantitative linguistics

1
New approaches to language andprehistory from
typology, genetics,and quantitative linguistics
Søren Wichmann MPI-EVA Leiden University
2
Lecture III The utility of phylogenetic
algorithms and software
3
Software used
PHYLIP (http//evolution.genetics.washington.edu/p
hylip.html) Splitstree (www.splitstree.org) PAUP
(www.sinauer.com) MrBayes (www.mrbayes.net) TreeVi
ew (http//taxonomy.zoology.gla.ac.uk/rod/treeview
.html)
4
Selecting or weighting features
5
Selecting or weighting features

Assumption-1 The feature value which is most
favored in a given genus is the one that should
be reconstructed for the proto-language of the
genus.

6
Selecting or weighting features

Assumption-1 The feature value which is most
favored in a given genus is the one that should
be reconstructed for the proto-language of the
genus.
Assumption-2 The better represented the most
favored feature value in a given genus is, the
more stable that feature may be assumed to be.

7
Selecting or weighting features

Assumption-1 The feature value which is most
favored in a given genus is the one that should
be reconstructed for the proto-language of the
genus.
Assumption-2 The better represented the most
favored feature value in a given genus is, the
more stable that feature may be assumed to be.
Strategy study the distribution of values of a
given feature for each genus and then calculate
an average of how well represented the best
represented value is throughout all genera in the
WALS sample.

8
Selecting or weighting features

Assumption-1 The feature value which is most
favored in a given genus is the one that should
be reconstructed for the proto-language of the
genus.
Assumption-2 The better represented the most
favored feature value in a given genus is, the
more stable that feature may be assumed to be.
Strategy study the distribution of values of a
given feature for each genus and then calculate
an average of how well represented the best
represented value is throughout all genera in the
WALS sample.
Problem how are we to compare the stability of
features when three variables are involved the
number of occurrences of the best represented
feature value, the number of possible feature
values, and the number of languages for which the
feature is attested in the WALS sample?

9
Selecting or weighting features (cont.)

Exemplification of problem
How do we compare the stability of the two
following features in Germanic given the
variables indicated?

10
Solving such problems by handa simple example

k (number of possible values) 2 (a and b)
n (number of languages) 4
r (number of times that the best represented
feature occurs)
Distributional possibilities
r
aaaa 4
bbbb 4
aaab 3
aaba 3
abaa 3
baaa 3
abbb 3
babb 3
bbab 3
bbba 3
aabb 2
abab 2

11
Solving such problems by handa simple example

k (number of possible values) 2 (a and b)
n (number of languages) 4
r (number of times that the best represented
feature occurs)
Distributional possibilities
Probabilities
r r probability
aaaa 4 4 2/16
bbbb 4 3 8/16
aaab 3 2 6/16
aaba 3
abaa 3
baaa 3
abbb 3
babb 3
bbab 3
bbba 3
aabb 2
abab 2

12
A formula for calculating the probability or
p-value for any set of (n, k, r)
k number of possible values n number of
languages r number of times that the best
represented feature occurs
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
A sample of 5 pairs of related languages for
testing different methods (selection criterion
best documented genealogical pairs in WALS
dataset) Athapaskan Slave Navajo Chibch
an Ika Rama Uto-Aztecan Yaqui Comanch
e Oto-Manguean Chalcatongo Mixtec Lealao
Chinantec Carib Hixkaryana Macushi
21
Neighbour-Joining using the 17 highest-ranking
features
22
Neighbour-Joining using the 17 lowest-ranking
features
23
A Neighbour Net representation (using SplitsTree)
24
Maximum-Parsimony analysis using Paup
25
Bayesian analysis using MrBayes
26
Tree generated by (presumed) knowledge of
ancestral states, using 23 informative Native
American founder features
27
Effects of using p-values for weighting
Method Branch and Bound Bootstrap Search
(Bootstrap 50 majority-rule consensus tree) No
Matrices, Equal Weighting /----------------------
-------------------------------------------------
Slave(1) --------------------------------------
--------------------------------- Navajo(2)

/------------------------ Yaqui(3)
/----------91----------
\------------------
------ Comanche(4)
------------------------------
----------------- NezPerce(5)
-------------------
---------------------------- HanisCoos(6) \-------
---96-----------
/------------------------
ChalMixtec(7)
----------85----------
\------------------------
LealChinantec(8)

/------------------------ Hixkaryana(9)
\----------54----------

\------------------------ Macushi(10)

28
Effects of using p-values for weighting
Method Branch and Bound Bootstrap Search
(Bootstrap 50 majority-rule consensus tree) No
Matrices, Equal Weighting /----------------------
-------------------------------------------------
Slave(1) --------------------------------------
--------------------------------- Navajo(2)

/------------------------ Yaqui(3)
/----------91----------
\------------------
------ Comanche(4)
------------------------------
----------------- NezPerce(5)
-------------------
---------------------------- HanisCoos(6) \-------
---96-----------
/------------------------
ChalMixtec(7)
----------85----------
\------------------------
LealChinantec(8)

/------------------------ Hixkaryana(9)
\----------54----------

\------------------------ Macushi(10)
No Matrices, PValue /--------------------------
---------------------------------------------
Slave(1) --------------------------------------
--------------------------------- Navajo(2)

/------------------------ Yaqui(3)
/----------87----------
\------------------
------ Comanche(4)
------------------------------
----------------- NezPerce(5)
-------------------
---------------------------- HanisCoos(6) \-------
---94-----------
/------------------------
ChalMixtec(7)
----------90----------
\------------------------
LealChinantec(8)

/------------------------ Hixkaryana(9)
\----------60----------

\------------------------ Macushi(10)
29
An additional method for enhancing phylogenetic
signals step matrices
Step matrices specify how many steps a language
has to pass through to get from one feature value
to another. The number of steps feeds into the
calculation of the most parsimonious tree.
30
An additional method for enhancing phylogenetic
signals step matrices
Step matrices specify how many steps a language
has to pass through to get from one feature value
to another. The number of steps feeds into the
calculation of the most parsimonious tree. A
simple example THE VELAR NASAL (WALS feature
no. 9)
31
A consideration one step is harder to take if
the value is universally rare, and easier if the
value is universally common.
32

A consideration one step is harder to take if
the value is universally rare, and easier if the
value is universally common.
We stipulate for the extreme cases that
going to a feature value shared by 100 of all
languages in
the sample is a non-step, i.e. it should
subtract one
step from the step matrix.

A consideration one step is harder to take if
the value is universally rare, and easier if the
value is universally common.
We stipulate for the extreme cases that
going to a feature value shared by 100 of all
languages in
the sample is a non-step, i.e. it should
subtract one
step from the step matrix.
going to a value shared by (100/v) of all
languages
(where v the number of values) should
neither add to or
detract from the number of steps.

A consideration one step is harder to take if
the value is universally rare, and easier if the
value is universally common.
We stipulate for the extreme cases that
going to a feature value shared by 100 of all
languages in
the sample is a non-step, i.e. it should
subtract one
step from the step matrix
going to a value shared by (100/v) of all
languages
(where v the number of values) should
neither add to or
detract from the number of steps
going to a value that none of all languages have
adds one
step extra to the matrix

A consideration one step is harder to take if
the value is universally rare, and easier if the
value is universally common.
We stipulate for the extreme cases that
going to a feature value shared by 100 of all
languages in
the sample is a non-step, i.e. it should
subtract one
step from the step matrix
going to a value shared by (100/v) of all
languages
(where v the number of values) should
neither add to or
detract from the number of steps
going to a value that none of all languages have
adds one
step extra to the matrix
These stipulations allow us to set up a formula
to modify the steps in the matrix, taking into
account world-wide distributions.

36
The formula is of a polynomial nature and has the
following shape s steps added or detracted
(max 1, min -1) w world-wide distribution
(percent of all languages in the sample) v
number of feature values s (v(v 2)/(v
1))w2 ((v2 2)/(v - 1))w 1
37
Returning to the example of the velar nasal, we
find that the world-wide distribution in the WALS
sample is No velar nasal 234 ? w 50.0 ? s
-0.38 Non-initial only 88 ? w 18.8 ? s
0.39 Initial and non-initial 146 ? w 31.2 ? s
0.05
Thus the step matrix should be modified as
follows
Original
38
Returning to the example of the velar nasal, we
find that the world-wide distribution in the WALS
sample is No velar nasal 234 ? w 50.0 ? s
-0.38 Non-initial only 88 ? w 18.8 ? s
0.39 Initial and non-initial 146 ? w 31.2 ? s
0.05
Thus the step matrix should be modified as
follows
Original
Modified
39
Effects of using step matrices for computing
genealogical trees
40
What happens when more data is added? (63
languages, 96 features)
41
Forcing known, shallow nodes by adding lexical
data
'Acoma rrcaaaqanrarn????aca...rcrc??r?rnn?aaa???
aa 'Apurina' arnaaaaanaaaaanraaaa...rcrrdrarraeq
???d??aa 'Araona' cadddaarnraaanga?aa?...?aaar
?arqrar???d???? 'Arawak' ????????n????rgaa???..
.rcrra?a?r??q???drr?? 'AwaPit'
aannaaadnrana????aan...rdraa?rrqncarradrraa 'Ayma
ra' dacaadrrn?araqga?aan...rcdc??arrrra????ar??
'BarasanoA ????????na?????????n...rrqan?rraaernr
a?rraa 'Cahuilla' nadaadarrranargeaaa?...rcdn?dr
rr??qnaa?aa?? 'CanelaKra' anaaaaara?anaega?aaa...
?nraa?r?raeaaara??aa 'Carib'
rrrrraaanraraaqd?aa?...?crc??arararrrr?ar?? 'Cayu
vava' rnrrdaaanaaaacgaaaad...rcnr??a?aae???????aa
'ChinantecB nrnrraara?arnaea?aa?...rrra??ar?aea?
???rr?? 'Comanche' araaaaaanrarargaaaa?...ncrnr?a
rrnna???arr?? 'CoosHani' crcrrdrdnranaanacaaa...r
cccn?arraeaaaa???aa 'CreePlains aaraaaaanraracgaa
aaa...nccnn?rrraeaaaa?rraa 'EpenaPedee rnrrraaana
araanr?dar...caaaa?rrarcraara???? ..............
............................. 'UrubuKaap'
rrraaaaaraaraegaraa?...rrnaa?rrraeaaarcra?? 'Wara
o' arraaaannaaaaqgaaaar...?naaadararar???
arraa 'Wari' rrraaaaanarraegacnaa...dcrr
ndandraq??????ar 'Wichi'
nrnaarqrnrarargaaaan...?ccrndrrrnranaarrrar 'Wich
ita' nacaaaqanra?aaccnraa...rccr?arrrnnanra?r
raa 'Yagua' araaaaaanaarr????nac...ncrrnd
rrraearrra??aa 'Yaqui' rrnraaarnrarrrgeaa
an...qaaandarraeaaard??ar 'Yuchi'
crdrraqdnaa?aaeqdaaa...rcrd??rnaaer??????aa 'Yuro
k' drdaaaqdnrana????aaa...rcrc??a?rnqaaar
???an 'ZoqueCop' rrraaaarr?anaqgaaaan...rcrc
??r?nrdraaa??raa 'Zuni'
nrnaaardn?ararga?aan...?aaa??arq???????araa
42
Forcing known, shallow nodes by adding lexical
data
'Acoma rrcaaaqanrarn????aca...rcrc??r?rnn?aaa???
aa--------- 'Apurina' arnaaaaanaaaaanraaaa...rcr
rdrarraeq???d??aa--------- 'Araona'
cadddaarnraaanga?aa?...?aaar?arqrar???d????------
--- 'Arawak' ????????n????rgaa???...rcrra?a?r??
q???drr??--------- 'AwaPit' aannaaadnrana????aa
n...rdraa?rrqncarradrraa--------- 'Aymara'
dacaadrrn?araqga?aan...rcdc??arrrra????ar??------
--- 'BarasanoA ????????na?????????n...rrqan?rraae
rnra?rraa--------- 'Cahuilla' nadaadarrranargeaa
a?...rcdn?drrr??qnaa?aa??aaaaaaaaa 'CanelaKra'
anaaaaara?anaega?aaa...?nraa?r?raeaaara??aa------
--- 'Carib' rrrrraaanraraaqd?aa?...?crc??arar
arrrr?ar??--------- 'Cayuvava' rnrrdaaanaaaacgaaa
ad...rcnr??a?aae???????aa--------- 'ChinantecB nr
nrraara?arnaea?aa?...rrra??ar?aea????rr??---------
'Comanche' araaaaaanrarargaaaa?...ncrnr?arrnna??
?arr??aaaaaaaaa 'CoosHani' crcrrdrdnranaanacaaa..
.rcccn?arraeaaaa???aa--------- 'CreePlains aaraaa
aanraracgaaaaa...nccnn?rrraeaaaa?rraa--------- 'Ep
enaPedee rnrrraaanaaraanr?dar...caaaa?rrarcraara?
???--------- ...................................
................. 'UrubuKaap'
rrraaaaaraaraegaraa?...rrnaa?rrraeaaarcra??------
--- 'Warao' arraaaannaaaaqgaaaar...?naaad
ararar???arraa--------- 'Wari'
rrraaaaanarraegacnaa...dcrrndandraq??????ar------
--- 'Wichi' nrnaarqrnrarargaaaan...?ccrndr
rrnranaarrrar--------- 'Wichita'
nacaaaqanra?aaccnraa...rccr?arrrnnanra?rraa------
--- 'Yagua' araaaaaanaarr????nac...ncrrnd
rrraearrra??aa--------- 'Yaqui'
rrnraaarnrarrrgeaaan...qaaandarraeaaard??araaaaaa
aaa 'Yuchi' crdrraqdnaa?aaeqdaaa...rcrd??
rnaaer??????aa--------- 'Yurok'
drdaaaqdnrana????aaa...rcrc??a?rnqaaar???an------
--- 'ZoqueCop' rrraaaarr?anaqgaaaan...rcrc??
r?nrdraaa??raa--------- 'Zuni'
nrnaaardn?ararga?aan...?aaa??arq???????araa------
---
43
The effect of forced nodes
Black dots forced nodes gray dots correct
nodes emerging from the data
44
The effect of known ancestral states for a large
dataset
Ancestral states known
45
Ancestral states unknown
46
(No Transcript)
47
In the larger phylogenetic picture the use of
knowledge of founder effect values has a positive
effect on the classification of languages known
to be related
48
Next step verifying hypotheses by traditional
methods
49
- Fin -

Tomorrow a critical evaluation of Dunn et al.s
recent paper in Science on Austronesian/Papuan
and some vistas regarding the classification of
the NEw World language family

Write a Comment

User Comments (0)