Title: Advances in Automated Language Classification ASJP Consortium Dik Bakker
1Advances inAutomatedLanguageClassificationASJ
P Consortium(Dik Bakker)
2Overview
Project (MAY 2007 - ) ASJP (Automated
Similarity Judgment Program)
3Overview
Project ASJP (Automated Similarity Judgment
Program)
NUMBERS
LANGUAGE
4Overview
Project ASJP (Automated Similarity Judgment
Program)
Data sources
TOOLS
5Overview
Project ASJP (Automated Similarity Judgment
Program)
Data bases
Data sources
Results
TOOLS
6Overview
Project ASJP (Automated Similarity Judgment
Program)
7Overview
Project ASJP are Sören Wichmann (BRD
Netherlands) Viveka Velupillai (BRD) André
Müller (BRD) Robert Mailhammer (BRD) Hagen
Jung (BRD) Eric Holman (US) Anthony Grant
(UK) Dmitry Egorov (Russia) Pamela Brown
(US) Cecil Brown (US) Dik Bakker (UK
Netherlands)
8Overview
Project ASJP (Automated Similarity Judgment
Program)
9Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships
10Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance
matrices between individual languages on
the basis of linguistic features
11Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance
matrices between individual languages on
the basis of linguistic features Method
Lexicostatistics mass comparison of basic
lexical items,
12Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance
matrices between individual languages on
the basis of linguistic features Method
Lexicostatistics mass comparison of basic
lexical items, extended by all relevant
data available
13Swadesh (2440)
14Swadesh (2440)
ASJP software
15Swadesh (2440)
ASJP software
distance matrices
16Swadesh (2440)
ASJP1
ASJP2
distance matrices
17Swadesh (2440)
ASJP1
ASJP2
distance matrices
TREE SFTW
18Swadesh (2440)
ETHN WALS EXPRT
ASJP1
ASJP2
calibration
distance matrices
TREE SFTW
STAT SFTW
19Swadesh (2440)
ETHN WALS EXPRT
ASJP1
ASJP2
calibration
distance matrices
TREE SFTW
STAT SFTW
20Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
ASJP1
ASJP2
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
21Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
HIST FACTS
ASJP1
ASJP2
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
22Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
HIST FACTS
PHON INVENT
ASJP1
ASJP2
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
23Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
HIST FACTS
PHON INVENT
ASJP1
ASJP2
Jeff Mielke 500
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
24Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
HIST FACTS
PHON INVENT
LOANS
ASJP1
ASJP2
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
25Overview
OVERALL GOAL Reconstruction of Language
Relationships
26Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals
27Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications
28Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages
29Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies
30Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon)
31Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon) -
Experimentally find an optimal dating method
32Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon) -
Experimentally find an optimal dating method -
Automatically detect borrowings
33Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon) -
Experimentally find the best/optimal dating
method - Automatically detect borrowings
Today ...
34Overview
1. The list of basic lexical items
35Overview
1. The list of basic lexical items
2. Comparing words languages
36Overview
1. The list of basic lexical items 2. Comparing
words languages 3. Some results genetic
proximity
37Overview
1. The list of basic lexical items 2. Comparing
words languages 3. Some results genetic
proximity 4. On Inheritance vs
Borrowing
38Overview
1. The list of basic lexical items 2. Comparing
words languages 3. Some results genetic
proximity 4. On Inheritance vs Borrowing 5.
Immanent extensions
391. The list of basic lexical items
40Lexical items
Word list Swadesh 100 basic meanings
41Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages
42Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar
43Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed
44Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent
45Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time
46Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time - Few synonyms
47Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time - Few synonyms
?
48Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time - Few synonyms
LWT
?
49(No Transcript)
50Otomi from Spanish
51Lexical items further reduction
Early analyses have shown - Most stable 40/100
item subset gives same results
52Lexical items further reduction
- Early analyses have shown
- - Most stable 40/100 item subset gives same
results - ? Less work
53Lexical items further reduction
- Early analyses have shown
- - Most stable 40/100 item subset gives same
results - ? Less work
- ? Less missing data
54Lexical items further reduction
- Early analyses have shown
- - Most stable 40/100 item subset gives same
results - ? Less work
- ? Less missing data
- Faster processing combinatorial explosion
- 40 100 109 lt 1010 COMPARISONS
55Lexical items further reduction
- Early analyses have shown
- - Most stable 40/100 item subset gives same
results - ? Less work
- ? Less missing data
- Faster processing combinatorial explosion
- 40 100 109 lt 1010 COMPARISONS
56Lexical items further reduction
Most stable SSM (R U) / (1 U)
see references
57Lexical items further reduction
Most stable SSM (R U) / (1 U) R
mean proportion same form for SMi / genus
58Lexical items further reduction
Most stable SSM (R U) / (1 U) R
mean proportion same form for SMi / genus U
mean proportion same form for different SMx /
genus
59Lexical items further reduction
Most stable SSM (R U) / (1 U) R
mean proportion same form for SMi / genus U
mean proportion same form for different SMx /
genus
N.B. Ssm high correlation between families
60Ethnologue (Goodmann-Kruskal)
WALS (Pearson)
lt Stability gt --
61(No Transcript)
6240 Most Stable
6340 Most Stable
64Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words
65Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard
66Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard - simple programming language
(Fortran Pascal)
67Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard - simple programming language
(Fortran Pascal) ? Recoding to simplified
ASJPcode (only Ascii)
68Lexical items transcription
ASJPcode
69Lexical items transcription
ASJPcode 7 Vowels
70Lexical items transcription
ASJPcode 7 Vowels 34 Consonants
71Lexical items transcription
ASJPcode 7 Vowels 34 Consonants
Closest sound
72Lexical items transcription
ASJPcode 7 Vowels 34 Consonants Operators
for Nasalization Labialization Palatalizati
on Aspiration Glottalization
73Abaza (Caucasian) Meaning PERSON LEAF SKI
N HORN NOSE TOOTH
74Abaza (Caucasian) Meaning IPA PERSON ????'??
???s LEAF b??? SKIN ??az? HORN ?'???
?a NOSE p?n?'a TOOTH p??
75Abaza (Caucasian) Meaning IPA ASJPcode PERSON
????'?????s Xw3Cw"yXw3s LEAF b??? bxy3 S
KIN ??az? Cwazy HORN ?'????a Cw"3Xwa NO
SE p?n?'a p3nc"a TOOTH p?? p3c
76Lexical items
Collected to date - Close to 2500 languages
(incl. dialects and proto)
77Lexical items
- Collected to date
- - Close to 2500 languages (incl. dialects and
proto) - - Mean number of items/language 35.8 (/40)
78Lexical items
Areal distribution (not a sample!) Americas 27
Eurasia 23 Australia/PNG 18 Austronesia 15
Africa 14 Creoles 2 Artificial 1
79Languages currently sampled
802. Comparing words and languages
81Comparing words
Two strategies
82Comparing words
Two strategies 1. ASJP rules
83Comparing words
1. ASJP context rules
84Comparing words
ASJP context rules a. between 2 words
85Comparing words
ASJP context rules
SMi WORDlg1
WORDlg2
86Comparing words
ASJP context rules (C/Vgeneral c/vspecific
X)
SMi WORDlg1
WORDlg2 R1 (V)cVcX XcVcX R2
Xc(V)c(V)cX Xc(V)c(V)cX R12 AVcvX
VcvX Ahwy R13 (V)ccVX
(V)ccVX R22 cv (CV)cv
87Comparing words
ASJP context rules (C/Vgeneral c/vspecific
X)
SMi WORDlg1
WORDlg2 R1 (V)cVcX XcVcX R2
Xc(V)c(V)cX Xc(V)c(V)cX R12 AVcvX
VcvX Ahwy R13 (V)ccVX
(V)ccVX R22 cv
(CV)cv pattern Wlg1 UNIFIES pattern Wlg2
88Comparing words
ASJP context rules (C/Vgeneral c/vspecific
X)
SMi WORDlg1
WORDlg2 R1 (V)cVcX XcVcX R2
Xc(V)c(V)cX Xc(V)c(V)cX R12 AVcvX
VcvX Ahwy R13 (V)ccVX
(V)ccVX R22 cv (CV)cv
89Comparing words
ASJP context rules (C/Vgeneral c/vspecific
X)
SMi WORDlg1
WORDlg2 R1 (V)cVcX XcVcX R2
Xc(V)c(V)cX Xc(V)c(V)cX R12 AVcvX
VcvX Ahwy R13 (V)ccVX
(V)ccVX R22 cv (CV)cv
90Comparing words
ASJP context rules (C/Vgeneral c/vspecific
X)
R1 (V)cVcX XcVcX R2
Xc(V)c(V)cX Xc(V)c(V)cX R12 AVcvX
VcvX Ahwy R13 (V)ccVX
(V)ccVX R22 cv
(CV)cv yapi opi
91Comparing words
ASJP context rules a. between 2 words value 0
or 1
92Comparing words
- ASJP context rules
- a. between 2 words
- value 0 or 1
- b. between 2 languages RELATEDNESS
- (n of matching words / total pairs) 100
93Comparing words
- ASJP context rules
- a. between 2 words
- value 0 or 1
- b. between 2 languages DISTANCE
- LSP100 ((matching words / total pairs) 100 )
94Comparing words
2. Levenshtein Distance
95Comparing words
Levenshtein Distance a. between 2
words number of transformations to get from the
shorter form to the longer one (changes,
additions) min 0 / max length longest word
96Comparing words
Levenshtein Distance a. between 2
words number of transformations to get from the
shorter form to the longer one (changes,
additions) b. between 2 languages mean LD for
total number of pairs
97Comparing words
Two problems with simple LD
98Comparing words
- Two problems
- Value depends on length of longest word
99Comparing words
- Two problems
- Value depends on length of longest word
- ? Normalize LDN ( LD / Lmax )
100Comparing words
- Two problems
- Value depends on length of longest word
- ? Normalize LDN ( LD / Lmax )
- 2. Differences between lgs in phonological overlap
101Comparing words
- Two problems
- Value depends on length of longest word
- ? Normalize LDN ( LD / Lmax )
- 2. Differences between lgs in phonological
overlap - Eliminate background noise
- LDND ( LDN / LDNdifferent pairs )
102Comparing words
Levenshtein Distance a. between 2 words LDND
0 - 100 ()
103Comparing words
Levenshtein Distance a. between 2 words LDND
0 - 100 () b. between 2
languages Mean of all LDNDs of words in common
104Comparing languages
AGUACATEC (agu) ltgt MOCHO (mhc) MAYAN (45) gt
MAYAN GeoD97 GenD1.86 ONE xunhun -
LDND 37.4 TWO kobkabe7 R1 LDND
67.3 BONE baqbaq R3 LDND 0.0
EAR SCinCikin - LDND 67.3
WATER a7ha7 R10 LDND 37.4
105Comparing languages
AGUACATEC (agu) ltgt MOCHO (mhc) MAYAN (45) gt
MAYAN GeoD97 GenD1.86 ONE xunhun -
LDND 37.4 TWO kobkabe7 R1 LDND
67.3 BONE baqbaq R3 LDND 0.0
EAR SCinCikin - LDND 67.3
WATER a7ha7 R10 LDND 37.4 T O T A
L LSP 58.14
106Comparing languages
AGUACATEC (agu) ltgt MOCHO (mhc) MAYAN (45) gt
MAYAN GeoD97 GenD1.86 ONE xunhun -
LDND 37.4 TWO kobkabe7 R1 LDND
67.3 BONE baqbaq R3 LDND 0.0
EAR SCinCikin - LDND 67.3
WATER a7ha7 R10 LDND 37.4 T O T A
L LSP 58.14 LDND 51.68 (n35)
107Comparing languages
AGUACATEC (agu) ltgt MOCHO (mhc) MAYAN (45) gt
MAYAN GeoD97 GenD1.86 ONE xunhun -
LDND 37.4 TWO kobkabe7 R1 LDND
67.3 BONE baqbaq R3 LDND 0.0
EAR SCinCikin - LDND 67.3
WATER a7ha7 R10 LDND 37.4 HIGH
CORRELATION LSP 58.14 LDND 51.68 (n35)
108Comparing languages
HIGH CORRELATION LSP LDND
109Comparing languages
HIGH CORRELATION LSP LDND MAYA
(n34) 0.93 INDO-EUROPEAN (n129) 0.97 AME
RINDIAN (n511) 0.59
110Comparing languages
BEST PERFORMERS Within families 1.
EYE 0.496 2. LOUSE 0.480 3. DIE 0.469
4. BREAST 0.415 5. STONE 0.364
111Comparing languages
BEST PERFORMERS Within families 1.
EYE 0.496 2. LOUSE 0.480 3. DIE 0.469
4. BREAST 0.415 5. STONE 0.364 Across
families 1. I 0.072 2. DIE 0.065 3.
WE 0.061 4. YOU 0.057 5. BREAST 0.057
112Comparing languages
BEST PERFORMERS Within families 1.
EYE 0.496 2. LOUSE 0.480 3. DIE 0.469
4. BREAST 0.415 5. STONE 0.364 Across
families 1. I 0.072 2. DIE 0.065 3.
WE 0.061 4. YOU 0.057 5. BREAST 0.057
113Comparing languages
BEST PERFORMERS Within families 1.
EYE 0.496 2. LOUSE 0.480 3. DIE 0.469
4. BREAST 0.415 5. STONE 0.364 Across
families 1. I 0.072 2. DIE 0.065 3.
WE 0.061 4. YOU 0.057 5. BREAST 0.057
- Shortness - Sound Symbolism?
114Comparing languages
WORST PERFORMERS Within families 36.
HORN 0.107 37. SEE 0.099 38.
KNEE 0.095 39. NIGHT
0.079 40. MOUNTAIN 0.075
115Comparing languages
WORST PERFORMERS Within families 36.
HORN 0.107 37. SEE 0.099 38.
KNEE 0.095 39. NIGHT
0.079 40. MOUNTAIN 0.075 Across
families 36. NIGHT 0.028 37. HEAR
0.027 38. HORN
0.027 39. STAR 0.024 40. KNEE
0.023
116Comparing languages
WORST PERFORMERS Within families 36.
HORN 0.107 37. SEE 0.099 38.
KNEE 0.095 39. NIGHT
0.079 40. MOUNTAIN 0.075 Across
families 36. NIGHT 0.028 37. HEAR
0.027 38. HORN
0.027 39. STAR 0.024 40. KNEE
0.023
117for 2440 lgs 3,000,000 ( 362 3.109
)
1183. Genetic proximity
119Swadesh (2440)
AJP2
distance matrices
Splits Tree
120Swadesh (2440)
AJP2
distance matrices
Splits Tree
MEGA4
121Swadesh (2440)
AJP2
distance matrices
Neighbour Joining
Splits Tree
MEGA4
122ASJP
LSP
123Correlation ETHN .325
LSP
124Correlation ETHN .325 (n 69)
(n 34)
LSP
125Correlation ETHN .325
More structure than ETHN
LSP
126Correlation ETHN .325
Separation
LSP
127Levenshtein
LDND
128Levenshtein
Correlation ETHN .195
LDND
129Levenshtein
Correlation ETHN .195 (LSP .325)
LDND
130ASJP
LDND
131cholan
ASJP
LDND
132cholan
tzeltalan
ASJP
LDND
133cholan
tzeltalan
ASJP
LDND
134yucatecan
ASJP
LDND
135ASJP
LDND
136ASJP
LDND
137all significant gt 0.01
138all significant gt 0.01
139Improving the fit
Enrich lexical with typological data
140Swadesh (2440)
WALS (2580)
ASJP
distance matrices
TREE SFTW
141SWALSH (2440)
ASJP
distance matrices
TREE SFTW
142Improving the fit
Enrich lexical with typological data
143Improving the fit
- Enrich lexical with typological data
- NOT 11 with ASJP languages
144SWALSH (550)
ASJP
distance matrices
TREE SFTW
145Improving the fit
- Enrich lexical with typological data
- NOT 11 with ASJP languages
- WALS variables very unevenly spread
146Improving the fit
- Enrich lexical with typological data
- NOT 11 with ASJP languages
- WALS variables very unevenly spread
- Maximum subset 85 most stable
147Most stable WALS variables
148Improving the fit
- Enrich lexical with typological data
- Maximum subset 85 most stable
149Improving the fit
- Enrich lexical with typological data
- Maximum subset 85 most stable
- Correlation with Swadesh 0.063 (gt 0.001)
?
150Improving the fit
- Enrich lexical with typological data
- Maximum subset 85 most stable
- Correlation with Swadesh 0.063 (gt 0.001)
- Mantel Test 10.000 simulations
151Improving the fit
- Enrich lexical with typological data
- Maximum subset 85 most stable
- Correlation with Swadesh 0.063 (gt 0.001)
- Mantel Test 10.000 simulations
-
- best 0.050 lt gt - 0.043 (mean 0.009)
152Improving the fit
- Enrich lexical with typological data
- Database 40 most stable Swadesh
- 85 most stable WALS features
153Improving the fit
- Enrich lexical with typological data
- Database 40 most stable Swadesh
- 85 most stable WALS features
- - Optimal weight of both?
154Improving the fit
155Improving the fit
156Improving the fit
1574. On Inheritance vs Borrowing
158Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
159Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0
160Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0 ? 6 items lt
70.0
161Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0 ? 6 items lt
70.0 ? Genetically related !!
162Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
163Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND15.8 STAR
estreyaestrecas LDND27.6 NIGHT
noCenoces LDND44.2 NEW
nuevonueba LDND44.2
164Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND15.8 STAR
estreyaestrecas LDND27.6 NIGHT
noCenoces LDND44.2 NEW
nuevonueba LDND44.2 ? 6 items lt 70.0
165Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND15.8 STAR
estreyaestrecas LDND27.6 NIGHT
noCenoces LDND44.2 NEW
nuevonueba LDND44.2 NOT Related
Chance?
166Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND15.8 STAR
estreyaestrecas LDND27.6 NIGHT
noCenoces LDND44.2 NEW
nuevonueba LDND44.2 NOT Related
Chance? Or Borrowing?
167Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
(12) / CHAMORRO (CHA) AUSTRONESIAN
(310) gt CHAMORROSTAR estreyaestrecas
LDND27.6
168Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA f/g 0.17/0.82 ( lt
0.70)
169Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA ltgt CHA f/g 0.17/0.82
0.00/0.00
170Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82
gt 0.00/0.00
171Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA ltgt CHA wwF
172Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA wwF 83 ( mean LDND estreya
in IE)
173Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA wwF 83-99 ( mean estreya in
AU)
174Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA ltgt CHA wwF 83-99 ltgt 102 ( mn
estrecas / AU)
175Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA ltgt CHA wwF 83-99 ltgt 102-85 (
estrecas / IE)
176Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA ltgt CHA wwF 83-99 ltgt 102-85
177Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85
178Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85
SPA ltgt CHA phwF
179Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85
SPA phwF 100.00 (phon estreya in IE / AU)
180Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85
SPAltgt CHA phwF 100.00 ltgt 0.52
(phon estrecas in AU/ IE )
181Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85
SPA gt CHA phwF 100.00 gt 0.52
182Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85
SPA gt CHA phwF 100.00 gt 0.52 SYN CHA
puti7on (f 1.00)
183Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA gt CHA f/g 0.24/0.82 gt
0.03/0.00 wwF 97-106 gt 110-97 phwF 12.00
gt 0.44
184Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROTWO dosdos LDND
0.0 SPA gt CHA f/g 0.62/1.00 gt
0.12/0.00 wwF 78-99 gt 102-78
phwF100.00 gt 0.22
185Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORRONIGHT noCenoces
LDND44.2 SPA gt CHA f/g 0.23/0.55 gt
0.04/0.00 wwF 89-100 gt 105-92 phwF 100.00
gt 0.10
186Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORRONEW nuevonueba
LDND44.2 SPA gt CHA f/g 0.50/0.64 gt
0.04/0.00 wwF 68-104 gt 105-80 phwF 4.27 gt
0.03
187Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROPERSON personapetsona
LDND15.8 SPA gt CHA f/g 0.20/0.64 gt
0.01/0.00 wwF 89-98 gt 98-90 phwF 32.40
gt 0.13 SYN CHA taotao (f 1.00)
188Inherited or borrowed?
Further output filters
189Inherited or borrowed?
Further output filters 1. Minimum N potential
borrowings
190Inherited or borrowed?
Further output filters 1. Minimum N potential
borrowings 2. All in the same direction
191Inherited or borrowed?
Further output filters 1. Minimum N potential
borrowings 2. All in the same direction 3.
Geographic information
192Inherited or borrowed?
SPANISH (spa) INDO-EUROPEAN (128) gt ROMANCE
(12) EURASIA SPAIN VS. CHAMORRO (cha)
AUSTRONESIAN (678) gt CHAMORRO OCEANIA
GUAM GEODIST13244 GENDIST3.00
193Spaniards in Pacific since 16th century
HIST FACTS
Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
ASJP1
ASJP2
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
194Inherited or borrowed?
Further output filters 1. Minimum N potential
borrowings 2. All in the same direction 3.
Geographic information 4. Role of form and
meaning (?)
195Inherited or borrowed?
Further output filters 1. Minimum N potential
borrowings 2. All in the same direction 3.
Geographic information 4. Role of form and
meaning (?)
LWT
196Borrowed!
BOR spa TO cha 6 (15.0) LDND 76.63
(shared40 crit70.00 - U) DATABASE unu(spa)
dos(spa) petsona(spa) estrecas(spa) n
oces(spa) nueba(spa)
1975. Immanent extensions
198(No Transcript)
199GARBAGE IN ? GARBAGE OUT
200Lexical items transcription
Second year of project (2008-9) Replace ASJP
code by full IPA representations
201Lexical items transcription
Second year of project (2008-9) Replace ASJP
code by full IPA representations
Juliette Jeff
202Lexical items transcription
Second year of project (2008-9) Problems with
full IPA representation solved
203Lexical items transcription
Second year of project (2008-9) Problems with
full IPA representation solved 1.
scan/download/ full IPA representations
204Lexical items transcription
Second year of project (2008-9) Problems with
full IPA representation solved 1.
scan/download/ full IPA representations 2.
automatic conversion IPA to integer (Python)
205Lexical items transcription
Second year of project (2008-9) Problems with
full IPA representation solved 1.
scan/download/ full IPA representations 2.
automatic conversion IPA to integer (Python) 3.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
206Lexical items transcription
Abaza (Caucasian) Meaning PERSON
207Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s
208Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s Decimal 661,695,616,679,700,690,695,661,6
95,616,115
209Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s Decimal 661,695,616,679,700,690,695,661,6
95,616,115 ASJPcode 88,119,126,51,67,34,121,11
9,126,88,119,126,51 115 ( Xw3Cw"yXw3s)
210Lexical items transcription
Second year of project (2008-9) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
Why not run on full IPA??
211Lexical items transcription
Second year of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
Caucasian correlations IPA ASJP gt 0.9
212Lexical items transcription
Second year of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
- correlations IPA ASJP gt 0.9 - but ASJP
better fit with classifications ?
IPA too specific
213Lexical items transcription
IPA ????'?????s Decimal 661,695,616,679,700,6
90,695,661,695,616,115 ASJPcode (
any unicode subset )
a lt- 661, 895, 416,
formal grammar
214Lexical items transcription
IPA ????'?????s Decimal 661,695,616,679,700,6
90,695,661,695,616,115 ASJPcode (
any unicode subset )
a lt- 661, 895, 416, C -V lt- C V / -
C V lt- C -V, PL / - C V
formal grammar
215Lexical items transcription
IPA ????'?????s Decimal 661,695,616,679,700,6
90,695,661,695,616,115 ASJPcode (
any unicode subset )
optimal level of abstraction for
historical phonological reconstruction?
a lt- 661, 895, 416, C -V lt- C V / -
C V lt- C -V, PL / - C V
216Swadesh
Phon Invent
ETHN WALS EXP
GEO GRAPH
HIST FACTS
ASJP1
ASJP2
distance matrices
Borrowing!
TREE SFTW
STAT SFTW
MAP SFTW
217Lexical items transcription
all significant gt 0.01
218Lexical items transcription
all significant gt 0.01
219- Holman, Eric et al. (2008).
- Advances in automated language classification.
- In Arppe, Antti, Kaius Sinnemäki and Urpu
- Nikanne (eds.), Quantitative Investigations in
Theoretical - Linguistics, 40-43. Helsinki University of
Helsinki. - - Holman et al. (forthc. 2008)
- Explorations in automated language
classification. - Folia Linguistica
- - Brown et al. (forthc. 2008)
- Automated Classification of the Worlds
languages - A description of the method and prelimary
results - Sprachtypologie und Universalienforschung
- - Bakker et al. (2009?)
- Using WALS for the ASJP project
220 email.eva.mpg.de./wichmann/ASJPHomePage
221?