Advances in Automated Language Classification ASJP Consortium Dik Bakker - PowerPoint PPT Presentation

1 / 221
About This Presentation
Title:

Advances in Automated Language Classification ASJP Consortium Dik Bakker

Description:

Distance matrices between individual languages on. the basis of linguistic features ... Lexicostatistics: mass comparison of basic lexical items, ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 222
Provided by: bak111
Category:

less

Transcript and Presenter's Notes

Title: Advances in Automated Language Classification ASJP Consortium Dik Bakker


1
Advances inAutomatedLanguageClassificationASJ
P Consortium(Dik Bakker)
2
Overview
Project (MAY 2007 - ) ASJP (Automated
Similarity Judgment Program)
3
Overview
Project ASJP (Automated Similarity Judgment
Program)
NUMBERS
LANGUAGE
4
Overview
Project ASJP (Automated Similarity Judgment
Program)
Data sources
TOOLS
5
Overview
Project ASJP (Automated Similarity Judgment
Program)
Data bases
Data sources
Results
TOOLS
6
Overview
Project ASJP (Automated Similarity Judgment
Program)
7
Overview
Project ASJP are Sören Wichmann (BRD
Netherlands) Viveka Velupillai (BRD) André
Müller (BRD) Robert Mailhammer (BRD) Hagen
Jung (BRD) Eric Holman (US) Anthony Grant
(UK) Dmitry Egorov (Russia) Pamela Brown
(US) Cecil Brown (US) Dik Bakker (UK
Netherlands)
8
Overview
Project ASJP (Automated Similarity Judgment
Program)
9
Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships
10
Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance
matrices between individual languages on
the basis of linguistic features
11
Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance
matrices between individual languages on
the basis of linguistic features Method
Lexicostatistics mass comparison of basic
lexical items,
12
Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance
matrices between individual languages on
the basis of linguistic features Method
Lexicostatistics mass comparison of basic
lexical items, extended by all relevant
data available
13
Swadesh (2440)
14
Swadesh (2440)
ASJP software
15
Swadesh (2440)
ASJP software
distance matrices
16
Swadesh (2440)
ASJP1
ASJP2
distance matrices
17
Swadesh (2440)
ASJP1
ASJP2
distance matrices
TREE SFTW
18
Swadesh (2440)
ETHN WALS EXPRT
ASJP1
ASJP2
calibration
distance matrices
TREE SFTW
STAT SFTW
19
Swadesh (2440)
ETHN WALS EXPRT
ASJP1
ASJP2
calibration
distance matrices
TREE SFTW
STAT SFTW
20
Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
ASJP1
ASJP2
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
21
Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
HIST FACTS
ASJP1
ASJP2
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
22
Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
HIST FACTS
PHON INVENT
ASJP1
ASJP2
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
23
Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
HIST FACTS
PHON INVENT
ASJP1
ASJP2
Jeff Mielke 500
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
24
Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
HIST FACTS
PHON INVENT
LOANS
ASJP1
ASJP2
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
25
Overview
OVERALL GOAL Reconstruction of Language
Relationships
26
Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals
27
Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications
28
Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages
29
Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies
30
Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon)
31
Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon) -
Experimentally find an optimal dating method
32
Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon) -
Experimentally find an optimal dating method -
Automatically detect borrowings
33
Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon) -
Experimentally find the best/optimal dating
method - Automatically detect borrowings
Today ...
34
Overview
1. The list of basic lexical items

35
Overview
1. The list of basic lexical items
2. Comparing words languages
36
Overview
1. The list of basic lexical items 2. Comparing
words languages 3. Some results genetic
proximity
37
Overview
1. The list of basic lexical items 2. Comparing
words languages 3. Some results genetic
proximity 4. On Inheritance vs
Borrowing
38
Overview
1. The list of basic lexical items 2. Comparing
words languages 3. Some results genetic
proximity 4. On Inheritance vs Borrowing 5.
Immanent extensions
39
1. The list of basic lexical items
40
Lexical items
Word list Swadesh 100 basic meanings

41
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages

42
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar

43
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed

44
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent

45
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time

46
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time - Few synonyms

47
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time - Few synonyms
?

48
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time - Few synonyms
LWT
?

49
(No Transcript)
50
Otomi from Spanish
51
Lexical items further reduction
Early analyses have shown - Most stable 40/100
item subset gives same results

52
Lexical items further reduction
  • Early analyses have shown
  • - Most stable 40/100 item subset gives same
    results
  • ? Less work


53
Lexical items further reduction
  • Early analyses have shown
  • - Most stable 40/100 item subset gives same
    results
  • ? Less work
  • ? Less missing data


54
Lexical items further reduction
  • Early analyses have shown
  • - Most stable 40/100 item subset gives same
    results
  • ? Less work
  • ? Less missing data
  • Faster processing combinatorial explosion
  • 40 100 109 lt 1010 COMPARISONS


55
Lexical items further reduction
  • Early analyses have shown
  • - Most stable 40/100 item subset gives same
    results
  • ? Less work
  • ? Less missing data
  • Faster processing combinatorial explosion
  • 40 100 109 lt 1010 COMPARISONS


56
Lexical items further reduction
Most stable SSM (R U) / (1 U)

see references

57
Lexical items further reduction
Most stable SSM (R U) / (1 U) R
mean proportion same form for SMi / genus

58
Lexical items further reduction
Most stable SSM (R U) / (1 U) R
mean proportion same form for SMi / genus U
mean proportion same form for different SMx /
genus

59
Lexical items further reduction
Most stable SSM (R U) / (1 U) R
mean proportion same form for SMi / genus U
mean proportion same form for different SMx /
genus
N.B. Ssm high correlation between families

60
Ethnologue (Goodmann-Kruskal)
WALS (Pearson)
lt Stability gt --
61
(No Transcript)
62
40 Most Stable
63
40 Most Stable
64
Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words

65
Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard

66
Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard - simple programming language
(Fortran Pascal)

67
Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard - simple programming language
(Fortran Pascal) ? Recoding to simplified
ASJPcode (only Ascii)

68
Lexical items transcription
ASJPcode

69
Lexical items transcription
ASJPcode 7 Vowels

70
Lexical items transcription
ASJPcode 7 Vowels 34 Consonants

71
Lexical items transcription
ASJPcode 7 Vowels 34 Consonants
Closest sound

72
Lexical items transcription
ASJPcode 7 Vowels 34 Consonants Operators
for Nasalization Labialization Palatalizati
on Aspiration Glottalization

73
Abaza (Caucasian) Meaning PERSON LEAF SKI
N HORN NOSE TOOTH
74
Abaza (Caucasian) Meaning IPA PERSON ????'??
???s LEAF b??? SKIN ??az? HORN ?'???
?a NOSE p?n?'a TOOTH p??
75
Abaza (Caucasian) Meaning IPA ASJPcode PERSON
????'?????s Xw3Cw"yXw3s LEAF b??? bxy3 S
KIN ??az? Cwazy HORN ?'????a Cw"3Xwa NO
SE p?n?'a p3nc"a TOOTH p?? p3c
76
Lexical items
Collected to date - Close to 2500 languages
(incl. dialects and proto)

77
Lexical items
  • Collected to date
  • - Close to 2500 languages (incl. dialects and
    proto)
  • - Mean number of items/language 35.8 (/40)


78
Lexical items
Areal distribution (not a sample!) Americas 27
Eurasia 23 Australia/PNG 18 Austronesia 15
Africa 14 Creoles 2 Artificial 1

79
Languages currently sampled
80
2. Comparing words and languages
81
Comparing words
Two strategies

82
Comparing words
Two strategies 1. ASJP rules

83
Comparing words
1. ASJP context rules

84
Comparing words
ASJP context rules a. between 2 words

85
Comparing words
ASJP context rules
SMi WORDlg1
WORDlg2

86
Comparing words
ASJP context rules (C/Vgeneral c/vspecific
X)
SMi WORDlg1
WORDlg2 R1 (V)cVcX XcVcX R2
Xc(V)c(V)cX Xc(V)c(V)cX R12 AVcvX
VcvX Ahwy R13 (V)ccVX
(V)ccVX R22 cv (CV)cv

87
Comparing words
ASJP context rules (C/Vgeneral c/vspecific
X)
SMi WORDlg1
WORDlg2 R1 (V)cVcX XcVcX R2
Xc(V)c(V)cX Xc(V)c(V)cX R12 AVcvX
VcvX Ahwy R13 (V)ccVX
(V)ccVX R22 cv
(CV)cv pattern Wlg1 UNIFIES pattern Wlg2

88
Comparing words
ASJP context rules (C/Vgeneral c/vspecific
X)
SMi WORDlg1
WORDlg2 R1 (V)cVcX XcVcX R2
Xc(V)c(V)cX Xc(V)c(V)cX R12 AVcvX
VcvX Ahwy R13 (V)ccVX
(V)ccVX R22 cv (CV)cv

89
Comparing words
ASJP context rules (C/Vgeneral c/vspecific
X)
SMi WORDlg1
WORDlg2 R1 (V)cVcX XcVcX R2
Xc(V)c(V)cX Xc(V)c(V)cX R12 AVcvX
VcvX Ahwy R13 (V)ccVX
(V)ccVX R22 cv (CV)cv

90
Comparing words
ASJP context rules (C/Vgeneral c/vspecific
X)
R1 (V)cVcX XcVcX R2
Xc(V)c(V)cX Xc(V)c(V)cX R12 AVcvX
VcvX Ahwy R13 (V)ccVX
(V)ccVX R22 cv
(CV)cv yapi opi

91
Comparing words
ASJP context rules a. between 2 words value 0
or 1

92
Comparing words
  • ASJP context rules
  • a. between 2 words
  • value 0 or 1
  • b. between 2 languages RELATEDNESS
  • (n of matching words / total pairs) 100


93
Comparing words
  • ASJP context rules
  • a. between 2 words
  • value 0 or 1
  • b. between 2 languages DISTANCE
  • LSP100 ((matching words / total pairs) 100 )


94
Comparing words
2. Levenshtein Distance

95
Comparing words
Levenshtein Distance a. between 2
words number of transformations to get from the
shorter form to the longer one (changes,
additions) min 0 / max length longest word

96
Comparing words
Levenshtein Distance a. between 2
words number of transformations to get from the
shorter form to the longer one (changes,
additions) b. between 2 languages mean LD for
total number of pairs

97
Comparing words
Two problems with simple LD

98
Comparing words
  • Two problems
  • Value depends on length of longest word


99
Comparing words
  • Two problems
  • Value depends on length of longest word
  • ? Normalize LDN ( LD / Lmax )


100
Comparing words
  • Two problems
  • Value depends on length of longest word
  • ? Normalize LDN ( LD / Lmax )
  • 2. Differences between lgs in phonological overlap


101
Comparing words
  • Two problems
  • Value depends on length of longest word
  • ? Normalize LDN ( LD / Lmax )
  • 2. Differences between lgs in phonological
    overlap
  • Eliminate background noise
  • LDND ( LDN / LDNdifferent pairs )


102
Comparing words
Levenshtein Distance a. between 2 words LDND
0 - 100 ()

103
Comparing words
Levenshtein Distance a. between 2 words LDND
0 - 100 () b. between 2
languages Mean of all LDNDs of words in common

104
Comparing languages
AGUACATEC (agu) ltgt MOCHO (mhc) MAYAN (45) gt
MAYAN GeoD97 GenD1.86 ONE xunhun -
LDND 37.4 TWO kobkabe7 R1 LDND
67.3 BONE baqbaq R3 LDND 0.0
EAR SCinCikin - LDND 67.3
WATER a7ha7 R10 LDND 37.4

105
Comparing languages
AGUACATEC (agu) ltgt MOCHO (mhc) MAYAN (45) gt
MAYAN GeoD97 GenD1.86 ONE xunhun -
LDND 37.4 TWO kobkabe7 R1 LDND
67.3 BONE baqbaq R3 LDND 0.0
EAR SCinCikin - LDND 67.3
WATER a7ha7 R10 LDND 37.4 T O T A
L LSP 58.14

106
Comparing languages
AGUACATEC (agu) ltgt MOCHO (mhc) MAYAN (45) gt
MAYAN GeoD97 GenD1.86 ONE xunhun -
LDND 37.4 TWO kobkabe7 R1 LDND
67.3 BONE baqbaq R3 LDND 0.0
EAR SCinCikin - LDND 67.3
WATER a7ha7 R10 LDND 37.4 T O T A
L LSP 58.14 LDND 51.68 (n35)

107
Comparing languages
AGUACATEC (agu) ltgt MOCHO (mhc) MAYAN (45) gt
MAYAN GeoD97 GenD1.86 ONE xunhun -
LDND 37.4 TWO kobkabe7 R1 LDND
67.3 BONE baqbaq R3 LDND 0.0
EAR SCinCikin - LDND 67.3
WATER a7ha7 R10 LDND 37.4 HIGH
CORRELATION LSP 58.14 LDND 51.68 (n35)

108
Comparing languages
HIGH CORRELATION LSP LDND

109
Comparing languages
HIGH CORRELATION LSP LDND MAYA
(n34) 0.93 INDO-EUROPEAN (n129) 0.97 AME
RINDIAN (n511) 0.59

110
Comparing languages
BEST PERFORMERS Within families 1.
EYE 0.496 2. LOUSE 0.480 3. DIE 0.469
4. BREAST 0.415 5. STONE 0.364

111
Comparing languages
BEST PERFORMERS Within families 1.
EYE 0.496 2. LOUSE 0.480 3. DIE 0.469
4. BREAST 0.415 5. STONE 0.364 Across
families 1. I 0.072 2. DIE 0.065 3.
WE 0.061 4. YOU 0.057 5. BREAST 0.057

112
Comparing languages
BEST PERFORMERS Within families 1.
EYE 0.496 2. LOUSE 0.480 3. DIE 0.469
4. BREAST 0.415 5. STONE 0.364 Across
families 1. I 0.072 2. DIE 0.065 3.
WE 0.061 4. YOU 0.057 5. BREAST 0.057

113
Comparing languages
BEST PERFORMERS Within families 1.
EYE 0.496 2. LOUSE 0.480 3. DIE 0.469
4. BREAST 0.415 5. STONE 0.364 Across
families 1. I 0.072 2. DIE 0.065 3.
WE 0.061 4. YOU 0.057 5. BREAST 0.057
- Shortness - Sound Symbolism?

114
Comparing languages
WORST PERFORMERS Within families 36.
HORN 0.107 37. SEE 0.099 38.
KNEE 0.095 39. NIGHT
0.079 40. MOUNTAIN 0.075

115
Comparing languages
WORST PERFORMERS Within families 36.
HORN 0.107 37. SEE 0.099 38.
KNEE 0.095 39. NIGHT
0.079 40. MOUNTAIN 0.075 Across
families 36. NIGHT 0.028 37. HEAR
0.027 38. HORN
0.027 39. STAR 0.024 40. KNEE
0.023

116
Comparing languages
WORST PERFORMERS Within families 36.
HORN 0.107 37. SEE 0.099 38.
KNEE 0.095 39. NIGHT
0.079 40. MOUNTAIN 0.075 Across
families 36. NIGHT 0.028 37. HEAR
0.027 38. HORN
0.027 39. STAR 0.024 40. KNEE
0.023

117
for 2440 lgs 3,000,000 ( 362 3.109
)
118
3. Genetic proximity
119
Swadesh (2440)
AJP2
distance matrices
Splits Tree
120
Swadesh (2440)
AJP2
distance matrices
Splits Tree
MEGA4
121
Swadesh (2440)
AJP2
distance matrices
Neighbour Joining
Splits Tree
MEGA4
122
ASJP
LSP
123
Correlation ETHN .325
LSP
124
Correlation ETHN .325 (n 69)
(n 34)
LSP
125
Correlation ETHN .325
More structure than ETHN
LSP
126
Correlation ETHN .325
Separation
LSP
127
Levenshtein
LDND
128
Levenshtein
Correlation ETHN .195
LDND
129
Levenshtein
Correlation ETHN .195 (LSP .325)
LDND
130
ASJP
LDND
131
cholan
ASJP
LDND
132
cholan
tzeltalan
ASJP
LDND
133
cholan
tzeltalan
ASJP
LDND
134
yucatecan
ASJP
LDND
135
ASJP
LDND
136
ASJP
LDND
137

all significant gt 0.01
138

all significant gt 0.01
139
Improving the fit
Enrich lexical with typological data

140
Swadesh (2440)
WALS (2580)

ASJP
distance matrices
TREE SFTW
141
SWALSH (2440)
ASJP
distance matrices
TREE SFTW
142
Improving the fit
Enrich lexical with typological data

143
Improving the fit
  • Enrich lexical with typological data
  • NOT 11 with ASJP languages


144
SWALSH (550)
ASJP
distance matrices
TREE SFTW
145
Improving the fit
  • Enrich lexical with typological data
  • NOT 11 with ASJP languages
  • WALS variables very unevenly spread


146
Improving the fit
  • Enrich lexical with typological data
  • NOT 11 with ASJP languages
  • WALS variables very unevenly spread
  • Maximum subset 85 most stable


147
Most stable WALS variables
148
Improving the fit
  • Enrich lexical with typological data
  • Maximum subset 85 most stable


149
Improving the fit
  • Enrich lexical with typological data
  • Maximum subset 85 most stable
  • Correlation with Swadesh 0.063 (gt 0.001)

?

150
Improving the fit
  • Enrich lexical with typological data
  • Maximum subset 85 most stable
  • Correlation with Swadesh 0.063 (gt 0.001)
  • Mantel Test 10.000 simulations


151
Improving the fit
  • Enrich lexical with typological data
  • Maximum subset 85 most stable
  • Correlation with Swadesh 0.063 (gt 0.001)
  • Mantel Test 10.000 simulations
  • best 0.050 lt gt - 0.043 (mean 0.009)


152
Improving the fit
  • Enrich lexical with typological data
  • Database 40 most stable Swadesh
  • 85 most stable WALS features


153
Improving the fit
  • Enrich lexical with typological data
  • Database 40 most stable Swadesh
  • 85 most stable WALS features
  • - Optimal weight of both?


154
Improving the fit
155
Improving the fit
156
Improving the fit
157
4. On Inheritance vs Borrowing
158
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)

159
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0

160
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0 ? 6 items lt
70.0

161
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0 ? 6 items lt
70.0 ? Genetically related !!

162
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)

163
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND15.8 STAR
estreyaestrecas LDND27.6 NIGHT
noCenoces LDND44.2 NEW
nuevonueba LDND44.2

164
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND15.8 STAR
estreyaestrecas LDND27.6 NIGHT
noCenoces LDND44.2 NEW
nuevonueba LDND44.2 ? 6 items lt 70.0

165
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND15.8 STAR
estreyaestrecas LDND27.6 NIGHT
noCenoces LDND44.2 NEW
nuevonueba LDND44.2 NOT Related
Chance?

166
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND15.8 STAR
estreyaestrecas LDND27.6 NIGHT
noCenoces LDND44.2 NEW
nuevonueba LDND44.2 NOT Related
Chance? Or Borrowing?

167
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
(12) / CHAMORRO (CHA) AUSTRONESIAN
(310) gt CHAMORROSTAR estreyaestrecas
LDND27.6

168
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA f/g 0.17/0.82 ( lt
0.70)

169
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA ltgt CHA f/g 0.17/0.82
0.00/0.00

170
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82
gt 0.00/0.00

171
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA ltgt CHA wwF

172
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA wwF 83 ( mean LDND estreya
in IE)

173
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA wwF 83-99 ( mean estreya in
AU)

174
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA ltgt CHA wwF 83-99 ltgt 102 ( mn
estrecas / AU)

175
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA ltgt CHA wwF 83-99 ltgt 102-85 (
estrecas / IE)

176
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA ltgt CHA wwF 83-99 ltgt 102-85

177
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85

178
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85
SPA ltgt CHA phwF

179
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85
SPA phwF 100.00 (phon estreya in IE / AU)

180
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85
SPAltgt CHA phwF 100.00 ltgt 0.52

(phon estrecas in AU/ IE )

181
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85
SPA gt CHA phwF 100.00 gt 0.52

182
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND27.6 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 SPA gt CHA wwF 83-99 gt 102-85
SPA gt CHA phwF 100.00 gt 0.52 SYN CHA
puti7on (f 1.00)

183
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA gt CHA f/g 0.24/0.82 gt
0.03/0.00 wwF 97-106 gt 110-97 phwF 12.00
gt 0.44

184
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROTWO dosdos LDND
0.0 SPA gt CHA f/g 0.62/1.00 gt
0.12/0.00 wwF 78-99 gt 102-78
phwF100.00 gt 0.22


185
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORRONIGHT noCenoces
LDND44.2 SPA gt CHA f/g 0.23/0.55 gt
0.04/0.00 wwF 89-100 gt 105-92 phwF 100.00
gt 0.10


186
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORRONEW nuevonueba
LDND44.2 SPA gt CHA f/g 0.50/0.64 gt
0.04/0.00 wwF 68-104 gt 105-80 phwF 4.27 gt
0.03

187
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROPERSON personapetsona
LDND15.8 SPA gt CHA f/g 0.20/0.64 gt
0.01/0.00 wwF 89-98 gt 98-90 phwF 32.40
gt 0.13 SYN CHA taotao (f 1.00)

188
Inherited or borrowed?
Further output filters

189
Inherited or borrowed?
Further output filters 1. Minimum N potential
borrowings

190
Inherited or borrowed?
Further output filters 1. Minimum N potential
borrowings 2. All in the same direction

191
Inherited or borrowed?
Further output filters 1. Minimum N potential
borrowings 2. All in the same direction 3.
Geographic information

192
Inherited or borrowed?
SPANISH (spa) INDO-EUROPEAN (128) gt ROMANCE
(12) EURASIA SPAIN VS. CHAMORRO (cha)
AUSTRONESIAN (678) gt CHAMORRO OCEANIA
GUAM GEODIST13244 GENDIST3.00

193
Spaniards in Pacific since 16th century
HIST FACTS
Swadesh (2440)
ETHN WALS EXPRT
GEO GRAPH
ASJP1
ASJP2
distance matrices
TREE SFTW
STAT SFTW
MAP SFTW
194
Inherited or borrowed?
Further output filters 1. Minimum N potential
borrowings 2. All in the same direction 3.
Geographic information 4. Role of form and
meaning (?)

195
Inherited or borrowed?
Further output filters 1. Minimum N potential
borrowings 2. All in the same direction 3.
Geographic information 4. Role of form and
meaning (?)
LWT

196
Borrowed!
BOR spa TO cha 6 (15.0) LDND 76.63
(shared40 crit70.00 - U) DATABASE unu(spa)
dos(spa) petsona(spa) estrecas(spa) n
oces(spa) nueba(spa)

197
5. Immanent extensions
198
(No Transcript)
199
GARBAGE IN ? GARBAGE OUT
200
Lexical items transcription
Second year of project (2008-9) Replace ASJP
code by full IPA representations

201
Lexical items transcription
Second year of project (2008-9) Replace ASJP
code by full IPA representations
Juliette Jeff

202
Lexical items transcription
Second year of project (2008-9) Problems with
full IPA representation solved

203
Lexical items transcription
Second year of project (2008-9) Problems with
full IPA representation solved 1.
scan/download/ full IPA representations

204
Lexical items transcription
Second year of project (2008-9) Problems with
full IPA representation solved 1.
scan/download/ full IPA representations 2.
automatic conversion IPA to integer (Python)

205
Lexical items transcription
Second year of project (2008-9) Problems with
full IPA representation solved 1.
scan/download/ full IPA representations 2.
automatic conversion IPA to integer (Python) 3.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar

206
Lexical items transcription
Abaza (Caucasian) Meaning PERSON

207
Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s

208
Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s Decimal 661,695,616,679,700,690,695,661,6
95,616,115

209
Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s Decimal 661,695,616,679,700,690,695,661,6
95,616,115 ASJPcode 88,119,126,51,67,34,121,11
9,126,88,119,126,51 115 ( Xw3Cw"yXw3s)

210
Lexical items transcription
Second year of project (2008-9) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
Why not run on full IPA??

211
Lexical items transcription
Second year of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
Caucasian correlations IPA ASJP gt 0.9

212
Lexical items transcription
Second year of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
- correlations IPA ASJP gt 0.9 - but ASJP
better fit with classifications ?
IPA too specific

213
Lexical items transcription
IPA ????'?????s Decimal 661,695,616,679,700,6
90,695,661,695,616,115 ASJPcode (
any unicode subset )
a lt- 661, 895, 416,
formal grammar

214
Lexical items transcription
IPA ????'?????s Decimal 661,695,616,679,700,6
90,695,661,695,616,115 ASJPcode (
any unicode subset )
a lt- 661, 895, 416, C -V lt- C V / -
C V lt- C -V, PL / - C V
formal grammar

215
Lexical items transcription
IPA ????'?????s Decimal 661,695,616,679,700,6
90,695,661,695,616,115 ASJPcode (
any unicode subset )
optimal level of abstraction for
historical phonological reconstruction?
a lt- 661, 895, 416, C -V lt- C V / -
C V lt- C -V, PL / - C V

216
Swadesh
Phon Invent
ETHN WALS EXP
GEO GRAPH
HIST FACTS
ASJP1
ASJP2
distance matrices
Borrowing!
TREE SFTW
STAT SFTW
MAP SFTW
217
Lexical items transcription

all significant gt 0.01
218
Lexical items transcription

all significant gt 0.01
219
  • Holman, Eric et al. (2008).
  • Advances in automated language classification.
  • In Arppe, Antti, Kaius Sinnemäki and Urpu
  • Nikanne (eds.), Quantitative Investigations in
    Theoretical
  • Linguistics, 40-43. Helsinki University of
    Helsinki.
  • - Holman et al. (forthc. 2008)
  • Explorations in automated language
    classification.
  • Folia Linguistica
  • - Brown et al. (forthc. 2008)
  • Automated Classification of the Worlds
    languages
  • A description of the method and prelimary
    results
  • Sprachtypologie und Universalienforschung
  • - Bakker et al. (2009?)
  • Using WALS for the ASJP project

220
email.eva.mpg.de./wichmann/ASJPHomePage
221
?
Write a Comment
User Comments (0)
About PowerShow.com