Quantitative approaches to language change - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Quantitative approaches to language change

Description:

2. Automated lexicostatistics: results. 3. Using typological databases for ... Traditional lexicostatistics. 1st step: determine cognates on a standard list: ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 28

Provided by: wich8

Category:

more less

Transcript and Presenter's Notes

Title: Quantitative approaches to language change

1
Quantitative approaches to language change

Søren Wichmann
MPI-EVA Leiden University

2
Overview

1. Automated lexicostatistics tools and
methods
2. Automated lexicostatistics results
3. Using typological databases for
historical linguistic research

3
Automated lexicostatistics methods

Lexicostatistics invented in the early 1950s
Recent renaissance due to two new developments
Phylogenies can more meaningfully be established
using modern computational methods developed by
bioinformaticians
Subjective determinations of cognacy can be
replaced by an objective, automated method

4
Traditional lexicostatistics 1st step determine
cognates on a standard list
Meaning Cocopa Diegeño Cognate?
fire a?á ?aw yes
nose ixú xú yes
one ?ít ?axínk no
Etc. ... ... ...
5
2nd step build a matrix of percent similarities
Cocopa Diegeño Hualapai
Havasupai ... Cocopa 0
90 77 80 Diegeño
90 0 87
75 Hualapai 77 87
0 72 Havasupai 80
75 72 0
(invented example)
3rd step find a graphic way of expressing
the similarities and interpret this
as a phylogeny
6
Fragment of matrix of similarities among Salishan
languages from Swadesh (1950)
7
Salish relations, after Swadesh (1950)
8
UPGMA tree produced in SplitsTree (UPGMA
Unweighted Pair Group Method with Arithmetic mean)
9
(No Transcript)
10
Tools for producing a tree from a similarity
matrix

Convert the similarity matrix to a distance
matrix using a spreadsheet such as Excel
Prepare an input file to your preferred
phylogenetic software using an editor such as
TextPad (free from www.textpad.com)
Run the data using phylogenetic software
SplitsTree can be recommended (free from
www.splitstree.org)
Choose the most appropriate algorithm (Neighbour
Joining recommended for distance data)
Prepare your tree for presentation using using a
tool such as the Tree Explorer of MEGA

11
Preparing the input file

Look at the example files that come with
Splitstree and imitate them. For instance this

12
(No Transcript)
13
nexus BEGIN Taxa DIMENSIONS ntax30 TAXLABELS
BellaCoola Comox etc. ... END BEGIN
distances DIMENSIONS ntax30 FORMAT
triangleLOWER diagonal
labels missing? MATRIX BellaCoola
0 Comox 80 0 etc. ... END
14
Lets do this using TextPad ?
15

Now we produce a tree from the data
Lets do that using SplitsTree,
and lets look at different algorithms
and features of the program ?

16
Illustrating the difference between UPGMA and
Neighbour Joining
17
UPGMA assumes that all members of a cluster have
the same amount of changes
18
Neighbour Joining doesnt make this assumption
19
Comparing the two trees
20

Now we prepare our tree for presentation
Lets do that using MEGA ?

21
Automating the similarity measure
Levenshtein distances the minimum number of
stepssubstitutions, insertions or deletionsthat
it takes to get from one word to another
Germ. Zunge ? Eng. tongue
tsu?? tu?? (substitution)
t??? (substitution) t??
(deletion) Or tongue ? Zunge
t?? t??? (insertion)
tu?? (substitution) tsu??
(substitution) 3 steps, so LD 3

22

There are more sophisticated versions where the
phonetic distance
between segments is taken into account, but
operating with such
fine distinction only becomes relevant for minute
dialectology.
People who have been using the more refined
approach
John Nerbonne Johan Heeringa (dialectologists,
Groningen)
Michael Cysouws course
People who have been using raw LDs
Serva Petroni (physicists, Italy)
Myself and colleagues

23
Weighting Levenshtein distances
Serva Petroni (2008) divide by the lengths of
the strings compared. Takes into account that
LDs grow with word length Colleagues and I
divide by the length of the longest string
compared and then divide by the average of LDs
among words in Swadesh lists with different
meanings. Takes into account typical word lengths
of the languages compared and accidental
similarity due to similarities in phonological
inventories
24
Comparing results for a test set Mixe-Zoquean
languages (Mexico)
Tree based on shared phonological innovations
(data from Wichmann 1995)
Tree based on automated lexicostatistics (using
Levenshtein distances)
25
So results are similar