Title: The new book: Polyakov V'N' Solovyev V'D' COMPUTER METHODS AND MODELS IN TYPOLOGY AND HISTORICAL LIN
1The new bookPolyakov V.N. Solovyev V.D.
COMPUTER METHODS AND MODELS IN TYPOLOGY AND
HISTORICAL LINGUISTICSKazan KSU, 2006.
Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
2Abstract
- In the monography the issues received as a result
of application of computer and mathematical
methods to the analysis of the Database
Languages of the World are presented in the
regular form . The Database contains the
description of 315 languages, basically, Europe,
Northern and the Central Asia on more, than to
3800 features. Computer methods which allow to
estimate degree of similarity of structure of
languages are described. These methods can be
applied to specification of genetic
classification. The mathematical model of
distribution of language features is constructed.
The monography can be of interest as for
linguists of various specialities (typology,
historical linguistics), and for mathematicians
and experts in computer technologies, who
interested by application of mathematical methods
in the humanities.
Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
3Contents (part 1)
- Chapter 3. THE DATABASE SOFTWARE
- 3.1. The primary goals solved by means of the
kernel of the DB - The 3.2.Windows-version of the kernel of the DB
- 3.3. The DB installation
- 3.4. Work with the Database Languages of the
World - 3.5. Pair comparison of languages at the level of
the classes and at the level of the whole essay - 3.6. Search of lines in essay
- 3.7. How to work with model and the language
essay - 3.7.1. Commands of the main menu
- 3.7.2. Export of language essays in a DB from a
file - 3.7.3. Import of language essays in a file
- 3.8. Navigation within the model and within the
essays - 3.9. Development of the Web-version of the DB
- 3.9.1. The site viewing
- 3.9.2. Viewing of the essays of language
- 3.9.3. Comparison of essays
- 3.10. DB examination
- 3.11. DB destination
Introduction Chapter 1 TYPOLOGY AND
CLASSIFICATION OF LANGUAGES 1.1. Ways of
classification of languages 1.2. Correlations
between various classifications 1.3. Mathematical
methods 1.4. Databases 1.5. The resume under
chapter 1 Chapter 2 THE DATABASE LANGUAGES OF
THE WORLD 2.1 History of the DB creations 2.2.
Characteristics of a content of the Database
Languages of the World 2.3. The principles of
the Database Languages of the World 2.3.1. A
binary principle 2.3.2. A hierarchical
principle 2.3.3. A paradigm principle 2.3.4. A
data presentation format pros and cons 2.4.
Characteristics of the model of the language
essay 2.5. The resume under chapter 2
4Contents (part 2)
- Chapter 4. NEW POSSIBILITIES OF QUANTITATIVE
RESEARCHES - 4.1. The calculation of the measures of pair
similarity of languages - 4.1.1. Approaches to the calculation of the
similarity measures - 4.1.2. Taking in account of the structure and
volume of feature space - 4.1.3. A technique of evaluation of calculations
- 4.1.4. Some results of calculations of similarity
measures - 4.1.5. Preliminary discussions of results
- 4.2. Clasterization
- 4.2.1. The general information on the problem of
clasterization - 4.2.2. Clasterization of languages
- 4.2.3. Clasterization of features
- 4.2.4. Evaluation of results by the method Data
Splitting - 4.2.5. Architecture of the software system
- 4.2.6. Revealing of possible errors in data
- 4.2.7. Calculation of frequencies of occurrence
of features in the chosen group of languages - 4.3. The intellectual analysis of data
- 4.4. Discussion of preliminary results in the
field of clasterization and intellectual analysis
of data - 4.5. The resume under chapter 4
Chapter 5. MODEL OF STRUCTURAL EVOLUTION OF
LANGUAGES 5.1. The general reasons 5.2.
Borrowings vs. Parallel evolution 5.3. Diagrams
of feature distribution 5.4. The analysis of
mathematical model of dynamics of features 5.5.
The resume under chapter 5
5Contents (part 3)
Bibliography Appendix A. Table ?.1. A genetic
accessory of languages of a DB Appendix ?. Table
?.1. The list of the classifying features
presented in the DB Appendix B. Example of essay
(Swedish language) Appendix ?. Results of
comparison of essays of Danish and Swedish
languages Appendix ?. The special teaching
course Databases For Researches In Language
Typology And Historical Linguistics Appendix
E. Table ?.1. Quantitative card of features for
section 2.5.4 Compound and Complex Sentence for
IE-languages Appendix ?. Dynamics of statistical
universal features Appendix ?. Distribution of 11
dead languages used in one of calculations on
time axis Appendix ?. The symbol index Appendix
?. The short glossary Appendix ?. The
personality index
- Chapter 6. DEVELOPING OF NEW METHODS OF
VERIFICATION OF GENETIC HYPOTHESES - 6.1. Calculation quantitative portraits of
language families and branches - 6.2. Classification of features
- 6.3. An example of quantitative portrait of
IE-languages (section "Compound and Complex
Sentence) - 6.4. Automatic detection of genetically
significant features - 6.5. A technique of verification of genetic
hypotheses - 6.6. The resume under chapter 6
- Chapter 7. THE PHENOMENON OF TYPOLOGICAL SHIFT
- 7.1. The nature of the language universal
features - 7.2. The nature of the language rare features
- 7.3. A phenomenon of typological shift
- 7.4. The resume under chapter 7
- THE CONCLUSION
6Chapter 1 TYPOLOGY AND CLASSIFICATION OF
LANGUAGES
- In chapter 1 of the monography the description of
the basic approaches of different classification
of languages are presented, the main results
received in recent works with application of
mathematical methods are mentioned, and the basic
problems standing in this feield are listed. - The resume under chapter 1
- Languages can be classified on the various bases
on an origin generality (genetic classification),
on a territorial proximity of zones of
distribution (areal), on a similarity of
structural properties (typological). - Between these classifications there are
significant correlations, however there are no
rigid dependences. - In genetic classification of languages
considerable successes are reached by application
of a comparative-historical method, for the basic
language families protolanguages are
reconstructed. - Possibly, comparative-historical method in the
classical form has practically settled its
possibilities reconstruction on depth is
considered the big 10 thousand years impossible,
many delicate questions remain without the
answer. - With introduction in a scientific turn all new
linguistic data and creation of extensive
databases essentially new possibilities of
researches with application of mathematical and
computer methods have appeared.
Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
7Chapters 2 THE DATABASE LANGUAGES OF THE WORLD
and Chapter 3 THE DATABASE SOFTWARE
- In the second and third chapters the detailed
description of structure of the Database
Languages of the World and toolkit accompanying
it are given. Methods of work with the DB are
described. - The resume under chapter 2
- The Database Languages of the World contains
detailed descriptions almost all languages of
Europe, Northern and Central Asia and is one of
two largest typological databases in the world. - The architecture of the DB is focused on support
of mathematical methods of researches. - In the field of language typology and historical
linguistics there is a number of unresolved
problems and questions. The Database and
connected mathematical models and methods could
be potentially applied to resolve them. - The resume under chapter 3
- The complex of software is developed, allowing to
input and to edit the description of languages in
the used format, search and comparison of
languages. - The DB is accessible in various variants DOS,
Windows, Web and Excel-versions exist. - The DB can be used for research, educational and
reference purposes.
Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
8Chapter 4 NEW POSSIBILITIES OF QUANTITATIVE
RESEARCHES IN THE FIELD OF LANGUAGE TYPOLOGY AND
HISTORICAL LINGUISTICS
- In the fourth chapter the key mathematical
apparatus for the developed approach for
calculation of measures of language similarity is
described. In traditional typological researches
languages are classified on the basis of a small
number of features. The offered measures of
similarity use all available data set and the
strict mathematical apparatus. It gives new
prospect in classification of languages. However
the choice of an adequate measure of similarity
in itself has appeared as a challenge. In the
chapter the description of the general
theoretical apparatus and a technique of a choice
of the suitable measure of similarity is given.
For calculation of measures of similarity and
classification of languages the wide spectrum of
methods is used. Also ordination of languages by
means of statistical package R is applied. - The resume under chapter 4
- The measure of similarity of languages is a base
parameter which can be useful to the decision of
many classification problems. However a choice
from continual set of different measures of a
concrete measure of similarity which would be
adequate to studied object is a difficult
independent problem. - The general principles of construction of
demanded measures of similarity are developed and
by means of numerical experiments the group of
typical measures is analysed. - The software is created, allowing to build
classifications of languages by means of strict
methods of the cluster analysis. - Possibilities of application of other methods of
the mathematical analysis are shown. Ordination
by means of the package of statistical researches
R is to be seemed as the most perspective from
this group of methods.
Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
9Chapter 5 MODEL OF STRUCTURAL EVOLUTION OF
LANGUAGES
- In the fifth chapter the mathematical model of
distribution of language features is described.
There is a number of the postulates in its
grounds that was formulated after the analysis of
the DB. They correlate well with postulates of
evolution of lexicon (formulated by Morris
Swadesh). The new concept of LF-diagrams
(reflecting dependence of number of features on a
degree of their prevalence) is formulated. It was
appeared to be a powerful tool of the analysis of
mechanisms of borrowing. - The resume under chapter 5
- The values of language features collected in a DB
can be analyzed by many various ways. - The analysis of distribution of the features
existing exactly in two languages, has allowed to
reveal defining character of borrowing for
evolution of languages. - The offered method allows to study contacts
between the language groups, taking place during
evolution, by means of strict mathematical
methods. - The LF-diagrams describing dependence of number
of features from degree of their prevalence are
entered. It is absolutely new interesting
linguistic object of the researches which have
appeared only after creation of the DB. - The mathematical model of distribution of the
features, explaining appearance of LF-diagrams is
constructed. The offered model confirms recently
stated hypothesis (Dahl Ö. 2004) about a
competition of young features. - The model allows to analyze dynamics of features
in the whole DB and within separate language
groups. The received results well correlate with
earlier received classical methods.
Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
10Chapter 6 DEVELOPING OF NEW METHODS OF
VERIFICATION OF GENETIC HYPOTHESES
- In the sixth chapter a new methodology of the
analysis of genetic reletionship of the languages
are entered . It is based on frequencies of
features. Earlier in works on typology such terms
as unical feature, statistical universal feature
were used. In thesis for a doctor's degree of E.
Yaroslavceva the concept of rare feature is
entered. These terms are specified with reference
to use in the DB and for objectives of this
research. - In the same chapter application of the offered
methods to check of hypotheses about genetic
relationship of languages is shown. - The resume under chapter 6
- Classification of features on the frequency of
the occurrence, specifying earlier offered by
E.Yaroslavceva is proposed. - The technique of application of the entered
apparatus for automatic revealing of genetic
markers (the features having strong correlation
with relationship of languages) is described. - The new method of an establishment of a
generality of an origin of the languages, based
on comparison of rare features is offered. - The method is verified by its evaluation on
groups of languages with precisely established
relationship (Romance and German languages, the
Altay macro-family). - The method is applied to several languages
isolates (Sumerian, Nivkh). The received results
confirm some of earlier stated hypotheses.
Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
11Chapter 7 THE PHENOMENON OF TYPOLOGICAL SHIFT
- In the seventh chapter numerical data on modern
and extinct languages are compared. Developed
above the techniques allows to reveal global
tendencies of language development. In
particular, the phenomenon of typological shift
consisting in increase of frequency of
high-frequency features and washing away of
low-frequency is revealed. - The resume under chapter 7
- Quantitative methods can be most effectively
applied to the description of macroevolution of
languages and language features. - Presence of enough full descriptions of 52
extinct languages allows to fullfill regular
comparison of distribution of features in the
languages existing on the average 2 thousand
years ago, and modern. It gives the new tool of
the analysis of language evolution. - The phenomenon named typological shift is
described. It means that for the specified
interval of time frequency features
(statistically universal) became even more
frequency, and rare features have undergone to
washing away.
Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
12CONCLUSION
- In the book new techniques of quantitative
researches on the material of DB Languages of
the world are considered. The book describes the
methods of calculation of pair measures of
similarity, methods of the cluster analysis and
the intellectual analysis of data. - Quantitative portraits have been calculated under
the genetic index of the DB. It was possible to
reveal new statistical regularities of evolution
of languages by means of these calculations . The
phenomenon of typological shift on the Euroasian
space is found out. It also explains presence of
the strong typological background interfering
revealing of genetic similarity of languages by
direct computer methods. - The facts and laws described have allowed to look
in a new fashion at the phenomenon of the
language variety and to construct new model of
structural evolution of languages. - The new technique of verification of the genetic
hypotheses, based on using of rare features has
been offered. The technique gives good conformity
with traditional representations for languages of
the Romance-German group, the Altay macro-family.
Data in support of genetic relationship of
Sumerian and Semitic languages, Nivkh and
Chukotko-Kamchatkan have been got. The
relationship of Old Japanese to the Altay
macro-family has not proved to be true. - As a whole it is possible to tell, that the
Database Languages of the World is a valuable
computer linguistic resource, and its value for
scientific community will increase in due course
only.
Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
13POST SCRIPTUMThe review of the new results
received after the book exit
- Regarding a content
- Descriptions of 3 Baltic languages and language
of Basques are formulated, results of examination
of 34 languages are obtained. - In the field of software products
- The new version of a site is developed and placed
on the Internet - The new interface concept of the DB is developed,
the prototype model of the reference and
educational version of the DB is executed. - In the part of quantitative researches
- The new measures of similarity having an
indicator of quality on 10 above, described in
the book, are created. - The technique of use of methods of philogeny on
the DB is mastered, genetic trees are received. - Parameters of typological shift are specified.
- Genetic markers on the families of Eurasia are
revealed.
Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
14PPS. FUTURE DIRECTIONS
- Cooperation with WALS techniques
- Issue of version of DB for English-speaking
scientist - Improving all products
- Issue of reference book on content of the DB
- Verifying of main genetic hypothesis and
information about ancient areal contacts between
languages of Eurasia on the material of the DB - New similarity measures discovering (on division
of areal and genetic markers) - and so on .
15- Thank you for attention !