The new book: Polyakov V'N' Solovyev V'D' COMPUTER METHODS AND MODELS IN TYPOLOGY AND HISTORICAL LIN - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

The new book: Polyakov V'N' Solovyev V'D' COMPUTER METHODS AND MODELS IN TYPOLOGY AND HISTORICAL LIN

Description:

The monography can be of interest as for linguists of various specialities ... of quantitative portrait of IE-languages (section 'Compound and Complex Sentence' ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 16
Provided by: vova5
Category:

less

Transcript and Presenter's Notes

Title: The new book: Polyakov V'N' Solovyev V'D' COMPUTER METHODS AND MODELS IN TYPOLOGY AND HISTORICAL LIN


1
The new bookPolyakov V.N. Solovyev V.D.
COMPUTER METHODS AND MODELS IN TYPOLOGY AND
HISTORICAL LINGUISTICSKazan KSU, 2006.
Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
2
Abstract
  • In the monography the issues received as a result
    of application of computer and mathematical
    methods to the analysis of the Database
    Languages of the World are presented in the
    regular form . The Database contains the
    description of 315 languages, basically, Europe,
    Northern and the Central Asia on more, than to
    3800 features. Computer methods which allow to
    estimate degree of similarity of structure of
    languages are described. These methods can be
    applied to specification of genetic
    classification. The mathematical model of
    distribution of language features is constructed.
    The monography can be of interest as for
    linguists of various specialities (typology,
    historical linguistics), and for mathematicians
    and experts in computer technologies, who
    interested by application of mathematical methods
    in the humanities.

Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
3
Contents (part 1)
  • Chapter 3. THE DATABASE SOFTWARE
  • 3.1. The primary goals solved by means of the
    kernel of the DB
  • The 3.2.Windows-version of the kernel of the DB
  • 3.3. The DB installation
  • 3.4. Work with the Database Languages of the
    World
  • 3.5. Pair comparison of languages at the level of
    the classes and at the level of the whole essay
  • 3.6. Search of lines in essay
  • 3.7. How to work with model and the language
    essay
  • 3.7.1. Commands of the main menu
  • 3.7.2. Export of language essays in a DB from a
    file
  • 3.7.3. Import of language essays in a file
  • 3.8. Navigation within the model and within the
    essays
  • 3.9. Development of the Web-version of the DB
  • 3.9.1. The site viewing
  • 3.9.2. Viewing of the essays of language
  • 3.9.3. Comparison of essays
  • 3.10. DB examination
  • 3.11. DB destination

Introduction Chapter 1 TYPOLOGY AND
CLASSIFICATION OF LANGUAGES 1.1. Ways of
classification of languages 1.2. Correlations
between various classifications 1.3. Mathematical
methods 1.4. Databases 1.5. The resume under
chapter 1 Chapter 2 THE DATABASE LANGUAGES OF
THE WORLD 2.1 History of the DB creations 2.2.
Characteristics of a content of the Database
Languages of the World 2.3. The principles of
the Database Languages of the World 2.3.1. A
binary principle 2.3.2. A hierarchical
principle 2.3.3. A paradigm principle 2.3.4. A
data presentation format pros and cons 2.4.
Characteristics of the model of the language
essay 2.5. The resume under chapter 2
4
Contents (part 2)
  • Chapter 4. NEW POSSIBILITIES OF QUANTITATIVE
    RESEARCHES
  • 4.1. The calculation of the measures of pair
    similarity of languages
  • 4.1.1. Approaches to the calculation of the
    similarity measures
  • 4.1.2. Taking in account of the structure and
    volume of feature space
  • 4.1.3. A technique of evaluation of calculations
  • 4.1.4. Some results of calculations of similarity
    measures
  • 4.1.5. Preliminary discussions of results
  • 4.2. Clasterization
  • 4.2.1. The general information on the problem of
    clasterization
  • 4.2.2. Clasterization of languages
  • 4.2.3. Clasterization of features
  • 4.2.4. Evaluation of results by the method Data
    Splitting
  • 4.2.5. Architecture of the software system
  • 4.2.6. Revealing of possible errors in data
  • 4.2.7. Calculation of frequencies of occurrence
    of features in the chosen group of languages
  • 4.3. The intellectual analysis of data
  • 4.4. Discussion of preliminary results in the
    field of clasterization and intellectual analysis
    of data
  • 4.5. The resume under chapter 4

Chapter 5. MODEL OF STRUCTURAL EVOLUTION OF
LANGUAGES 5.1. The general reasons 5.2.
Borrowings vs. Parallel evolution 5.3. Diagrams
of feature distribution 5.4. The analysis of
mathematical model of dynamics of features 5.5.
The resume under chapter 5
5
Contents (part 3)
Bibliography Appendix A. Table ?.1. A genetic
accessory of languages of a DB Appendix ?. Table
?.1. The list of the classifying features
presented in the DB Appendix B. Example of essay
(Swedish language) Appendix ?. Results of
comparison of essays of Danish and Swedish
languages Appendix ?. The special teaching
course Databases For Researches In Language
Typology And Historical Linguistics Appendix
E. Table ?.1. Quantitative card of features for
section 2.5.4 Compound and Complex Sentence for
IE-languages Appendix ?. Dynamics of statistical
universal features Appendix ?. Distribution of 11
dead languages used in one of calculations on
time axis Appendix ?. The symbol index Appendix
?. The short glossary Appendix ?. The
personality index
  • Chapter 6. DEVELOPING OF NEW METHODS OF
    VERIFICATION OF GENETIC HYPOTHESES
  • 6.1. Calculation quantitative portraits of
    language families and branches
  • 6.2. Classification of features
  • 6.3. An example of quantitative portrait of
    IE-languages (section "Compound and Complex
    Sentence)
  • 6.4. Automatic detection of genetically
    significant features
  • 6.5. A technique of verification of genetic
    hypotheses
  • 6.6. The resume under chapter 6
  • Chapter 7. THE PHENOMENON OF TYPOLOGICAL SHIFT
  • 7.1. The nature of the language universal
    features
  • 7.2. The nature of the language rare features
  • 7.3. A phenomenon of typological shift
  • 7.4. The resume under chapter 7
  • THE CONCLUSION

6
Chapter 1 TYPOLOGY AND CLASSIFICATION OF
LANGUAGES
  • In chapter 1 of the monography the description of
    the basic approaches of different classification
    of languages are presented, the main results
    received in recent works with application of
    mathematical methods are mentioned, and the basic
    problems standing in this feield are listed.
  • The resume under chapter 1
  • Languages can be classified on the various bases
    on an origin generality (genetic classification),
    on a territorial proximity of zones of
    distribution (areal), on a similarity of
    structural properties (typological).
  • Between these classifications there are
    significant correlations, however there are no
    rigid dependences.
  • In genetic classification of languages
    considerable successes are reached by application
    of a comparative-historical method, for the basic
    language families protolanguages are
    reconstructed.
  • Possibly, comparative-historical method in the
    classical form has practically settled its
    possibilities reconstruction on depth is
    considered the big 10 thousand years impossible,
    many delicate questions remain without the
    answer.
  • With introduction in a scientific turn all new
    linguistic data and creation of extensive
    databases essentially new possibilities of
    researches with application of mathematical and
    computer methods have appeared.

Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
7
Chapters 2 THE DATABASE LANGUAGES OF THE WORLD
and Chapter 3 THE DATABASE SOFTWARE
  • In the second and third chapters the detailed
    description of structure of the Database
    Languages of the World and toolkit accompanying
    it are given. Methods of work with the DB are
    described.
  • The resume under chapter 2
  • The Database Languages of the World contains
    detailed descriptions almost all languages of
    Europe, Northern and Central Asia and is one of
    two largest typological databases in the world.
  • The architecture of the DB is focused on support
    of mathematical methods of researches.
  • In the field of language typology and historical
    linguistics there is a number of unresolved
    problems and questions. The Database and
    connected mathematical models and methods could
    be potentially applied to resolve them.
  • The resume under chapter 3
  • The complex of software is developed, allowing to
    input and to edit the description of languages in
    the used format, search and comparison of
    languages.
  • The DB is accessible in various variants DOS,
    Windows, Web and Excel-versions exist.
  • The DB can be used for research, educational and
    reference purposes.

Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
8
Chapter 4 NEW POSSIBILITIES OF QUANTITATIVE
RESEARCHES IN THE FIELD OF LANGUAGE TYPOLOGY AND
HISTORICAL LINGUISTICS
  • In the fourth chapter the key mathematical
    apparatus for the developed approach for
    calculation of measures of language similarity is
    described. In traditional typological researches
    languages are classified on the basis of a small
    number of features. The offered measures of
    similarity use all available data set and the
    strict mathematical apparatus. It gives new
    prospect in classification of languages. However
    the choice of an adequate measure of similarity
    in itself has appeared as a challenge. In the
    chapter the description of the general
    theoretical apparatus and a technique of a choice
    of the suitable measure of similarity is given.
    For calculation of measures of similarity and
    classification of languages the wide spectrum of
    methods is used. Also ordination of languages by
    means of statistical package R is applied.
  • The resume under chapter 4
  • The measure of similarity of languages is a base
    parameter which can be useful to the decision of
    many classification problems. However a choice
    from continual set of different measures of a
    concrete measure of similarity which would be
    adequate to studied object is a difficult
    independent problem.
  • The general principles of construction of
    demanded measures of similarity are developed and
    by means of numerical experiments the group of
    typical measures is analysed.
  • The software is created, allowing to build
    classifications of languages by means of strict
    methods of the cluster analysis.
  • Possibilities of application of other methods of
    the mathematical analysis are shown. Ordination
    by means of the package of statistical researches
    R is to be seemed as the most perspective from
    this group of methods.

Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
9
Chapter 5 MODEL OF STRUCTURAL EVOLUTION OF
LANGUAGES
  • In the fifth chapter the mathematical model of
    distribution of language features is described.
    There is a number of the postulates in its
    grounds that was formulated after the analysis of
    the DB. They correlate well with postulates of
    evolution of lexicon (formulated by Morris
    Swadesh). The new concept of LF-diagrams
    (reflecting dependence of number of features on a
    degree of their prevalence) is formulated. It was
    appeared to be a powerful tool of the analysis of
    mechanisms of borrowing.
  • The resume under chapter 5
  • The values of language features collected in a DB
    can be analyzed by many various ways.
  • The analysis of distribution of the features
    existing exactly in two languages, has allowed to
    reveal defining character of borrowing for
    evolution of languages.
  • The offered method allows to study contacts
    between the language groups, taking place during
    evolution, by means of strict mathematical
    methods.
  • The LF-diagrams describing dependence of number
    of features from degree of their prevalence are
    entered. It is absolutely new interesting
    linguistic object of the researches which have
    appeared only after creation of the DB.
  • The mathematical model of distribution of the
    features, explaining appearance of LF-diagrams is
    constructed. The offered model confirms recently
    stated hypothesis (Dahl Ö. 2004) about a
    competition of young features.
  • The model allows to analyze dynamics of features
    in the whole DB and within separate language
    groups. The received results well correlate with
    earlier received classical methods.

Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
10
Chapter 6 DEVELOPING OF NEW METHODS OF
VERIFICATION OF GENETIC HYPOTHESES
  • In the sixth chapter a new methodology of the
    analysis of genetic reletionship of the languages
    are entered . It is based on frequencies of
    features. Earlier in works on typology such terms
    as unical feature, statistical universal feature
    were used. In thesis for a doctor's degree of E.
    Yaroslavceva the concept of rare feature is
    entered. These terms are specified with reference
    to use in the DB and for objectives of this
    research.
  • In the same chapter application of the offered
    methods to check of hypotheses about genetic
    relationship of languages is shown.
  • The resume under chapter 6
  • Classification of features on the frequency of
    the occurrence, specifying earlier offered by
    E.Yaroslavceva is proposed.
  • The technique of application of the entered
    apparatus for automatic revealing of genetic
    markers (the features having strong correlation
    with relationship of languages) is described.
  • The new method of an establishment of a
    generality of an origin of the languages, based
    on comparison of rare features is offered.
  • The method is verified by its evaluation on
    groups of languages with precisely established
    relationship (Romance and German languages, the
    Altay macro-family).
  • The method is applied to several languages
    isolates (Sumerian, Nivkh). The received results
    confirm some of earlier stated hypotheses.

Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
11
Chapter 7 THE PHENOMENON OF TYPOLOGICAL SHIFT
  • In the seventh chapter numerical data on modern
    and extinct languages are compared. Developed
    above the techniques allows to reveal global
    tendencies of language development. In
    particular, the phenomenon of typological shift
    consisting in increase of frequency of
    high-frequency features and washing away of
    low-frequency is revealed.
  • The resume under chapter 7
  • Quantitative methods can be most effectively
    applied to the description of macroevolution of
    languages and language features.
  • Presence of enough full descriptions of 52
    extinct languages allows to fullfill regular
    comparison of distribution of features in the
    languages existing on the average 2 thousand
    years ago, and modern. It gives the new tool of
    the analysis of language evolution.
  • The phenomenon named typological shift is
    described. It means that for the specified
    interval of time frequency features
    (statistically universal) became even more
    frequency, and rare features have undergone to
    washing away.

Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
12
CONCLUSION
  • In the book new techniques of quantitative
    researches on the material of DB Languages of
    the world are considered. The book describes the
    methods of calculation of pair measures of
    similarity, methods of the cluster analysis and
    the intellectual analysis of data.
  • Quantitative portraits have been calculated under
    the genetic index of the DB. It was possible to
    reveal new statistical regularities of evolution
    of languages by means of these calculations . The
    phenomenon of typological shift on the Euroasian
    space is found out. It also explains presence of
    the strong typological background interfering
    revealing of genetic similarity of languages by
    direct computer methods.
  • The facts and laws described have allowed to look
    in a new fashion at the phenomenon of the
    language variety and to construct new model of
    structural evolution of languages.
  • The new technique of verification of the genetic
    hypotheses, based on using of rare features has
    been offered. The technique gives good conformity
    with traditional representations for languages of
    the Romance-German group, the Altay macro-family.
    Data in support of genetic relationship of
    Sumerian and Semitic languages, Nivkh and
    Chukotko-Kamchatkan have been got. The
    relationship of Old Japanese to the Altay
    macro-family has not proved to be true.
  • As a whole it is possible to tell, that the
    Database Languages of the World is a valuable
    computer linguistic resource, and its value for
    scientific community will increase in due course
    only.

Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
13
POST SCRIPTUMThe review of the new results
received after the book exit
  • Regarding a content
  • Descriptions of 3 Baltic languages and language
    of Basques are formulated, results of examination
    of 34 languages are obtained.
  • In the field of software products
  • The new version of a site is developed and placed
    on the Internet
  • The new interface concept of the DB is developed,
    the prototype model of the reference and
    educational version of the DB is executed.
  • In the part of quantitative researches
  • The new measures of similarity having an
    indicator of quality on 10 above, described in
    the book, are created.
  • The technique of use of methods of philogeny on
    the DB is mastered, genetic trees are received.
  • Parameters of typological shift are specified.
  • Genetic markers on the families of Eurasia are
    revealed.

Visit to Max Planck Institute of Evolutionary
Anthropology, Leipzig, May 2008.
14
PPS. FUTURE DIRECTIONS
  • Cooperation with WALS techniques
  • Issue of version of DB for English-speaking
    scientist
  • Improving all products
  • Issue of reference book on content of the DB
  • Verifying of main genetic hypothesis and
    information about ancient areal contacts between
    languages of Eurasia on the material of the DB
  • New similarity measures discovering (on division
    of areal and genetic markers)
  • and so on .

15
  • Thank you for attention !
Write a Comment
User Comments (0)
About PowerShow.com