Gramsci - PowerPoint PPT Presentation

About This Presentation
Title:

Gramsci

Description:

Gramsci s authorship attribution of anonymus newspapers articles Maurizio Lana Histoire et informatique Textom trie des sources historiques – PowerPoint PPT presentation

Number of Views:404
Avg rating:3.0/5.0
Slides: 30
Provided by: mlan82
Category:

less

Transcript and Presenter's Notes

Title: Gramsci


1
Gramscis authorship attribution of anonymus
newspapers articles
  • Maurizio Lana
  • Histoire et informatique
  • Textométrie des sources historiques
  • 6.6.2014

2
who we are
  • maurizio lana
  • mirko degli esposti
  • emanuele caglioti
  • dario benedetto
  • 1 scholar and 3 physical mathematicians

3
its always data
  • the analysis of numerization of physical world
    phenomena can equally work on
  • TAC imaging,
  • songs,
  • ECG,
  • texts,

4
reason for the study
  • national edition of Gramscis works, by Ministero
    dei Beni Culturali
  • new work on the newspaper articles
  • many anonymous newspaper articles in the journals
    and newspapers Gramsci wrote forIl Grido del
    Popolo, Avanti!, La Citta Futura
  • request from the Fondazione Gramsci to start anew
    the study of anonymous articles, to find new
    evidences of Gramsci writings
  • we were in 2005

5
a little background
  • the start is in 1847, V.J. Bunjakovskij On the
    possibility to apply determining measures of
    confidence to the results of some observing
    sciences, particularly statistics
  • 1897-98, W. Lutoslawski, On Stylometry
    Principes de stylometrie
  • 1959, D. R. Cox and L. Brandwood, On a
    discriminatory problem connected with the works
    of Plato
  • 1962, A. Ellegard, Who was Junius?
  • 1964, F. Mosteller and D. Wallace Inference and
    Disputed Authorship The Federalist
  • 1978, A. Kenny, The Aristotelian ethics a study
    of the relationship between the Eudemian and
    Nicomachean ethics of Aristotle
  • 1980, J.P. Benzécri Pratique de lanalyse des
    donnees
  • 1987, J. F Burrows, Word Patterns and Story
    Shapes The Statistical Analysis of Narrative
    Style, LLC, 2, 1987, pagg. 61-70

6
in common
  • they have the work at words levels

7
the turning point
  • G. Ledger, Re-counting Plato A Computer Analysis
    of Platos Style, Oxford, Clarendon Press, 1989
  • the scope are
  • words containing a specified letter
  • words ending in a specified letter
  • words with a specified letter as penultimate
  • that is semantically and linguistically
    meaningless parts of the words
  • I have departed from the traditional approach of
    stylometry by ignoring entirely meanings and
    grammatical functions, measuring instead the
    frequencies of words according to their
    orthographic content

8
today, for me (for us)
  • the key is a latent mathematical structure of
    the text
  • from L. Dolez?el, A note on quantification in
    text theory, in Text Processing, S. Allen
    ed., Stockholm, 1982, pagg. 539-552
  • an expression of the idea D. Khmelev, F.
    Tweedie, Using Markov chains for identification
    of writers, LLC, 16, 4, 2001, pagg. 299-307

9
today, for me (for us)
  • another expression D. Benedetto, E. Caglioti, V.
    Loreto et al., Language Trees and Zipping, Phys.
    Rev. Lett. 88, n. 4, 048702-1, 048702-4 (2002)
  • take 1 texts, compress it with Zip
  • then take another text and compress it with the
    compression dictionary of the first one
  • measure the difference in size this is the
    measure of the relative entropy

10
then came the AAAC
  • in 2004 the american mathematician Patrick Juola
    proposed the ad-hoc authorship attribution
    competition to experimentally find the best
    method to correctly attribute anonymous
    workshttp//www.mathcs. duq.edu/juola/authorshi
    p_contest.html
  • second best scorer Vlado Keselj, with a method
    based on measurements of n-grams frequencies

11
the state of the QAA world in 2005
  • in 2002 Jack Grieve, for his thesis Quantitative
    Authorship Attribution A History And An
    Evaluation Of Techniques, counts at least 39
    known and used methods with 93 variants for
    Quantitative AA
  • the aim of AAAC prune the useless methods
  • nevertheless this continue to be not science,
    but craftmanship

12
in 2005 we started
  • we had to prove to the Fondazione Gramsci that
    the Quantitative AA produced good results
  • we choose to use two QAA methods
  • relative entropy (already described)
  • n-gram distances (which gave Keselj the 2 palce
    in the AAAC)

13
the protocol
  • phase 1 50 surely Gramscian texts 50 surely
    non-Gramscian texts
  • do whatever you like to be able to recognize the
    Gramscian as Gramscian and the non-Gramscian as
    non-Gramscian
  • phase 2 (blind test) 40 unidentified texts, some
    Gramscian and some not classify them correctly

14
text preparation
  • deletion of
  • citations of any lenght
  • proper nouns
  • numbers
  • no lemmatization e.g. the choice for a given
    tense and person of a verb contains some quantity
    of information we cannot evaluate properly in
    order to discard it

15
n-grams
  • sequencies of n entities you must choose (we
    chose characters)
  • sliding n-grams in final a 3-gram reads fin,
    ina, nal
  • to find the right n you do tests
  • n-grams capture fragments of meaning, syntax,
    collocations/cooccurrences, etc.
  • you have a dictionary of gramscian n-grams
  • you check the n-grams of your anonymous texts
    you count the matches and the non-matches and do
    an algebric sum if positive the text is
    gramscian, if negative not

16
strategy
  • maximize the correct attributions
  • at the same avoiding false attributions
  • some missed attributions are ok if you dont
    produce false attributions
  • you must have your commissioner trust you

17
strategy 2
  • we dont know if, how, and how much the parole
    of an author changes across matters, audience,
    genre, time,
  • so we decide that we had to work on well defined
    periods their boundaries being left to decide to
    the Gramsci experts
  • 1 period 1914-1921

18
a little of maths
  • having two methods at work, we could build a
    cartesian plane, where the results of he measures
    were plotted after normalization bringing them in
    the range -1 / 1

19
phase 1 - setup
20
phase 2 blind test
21
the day after
  • we started to do the attributions - being paid by
    Fondazione Gramsci for it - without knowing
    anything of the texts, and giving periodical
    reports to the historians who were editors of the
    various volumes of the national edition od
    Gramsci works
  • we got the texts, normalized them, measured them,
    and produced a Report we sent to Fondazione
    Gramsci
  • historians evaluation of the QAA no proposed
    attribution was unacceptable, even if not every
    proposed attribution was accepted
  • example of report

22
now we have stopped
  • due to the cuts to research funds, the national
    edition is at now stopped

23
some practical principles on AA
  • no tool can read a text and say you this text
    was written by Francesco Stella
  • you can only classify the texts you chose to work
    on, crunched by the tool you use
  • all of the texts will be connected you must
    interpret the results
  • you must mix anonymous or disputed works with
    control works same period, same genre, same
    language, same author, similar authors,

24
be careful
  • when you have proper nouns in your works, its
    easy to classify them
  • R. Clement and D. Sharp, Ngram and Bayesian
    Classification of Documents for Topic and
    Authorship, LLC, 2003, 18(4)423-447
  • but you dont really classifiy the texts, you
    classify the collections of proper nouns they
    contain

25
why the gramsci cas was/is difficult and strange
  • articles are very short between 300 and
    1000/1200 words
  • all of these articles share matters, ideology,
    context
  • there is no countercheck, and you work for a
    scientific and productive initiative (its not
    simply an experiment)
  • the tables showing the matches are sparse tables,
    nevertheless these data work well

26
now what
  • Patrick Juola, the mathematician who proposed the
    AAAC, has released JGAAP, a package offering
    various tools for QAA
  • http//evllabs.com/jgaap/w/index.php/
  • the R package with stylo is impressive and I wish
    we had it when we started our work with Gramsci
    texts

27
some references to start from
  • C. Basile, D. Benedetto, E. Caglioti, M. Degli
    Esposti, An example of mathematical authorship
    attribution, Journal Of Mathematical Physics,
    2008, 49, pp. 1 20
  • C. Basile, D. Benedetto, E. Caglioti, M. Degli
    Esposti, L'attribuzione dei testi gramsciani
    metodi e modelli matematici, La Matematica nella
    Società e nella Cultura, 2010, 3, pp. 235 269
  • M. Lana, Come scriveva Gramsci? Metodi matematici
    per riconoscere scritti gramsciani anonimi,
    Informatica Umanistica, 2010, 3, 31-56

28
some references (2)
  • M. Lana, Individuare scritti gramsciani anonimi
    in un" corpus" giornalistico. Il ruolo dei metodi
    quantitativi, Studi storici rivista trimestrale
    dell'Istituto Gramsci, 52 (4), 859-880
  • P. Juola, Authorship Attribution, Foundations
    and Trends in Information Retrieval, Vol. 1, No.
    3 (2006) 233334http//www.conll.org/walter/educ
    ational/material/fnt-aa.pdf
  • J. Grieve, Quantitative Authorship Attribution
    An Evaluation of Techniques, LLC 22
    251-270http//dl.dropboxusercontent.com/u/9916105
    7/Grieve_authorshipattribution.pdf

29
  • thanks!
Write a Comment
User Comments (0)
About PowerShow.com