English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE) - PowerPoint PPT Presentation

About This Presentation
Title:

English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

Description:

English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE) Sean Wallis UCL Barber (1964): changes in English grammar a. – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 22
Provided by: acuk
Category:

less

Transcript and Presenter's Notes

Title: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)


1
English Corpus LinguisticsIntroducing the
Diachronic Corpus of Present-Day Spoken English
(DCPSE)
  • Sean Wallis
  • UCL

2
Barber (1964) changes in English grammar
  • a. A tendency to regularize irregular morphology
    (e.g. dreamt- dreamed)
  • b. A revival of the mandative subjunctive,
    probably inspired by formal US usage (we demand
    that she take part in the meeting)
  • c. Elimination of shall as a future marker in
    the first person
  • Development of new, auxiliary-like uses of
    certain lexical verbs (e.g. get, want cf.,
    e.g., The way you look, you wanna / want to see a
    doctor soon)
  • Extension of the progressive to new
    constructions, e.g. modal, present perfect and
    past perfect passive progressive (the road would
    not be being built/ has not been being built/ had
    not been being built before the general
    elections)
  • Increase in the number and types of multi-word
    verbs (phrasal verbs, have/take/give a ride,
    etc.)
  • Placement of frequency adverbs before auxiliary
    verbs (even if no emphasis is intended I never
    have said so)
  • h. Do-support for have (have you any money? and
    no, I havent any money - do you have/ have you
    got any money? and no, I dont have any money/
    havent got any money)

3
The Diachronic Corpus of Present-daySpoken
English (DCPSE)
  • Orthographically transcribed spoken BrE
  • Fully parsed
  • every sentence has a tree diagram
  • searchable with ICECUP and FTFs
  • 400,000 words each from
  • London-Lund Corpus (aka The Survey Corpus)
  • ICE-GB
  • Balanced by text category
  • Not evenly distributed by year
  • LLC samples from 1958-1977
  • ICE-GB 1990-1992

4
Tree diagrams
  • A tree diagram for the sentence Were getting
    there.

5
Barber on shall and will
  • The distinctions formerly made between shall
    and will are being lost, and will is coming
    increasingly to be used instead of shall. One
    reason for this is that in speech we very often
    say neither will nor shall, but just ll
    Ill see you to-morrow, well meet you at the
    station, Johnll get it for you. We cannot use
    this weak form in all positions (not at the end
    of a phrase, for example), but we use it very
    often and, whatever its historical origin may
    have been (probably from will), we now use it
    indiscriminately as a weak form for either shall
    or will and very often the speaker could not
    tell you which he had intended. There is thus
    often a doubt in a speakers mind whether will or
    shall is the appropriate form and, in this
    doubt, it is will that is spreading at the
    expense of shall, presumably because will is used
    more frequently than shall anyway, and so is
    likely to be the winner in a levelling process.
    So people nowadays commonly say or write I will
    be there, we will all die one day, and so on,
    when they intend to express simple futurity and
    not volition.
  • (Barber 1964 134)

6
Denison on shall and will
  • During the latter part of our period
    1776-present day ... in the first person shall
    has increasingly been replaced by will even where
    there is no element of volition in the meaning.
  • (Denison 1998 167)

7
The use of shall and will in written British and
American English from the 1960s and 1990s
BrE LOB FLOB LL diff will 2,798 2,723 1.2 -2.7
shall 355 200 44.3 -43.7
AmE Brown Frown LL diff will 2,702 2,402 17.3 -1
1.1 shall 267 150 33.1 -43.8
From Mair and Leech (2006 327)
  • Figures are normalised per million word
    frequencies
  • Log likelihood LL is performed against number of
    words

8
Mair and Leechs data
  • Simply counts tagged lexical tokens
  • Will auxiliary verb, includes ll
  • Shall auxiliary verb
  • Includes negative forms
  • Does not distinguish by grammatical position or
    context
  • Does not ask whether the choice is available,
    e.g. limit to first person use
  • Does not consider subclasses separately
  • Negative cases will not/wont vs. shall
    not/shant?
  • Do interrogative cases behave differently?
  • Is written data only
  • Can we do better than this?

9
An FTF for first person declarative shall
  • This FTF is limited to first person cases
  • The FTF requires that the NP is realised by the
    pronoun I or we.
  • Interrogative cases have a different structure
  • We can subtract negative (shall not) cases to
    exclude them.

10
Shall vs. will
  • Does the proportion of cases of shall out of
    shall, will change over time?
  • ?² for first person subject shall vs will
  • d percentage difference (30 fall in shall
    between LLC and ICE-GB)
  • an estimate of the size of the overall effect
    (a bit like d)
  • c2 2x2 chi-square test is this change
    statistically significant?
  • c2(shall) 2x1 goodness of fit test does shall
    behave differently to average?

11
Shall vs. will/ll
  • Does the proportion of cases of shall out of
    shall, will, ll change over time?
  • ?² for first person subject shall vs will vs.
    ll

c2(shall) 2x1 goodness of fit test does shall
behave differently to average?
12
Focusing on choice
  • We focused on the choice of shall vs. will
  • Mair and Leech simply said that total cases of
    shall fell
  • But this might have happened for other reasons
  • For example there may have been more
    opportunities to use shall in the LLC data
  • Examining choice is a more precise way of
    conducting experiments than counting frequencies
  • It allows us to consider what variables (time,
    genre, other choices) affect the probability of
    shall being chosen
  • Probability is a simple fraction from 0 to 1.
  • p(shall) F(shall)
  • F(shall) F(will)

13
Probability of shall vs. will over time
14
Probability of shall vs. will/ll over time
15
Confidence intervals
  • Probability p(shall)
  • 0 no cases are of type shall
  • 1 all cases are of type shall
  • Our sample is a tiny subset of possible sentences
    from the same period
  • So we cannot say a particular observation is
    certain
  • Instead we try to estimate our confidence in an
    observation using error bars or confidence
    intervals
  • The more data we have supporting an observation
    p, the smaller the confidence interval around it
  • We set a confidence level, typically of 95
  • we are 95 sure that the true value is within the
    interval

16
Modal meaning
  • Remember Barber and Denison. Not all cases of
    shall or will mean the same thing
  • Root (futurity)
  • Ive got some at home so I shall take it home.
    DI-A18 30
  • I will answer you in a minute. DI-B30 293
  • Epistemic (volition)
  • So I shall have roughly from the twenty-ninth of
    June to the eighth of July on which I can spend
    the whole of that time on those two papers.
    DL-B01 62
  • Its certainly my long term hope that I will have
    some kind of companion... DI-B53 0257
  • We should examine these choices separately
  • Unfortunately this means classifying cases
    manually

17
Modal meaning statistics
Root
Epistemic
Unclear
Total
shall LLC 33 30.84 72 67.29 2 1.87 107 ICE-GB 22
59.46 14 37.84 ? sig 1 2.70 37 will LLC 44 55.70
28 35.44 7 8.86 79 ICE-GB 37 66.07 14 25.00 5 8
.93 56 Total 136 128 ? sig 15 279
  • Root shall / will is stable results are not
    significant
  • Epistemic shall / will falls (d -30 ?27)
  • The fall in shall is not explained by the sharp
    fall in Epistemic modals overall - from 100
    (7228) to 28 (1414)
  • This is evidence that the shift in use in C20 is
    concentrated within Epistemic meanings, from
    shall to will.
  • Barber and Denison earlier shift was in Root
    (future) meaning.

18
Modal meaning statistics
Root
Epistemic
Unclear
Total
shall LLC 33 30.84 72 67.29 2 1.87 107 ICE-GB 22
59.46 14 37.84 ? sig 1 2.70 37 will LLC 44 55.70
28 35.44 7 8.86 79 ICE-GB 37 66.07 14 25.00 5 8
.93 56 Total 136 128 ? sig 15 279
  • Shall is losing its particular Epistemic meaning
    as a result
  • In the LLC data two thirds (67) of shall uses
    were Epistemic.
  • This fell to 37 (just over one third) in ICE-GB.

19
Conclusions
  • DCPSE is
  • orthographically transcribed spoken English
  • mostly spontaneous
  • fully parsed and checked by linguists, uses
    phrase structure grammar based on Quirk et al.
  • searchable with ICECUP and FTFs
  • Even lexical studies benefit from parsing
  • allows us to focus on when a choice occurs
  • You can use DCPSE to carry out many different
    experiments on real English
  • we looked at change over (recent) time
  • we might also look at how decisions interact

20
Conclusions
  • Designing a Corpus Linguistic experiment means
    thinking carefully about your hypothesis and then
    attempting to test it against the corpus
  • We examined the shift from shall to will
  • We limited it to first person, declarative,
    positive cases
  • Changing baselines (including ll) may lead to
    different conclusions
  • Many corpus studies only consider word baselines
    (or pmw)
  • But it is often better to consider proportions of
    types of clause or phrase, or list specific
    alternative choices
  • Alternation (choice) studies aim to hold meaning
    constant so the speaker/writer is free to choose
    between both cases
  • We focused further by subdividing data by modal
    meaning

21
Suggested further reading
  • On shall vs. will and the progressive
  • Aarts, B. Close, J. and Wallis S.A. (forthcoming)
    Choices over time methodological issues in
    investigating current change. In B. Aarts et
    al. The changing Verb Phrase, Cambridge CUP.
  • www.ucl.ac.uk/english-usage/projects/verb-phrase/b
    ook/aartsclosewallis.pdf
  • Barber, C. (1964) Linguistic Change in
    Present-Day English. Edinburgh and London Oliver
    and Boyd.
  • Denison, D. (1998) Syntax. In S. Romaine (ed.).
    The Cambridge History of the English Language.
    IV 1776-1997. Cambridge Cambridge University
    Press. 92-329.
  • Mair, C. and Leech, G. (2006) Current changes in
    English syntax.In B. Aarts and A. McMahon (ed.)
    The Handbook of English Linguistics. Malden MA
    Blackwell Publishers. 318-342.
  • On statistical tests, confidence intervals and
    other methods
  • Wallis, S.A. (2010) z-squared the origin and use
    of c2. Survey of English Usage, UCL.
  • www.ucl.ac.uk/english-usage/statspapers/z-squared.
    pdf
Write a Comment
User Comments (0)
About PowerShow.com