Title: Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses
1Corpus and Experimental Data as Corroborating
EvidenceThe Case of Preposition Placement in
English Relative Clauses
Thomas Hoffmann (University of Regensburg)
- Linguistic Evidence Empirical, Theoretical, and
Computational Perspectives University of
Tübingen, 02.02.-04.02.2006
21. Introduction Corpus vs. Introspection
- We do not need to use intuition in justifying
our grammars, and as scientists, we must not use
intuition in this way. (Sampson 2001 135) - You dont take a corpus, you ask questions.
You can take as many texts as you like, you can
take tape recordings, but youll never get the
answer. (Chomsky in Aarts 2000 5-6) - ? Which type of data are we left with then?
31. Introduction Corpus vs. Introspection
- A corpus and an introspection-based approach to
linguistics can be gainfully viewed as being
complementary. (McEnery and Wilson 1996 16) - ? corpus and introspection data
corroborating evidence - ? case study P placement in English Relative
clauses
41. Introduction What to Expect
- corpora vs. introspection?
- categorical corpus data (ICE-GB corpus)
- Magnitude Estimation experiment
- variable corpus data (ICE-GB corpus)
- conclusion
52. Corpora and Introspection
- Arguments against corpus data
- performance problem
- negative data problem
- homogeneity problem
- ? only use introspection
62. Corpora and Introspection
- Arguments against corpus data ? no corpus
- performance problem yet performance result
of competence modern corpora representative - negative data problem yet only additional
(different) data needed - homogeneity problemyet empirical claim that
needs to be investigated - ? use corpora additional data type
72. Corpora and Introspection
- Arguments against introspection data
- unnatural data problem
- irrefutable data problem
- illusion problem
- stability problem
- ? only use corpora
82. Corpora and Introspection
- Arguments against introspection data ? no
introspection - unnatural data problemyet only additional
(context) data needed - irrefutable datayet depends only on
collection method - illusion problem yet only additional
(natural) data needed - stability problem yet empirical claim that
needs to be investigated - ? use corpora additional data type
92. Corpora and Introspection
- Corpora and introspection are corroborating
evidence
103. Case Study Preposition Placement
- I want a data source ...
- (1) a. which I can rely on
- stranded preposition
- b. on which I can rely
- pied-piped preposition
- driving question
- data source for empirical analysis of (1a,b)?
114. Empirical Study I Corpus Data
- Corpus used
- International Corpus of English ICE-GB (Nelson
et al. 2002)(educated Present-day BE, written
spoken) - Analysis tool
- GOLDVARB computer programme (logistic
regression Robinson et al. 2001) - relative influence of various contextual factors
(weights lt0.5 inhibiting factors gt0.5
favouring)
124. Empirical Study I Corpus Data I
- Pstrand/pied-piped token tested for
- finiteness
- restrictiveness
- relativizer
- XP contained in (V / N, e.g. entrance to sth. /
Adj, e.g. afraid of sth.) - level of formality
- X-PP relationship (Vprepositional, PPLoc_Adjunct,
PPMan_Adjunct ) - except 2 all factors discussed in literature
before, but not w.r.t. interdependence (e.g.
Bergh, G. A. Seppänen. 2000 Trotta 2000)
134.1 Categorical corpus data
- raw ICE-GB P-placement data
- 1074 finite relative clauses
- 659 (61.4) tokens pied piped
- 415 (38.6) tokens stranded
- as expected many categorical effects
- ? accidental vs. systematic gaps?
144.2 Categorical corpus data that/Ø ? WH-relatives
- relativizer
- all that/Ø-tokens in ICE-GB stranded
- 176 thatPstranded-token
- (2) ?a data source on that I can rely
- 177 ØPstranded-token
- (3) ?a data source on Ø I can rely
- ? ICE-GB result expected
- ? implications (2) (3)? / that ? WH-
154.3 Categorical corpus data Constraints on
Pstrand
- 2. X-PP relationship
-
- Literature (e.g. Bergh, G. A. Seppänen. 2000
Trotta 2000) - Pstranding favoured with complement PP
- disfavoured with adjunct PP
- ICE-GB data
- Pstranding restricted to PPs which
- add thematic information to predicates/events
164.3 Categorical corpus data Constraints on
Pstrand
- 2. X-PP relationship
- categorical effect of WH-PPAdjuncts-tokens
- a) just PWH / no that/ØP in ICE-GB
- manner, degree, frequency respect PPs,
e.g. - a. the ways in which the satire is achieved
ltICE-GBS1B-014 51Agt -
- b. ? the ways which/that/Ø the satire is
achieved in
174.3 Categorical corpus data Constraints on
Pstrand
- 2. X-PP relationship
- categorical effect of WH-PPAdjuncts-tokens
- b) just PWH / but that/ØP in ICE-GB
- subcat. PP (put sth. in/into/under)
- locative, affected loc., direction PP
adjuncts - a. the world that I was working in and
studying in ltICE-GBS1A-001 351Bgt - b. the world in which I was working and
studying
184.3 Categorical corpus data Constraints on
Pstrand
- Claim comparison of WH- vs that/Ø shows
- P can only be stranded if PP adds thematic
information to predicates/events - manner degree adjunctscompare events to
other possible events of V-ing (Ernst 2002 59)
- frequency respect adjuncts have scope over
temporal information (frequency) and truth value
of entire clause (respect) - ? dont add thematic participant ? Pstrand
with these systematic gap
194.3 Categorical corpus data Constraints on
Pstrand
- Claim comparison of WH- vs that/Ø shows
- P can only be stranded if PP adds thematic
information to predicates/events - subcat. PP loc., affected loc., direction PP
adjuncts -
- ? add thematic participant ? WHP with these
accidental gap
204.3 Categorical corpus data Constraints on
Pstrand
- Claim comparison of WH- vs that/Ø shows
- P can only be stranded if PP adds thematic
information to predicates/events - Comparison of WH- vs that/Ø good evidence, but
- still negative data problem
- further corroborating evidence needed
- Introspection Magnitude Estimation study
215. Empirical Study II Magnitude Estimation
- relative judgements (reference sentence)
- informal, restrictive RCs tested for
- P-PLACEMENT (Pstrand, Ppied-piped)RELATIVIZER
(WH-, that-, Ø-)X-PP (VPrep, PPTemp/Loc_Adjunct,
PPManner/Degree_Adjunct) - tokens counterbalanced 6 material groups a 18
tokens 36 filler 54 tokens - tokens randomized (Web-Exp-software)
- N 36 BE native speakers (sex 18m, 18f / age
17-64)
225. Empirical Study II Magnitude Estimation
- 18 filler sentences ungrammatical
- a. Thats a tape I sent them that done Ive
myself (word order violation original source
ltICE-GBS1A-033 074gt) - b. There was lots of activity that goes on there
(subject contact clause original source
ltICE-GBS1A-004 067gt) - c. There are so many people who needs
physiotherapy (subject-verb agreement error
original source ltICE-GBS1A-003 027gt)
235. Empirical Study II Magnitude Estimation
- ANOVA significant effects
- P-PLACEMENT F(1,33) 4.536, p lt 0.05
- RELATIVIZER F(2,66) 17.149, p lt 0.001
- P-PLACEMENTX-PP F(2,66) 9.740, p lt 0.001
- P-PLACEMENTRELATIVIZER F(2,66) 4.217, p lt
0.02
245. Empirical Study II Magnitude Estimation
- ANOVA not significant
- AGE F(1,33) 2.760, p gt 0.10
- GENDERF(1,33) 1.495, p gt 0.20
- ? indicates homogeneity of subjects
255. Empirical Study II Magnitude Estimation
- Post-hoc Tukey test P-PlaceRelativizer
- Ppied-piped WH- gtgt that p lt 0.001 WH-
gtgt ? p lt 0.001 that gt ? p lt 0.010 - Pstrand no difference WH- that ? p gtgt
0.100
265. Empirical Study II Magnitude Estimation
- Post-hoc Tukey test P-PlaceX-PP
- Ppied-piped PPMan/Deg gt VPrep p lt
0.010 PPMan/Deg PPTemp/Loc p 0.100
VPrep PPTemp/Loc p gt 0.100 - Pstrand no difference VPrep gt PPTemp/Loc gt
PPMan/Deg p lt 0.001
27Fig. 1 Magnitude estimation result for P
relativizer PWH gtgt Pthat gt PØ
28Fig. 2 Magnitude estimation result for P
relativizer compared with fillers Pthat PØ
ungrammatical fillers ? violation of hard
constraint (Sorace Keller 2005)
29Fig. 3 Magnitude estimation result for
relativizer P WH P that P Ø PVPrep gt
PPTemp/Loc gt PPMan/Deg
30Fig. 3 Magnitude estimation result for
relativizer P VPrep gt PPTemp/Loc gt PPMan/Deg gtgt
ungrammatical filler? violation of soft
constraint (Sorace Keller 2005)
316. Corroborating Evidence
- Corroborating evidence
- corpus man/deg PPs no Pstranded (not even with
that/?)? semantic constraint on Pstranded - experimentman/deg PPs worst environment for
Pstranded yet better than ungrammatical fillers
(soft constraint violation)
327. Empirical Study III Corpus Data II
- Constraints on variable corpus data (354 finite
WH-token) - Goldvarb identified 3 independent factors (Log
likelihood -88.437 Significance 0.004 Fit
X-square(27) 27.977, accepted, p 0.2040) - 1. level of formality (as expected)
- 2. type of PP contained in (as expected)
- 3. restrictiveness (unexpected)
- restrictive RC favour pied piping (weight
0.592) - nonrestrictive RC clearly inhibit pied piping
(i.e. favour stranding weight 0.248)
337. Empirical Study III Corpus Data II
- (6) And uhm he left me there with this packet of
Durex which I hadn't got a clue what to do
with to be totally honest ltICE-GBS1B-049
1671Bgt - reasons for restrictiveness effect
- 1. weaker semantic ties of non-restrictive
clause with antecedent (pause/comma) - 2. Pied-piped P receives connective function
-
- ? functionalisation of preposition placement
in WH-relative clause
348. Conclusion
- corpus and introspection data corroborating
evidence - corporafrequency/context effects (e.g. level of
formality)unexpected patterns (e.g.
restrictiveness)categorical data ? require
further investigation - ?
- introspection differentiation of accidental
gaps (WHP with PPTemp/Loc)systematic gaps (XP
with PPMan/Deg)detection of degrees of
ungrammaticality
359. References
- Aarts, B. 2000. "Corpus linguistics, Chomsky and
Fuzzy Tree Fragments". In Christian Mair and
Marianne Hundt, eds. 2000. Corpus Linguistics and
Linguistic Theory. Amsterdam and Atlanta, GA
Rodopi, 5-13. - Bard, E.G. et al. 1996. Magnitude Estimation of
Linguistic acceptability. Language 7232-68. - Bergh, G. A. Seppänen. 2000. Preposition
stranding with wh-relatives A historical
survey. English Language and Linguistics
4295-316. - Cowart, W. 1997. Experimental Syntax Applying
Objective Methods to Sentence Judgements.
Thousand Oaks Sage. - Huddleston, R. et al. 2002. Relative
constructions and unbound dependencies. In G.K.
Pullum R. Huddleston, eds. The Cambridge
Grammar of the English Language. Cambridge
Cambridge University Press, 1031-1096. - Jackendoff, R. 2002. Foundations of Language
Brain, Meaning, Grammar, Evolution. Oxford
Oxford University Press. - Levine, R. I.A. Sag. 2003. WH-Nonmovement.
lthttp//www-csli.stanford.edu/saggt, 04.07.2004.
369. References
- Nelson, G. et al. 2002. Exploring Natural
Language Working with the British Component of
the International Corpus of English. Amsterdam,
Philadelphia Benjamins. - McEnery, T. and A. Wilson. 1997. Corpus
Linguistics. Edinburgh Edinburgh University
Press. - Pesetsky, D. 1998. Some principles of sentence
production. In Pilar Barbosa et al., eds. Is
the Best Good Enough? Optimality and Competition
in Syntax. Cambridge, MA MIT Press, 337-83. - Penke, M. A. Rosenbach. 2004. "What counts as
evidence in linguistics? An introduction".
Studies in Language 28,3 480-526. - Pickering, M. G. Barry. 1991. Sentence
processing without empty categories. Language
and Cognitive Processes 6229-259. - Quirk, R. et al. 1985. A Comprehensive Grammar of
the English Language. London Longman. - Robinson, J. et al. 2001. GOLDVARB 2001 A
Multivariate Analysis Application for Windows.
lthttp//www.york.ac.uk/depts/lang/webstuff/goldvar
b/manualOct2001gt
379. References
- Sag, I.A. 1997. English relative constructions.
Journal of Linguistics 33431-484. - Sampson, G. 2001. Empirical Linguistics. London,
New York Continuum. - Schütze, Carson T. 1996. The Empirical Base of
Linguistics Grammaticality Judgements and
Linguistic Methodology. Chicago Chicago
University Press. - Sorace, Antonella and Frank Keller. 2005.
"Gradience in linguistic data". Lingua 115,11
1497-1525. - Trotta, J. 2000. Wh-clauses in English Aspects
of Theory and Description. Amsterdam and
Philadelphia, GA Rodopi. - Van der Auwera, J. 1985. Relative that a
centennial dispute. Journal of Linguistics
21149-179.