What cross-linguistic variation tells us about information density in on-line processing - PowerPoint PPT Presentation

1 / 130
About This Presentation
Title:

What cross-linguistic variation tells us about information density in on-line processing

Description:

... rules provide data that can inform language processing ... Parallelism in Natural Language Processing, Ablex ... in a corpus of Sanskrit, for ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 131
Provided by: JohnHa189
Category:

less

Transcript and Presenter's Notes

Title: What cross-linguistic variation tells us about information density in on-line processing


1
What cross-linguistic variation tells us about
information density in on-line processing
  • John A. Hawkins
  • UC Davis University of Cambridge

2
  • Patterns of variation across languages provide
    relevant evidence for current issues in
    psychology on information density in on-line
    processing.

3
  • Some background, first of all.
  • I have argued (Hawkins 1994, 2004, 2009, to
    appear) for a Performance-Grammar Correspondence
    Hypothesis

4
  • Performance-Grammar Correspondence Hypothesis
    (PGCH)
  • Languages have conventionalized grammatical
    properties in proportion to their degree of
    preference in performance, as evidenced by
    patterns of selection in corpora and by ease of
    processing in psycholinguistic experiments.

5
  • I.e. languages have conventionalized or fixed
    in their grammars the same kinds of preferences
    and principles that we see in performance,
  • esp. in those languages in which speakers have
    alternatives to choose from in language use

6
  • E.g. between
  • alternative word orders
  • relative clauses with or without a relativizer,
  • with a gap or a resumptive pronoun
  • extraposed vs non-extraposed phrases
  • Heavy NP Shift or no shift
  • alternative ditransitive constructions
  • zero vs non-zero case markers
  • and so on

7
  • The patterns and principles found in these
    selections are, according to the PGCH, the same
    patterns and principles that we see in grammars
    in languages with fewer conventionalized options
    (more fixed orderings, gaps only in certain
    relativization environments, etc).

8
  • If so, linguists developing theories of grammar
    and of typological variation need to look
    seriously at theories of processing, in order to
    understand which structures are selected in
    performance, when, and why, with the result that
    grammars come to conventionalize these, and not
    other, patterns.
  • See Hawkins (2004, 2009, to appear)

9
  • Conversely, psychologists need to look at
    grammars and at cross-linguistic variation in
    order to see what they tell us about processing.
    since grammars are conventionalized processing
    preferences.

10
  • Alternative variants across grammars are also, by
    hypothesis, alternatives for efficient
    processing.
  • And the frequency with which these alternatives
    are conventionalized is, again by hypothesis,
    correlated with their degree of preference and
    efficiency in processing.

11
  • Looking at grammatical variation from a
    processing perspective can be revealing,
    therefore.

12
  • E.g. Japanese, Korean, Dravidian languages do not
    move heavy and complex phrases to the end of
    their clauses, like English does, they move them
    to the beginning, in proportion to their
    (relative) complexity.
  • If your psychological model predicts that all
    languages should be like English, then you need
    to go back to the drawing board and look at these
    different grammars, and at their performance,
    before you define and test your model further.

13
  • Which brings me to todays topic
  • What do grammars and typological variation tell
    us about information density in on-line
    processing?

14
  • Let us define Information as
  • the set of linguistic forms F (phonemes,
    morphemes, words, etc) and the set of
    properties P (ultimately semantic properties
    in a semantic representation) that are assigned
    to them by linguistic convention and in
    processing.

15
  • Let us define Density as
  • the number of these forms and properties that
    are assigned at a particular point in
    processing, i.e. the size of a given Fi-Pi
    pairing at point i in on-line comprehension
    or production.

16
  • I see evidence for two very general and
    complementary principles of information density
    in cross-linguistic patterns.

17
  • First, minimize Fi
  • minimize the set Fi required for the
    assignment of a particular Pi or Pi
  • I.e. minimize the number of linguistic forms that
    need to be processed at each point in order to
    assign a given morphological, syntactic or
    semantic property or set of properties to these
    forms on-line.

18
  • The conditions that determine the degree of
    permissible minimization can be inferred from the
    patterns themselves and essentially involve
    efficiency and ease of processing in the
    assignment of Pi to Fi.

19
  • Examples will be given from morphological
  • hierarchies and from syntactic patterns such
    as
  • word order and filler-gap dependencies.

20
  • Second, maximize Pi
  • maximize the set Pi that can be assigned to a
    particular Fi or Fi.
  • I.e. select and arrange linguistic forms so that
    as many as possible of their (correct) syntactic
    and semantic properties can be assigned to them
    at each point in on-line processing.

21
  • A set of linear ordering universals will be
    presented in which category A is systematically
    preferred before B regardless of language type,
    i.e. A B. Positioning B first would always
    result in incomplete or incorrect assignments of
    properties to B on-line, whereas positioning it
    after A permits the full assignment of properties
    to B at the time it is processed.
  • These universals provide systematic evidence for
    maximize Pi.

22
  • Consider first some grammatical patterns from
    morphology that support the minimize Fi
    principle
  • minimize the set Fi required for the
    assignment of a particular Pi or Pi

23
  • In Hawkins (2004) I formulated the following
    principle of form minimization based on parallel
    data from cross-linguistic variation and
    language-internal selection patterns.

24
  • Minimize Forms (MiF)
  • The human processor prefers to minimize the
    formal complexity of each linguistic form F (its
    phoneme, morpheme, word or phrasal units) and the
    number of forms with unique conventionalized
    property assignments, thereby assigning more
    properties to fewer forms. These minimizations
    apply in proportion to the ease with which a
    given property P can be assigned in processing to
    a given F.

25
  • The basic premise of MiF is that the processing
    of linguistic forms and their conventionalized
    property assignments requires effort. Minimizing
    the forms required for property assignments is
    efficient since it reduces that effort by
    fine-tuning it to information that is already
    active in processing through accessibility, high
    frequency, and inferencing strategies of various
    kinds.

26
  • MiF is visible in two sets of variation data
    across and within languages.
  • The first involves complexity differences between
    surface forms (morphology and syntax), with
    preferences for minimal expression (e.g. zero
    morphemes) in proportion to their frequency of
    occurrence and hence ease of processing through
    degree of expectedness (cf. Levy 2008, Jaeger
    2006).

27
  • E.g. singular number for nouns is much more
    frequent than plural, absolutive case is more
    frequent than ergative.
  • Correspondingly singularity on nouns is expressed
    by shorter or equal morphemes, often zero (cf.
    English cat vs. cat-s), almost never by more.
    Similarly for absolutive and ergative case
    marking.

28
  • A second data pattern captured in MiF involves
    the number and nature of lexical and grammatical
    distinctions that languages conventionalize.
  • The preferences are again in proportion to their
    efficiency, including frequency of use.

29
  • There are preferred lexicalization patterns
    across languages.
  • Certain grammatical distinctions are
    cross-linguistically preferred
  • certain numbers on nouns
  • certain tenses
  • aspects
  • causativity
  • some basic speech act types
  • thematical roles like Agent, Patient
  • etc

30
  • The result is numerous hierarchies of lexical
    and grammatical patterns
  • E.g. the famous color term hierarchy of Berlin
    Kay (1969), and the Greenbergian morphological
    hierarchies

31
  • Where we have comparative performance and
    grammatical data for these hierarchies it is very
    clear that the grammatical rankings (e.g.
    Singular gt Plural) correspond to a frequency/ease
    of processing ranking, with higher positions
    receiving less or equal formal marking and more
    or equal unique forms for the expression of that
    category alone.

32
  • Form Minimization Prediction 1
  • The formal complexity of each F is reduced in
    proportion to the frequency of that F and/or the
    processing ease of assigning a given P to a
    reduced F (e.g. to zero).

33
  • The cross-linguistic effects of this can be seen
    in the following Greenbergian (1966)
    morphological hierarchies (with reformulations
    and revisions by the authors shown)

34
  • Sing gt Plur gt Dual gt Trial/Paucal (for
    number)
  • Greenberg 1966, Croft 2003
  • Nom/Abs gt Acc/Erg gt Dat gt Other (for case
    marking)
  • Primus 1999
  • Masc,Fem gt Neut (for gender) Hawkins 2004
  • Positive gt Comparative gt Superlative Greenberg
    1966

35
  • Greenberg pointed out that these grammatical
    hierarchies define performance frequency rankings
    for the relevant properties in each domain.
  • The frequencies of number inflections on nouns in
    a corpus of Sanskrit, for example, were
  •  
  • Singular 70.3 Plural 25.1 Dual
    4.6

36
  • By MiF Prediction 1 we therefore expect
  • For each hierarchy H the amount of formal marking
    (i.e. phonological and morphological complexity)
    will be greater or equal down each hierarchy
    position.

37
  • E.g. in (Austronesian) Manam
  • 3rd Singular suffix on nouns 0
  • 3rd Plural suffix -di,
  • 3rd Dual suffix -di-a-ru
  • 3rd Paucal -di-a-to (Lichtenberk
    1983)
  • The amount of formal marking increases from
    singular to plural, and from plural to dual, and
    is equal from dual to paucal, in accordance with
    the hierarchy prediction.

38
  • Form Minimization Prediction 2
  • The number of unique FP pairings in a language
    is reduced by grammaticalizing or lexicalizing a
    given FP in proportion to the frequency and
    preferred expressiveness of that P in performance.

39
  • In the lexicon the property associated with
    teacher is frequently used in performance, that
    of teacher who is late for class much less so.
    The event of X hitting Y is frequently selected,
    that of X hitting Y with Xs right hand less so.
  • The more frequently selected properties are
    conventionalized in single lexemes or unique
    categories and constructions. Less frequently
    used properties must then be expressed through
    word and phrase combinations and their meanings
    must be derived by semantic composition.

40
  • This makes the expression of more frequently used
    meanings shorter, that of less frequently used
    meanings longer, and this pattern matches the
    first pattern of less versus more complexity in
    the surface forms themselves correlating with
    relative frequency.
  • Both patterns make utterances shorter and the
    communication of meanings more efficient overall,
    which is why I have collapsed them both into one
    common Minimize Forms principle.

41
  • By MiF Prediction 2 we expect
  • For each hierarchy H (A gt B gt C) if a language
    assigns at least one morpheme uniquely to C, then
    it assigns at least one uniquely to B if it
    assigns at least one uniquely to B, it does so to
    A.

42
  • E.g.a distinct Dual implies a distinct Plural and
    Singular in the grammar of Sanskrit.
  • A distinct Dative implies a distinct Accusative
    and Nominative in the case grammar of Latin and
    German
  • (or a distinct Ergative and Absolutive in Basque,
    cf. Primus 1999).

43
  • A unique number or case assignment low in the
    hierarchy implies unique and differentiated
    numbers and cases in all higher positions.

44
  • I.e. grammars prioritize categories for unique
    formal expression in each of these areas in
    proportion to their relative frequency and
    preferred expressiveness.
  • This results in these hierarchies for
    conventionalized categories whereby languages
    with fewer categories match the performance
    frequency rankings of languages with many.

45
  • By MiF Prediction 2 we also expect
  • For each hierarchy H any combinatorial features
    that partition references to a given position on
    H will result in fewer or equal morphological
    distinctions down each lower position of H.

46
  • E.g. when gender features combine with and
    partition number, unique gender-distinctive
    pronouns often exist for the singular and not for
    the plural
  • English he/she/it vs they
  • the reverse uniqueness is not found (i.e. with a
    gender-distinctive plural, but gender-neutral
    singular).

47
  • More generally MiF Prediction 2 leads to a
    general principle of cross-linguistic morphology
  •  
  • Morphologization
  • A morphological distinction will be
    grammaticalized in proportion to the performance
    frequency with which it can uniquely identify a
    given subset of entities E in a grammatical
    and/or semantic domain D.

48
  • This enables us to make sense of markedness
    reversals.
  • E.g. in certain nouns in Welsh whose referents
    are much more frequently plural than singular,
    like leaves and beans, it is the singular
    form that is morphologically more complex than
    the plural
  • deilen ("leaf") vs. dail ("leaves")
  • ffäen ("bean") vs. ffa ("beans")
  • Cf. Haspelmath (2002244).

49
  • All of these data provide support for our
    minimize Fi principle
  • minimize the set Fi required for the
    assignment of a particular Pi or Pi
  • I.e. minimize the number of linguistic forms that
    need to be processed at each point in order to
    assign a given morphological, syntactic or
    semantic property or set of properties to these
    forms on-line.

50
  • Either the surface forms of the morphology are
    reduced, in proportion to frequency and/or ease
    of processing.
  • Or lexical and grammatical categories are given
    priority for unique formal expression, in
    proportion to frequency and/or preferred
    expression, resulting in reduced morpheme and
    word combinations for their expression.

51
  • The result of both is more minimal forms in
    proportion to frequency/ease of
    processing/preferred expressiveness, i.e. fewer
    and shorter forms for the expression of the
    speakers preferred meanings in performance.

52
  • Consider now some patterns from syntax that
    support the minimize Fi principle
  • minimize the set Fi required for the
  • assignment of a particular Pi or Pi

53
  • In Hawkins (2004) I formulated a second
    minimization principle for the combination of
    forms and dependencies between them based on
    parallel data from cross-linguistic variation and
    language-internal selection patterns Minimize
    Domains (MiD).

54
  • Minimize Domains (MiD)
  • The human processor prefers to minimize the
    connected sequences of linguistic forms and their
    conventionally associated syntactic and semantic
    properties in which relations of combination
    and/or dependency are processed.

55
  • E.g. in order to recognize how the words of a
    sentence are grouped together into phrases and
    into a hierarchical tree structure the human
    parser prefers to access the smallest possible
    linear string of words that enable it to make
    each phrase structure decision
  • the principle of Early Immediate Constituents
    (EIC) (Hawkins 1994).

56
  • more generally the processing of all syntactic
    and semantic relations prefers minimal domains
    (Hawkins 2004).

57
  • Minimize Domains predicts that each Phrasal
    Combination Domain (PCD) should be as short as
    possible.
  • A PCD consists of the smallest amount of surface
    structure on the basis of which the human
    processor can recognize (and produce) a mother
    node M and assign the correct daughter ICs to it,
    i.e. on the basis of which phrase structure can
    be processed.

58
  • Some linear orderings reduce the number of words
    and their associated properties that need to be
    accessed for this purpose.
  • The degree of this preference is proportional to
    the minimization difference for the same PCDs in
    competing orderings.

59
  • I.e. linear orderings should be preferred that
    minimize PCDs by maximizing their IC-to-word
    ratios.
  • The result will be a preference for short before
    long phrases in head-initial languages like
    English.

60
  • (1) a. The man vpwaited pp1for his son
    pp2in the cold but not unpleasant wind
  • 1 2 3 4
    5
  • -----------------------------------
  • b. The man vpwaited pp2in the cold but not
    unpleasant wind pp1for his son
  • 1 2 3 4 5 6
    7 8 9
  • ------------------------------------------
    -----------------------
  • The three items, V, PP1, PP2 can be recognized
    and constructed on the basis of five words in
    (1a), compared with nine in (1b), assuming that
    (head) categories such as P immediately project
    to mother nodes such as PP, enabling the parser
    to construct them on-line.
  • (1a) VP PCD IC-to-word ratio of 3/5 60
  • (1b) ------------------------------------- 3/9
    33

61
  • For experimental support (in production and
    comprehension) for short before long effects in
    English, see e.g. Stallings (1998), Gibson
    (1998), Wasow (2002).

62
  • A Corpus Study Testing MiD in English
  • Structures like (1ab) with vpV, PP1, PP2 were
    examined (Hawkins 2000) in which the two PPs were
    permutable with truth-conditional equivalence
    (i.e. the speaker had a choice).
  • Only 15 (58/394) had long before short. Among
    those with at least a one-word weight difference,
    82 had short before long, and there was a
    gradual reduction in the long before short orders
    the bigger the weight difference (PPS shorter
    PP, PPL longer PP)

63
  • (2) PPL gt PPS by 1 word by 2-4
    by 5-6 by 7
  • V PPS PPL 60 (58) 86 (108)
    94 (31) 99 (68)
  • V PPL PPS 40 (38) 14 (17)
    6 (2) 1 (1)

64
  • For head-final languages long before short orders
    provide minimal domains for processing phrase
    structure
  • (3) a. Mary ga kinoo John ga
    kekkonsi-ta tos it-tavp
  • Mary SU yesterday John SU
    married that said,
  • Mary said that John got married yesterday
  • b. kinoo John ga kekkonsi-ta tos Mary
    ga it-tavp

65
  • Why?
  • Because placing longer before shorter phrases
    in Japanese positions constructing categories or
    heads (V, P, Comp, etc) close, or as close as
    possible, to each other, each being on the right
    of their respective phrasal sisters.
  • Result PCDs are smaller

66
  • (4) Some basic word orders of Japanese grammar
  • a. Taroo ga vptegami o kaita NP-V
  • T. SU letter DO wrote
  • 'Taroo wrote a letter'
  • b. Taroo ga ppTokyo kara ryokoosita NP-P
  • T. SU Tokyo from travelled
  • 'Taroo travelled from Tokyo'
  • c. npTaroo no ie Gen-N
  • Taroo 's house
  • The heavier phrasal categories, e.g. NPs, occur
    to the left of their single-word (shorter) heads
    in Japanese, e.g. before V and P, and P and V are
    adjacent on the right of their respective sisters

67
  • For experimental and corpus support for long
    before short phrases in Japanese and Korean when
    there is a plurality of phrases before V, see
    Hawkins (1994, 2004), Yamashita Chang (2001,
    2006), Choi (2007)

68
  • An early corpus study testing long before short
    in Japanese (Hawkins 1994)
  • NPo, PPm V
  • (5) a. (Tanaka ga) Hanako karapp sono
    hon onp kattavp
  • Tanaka SU Hanako from that
    book DO bought,
  • 'Tanako bought that book from Hanako'
  • b. (Tanaka ga) sono hon onp Hanako
    karapp kattavp

69
  • ICS shorter Immediate Constituent ICL
    longer Immediate Constituent regardless of NP
    or PP status
  • (6) ICLgtICS by 1-2 words by 3-4 by
    5-8 by 9
  • ICS ICL V 34 (30) 28 (8) 17 (4) 9
    (1)
  • ICL ICS V 66 (59) 72 (21) 83 (20) 91
    (10)
  • Data from Hawkins (1994152), collected by Kaoru
    Horie.
  • I.e. the bigger the weight difference, the more
    the heavy phrase occurs to the left the
    mirror-image of English

70
  • Given these data from performance, we can now
    better understand
  • (a) the Greenbergian word order correlations
  • (b) why there are two, and only two, productive
    word order types cross-linguistically,
    head-initial and head-final
  • (c) why and when there are exceptional
    departures from the expected head-initial and
    head-final orders

71
  • The "Greenbergian" word order correlations
    (Greenberg 1963, Dryer 1992)
  • (7) vpV, ppP, NP
  • a. vptravels ppto the city b. the
    city topp travelsvp
  • -------- --------
  • c. vptravels the city topp d. ppto the
    city travelsvp
  • ------------------
    -------------------
  • The adjacency of V and P guarantees the smallest
    possible string of words for the recognition and
    cnstruction of VP and its two constituents (V and
    PP), see the underlinings.

72
  • Language Quantities in Matthew Dryer's (1992)
    Cross-linguistic Sample
  • (8) a. vpV ppP NP 161 (41) b. NP
    Ppp Vvp 204 (52)
  • c. vpV NP Ppp 18 (5) d. ppP NP
    Vvp 6 (2)
  • Preferred (a)(b) with consistent head ordering
    365/389 (94)

73
  • Both head-initial (English) and head-final
    (Japanese) orders can be equally efficient for
    processing whether heads are adjacent to one
    another on the left of their respective sisters
    (English), or on the right (Japanese),
  • hence two and only two highly word order
    productive types, as predicted by MiD

74
  • MiD helps us to understand these
    cross-linguistic patterns and their frequencies.
    It also enables us to explain some systematic
    grammatical exceptions to these head-ordering
    universals.

75
  • Dryer (1992) there are exceptions to the
    preferred consistent head ordering when the
    category that modifies a head is a single-word
    item, e.g. an adjective modifying a noun (yellow
    book).

76
  • Many otherwise head-initial languages have
    non-initial heads with the adjective preceding
    the noun here (e.g. English), many otherwise
    head-final languages have noun before adjective
    (e.g. Basque).
  • BUT when the non-head is a branching phrasal
    category (e.g. adjective phrase, cf. English
    books yellow with age) there are good
    correlations with the predominant head ordering.
  • Why?

77
  • When heads are separated by a non-branching
    single word, then the difference between, say,
  • vpV Adj Nnp and vpV npN Adj
  • read yellow book read book
    yellow
  • is short, only one word. Hence the MiD
    preference for noun initiality (and for
    noun-finality in postpositional languages) is
    significantly less than it is for intervening
    branching phrases, and either less head ordering
    consistency or no consistency is predicted

78
  • English yellow book but book yellow with
    age
  • Romance languages have both prenominal and
    postnominal adjectives
  • French grand homme / homme grand
  • but postnominal adjective phrases like English

79
  • Similarly, when there is just a one-word
    difference between competing domains in
    performance, e.g. in the corpus data of English
    and Japanese above, both ordering options are
    generally productive, and so too in grammars.

80
  • Center embedding hierarchies and EIC
  • The more complex a center-embedded constituent
    and the longer the PCD for its containing phrase,
    the fewer languages.
  • E.g. in the environment ppP np__ N we have a
    center-embedding hierarchy, cf. Hawkins (1983).
  • (9) Prep lgs AdjN 32 NAdj 68
  • PosspN 12 NPossp 88 RelN
    1 NRel 99
  • Mary traveled ppto npinteresting
    cities AdjN
  • npthis countrys cities PosspN
  • npI already visited cities RelN

81
  • I.e. The Greenbergian word order universals
    support domain minimization and locality (Hawkins
    2004, Gibson 1998).
  • There are minor and predicted departures from
    consistent ordering and head adjacency, as we
    have seen.
  • There are also certain conflicts between MiD and
    other ease of processing principles, e.g. Fillers
    before Gaps, which result in e.g. NRel in certain
    (non-rigid) OV languages (Hawkins 2004, to
    appear).

82
  • Apart from these, I see no evidence in grammars
    for any preference for non-locality of the kind
    that certain psycholinguists have argued for
    based on experimental evidence with head-final
    languages (e.g. Konieczny 2000, Vasishth Lewis
    2006).
  • E.g. Konieczny showed in a self-paced reading
    experiment in German that the verb is read
    systematically faster when a NRel precedes it, in
    proportion to the length of Rel.

83
  • This finding makes sense in terms of expectedness
    and predictability (Levy 2008, Jaeger 2006) the
    longer you have to wait for a verb in a
    verb-final structure, the more you expect to find
    one, making verb recognition easier.
  • However, Konieczny found no evidence for this
    facilitation at the verb in his German corpus
    data (Uszkoreit et al. 1998). Instead the
    predictions made for the relevant structures by
    MiD and locality were strongly confirmed.

84
  • In fact, corpus studies quite generally do not
    support non-locality none of the data from
    numerous typologically diverse language corpora
    reported in Hawkins (1994, 2004) support it.

85
  • Nor do word order universals support it. The
    Greenbergian correlations strongly support
    locality, and the exceptions to Greenberg involve
    either small single-word non-localities or
    competitions with independently motivated
    preferences that do produce some non-localities
    in certain language types but not because
    non-locality is a good thing!

86
  • The experimental evidence for greater ease of
    processing at the verb appears to be evidence,
    therefore, for a certain facilitation (arguably
    through predictability) at a single temporal
    point in sentence processing it tell us nothing,
    about processing load for the structure as a
    whole, and it does not implicate any preference
    for non-locality as such.

87
  • Corpus data appear to reflect these overall
    processing advantages for alternative structures
    within which the verb may appear early or late.
    The predictions for these alternations are based
    squarely on the preferred locality of phrasal
    daughters and these predictions are empirically
    correct (Konieczny 2000, Uszkoreit et al. 1998).
    Non-locality arises only when the locality
    demands of two phrases are in conflict and cannot
    be satisfied at the same time.
  • E.g. if N is adjacent to its Rel in German, then
    N is separated from a final V.

88
  • Grammars also support locality in word order
    universals and provide no evidence for
    non-locality as an independent factor.
  • Let us turn now to relative clauses and look at
    the cross-linguistic evidence for form and domain
    minimization in this area.

89
  • Relative clauses in many languages (e.g. Hebrew)
    exhibit both a 'gap' and a 'resumptive pronoun'
    structure
  • (10) a. the studentsi that I teach Oi Gap
  • b. the studentsi that I teach
    themi Resumptive Pronoun
  • In English we find relative clauses with and
    without a relative pronoun
  • (11) a. the studentsi whomi I teach
    Oi Relative Pronoun
  • b. the studentsi Oi I teach Oi Zero
    Relative

90
  • Patterns in Performance
  • The retention of the relative pronoun in English
    is correlated, inter alia,
  • with the degree of separation of the relative
    clause from its head noun
  • the bigger the separation, the more the rel pros
    are retained (Quirk
  • 1957, Hawkins 2004153).

91
  • (12) a. the studentsi whomi I teach Oi
    visited me
  • b. the studentsi Oi I teach Oi
    visited me
  • (13) a. the studentsi (from Denmark) whomi I
    teach Oi visited me
  • b. the studentsi (from Denmark) Oi I
    teach Oi visited me
  • (14) a. the studentsi (from Denmark) visited
    me whomi I teach Oi
  • b. the studentsi (from Denmark)
    visited me Oi I teach Oi
  • (12a) Rel Pro 60 (12b) Zero Rel 40
  • (13a) Rel Pro 94 (13b) Zero Rel 6
  • (14a) Rel Pro 99 (14b) Zero Rel 1

92
  • The Hebrew gap is favored when the distance
    between head and gap is small, cf. Ariel (1999)
  • (15) a. Shoshana hi ha-ishai she-nili
    ohevet Oi Gap Shoshana is the-woman
    that-Nili loves
  • b. Shoshana hi ha-ishai she-nili
    ohevet otai Res Pro
  • that-Nili loves her
  • (15a) Gap 91 (15b) Res Pro 9

93
  • Resumptive pronouns in Hebrew become more
    frequent in more complex relatives with bigger
    distances between the head and the position
    relativized on, as in (16b)
  • (16) a. Shoshana hi ha-ishai she-dani siper
    she-moshe rixel she-nili ohevet Oi
  • b. Shoshana hi ha-ishai she-dani siper
    she-moshe rixel she-nili ohevet otai
  • Shoshana is the-woman that-Danny said
    that-Moshe gossiped that-Nili loves (her)
  • For just 3 words separating head and position
    relativized on (i.e. gap or resumptive pronoun),
    many more pronouns, Ariel (1999)
  • (16a) Gap 58 (16b) Res Pro 42

94
  • Relative clauses with larger domains are more
    complex and harder to process. The harder to
    process relatives have the less minimal and more
    explicit form, in accordance with our minimize
    Fi principle above.

95
  • Specifically, the explicit resumptive pronoun
    makes the relative easier to process because the
    position relativized on is now explicitly
    signaled and flagged, in contrast to the zero
    gap, and because the explicit pronoun shortens
    various domains for processing combinatorial and
    dependency relations within the relative clause
    (these processes must otherwise access the head
    noun itself), cf. Hawkins (2004)

96
  • A Cross-linguistic Universal the Accessibility
    Hierarchy
  • Keenan Comrie (1977) proposed an Accessibility
    Hierarchy (AH) for universal rules of
    relativization on different structural positions
    within a clause
  • Subjects gt Direct Objects gt Indirect
    Objects/Obliques gt Genitives
  • (17) a. the professori that Oi/hei wrote the
    letter SU
  • b. the professori that the student knows
    Oi/himi DO
  • c. the professori that the student
    showed the book to Oi/himi IO/OBL
  • d. the professori that the student knows
    Oi/hisi son GEN

97
  • Relative clauses "cut off" (may cease to apply)
    down AH, cf. (18) if a language can form a
    relative clause on any low position, it can
    (generally) relativize on all higher positions.
  • (18) SU only Malagasy, Maori
  • SU DO only Kinyarwanda,
    Indonesian
  • SU DO IO/OBL only Basque, Catalan
  • SU DO IO/OBL GEN English, Hausa
  • (19) ny mpianatrai izay nahita ny vehivavy Oi
    (Malagasy)
  • the student that saw the woman
  • 'the student that saw the woman' (NOT the
    student that the woman saw)

98
  • Distribution of gaps to resumptive pronouns
    across languages also follows the AH with gaps
    higher and pronouns lower
  • If a gap occurs low on the hierarchy, it occurs
    all the way up if a pronoun occurs high, it
    occurs all the way down.

99
  • Languages Combining Gaps with Resumptive Pronouns
  • (data from Keenan-Comrie 1977)
  • SU DO IO/OBL GEN
  • Aoban gap pro pro pro
  • Arabic gap pro pro pro
  • Gilbertese gap pro pro pro
  • Kera gap pro pro pro
  • Chinese (Peking) gap gap/pro pro pro
  • Genoese gap gap/pro pro pro
  • Hebrew gap gap/pro pro pro
  • Persian gap gap/pro pro pro
  • Tongan gap gap/pro pro pro
  • Fulani gap gap pro pro
  • Greek gap gap pro pro
  • Welsh gap gap pro pro
  • Zurich German gap gap pro pro
  • Toba Batak gap pro pro

100
  • Keenan-Comrie argued that these grammatical
    patterns were ultimately explainable by declining
    ease of processing down the AH
  • They hypothesized that the AH was a complexity
    ranking
  • Cf. Hawkins 1999, 2004177-190, to appear for
    elaboration in terms of Minimize Forms and
    Minimize Domains

101
  • Keenan (1987) gave data from English corpora
    showing declining frequencies of relative clause
    usage correlating with the AH positions
    relativized on

102
  • Experimental evidence for SU gt (easier than) DO
    relativization (English)
  • Wanner Maratsos (1978) first pointed to
    greater processing load for DO rels
  • Ford (1983) longer lexical decision times in DO
    rels
  • King Just (1991) lower comprehension accuracy
    and longer lexical decision times in self-paced
    reading experiments
  • Pickering Shillcock (1992) significant
    reaction time differences in self-paced reading
    experiments, both within and across clause
    boundaries (i.e. for embedded and non-embedded
    gap positions)
  • King Kutas (1992, 1993) neurolinguistic
    support using ERPs
  • Traxler et al (2002) eye movement study
    controlling also for agency and animacy
  • Frauenfelder et al (1980) and Holmes O'Regan
    (1981) similar (SU gt DO) results for French
  • Kwon et al (2010) for an eye-tracking study of
    Korean and a recent literature review of the
    SU/DO asymmetry in English and other lgs

103
  • Let us take stock
  • We see in these studies a clear correlation
    between performance data measuring preferred
    selections in corpora and ease of processing in
    experiments, on the one hand, and the fixed
    conventions of grammars in languages with fewer
    options
  • ? SU relatives have been shown to be easier to
    process than DO in English and certain other lgs
    - correspondingly lgs like Malagasy only have the
    SU option
  • ? the distribution of resumptive pronouns to
    gaps across grammars follows the AH ranking, with
    pronouns in the more difficult environments, and
    gaps in the easier ones this reverse
    implicational hierarchy appears to be structured
    by ease of processing

104
  • All of these data, morphological and syntactic,
    support minimize Fi, in proportion to the ease
    with which a given property Pi can be assigned in
    processing to a given Fi.

105
  • Let is turn now to our second principle of
    Information Density, maximize Pi.
  • maximize the set Pi that can be
  • assigned to a particular Fi or Fi.

106
  • In Hawkins (2004) I argued for a further very
    general principle of efficiency, in addition to
    Minimize Forms and Minimize Domains Maximimize
    On-line Processing.
  • There is a clear preference for selecting and
    arranging linguistic forms so as to provide the
    earliest possible access to as much of the
    ultimate syntactic and semantic representation as
    possible.

107
  • This principle also results in a preference for
    error-free on-line processing since errors delay
    the assignment of intended properties and
    increase processing effort.

108
  • Maximize On-line Processing (MaOP)
  • The human processor prefers to maximize the set
    of properties that are assignable to each item X
    as X is processed, thereby increasing O(n-line)
    P(roperty) to U(ltimate) P(roperty) ratios. The
    maximization difference between competing orders
    and structures will be a function of the number
    of properties that are unassigned or misassigned
    to X in a structure/sequence S, compared with the
    number in an alternative.

109
  • Clear examples can be seen across languages when
    certain common categories A, B are ordered
    asymmetrically A B, regardless of the language
    type, in contrast to symmetries in which both
    orders are productive AB/BA, e.g. VerbObject
    VO and ObjectVerb OV.
  • Some examples of asymmetries are summarized
    below

110
  • Some Asymmetries (Hawkins 2002, 2004)
  • (i) Displaced WH preposed to the left of its
    (gap-containing) clause
  • almost exceptionless
  • Whoi did you say Oi came to the party
  • (ii) Head Noun (Filler) to the left of its
    (gap-containing) Relative Clause
  • E.g. the studentsi that I teach Oi
  • If a lg has basic VO, then NRel exceptions
    rare (Hawkins 1983)
  • VO OV
  • NRel (English) NRel (Persian)
  • RelN RelN (Japanese)

111
  • (iii) Antecedent precedes Anaphor highly
    preferred cross-linguistically
  • E.g. John washed himself (SVO), Washed John
    himself (VSO), John himself washed (SOV)
    highly preferred over e.g. Washed himself John
    (VOS)
  • (iv) Wide Scope Quantifier/Operator precedes
    Narrow Scope Q/O preferred
  • E.g. Every student a book read (SOV lgs) ??
    preferred
  • A book every student read (SOV lgs)
    ?? preferred

112
  • In these examples there is an asymmetric
    dependency of B on A the gap is dependent on
    the head-noun filler in (ii) (for gap-filling),
    the anaphor on its antecedent in (iii) (for
    co-indexation), the narrow scope quantifier on
    the wide scope quantifier in (iv) (the number of
    books read depends on the quantifier in the
    subject NP in Every student read a book/Many
    students read a book/Three students read a book,
    etc).

113
  • The assignment of dependent properties to B is
    more efficient when A precedes, since these
    properties can be assigned to B immediately in
    on-line processing. In the reverse B A there
    will be delays in property assignments on-line
    ("unassignments") or misanalyses
    ("misassignments").
  • If the relative clause precedes the head noun the
    gap is not immediately recognized and there are
    delays in argument structure assignment within
    the relative clause if a narrow scope
    quantifier precedes a wide scope quantifier, a
    wide scope interpretation will generally be
    (mis)assigned on-line to the narrow scope
    quantifier and so on.

114
  • I have argued that MaOP (in the form of Fillers
    before Gaps) competes with Minimize Domains to
    give asymmetries in relative clause ordering
  • a head before relative clause preference is
    visible in both VO and OV languages, with only
    rigid V-final languages resisting this preference
    to any degree (Hawkins 2004203-10).

115
  • MiD MaOP
  • VO NRel
  • VO RelN - -
  • OV RelN -
  • OV NRel -

116
  • WALS data (Dryer 2005ab)
  • Rel-Noun Noun-Rel or Mixed/Other
  • Rigid SOV 50 (17) 50 (17)
  • Non-rigid SOV 0 (0) 100 (17)
  • VO 3 (3) 97 (116)

117
  • Language Variation in Psycholinguistics
  • What this all means for psycholinguistics is
    that grammatical patterns and rules provide data
    that can inform language processing theories
    (Hawkins 2007, Jaeger Norcliffe 2009).
  • Conversely, processing can help us understand
    grammars better.

118
  • We can now give an explanation for what has been
    simply observed and stipulated so far in
    grammatical models, e.g. the existence of a head
    ordering parameter, with head-initial (VO) and
    head-final (OV) lgs being roughly equally
    productive
  • they are equally efficient for processing
    whether adjacent heads occur on the left of their
    sisters (English), or on the right (Japanese).

119
  • Performance data motivate the Accessibility
    Hierarchy for relative clause formation, the
    cut-offs for relativization, the reverse
    implicational patterns for gaps and resumptive
    pronouns, and numerous other regularities and
    language-particular subtleties (Hawkins 1999,
    2004, to appear).

120
  • This approach helps us understand exceptions to
    proposed universals (involving e.g. differential
    ordering for single-word versus phrasal modifiers
    of heads).
  • I.e. linguists can benefit from the inclusion of
    processing ideas in their theories and
    descriptions.

121
  • The leftward versus rightward movement of heavy
    phrases in different language types is directly
    relevant for processing theories, on the other
    hand (cf. the theory of de Smedt 1994 which
    predicts only rightward movements).
  • As is the absence of any independent evidence for
    anti-locality in any word order universals.

122
  • For theories of information density we have seen
    lots of cross-linguistic patterns and hierarchies
    in morphology and syntax that support two
    complementary principles
  • minimize Fi and maximize Pi

123
  • Minimize Fi
  • minimize the set Fi required for the
  • assignment of a particular Pi or Pi
  • in proportion to the processing ease with which
    each Pi can be assigned.

124
  • Maximize Pi
  • Maximize the set Pi that can be
  • assigned to a particular Fi or Fi
  • at each point in on-line processing.

125
  • References
  • Ariel, M. (1999) 'Cognitive universals and
    linguistic conventions The case of resumptive
    pronouns', Studies in Language 23217-269.
  • Choi, H.W. (2007) Length and order A corpus
    study of Korean dative-accusative construction,
    Discourse and Cognition 14 207-27.
  • Croft, W. (1990) Typology and Universals, CUP,
    Cambridge.
  • de Smedt, K.J.M.J. (1994) 'Parallelism in
    incremental sentence generation', in G. Adriens
    U. Hahn, eds., Parallelism in Natural Language
    Processing, Ablex, Norwood, NJ.
  • Dryer, M.S. (1992) 'The Greenbergian word order
    correlations', Language 68 81-138.
  • Dryer, M.S. (2005a) Order of relative clause and
    noun, in M. Haspelmath, M.S. Dryer, D. Gil B.
    Comrie, eds., The World Atlas of Language
    Structures, OUP, Oxford.
  • Dryer, M.S. (2005b) Relationship between the
    order of object and verb and the order of
    relative clause and noun, in M. Haspelmath, M.S.
    Dryer, D. Gil B. Comrie, eds., The World Atlas
    of Language Structures, OUP, Oxford.
  • Ford, M. (1983) 'A method of obtaining measures
    of local parsing complexity throughout
    sentences', Journal of Verbal Learning and Verbal
    Behavior 22 203-218.
  • Gibson, E. (1998) 'Linguistic complexity
    Locality of syntactic dependencies', Cognition
    68 1-76.
  • Greenberg, J.H. (1963) 'Some universals of
    grammar with particular reference to the order of
    meaningful elements', in J.H. Greenberg, ed.,
    Universals of Language, MIT Press, Cambridge,
    Mass..
  • Greenberg, J.H. (1966) Language Universals with
    Special Reference to Feature Hierarchies, Mouton,
    The Hague.
  • Haspelmath, M. (2002) Morphology, Arnold, London.
  • Hawkins, J.A. (1983) Word Order Universals,
    Academic Press, New York.

126
  • Hawkins, J.A. (1994) A Performance Theory of
    Order and Constituency, CUP, Cambridge.
  • Hawkins, J.A. (1999) 'Processing complexity and
    filler-gap dependencies', Language 75 244-285
  • Hawkins, J.A. (2000) 'The relative ordering of
    prepositional phrases in English Going beyond
    manner-place-time', Language Variation and Change
    11 231-266.
  • Hawkins, J.A. (2004) Efficiency and Complexity in
    Grammars, OUP, Oxford.
  • Hawkins, J.A. (2007) Processing typology and why
    psychologists need to know about it, New Ideas
    in Psychology 25 87-107.
  • Hawkins, J.A. (2009) Language universals and the
    performance-grammar correspondence hypothesis,
    in M.H. Christiansen, C. Collins S. Edelman,
    eds., Language Universals, OUP, Oxford, 54-78.
  • Hawkins, J.A. (to appear) Cross-linguistic
    Variation and Efficiency, OUP, Oxford.
  • Holmes, V.M. O'Regan, J.K. (1981) 'Eye fixation
    patterns during the reading of relative clause
    sentences', Journal of Verbal Learning and Verbal
    Behavior 20 417-430.
  • Jaeger, T.F. (2006) Redundancy and syntactic
    reduction in spontaneous speech, Unpublished PhD
    dissertation, Stanford University, Stanford, CA.
  • Jaeger, T.F. Norcliffe, E. (2009) The
    cross-linguistic study of sentence production
    State of the art and a call for action, Language
    and Linguistics Compass, Blackwell.
  • Just, M.A. Carpenter, P.A. (1992) 'A capacity
    theory of comprehension Individual differences
    in working memory', Psychological Review
    99122-49.
  • Keenan, E.L. (1987) Variation in Universal
    Grammar, in E.L. Keenan Universal Grammar 15
    Essays, Croom Helm, London, 46-59.

127
  • Keenan, E.L. Hawkins, S. (1987) 'The
    psychological validity of the Accessibility
    Hierarchy', in E.L. Keenan, Universal Grammar 15
    Essays, Croom Helm, London.
  • King, J. Just, M.A. (1991) 'Individual
    differences in syntactic processing The role of
    working memory', Journal of Memory and Language
    30 580-602.
  • King, J. Kutas, M. (1992) 'ERP responses to
    sentences that vary in syntactic complexity
    Differences between good and poor comprehenders',
    Poster, Annual Conference of the Society for
    Psychophysiological Research, San Diego, CA.
  • King, J. Kutas, M. (1993) 'Bridging gaps with
    longer spans Enhancing ERP studies of parsing',
    Poster presented at the Sixth Annual CUNY
    Sentence Processing Conference, University of
    Massachusetts, Amherst.
  • Konieczny, L. (2000) Locality and parsing
    complexity, Journal of Psycholinguistic Research
    29(6) 627-645.
  • Kwon, N., Gordon, P.C., Lee, Y., Kluender, R.
    Polinsky, M. (2010) Cognitive and linguistic
    factors affecting subject/object asymmetry An
    eye-tracking study of prenominal relative clauses
    in Korean, Language 86 546-82.
  • Levy, R. (2008) Expectation-based syntactic
    comprehension, Cognition 106 1126-1177.
  • Lichtenberk, F. (1983) A Grammar of Manam,
    University of Hawaii Press, Honolulu.
  • Primus, B. (1999) Cases and Thematic Roles, Max
    Niemeyer Verlag, Tuebingen.
  • Quirk (1957) 'Relative clauses in educated spoken
    English', English Studies 38 97-109.
  • Keenan, E.L. Comrie, B. (1977) 'Noun phrase
    accessibility and Universal Grammar', Linguistic
    Inquiry 8 63-99.
  • Stallings, L. M. (1998) 'Evaluating Heaviness
    Relative Weight in the Spoken Production of
    Heavy-NP Shift', Ph.D. dissertation, University
    of Southern California.
  • Traxler, M.J., Morris, R.K. Seeley, R.E. (2002)
    Processing subject and object relative clauses
    Evidence from eye movements, Journal of Memory
    and Language 47 69-90.

128
  • Uszkoreit, H., Brants, T., Duchier, D., Krenn,
    B., Konieczny, L., Oepen, S. and Skut, W. (1998)
    Studien zur performanzorientierten Linguistik
    Aspekte der Relativsatzextraposition im
    Deutschen, Kognitionswissenschaft 7 129-133.
  • Vasishth, S Lewis, R. (2006) Argument-head
    distance and processing complexity Explaining
    both locality and anti-locality effects,
    Language 82 767-794.
  • Wanner, E. Maratsos, M. (1978) 'An ATN approach
    to comprehension', in M. Halle, J. Bresnan G.A.
    Miller, eds., Linguistic Theory and Psychological
    Reality, MIT Press, Cambridge, Mass., 119-161.
  • Wasow, T. (2002) Postverbal Behavior, CSLI
    Publications, Stanford University, Stanford.
  • Yamashita, H. Chang, F. (2001) '"Long before
    short" preference in the production of a
    head-final language', Cognition, 81 B45-B55.
  • Yamashita, H. Chang, F. (2006) Sentence
    production in Japanese, in M. Nakayama, R.
    Mazuka Y. Shirai, eds., Handbook of East Asian
    Psycholinguistics, Vol.2, CUP, Cambridge.

129
  • Acknowledgements
  • Special thanks to the many collaborators and
    contributors to this research program as
    presented here, especially
  • Gontzal Aldai Barbara Jansing
  • Bernard Comrie Stephen
    Matthews
  • Gisbert Fanselow Fritz Newmeyer
  • Luna Filipovic Beatrice Primus
  • Kaoru Horie Anna Siewierska
  • Ed Keenan Lynne Stallings
  • Lewis Lawyer Tom Wasow

130
  • Financial Support
  • has been received from the following sources
    for the research reported here and is gratefully
    acknowledged
  • German National Science Foundation fellowship
    (DFG grant INK 12/A1)
  • European Science Foundation small grant
  • Max Planck Institute for Evolutionary
    Anthropology (Leipzig) research fellowships
    2000-04
  • University of California Davis research funds
  • University of Cambridge Research Centre for
    English and Applied Linguistics research funds
    and UCD teaching buy-outs 2007-10
Write a Comment
User Comments (0)
About PowerShow.com