Title: Verb compounds within canonical typology: Chinese separable verb compounds
1Verb compounds within canonical typology
Chinese separable verb compounds
- Anna Siewierska
- Jiajin Xu
- Richard Xiao
2Overview of the talk
1
Separable verb compounds (SVCs)
2
Canonical typological strategy
3
A case study of SVCs in Mandarin
2
3Separable verb compounds
- Some languages have verb compounds which are made
up of two parts, a verbal stem and a movable
element standing before or after the verb in
adjacency or close proximity - Different terms in the literature
- separable verb compounds, split words, separable
verbs, ionised words, discontinuous / detachable
/ breakable / discrete words, etc
3
3
4An example of Chinese SVC
- dan1xin1, lit. carry heart, to worry
- dan1-le yi1 shang4wu3 xin1, carry ASP one morning
heart, to be worried the whole morning - xin1 yi4zhi2 dan1-zhe, heart all the time carry
ASP, to have been worried all the time
4
5Sound similar?
- Derivation by infixing (e.g. abso-fucking-lutely)
and syntactic interposing (e.g. of bloody course)
in English - Separable complex verbs in Dutch (aankomen
arrive) and German (ankommen arrive) - But Chinese SVCs are
5
5
6essentially different
- 1) Insertions in English infixing and interposing
- Almost exclusively restricted to expletives,
euphemisms, and amplifiers - Acting as an emotive intensifier
- In contrast, discontinuous use of Chinese SVCs
has a greater variety of insertions and discourse
/ pragmatic functions - Insertions as head / tail satellites aspect
markers, RVCs, quantifiers, classifiers,
modifiers, etc - Providing extra information
- Acting as a mitigator / softener
- Showing casualness
- Expressing negative emotions such as disapproval
- Enhancing rhythm important in a syllable-timed
language like Chinese
6
7essentially different
- 2) A significant difference between SVCs in
Mandarin and the split prefix phenomenon in Dutch
(e.g. binnenkomen, to come in) and German (e.g.
abfahren, to drive off/depart) - Chinese SVCs are not words with a separable affix
- E.g. dan1xin1 worry
- V O
7
8essentially different
- 3) SVCs in Dutch and German can have a wide range
of constituents of all types as insertions,
including complex NGs and subordinate clauses as
in the example below - A Dutch example of opbellen ring up
- Ik bel op
- Ik bel hem op
- I ring him up
- Ik bel hem morgen op
- I ring him tomorrow up
- Ik bel de man waarvan ik houd op
- I ring the man that I love up
- ...which is completely impossible in Chinese
8
8
9Why are SVCs interesting?
- 1) SVCs are a large class of verbs in Chinese
which cannot be marginalised - 2) They satisfy none of the universal criteria
for wordhood (Dixon and Aikhenvald 2002 19-20) - A grammatical word consists of a number of
grammatical elements which (a) always occur
together, rather than scattered through the
clause (the criterion of cohesiveness) (b) occur
in a fixed order (c) have a conventionalised
coherence and meaning - Criterion (c) means that speakers of the language
may talk about a word (but are unlikely to talk
about a morpheme)
9
10Why are SVCs interesting?
- 3) SVCs violate one of the most fundamental
principle of the theory of word formation - The Principle of Lexical Integrity Word-internal
structures are not accessible to rules of syntax
(Booij 1990 45) - 4) SVCs are listed as words, but they clearly
have some phrasal properties, thus straddling
the boundary of morphology and syntax - E.g. the analysable internal structures of
Chinese SVCs
10
11Canonical typology
- To study such fuzzy and cross-border grammatical
categories, canonical typology (CT) has proved to
be a useful strategy (cf. Bond 2007 Corbett
2007 Nikolaeva 2008), e.g. - Suppletive forms
- Agreement
- Negation
- Syncretism
11
12Standard strategy in typological research (Croft
2003 14)
- Determine the particular structure or situation
type of interest - Examine the morpho-syntactic construction(s) or
strategies used to encode that situation type - Search for dependencies between the constructions
used for that situation and other linguistic
factors - i.e. other structural features and external
functions expressed by the structure, or both
12
13Canonical typological approach
- Start with a linguistic phenomenon
- Establish a general definition for identifying
that linguistic category - Construct a set of features or criteria for the
typical (canonical) case of the category - Use the criteria to investigate the relevant
categories in languages
13
14Canonical typological approach
- Start with a linguistic phenomenon
- Establish a general definition for identifying
the linguistic category in question - Construct a set of features or criteria for the
canonical case of the category - Use the criteria to investigate the relevant
categories in languages
14
15How can corpora inform CT?
- In CT, the features are usually collected from
the literature - The collection could be selective, subjective and
arbitrary - Can the selection of features be more objective
and reliable? - We seek to answer this question from the corpus
linguistic perspective - The corpus-based approach makes it possible for
variational parameters of SVCs to be summarised
exhaustively and more objectively by looking at a
large amount of attested language use
simultaneously
15
16A case study of Chinese SVCs
- What are common types of insertions and external
patterns of discontinuous use of SVCs in
Mandarin? - How can canonical features be identified on the
basis of frequency? - How can the study of SVCs in Chinese contribute
to the research of similar phenomena in other
languages?
16
17Prevalence of SVCs in Mandarin
- The 2002 edition of the Modern Chinese Dictionary
includes 3,236 types of SVCs (Zhu 2006 29) - Four categories verb-object (97),
verb-complement, subject- predicate, and
coordinative - Given their prevalence, no grammar of Chinese can
turn a blind eye to the verb-object paradox
(Packard 2003 108)
17
18Corpora
- Two corpora are used in this study
- The Lancaster Corpus of Mandarin Chinese (LCMC)
for written Chinese - The Lancaster Los Angeles Corpus of Spoken
Chinese (LLSCC) for spoken Chinese - The LCMC is a balanced corpus of written Chinese
composed of one million words proportionally
sampled from fifteen genres ranging from news,
fiction to academic prose published in mainland
China around 1991 (see McEnery, Xiao Mo 2003)
18
19Corpora
- The LLSCC comprises one million words of
dialogues (55) and monologues (45) in Chinese,
covering both spontaneous (57) and scripted
(43) speech in six spoken genres - The two corpora are also tokenised and POS-tagged
- They provide an empirical basis for our
quantitative and qualitative analysis of SVCs in
Chinese
19
20Seed SVCs for data extraction
- A total of 1,738 commonly used SVCs listed in A
Dictionary of Split Word Usage in Modern Chinese
(Yang 1995) were used as seeds to automatically
extract all instances of possible SVCs
exhaustively when their the head and tail are
separated, in either forward or backward
direction, by a span of 1-10 words - 2793 raw concordance lines were extracted from
the two corpora
20
21Human evaluation and annotation
- Each concordance line was evaluated independently
by two native Chinese speakers in order to remove
noise in automatically extracted results - Only 565 true instances of discontinuous use of
SVCs are retained for further annotation and
analysis - Type of insertion, direction of separation, word
semantics, sentence semantics (i.e. pragmatic
meaning), sentence type, genre
21
22Syntagmatic pattern of SVCs
22
23Head satellites of SVCs
- Aspect insertion
- Expanded aspect insertion
- Note The ? slot can be filled or left blank
Pattern SVC types () SVC tokens ()
SVCH-le SVCT 42 (25) 74 (13)
SVCH-guo SVCT 15 (9) 22 (4)
SVCH-zhe SVCT 12 (7) 35 (6)
Total 69 (42) 131 (23)
Pattern SVC types() SVC tokens()
SVCH (?) ASP (?) SVCT 91 (55) 244 (43)
23
24Head satellites of SVCs
- RVC insertion
- Expanded RVC insertion
- hardly surprising given that RVCs can be
analysed as markers of the completive aspect in
Chinese (Xiao and McEnery 2004)
Pattern SVC type () SVC token ()
SVCH RVC SVCT 20 (12) 26 (5)
Pattern SVC types () SVC tokens ()
SVCH (?) RVC (?) SVCT 20 (12) 66 (12)
24
25Tail satellites of SVCs
- Classifier (CL)
- 21 (116 SVCs) contain a classifier
- Nominals in Mandarin are typically preceded by a
classifier - Quantifier (MC)
- 19 (108 SVCs) contain a quantifying construction
- Modifier (MOD), i.e. pre-modifiers of tails
- Possessive pronouns (64 times, 11)
- Adjectival modifiers (63 times, 11)
- Nominal items (59 times, 10)
- Question word (i.e. shen2me what, 26 times, 5)
- Also combinations of these elements
25
26SVC networkLexical and grammatical patterning
26
27Words or phrases?
- Synchronically, located somewhere on the
continuum between words and phrases (cf. Guo and
Qian 2004) - words SVCs idioms phrases
- Diachronically, wordhood subject to language
change - Many compound words in current use have evolved
from phrases (e.g. daoqian apologise, jugong
bow) - Givón (1971) Today's morphology is yesterday's
syntax. - Two criteria - depending on the type and number
morpheme(s) in the insertion - Over half of discontinuous use of SVCs in our
data (i.e. 54 if RVCs are seen as quasi-aspect
markers), together with their combined cognates,
can be analysed as legitimate compound words
27
28Two overarching criteria
- Structural criteria
- Host dependency
- Head dependence enjoys priority over tail
dependence - Phonological criteria
- PrWd restriction (Feng 2001, 2002)
- A disyllabic unit is the typical prosodic foot in
Chinese - A trisyllabic unit can also be a prosodic word
28
29Structural criteria
- According to the host dependency criteria of the
canonical typological approach - a) SVCs with a clitic-like aspect marker alone
are compounds rather than phrases - b) SVCs with an RVC attached to the head verb as
quasi-compounds - c) Other modifiers (classifiers, modifiers, etc)
attached to the tail (represented typically by a
object or complement) are least possible
compounds - Priority a gt b gt c
29
30Phonological criteria
- Various manifestations of SVCs define a continuum
of phonological conditions which complement the
structural criteria - a) The combined uses of head and tail are
disyllabic compounds - b) SVCs in which the head and tail are separated
by one single morpheme are possible compounds
under the Trisyllabic Foot Rule (TFR) of prosodic
morphology (McCarthy Prince 1993 1995) - c) The head and tail separated by polymorphemic
insertions like quantifiers, adjectival modifiers
etc are phrases - Priority a gt b gt c
30
31Conclusions
- We have used the corpus-based approach to
generalise canonical internal structures of
Chinese SVCs - The structural and phonological criteria we have
proposed work well to define wordhood of SVCs in
Mandarin - The approach combining canonical typology and
corpus methodology could also be useful in
research of similar phenomena in other languages
31
32- Thank you!
- Richard.Xiao_at_edgehill.ac.uk