Title: LING 406 Intro to Computational Linguistics Morphology and Computational Morphology
1LING 406Intro to Computational
LinguisticsMorphology and Computational
Morphology
- Richard Sproat
- URL http//catarina.ai.uiuc.edu/L406_08/
2This Lecture
- Overview of morphological phenomena
- and their computational implementation
3What is morphology?
- scripserunt is third person, plural, perfect,
active of scribo (I write) - Morphology relates word forms
- the lemma of scripserunt is scribo
- Morphology analyzes the structure of word forms
- scripserunt has the structure scribserunt
4What is morphology useful for?
- Information retrieval
- Machine translation
- Text-to-speech synthesis
- Word pronunciation often depends upon structure
- Staatsprotokoll would be pronounced with // for
the underlined ltsgt if it were part of the same
syllable as the following ltpgt - But it isnt because the ltsgt is a compound
linking morpheme
5Morphology is a relation
- Imagine you have a Latin morphological analyzer
comprising - D a relation that maps between surface form and
decomposed form - L a relation that maps between decomposed form
and lemma - Then
- scripserunt ? D scribserunt
- scripserunt ? D ? L scribo
6Other issues
- Inflectional vs. Derivational morphology
- Syntagmatic combination of elements versus
paradigmatic relationship between words
7String representation
8Basic theme of todays lecture
- All morphological operations can be handled by a
single finite-state operation - COMPOSITION
9Simple concatenation
- For example dogs dogs
- Typically this is dealt with using concatenation
- But it can also be handled using composition
- There is an efficiency difference, but these are
all offline processes
10Why do it this way?
- Concatenation is often accompanied by other
changes either to the stem or the affix or both,
or imposes requirements on the stem - Vowel harmony
- Stem prosody (see below)
- Other phonological changes
- These necessarily involve composition anyway (see
previous lectures)
11English regular plurals
- cat s cats /s/
- dog s dogs /z/
- spouse s spouses /?z/
- Assume one implements the relevant rule with a
transducer T
12Prosodic constraints English comparatives
13Comparative transducer
14Templatic affixes in Yowlumne
15CVC(C)
16CVCVV(C)
17CVCVV(C) continued
18-inay and -?aa
19CVCVV(C) transducer
20Affix-induced phonological changes
21Subsegmental morphology
22Another example
23Subtractive morphology
24Koasati truncation transducer
25Infixation two kinds
- Extrametrical infixation
- Positively circumscribed infixation
26Extrametrical infixation
- Infixation that can be viewed as ordinary
affixation, except that you ignore a piece of the
base you attach it to - That piece is said to be extrametrical
27Bontoc infixation
- Insert a marker gt after the first consonant (if
any) - Change gt into the infix um-
28Positively circumscribed infixation
29Ulwa marker introduction
30Another instance of prosodically circumscribed
infixation
Kalama
zoo
fg
31Root Pattern Morphology (McCarthy 1979)
k t b
32Implementing root and pattern morphology
33A note on practical implementations
- Practical implementations such as Buckwalters
(2002) analyzer usually sidestep all of this - Precompile all of the alternations that any
given root has
34Karttunen Beesleys Compile-Replace-Merge
- Malay reduplication (see below on reduplication)
- bagibagi bags
- Lexical form bagiNounPlural maps to
- a regular expression
- bagi2
- Which is then replaced with the compilation of
itself bagibagi
35Merge
- View root as filler for a template. So drs is
filler for CVVCVC - Merge operation walks down root and template in
parallel, attempting to find a match between root
and template
36Beesley Karttunens Solution
d V V r V S
d u u r i s
Surface form is a regular expression
37But this is equivalent to model based solely on
composition
38Other notable approaches
- Kiraz (2000) presents a multitape solution based
on earlier work of Kay (1988)
39Morphomic relations
- Aronoff (1994) introduces the rather odd term
morphomic to describe morphological phenomena
that are purely morphological
40The Latin 3rd stem
41So?
- 3rd stem is not morphologically uniform
- It differs across different verb classes and some
verbs have idiosyncratic third stems - It is not semantically coherent
- The forms that require the 3rd stem are a motley
crew - Yet there is clearly a notion of 3rd stem
- If you tell me the 3rd stem of a verb, I can tell
you how the agentive noun, the supine, the
perfect participle are formed - The 3rd stem has a purely morphological function
423rd stem is just prosodically induced affixation
- Assume we have a transducer T that forms the 3rd
stem of a verb - of course, T will have to allow for a lot of
idiosyncratic changes
S gt3ste S
43Reduplication
- Skip the discussion of paradigmatic variation
see 2.3 in RS - Reduplication
- Unbounded
- Bounded
44Unbounded Bambara Reduplication (Culy, 1985)
This is apparently beyond the power
of finite-state methods.
45Reduplication Gothic (Wright 1910)
- Prefix a syllable of the form (A)Cai to the stem,
where C is a consonant position and A is an
optional appendix - Copy the onset of the stem to the C position. If
there is a pre-onset appendix /s/, copy this to
the appendix position
46Factoring Reduplication
- Prosodic constraints
- Copy verification transducer C
47Gothic Index Transducer
48Factoring Reduplication
- Then reduplication in Gothic can be modeled as
- a o C
- More generally, one can model reduplication as
the following composition, where P implements the
prosodic constraints, C the copy constraints, and
A optional phonological adjustments - P o C o A
49Other Approaches
- Walther (2000a, 2000b) proposes a special kind of
transducer involving - Repeat arcs move backwards in a string and
repeat - Skip arcs skip over portions of the string
- Cohen-Sygal Wintner (forthcoming) introduce
finite state registered automata, extending FSAs
with registers - These methods generally seem to presume exact
copies
50Non-Exact Copies
- Dakota (Inkelas Zoll, 1999)
-
51Non-Exact Copies
- Basic and modified stems in Sye (Inkelas Zoll,
1999)
52Morphological Doubling Theory(Inkelas Zoll,
1999)
- In contradistinction to the more common
correspondence theory - Reduplication involves doubling at the
morphosyntactic level - Phonological doubling is thus expected, but not
required
53Gothic Reduplication under Morphological Doubling
Theory
54Summary
- All morphology seems to be implementable using
composition - This is even true of (bounded) reduplication
- Thus composition is the most general single
operation that can implement morphology
55Reading