LING 406 Intro to Computational Linguistics Morphology and Computational Morphology - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

LING 406 Intro to Computational Linguistics Morphology and Computational Morphology

Description:

the 'lemma' of scripserunt is scribo. Morphology analyzes the structure ... a verb, I can tell you how the agentive noun, the supine, the perfect participle ... – PowerPoint PPT presentation

Number of Views:366
Avg rating:3.0/5.0
Slides: 56
Provided by: richar781
Category:

less

Transcript and Presenter's Notes

Title: LING 406 Intro to Computational Linguistics Morphology and Computational Morphology


1
LING 406Intro to Computational
LinguisticsMorphology and Computational
Morphology
  • Richard Sproat
  • URL http//catarina.ai.uiuc.edu/L406_08/

2
This Lecture
  • Overview of morphological phenomena
  • and their computational implementation

3
What is morphology?
  • scripserunt is third person, plural, perfect,
    active of scribo (I write)
  • Morphology relates word forms
  • the lemma of scripserunt is scribo
  • Morphology analyzes the structure of word forms
  • scripserunt has the structure scribserunt

4
What is morphology useful for?
  • Information retrieval
  • Machine translation
  • Text-to-speech synthesis
  • Word pronunciation often depends upon structure
  • Staatsprotokoll would be pronounced with // for
    the underlined ltsgt if it were part of the same
    syllable as the following ltpgt
  • But it isnt because the ltsgt is a compound
    linking morpheme

5
Morphology is a relation
  • Imagine you have a Latin morphological analyzer
    comprising
  • D a relation that maps between surface form and
    decomposed form
  • L a relation that maps between decomposed form
    and lemma
  • Then
  • scripserunt ? D scribserunt
  • scripserunt ? D ? L scribo

6
Other issues
  • Inflectional vs. Derivational morphology
  • Syntagmatic combination of elements versus
    paradigmatic relationship between words

7
String representation
8
Basic theme of todays lecture
  • All morphological operations can be handled by a
    single finite-state operation
  • COMPOSITION

9
Simple concatenation
  • For example dogs dogs
  • Typically this is dealt with using concatenation
  • But it can also be handled using composition
  • There is an efficiency difference, but these are
    all offline processes

10
Why do it this way?
  • Concatenation is often accompanied by other
    changes either to the stem or the affix or both,
    or imposes requirements on the stem
  • Vowel harmony
  • Stem prosody (see below)
  • Other phonological changes
  • These necessarily involve composition anyway (see
    previous lectures)

11
English regular plurals
  • cat s cats /s/
  • dog s dogs /z/
  • spouse s spouses /?z/
  • Assume one implements the relevant rule with a
    transducer T

12
Prosodic constraints English comparatives
13
Comparative transducer
14
Templatic affixes in Yowlumne
15
CVC(C)
16
CVCVV(C)
17
CVCVV(C) continued
18
-inay and -?aa
19
CVCVV(C) transducer
20
Affix-induced phonological changes
21
Subsegmental morphology
22
Another example
23
Subtractive morphology
24
Koasati truncation transducer
25
Infixation two kinds
  • Extrametrical infixation
  • Positively circumscribed infixation

26
Extrametrical infixation
  • Infixation that can be viewed as ordinary
    affixation, except that you ignore a piece of the
    base you attach it to
  • That piece is said to be extrametrical

27
Bontoc infixation
  • Insert a marker gt after the first consonant (if
    any)
  • Change gt into the infix um-

28
Positively circumscribed infixation
29
Ulwa marker introduction
30
Another instance of prosodically circumscribed
infixation
Kalama
zoo
fg
31
Root Pattern Morphology (McCarthy 1979)
k t b
32
Implementing root and pattern morphology
33
A note on practical implementations
  • Practical implementations such as Buckwalters
    (2002) analyzer usually sidestep all of this
  • Precompile all of the alternations that any
    given root has

34
Karttunen Beesleys Compile-Replace-Merge
  • Malay reduplication (see below on reduplication)
  • bagibagi bags
  • Lexical form bagiNounPlural maps to
  • a regular expression
  • bagi2
  • Which is then replaced with the compilation of
    itself bagibagi

35
Merge
  • View root as filler for a template. So drs is
    filler for CVVCVC
  • Merge operation walks down root and template in
    parallel, attempting to find a match between root
    and template

36
Beesley Karttunens Solution
d V V r V S
d u u r i s
Surface form is a regular expression
37
But this is equivalent to model based solely on
composition
38
Other notable approaches
  • Kiraz (2000) presents a multitape solution based
    on earlier work of Kay (1988)

39
Morphomic relations
  • Aronoff (1994) introduces the rather odd term
    morphomic to describe morphological phenomena
    that are purely morphological

40
The Latin 3rd stem
41
So?
  • 3rd stem is not morphologically uniform
  • It differs across different verb classes and some
    verbs have idiosyncratic third stems
  • It is not semantically coherent
  • The forms that require the 3rd stem are a motley
    crew
  • Yet there is clearly a notion of 3rd stem
  • If you tell me the 3rd stem of a verb, I can tell
    you how the agentive noun, the supine, the
    perfect participle are formed
  • The 3rd stem has a purely morphological function

42
3rd stem is just prosodically induced affixation
  • Assume we have a transducer T that forms the 3rd
    stem of a verb
  • of course, T will have to allow for a lot of
    idiosyncratic changes

S gt3ste S
43
Reduplication
  • Skip the discussion of paradigmatic variation
    see 2.3 in RS
  • Reduplication
  • Unbounded
  • Bounded

44
Unbounded Bambara Reduplication (Culy, 1985)
This is apparently beyond the power
of finite-state methods.
45
Reduplication Gothic (Wright 1910)
  • Prefix a syllable of the form (A)Cai to the stem,
    where C is a consonant position and A is an
    optional appendix
  • Copy the onset of the stem to the C position. If
    there is a pre-onset appendix /s/, copy this to
    the appendix position

46
Factoring Reduplication
  • Prosodic constraints
  • Copy verification transducer C

47
Gothic Index Transducer
48
Factoring Reduplication
  • Then reduplication in Gothic can be modeled as
  • a o C
  • More generally, one can model reduplication as
    the following composition, where P implements the
    prosodic constraints, C the copy constraints, and
    A optional phonological adjustments
  • P o C o A

49
Other Approaches
  • Walther (2000a, 2000b) proposes a special kind of
    transducer involving
  • Repeat arcs move backwards in a string and
    repeat
  • Skip arcs skip over portions of the string
  • Cohen-Sygal Wintner (forthcoming) introduce
    finite state registered automata, extending FSAs
    with registers
  • These methods generally seem to presume exact
    copies

50
Non-Exact Copies
  • Dakota (Inkelas Zoll, 1999)

51
Non-Exact Copies
  • Basic and modified stems in Sye (Inkelas Zoll,
    1999)

52
Morphological Doubling Theory(Inkelas Zoll,
1999)
  • In contradistinction to the more common
    correspondence theory
  • Reduplication involves doubling at the
    morphosyntactic level
  • Phonological doubling is thus expected, but not
    required

53
Gothic Reduplication under Morphological Doubling
Theory
54
Summary
  • All morphology seems to be implementable using
    composition
  • This is even true of (bounded) reduplication
  • Thus composition is the most general single
    operation that can implement morphology

55
Reading
  • RS Ch. 2
Write a Comment
User Comments (0)
About PowerShow.com