CSA4050 Advanced Topics in NLP - PowerPoint PPT Presentation

About This Presentation
Title:

CSA4050 Advanced Topics in NLP

Description:

... CVCVC and a different vocalism ui (signifying imperfect aspect ... Intermediate Result Vocalism. d u u r i s. November 2003. Computational Morphology VI ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 27
Provided by: scienc76
Category:

less

Transcript and Presenter's Notes

Title: CSA4050 Advanced Topics in NLP


1
CSA4050 Advanced Topicsin NLP
  • Non-Concatenative Morphology
  • Reduplication
  • Interdigitation

2
Reference
  • Ken Beesely and Lauri Karttunen, Finite State
    Non-Concatenative Morphotactics, Proceedings of
    SIGPHON-2000

3
Koskenniemi 1983
  • "Only restricted infixation and reduplication
    can be handled adequately with the present
    system. Some extensions or revisions will be
    necessary for an adequate description of
    languages possessing extensive infixation or
    reduplication"

4
Non-Concatenative Languages
  • Most languages build words by stringing together
    morphemes like beads on a string.
  • The word-building processes of prefixation and
    suffixation can be straightforwardly modeled in
    finite state terms by concatenation.
  • But some languages also exhibit non-concatenative
    morphotactics.

5
Non-Concatenative Phenomena1. Reduplication
  • In Malaybagi (bag)bagi-bagi (bags)
  • Although this may appear concatenative, it does
    not involve concatenating a predictible morpheme
    like "s". Instead the entire stem is copied no
    matter what its length.
  • In general language class (ww w ? L) is context
    sensitive, but if L is finite, we can construct
    an FS network that encodes it.

6
General Solution for Reduplication
  • Therefore, assuming the number of words subject
    to reduplication is finite, it is possible to
    construct a lexical transducer for languages like
    Malay.
  • To handle reduplication, a new operator n is
    introduced
  • An denotes n concatenations of A.

7
Remarks from Beesleyon Context Sensitivity
  • finite-state grammars (cannot handle unlimited
    nesting or non-nested terminal dependencies)
  • context-free (can handle unlimited nesting,
    suchas matched parentheses in arithmetic
    expressions, but cannot handle non-nested
    dependencies between terminals)
  • context-sensitive (can also handle
    non-nesteddependencies between terminals, as
    indogdogwhere terminal elements 1 and 4 have
    to bethe same, 2 and 5 have to be the same,
    and3 and 6 have to be the same.  These
    dependenciescross, so they're not nested.

8
Non-Concatenation 2. Interdigitation
  • In Arabic and Maltese, prefixes and suffixes
    attach to stems in the usual concatenative way,
    but stems themselves are formed by a process
    known as interdigitation.
  • An example of occurs with the Arabic stem "katab"
    (wrote).
  • This stem is composed of three elements
  • the all consonant root ktb
  • an abstract consonant-vowel template CVCVC
  • a vocalisation aa (in this case signifying
    perfect tense and active voice)

9
Interdigitation
  • The same root ktb can combine with the same
    template CVCVC and a different vocalism ui
    (signifying imperfect aspect and passive voice)
    to produce "kutib" (was written).
  • The same root ktb can combine with a different
    template CVVCVC and the vocalism ui to produce
    "kuutib" another form of the verb.

10
Intermediate ResultTemplate Root
d v v r v s
11
Final ResultIntermediate Result Vocalism
d u u r i s
12
Merge
  • In this case the filler language contains an
    infinite set of strings (i, ui, uui ) but only
    one path can be constructed because all strings
    end in i. Hence the earlier vowels must be "u".
  • This need not always be the case (eg if the
    filler language were ui).

13
Merge Operators
  • To introduce the merge operation into the Xerox
    calculus new operators, .ltm. and .mgt. have been
    introduced.
  • These differ only in the order of arguments.
  • T .ltm. F and F .mgt. T represent the same
    merge operation with F and T as filler and
    template respectively.

14
The Composite Transducer
  • With these operators the network above can be
    compiled by using the following expressiond r
    s .mgt. C V V C V C .ltm. u i

15
Merge
template
c v v c v c
vocalism
root
d r s
16
Compile-Replace
  • Regular expressions are compiled into networks as
    usual, but in addition,
  • the compiler is then applied to its own output.
  • Central idea
  • transduce to a language that has the format of
    regular expressions.
  • The compile-replace algorithm then replaces the
    regular expression with the result of its own
    compilation.

17
Compile Replace Simple Example
This network maps the string a to a
(i.e. the same RE but with special delimiters)
Application of CR to the lower side of
the network eliminates the markers, compile
the RE a and maps the upper side to to the
language resulting from the compilation.
18
The result of compiling a
  • To answer the question what does this network
    do?
  • Figure out what it does in upward and downward
  • directions

19
The result of compiling a
When applied in the upward direction, this
transducer maps any string of the infinite a
language into the regular expression from which
it was compiled.
When applied in the downward direction, it maps
from a to all the strings in the language a,
0, a, aa, ...
20
Compile-Replace 1
  • Copy input path to output path until is
    encountered on indicated (in our case lower) side
    of the network.
  • Extract path until closing delimiter .

21
Compile-Replace 2
  • Symbols along indicated side are concatenated
    into a string and eliminated from the path
    leaving just the symbols on the opposite side.
    The remaining net is
  • The extracted string is compiled into a second
    network using the standard network compiler

22
Compile-Replace 3
  • The 2 networksare combined together using the
    cross product operator.
  • The result
  • is spliced between the origin and destination
    states of the regular expression path.

23
Reduplication Revisited
  • Applying compile-replace to this transducer
  • Lexical b a g i Noun Plural
  • Surface b a g i 2
  • yields this one
  • Lexical b a g i Noun Plural
  • Surface b a g i b a g i

24
Interdigitation Revisited
  • Applying compile-replace to this transducerUp
    k i t e b Verb Past 3SgDok t b .mgt. C V C
    V C .ltm. i e
  • yields this oneUp k i t e b Verb Past 3Sg
  • Do k i t e b

25
Remember Two Central Problems
  • Morphotactics constraints on combinations of
    morphemes governing the formation of valid words.
    unbelievable vs. believeunable
  • Phonological/Orthographical Alternation (spelling
    rules)how morphemes are realised in particular
    environmentsfly s flies

26
Xerox Perspective
  • Morphotactics handle with lexc
  • Phonological/Orthographical Alternation (spelling
    rules)handle with xfst

lexc
Morphotactics
Lexicon FST
Lexical Transducer
.o.
xfst
Rules FST
Alternations
Write a Comment
User Comments (0)
About PowerShow.com