Finite-State Methods in Natural Language Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Finite-State Methods in Natural Language Processing

Description:

Finite-State Methods in Natural Language Processing Lauri Karttunen LSA 2005 Summer Institute August 3, 2005 – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 44
Provided by: ern98
Learn more at: http://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Finite-State Methods in Natural Language Processing


1
Finite-State Methods in Natural Language
Processing
  • Lauri Karttunen
  • LSA 2005 Summer Institute
  • August 3, 2005

2
  • August 1
  • Non-concatenative morphotactics
  • Reduplication, interdigitation
  • Realizational morphology
  • Readings
  • Chapter 8. Non-Concatenative Morphotactics
  • Gregory T. Stump. Inflectional Morphology. A
    Theory of Paradigm Structure. Cambridge U. Press.
    2001. (An excerpt)
  • Lauri Karttunen, Computing with Realizational
    Morphology, Lecture Notes in Computer Science,
    Volume 2588, Alexander Gelbukh (ed.), 205-216,
    Springer Verlag. 2003.
  • August 3
  • Optimality theory
  • Readings
  • Paul Kiparsky Finnish Noun Inflection
    Generative Approaches to Finnic and Saami
    Linguistics, Diane Nelson and Satu Manninen
    (eds.), pp.109-161, CSLI Publications, 2003.
  • Nine Elenbaas and René Kager. "Ternary rhythm and
    the lapse constraint". Phonology 16. 273-329.

3
Background
  • Two old strains of finite-state (morpho)phonology
  • rewrite rules (ChomskyHalle 1968)
  • two-level constraints (Koskenniemi 1983)
  • Optimality theory (Prince Smolensky 1993)
  • two-level model with ranked, violable constraints
  • Formal Power
  • OT is not a finite-state system if it involves
    unlimited counting of constraint violations.
    (Ellison 1994, Eisner 1997, FrankSatta 1998)
  • But a finite-state model can be useful for OT.

4
Optimality theory
  • Prince Smolensky 1993
  • eliminate
  • rules
  • derivations
  • introduce
  • violable ranked constraints
  • Instant success!

5
Brief Introduction to OT
  • Input
  • A language of underlying lexical forms.
  • GEN
  • A function that generates alternate surface
    realizations for each input form, possibly an
    infinite set.
  • Constraints
  • A finite set of principles, preferrably
    universal, that filter out unwanted realizations.
  • Ranking
  • A language-specific ordering of the constraints.

6
Computational perspective
  • Ellison 1994
  • OT deals with regular sets and relations a
    finite-state system
  • constraint transducers mark violations, marks
    sorted and counted
  • Tesar 1995
  • dynamic algorithm for optimal path computations
  • Eisner 1996
  • two-level typology of optimality constraints
    restrict, prohibit
  • FootForm Decomposed MIT Working Papers in
    Linguistics, 31115-143 proposes Primitive
    Optimality Theory (no generalized alignment)
  • Karttunen 1998
  • Introduces lenient composition
  • Frank Satta 1998
  • Prove that OT is regular if of violations is
    bounded.

7
Comparisons
8
Finnish OT Prosody
  • Lauri Karttunen
  • CLS-41
  • April 7, 2005

9
Finnish Prosody basic facts
  • The nucleus of a Finnish syllable must consist of
    a short vowel, a long vowel, or a diphthong.
  • Main stress is always on the first syllable,
    secondary stress occurs on non-initial syllables.
  • Adjacent syllables are never stressed.
  • Stressed syllable is initial in the foot.
  • ilmoittautuminen registering (Nom Sg)
  • (íl.moit).(tàu.tu).(mì.nen)

10
Ternary feet in Finnish
  • Stress that would fall on a light syllable shifts
    on the following heavy syllable creating a
    ternary foot.
  • (ká.las).te.(lèm.me) we are fishing
  • (íl.moit).(tàu.tu).mi.(sès.ta) registering (Ela
    Sg)
  • (rá.kas).ta.(jàt.ta).ri.(àn.sa) his mistresses
    (Par Pl)
  • Can we get these facts to come out for free,
    from the interaction of independently motivated
    principles?
  • Yes!
  • Paul Kiparsky Finnish Noun Inflection
    Generative Approaches to Finnic and Saami
    Linguistics, Diane Nelson and Satu Manninen
    (eds.), pp.109-161, CSLI Publications, 2003.
  • Nine Elenbaas and René Kager. "Ternary rhythm and
    the lapse constraint". Phonology 16. 273-329.

11
Non-OT and OT solutions
  • It is possible to define a cascade of replace
    rules that produce the desired result.
  • http//www.stanford.edu/laurik/fsmbook/examples/F
    innishProsody.html
  • But, following Kiparsky, we are going to do OT
    today, and in a more elegant way than is shown
    at
  • http//www.stanford.edu/laurik/fsmbook/examples/F
    innishOTProsody.html

12
Prelude Built-in Functions in fst
  • Case conversion
  • UpCase( OptUpCase(
  • DownCase( OptDownCase(
  • Cap( OptCap(
  • AnyCase(
  • Cap(hello) is equivalent to Hello
  • OptUpCase(ab, L) is equivalent to aB ab
  • Symbol manipulation
  • Explode( Implode(
  • regex Explode("Test") is equivalent to regex
    Test

13
Functions User-defined
  • The function definition is attached to a symbol
    ending with (
  • The definition is any regular expression.
  • There may be any number of arguments.
  • define Redup(X) X X
  • define Apply(X, Y) X .o. Y.l
  • When the function is used in a regular
    expression, the arguments are bound and the
    function is evaluated.
  • regex Apply(abc, a -gt x _ b)
  • print words
  • xbc
  • The definition of a function may contain other
    functions.

14
Pig Latin
  • This script creates a function for translating
    from English to Pig Latin
  • pig -gt igpay, brown -gt ownbray, script -gt
    iptscray

define C bcdfghjklmnpqrstvwxy
z define V aeiou
define Redup(X) X "." X define DelCons(X) X
.o. C _at_-gt 0 .. _ define TailToAy(X) X
.o. V ? _at_-gt ay "." C _ define
DelMiddle(X) X .o. "." -gt 0
define Pig(X) DelMiddle(TailToAy(DelCons(Redup(X)
)))
15
Demo!
  • fst -l piglatin.script

16
Computing with OT
By what finite-state operation?
17
Priority union .P.
All pairs from R and those pairs from Q that do
not conflict with the mapping established by R.
R .P. Q R R.u .o. Q
Kaplan 1987
18
Lenient Composition .O.
  • Let R be a relation that maps each input string
    to one or more outputs.
  • Let C be a constraint that eliminates some
    outputs.
  • R .O. C is the relation that maps each input
    string that can meet the constraint C to the
    outputs that meet C and leaves the rest of the
    relation R unchanged. (Karttunen 1998)
  • R .O. C R .o. C .P. R
  • Is constraint ranking rule ordering in disguise?
    Yes.

19
Need a prolific GEN
  • ka.la
  • ka.lá
  • ka.là
  • ka.(là)
  • ka.(lá)
  • ká.la
  • ká.lá
  • ká.là
  • ká.(là)
  • ká.(lá)
  • kà.la
  • (kà.la)
  • (ká).la
  • (ká).lá
  • (ká).là
  • (ká).(là)
  • (ká).(lá)
  • (ká.là)
  • (ká.lá)
  • (ká.la) ?
  • (ka.là)
  • (ka.lá)

kà.lá kà.là kà.(là) kà.(lá) (kà).la (kà).lá (kà).
là (kà).(là) (kà).(lá) (kà.là) (kà.lá)
kala fish (Nom Sg) 33 candidates
20
Basic definitions 1
  • Using Parc/XRCE regular expression syntax
  • define C b c d f g h j k l m
  • n p q r s t v w x z
    Consonant
  • define HighV u y i High
    vowel
  • define MidV e o ö Mid
    vowel
  • define LowV a ä Low
    vowel
  • define USV HighV MidV LowV
    Unstressed Vowel
  • define MSV á é í ó ú ý ä ö
  • define SSV à è ì ò ù y ä ö
  • define SV MSV SSV Stressed
    vowel
  • define V USV SV Vowel

21
Basic definitions 2
  • define P V C
    Phone
  • define B \P ..
    Boundary
  • define E .. "."
    Edge
  • define Light C V
    Light syllable
  • define Heavy Light P
    Heavy syllable
  • define S Heavy Light Syllable
  • define SS S SV Stressed
    syllable
  • define US S SV Unstressed
    syllable
  • define MSS S MSV Syllable with
    main stress

22
GEN 1
  • define MarkNonDiphthongs
  • . . -gt "." HighV MidV _ LowV, i.a,
    e.a
  • LowV _ MidV, a.e
  • i _ MidV - e,
    i.o, i.ö
  • u _ MidV - o,
    u.e
  • y _ MidV - ö,
    y.e
  • V i _ e,
    poiki.en
  • V u _ o,
  • V y _ ö
  • Insert a syllable boundary between vowels that
    cannot form
  • a diphtong i.a, e.a, a.e, i.o, u.e, y.e, etc.
  • define Syllabify C V C _at_-gt ... "." _ C V
  • Insert a syllable boundary after a maximal C
    V C pattern that is followed by C V. For
    example, strukturalismi -gt struk.tu.ra.lis.mi.

23
GEN 2
  • define Stress a (-gt) áà, e (-gt) éè, i (-gt) íì,
  • o (-gt) óò, u (-gt) úù, y (-gt)
    "y""y",
  • ä (-gt) "ä""ä", ö (-gt) "ö""ö"
  • Optionally stress any vowel with a primary or
    secondary stress.
  • define Scan S ("." S ("." S)) SS (-gt) "("
    ... ")" E _ E
  • Optionally group syllables into unary, binary, or
    ternary feet when there is at least one stressed
    syllable.
  • define Gen MarkNonDiphthongs .o. Syllabify .o.
  • Stress .o. Scan

24
Demo!
  • fst -utf8 -l gen.script
  • regex kala .o. Gen (compose)
  • print lower-words (show output candidates)
  • print size (count them)

25
Kiparsky's nine constraints
  • Clash
  • AlignLeft
  • MainStress
  • FootBin
  • Lapse
  • NonFinal
  • StressToWeight
  • Parse
  • AllFeetFirst

26
Counting constraint violations
  • We use asterisks to mark constraint violations.
    We need a way to prefer candidates with the least
    number of violation marks.
  • define Viol
  • define Viol0 Viol No violations
  • define Viol1 Viol2 At most one violation
  • define Viol2 Viol3 At most two violations
  • define Viol3 Viol4
  • This eliminates the violation marks after the
    candidate set has been pruned by a constraint.
  • define Pardon -gt 0

27
Defining OT Constraints
  • Three types
  • Unviolable constraints
  • Primary stress in Finnish
  • Ordinary violable constraints
  • Lapse
  • Gradient alignment constraints
  • All-Feet-First
  • Strategy
  • We define an evaluation template for each of the
    three types and then define the individual
    constraints with the help of the templates.

28
Evaluation Template for Unviolable Constraints
  • define Unviolable(Candidates, Constraint)
  • Candidates
  • .o.
  • Constraint
  • Example
  • define MainStress(X) Unviolable(X, B MSS MSS)
  • B is the left edge of the word or "(".
  • MSS is a syllable with a primary stress.

29
Evaluation Template for Ordinary Constraints
  • define Eval(Candidates, Violation, Left, Right)
  • Candidates
  • .o.
  • Violation -gt ... Left _ Right
  • .O.
  • Viol3 .O. Viol2 .O. Viol1 .O. Viol0
  • .o.
  • Pardon
  • where Viol0 is , Viol2 is 2, etc.
    and
  • Pardon is -gt 0 deleting all violation marks.

30
Evaluation Template for Left-Oriented Gradient
Alignment
  • define EvalGradientLeft(Candidates, Violation,
    Left, Right)
  • Candidates .o.
  • Violation -gt ... .. Left _ Right
  • .o.
  • Violation -gt 2 ... .. Left2 _ Right
  • .o.
  • Violation -gt 3... .. Left3 _ Right
  • .o.
  • Violation -gt 4 ... .. Left4 _ Right
  • .o.
  • Violation -gt 5 ... .. Left5 _ Right
  • .o.
  • Violation -gt 6 ... .. Left6 _ Right
  • .o.
  • Violation -gt 7 ... .. Left7 _ Right
  • .o.
  • Violation -gt 8 ... .. Left8 _ Right
  • .O.
  • Viol12 .O. Viol11 .O. Viol10 .O. Viol9 .O.
    Viol8 .O. Viol7 .O.

31
Clash, AlignLeft, MainStress
  • Clash
  • No stress on adjacent syllables.
  • define Clash(X) Eval(X, SS, SS B, ?)
  • Align-Left
  • The stressed syllable is initial in the foot.
  • define AlignLeft(X) Eval(X, SV, .. ? "(" C,
    ?)
  • Main Stress
  • The primary stress in Finnish is on the first
    syllable.
  • define MainStress(X) Unviolable(X, B MSS MSS)

32
FootBin, Lapse, NonFinal
  • Foot-Bin
  • Feet are minimally bimoraic and maximally
    bisyllabic.
  • define FootBin(X) Eval(X, "( Light ") "
    ("S"." Sgt1,
  • ? ,?)
  • Lapse
  • Every unstressed syllable must be adjacent to a
    stressed syllable or to the word edge.
  • define Lapse(X) Eval(X, US, B US B, B US B)
  • Non-Final
  • The final syllable is not stressed.
  • define NonFinal(X) Eval(X, SS, ?, S ..)

33
StressToWeight, Parse, AllFeetFirst
  • Stress-To-Weight
  • Stressed syllables are heavy.
  • define StressToWeight(X) Eval(X, SS Light, ?,
    ")" E)
  • License-s
  • Syllables are parsed into feet.
  • define Parse(X) Eval(X, S, E, E)
  • All-Ft-Left
  • The left edge of every foot coincides with the
    left edge of some prosodic word.
  • define AllFeetFirst(X)
  • EvalGradientLeft(X, "(", ".", ?)

34
Finnish Prosody
  • Kiparsky 2003
  • define FinnishProsody(Input)
  • AllFeetFirst( Parse( StressToWeight(
  • NonFinal( Lapse( FootBin( MainStress(
  • AlignLeft( Clash( Input .o. Gen)))))))))

35
FinnWords
  • regex FinnishProsody( kalastelet
    kalasteleminen
  • ilmoittautuminen järjestelmättömyydestänsä
  • kalastelemme ilmoittautumisesta
  • järjestelmällisyydelläni järjestelmällistämä
    töntä
  • voimisteluttelemasta opiskelija
    opettamassa
  • kalastelet strukturalismi
    onnittelemanikin
  • mäki perijä repeämä ergonomia
  • puhelimellani matematiikka
    puhelimistani
  • rakastajattariansa kuningas
    kainostelijat
  • ravintolat merkonomin )
  • Demo!

36
Result
  • (ér.go).(nò.mi).a
  • (íl.moit).(tàu.tu).mi.(sès.ta)
  • (íl.moit).(tàu.tu).(mì.nen)
  • (ón.nit).(tè.le).(mà.ni).kin
  • (ó.pis).(kè.li).ja
  • (ó.pet).ta.(màs.sa)
  • (vói.mis).te.(lùt.te).le.(màs.ta)
  • (strúk.tu).ra.(lìs.mi)
  • (rá.vin).(tò.lat)
  • (rá.kas).ta.(jàt.ta).ri.(àn.sa)
  • (ré.pe).(ä.mä)
  • (pé.ri).jä
  • (pú.he).li.(mèl.la).ni
  • (pú.he).li.(mìs.ta).ni
  • (mä.ki)
  • (má.te).ma.(tìik.ka)
  • (mér.ko).(nò.min)
  • (kái.nos).(tè.li).jat
  • (ká.las).te.(lèm.me)
  • (ká.las).te.(lè.mi).nen
  • (ká.las).(tè.let)
  • (kú.nin).gas
  • (jär.jes).tel.(mäl.li).syy.(dèl.lä).ni
  • (jär.jes).(tèl.mät).tö.(myy.des).(tän.sä)
  • (jär.jes).(tèl.mäl).(lìs.tä).mä.(tön.tä)

37
Two Errors
  • (ká.las).te.(lè.mi).nen
  • (jär.jes).tel.(mäl.li).syy.(dèl.lä).ni
  • The interaction of Lapse and StressToWeight does
    not produce the desired result in these cases.

38
What is wrong?
  • define Debug(Input)
  • DebugStressToWeight(
  • NonFinal( Lapse( FootBin( MainStress(
    AlignLeft(
  • Clash( Input .o. Gen)))))))
  • regex Debug(kalasteleminen)
  • (ká.las).te.(lè.mi).nen lt-- actual winner
  • (ká.las).(tè.le).(mì.nen) lt-- desired output
  • (jär.jes).tel.(mäl.li).syy.(dèl.lä).ni lt--
    actual winner
  • (jär.jes).(tèl.mäl).li.(syy.del).(lä.ni) lt--
    desired output
  • The StressToWeight constraint eliminates some of
    the desired winning candidates.

39
Nine Elenbaas
  • A unified account of binary and ternary stress.
    Ph.D. dissertation. University of Utrecht. 1999.
    Based on KiparskyHanson 1996. The only
    difference is that Elenbaas has a special
    constraint (L H) or AntiLStressH( in place of
    Kiparskys more general StressToWeight
    constraint.
  • define FinnishProsody(Input)
  • AllFeetFirst( Parse( AntiLStressH(
  • NonFinal( Lapse( AlignLeft( FootBin(
  • MainStress( Clash( Input .o. Gen)))))))))
  • define AntiLStressH(X) Eval(X, SS Light, "(" ,
    "." Heavy)

40
Result
  • (ér.go).(nò.mi).a
  • (íl.moit).(tàu.tu).mi.(sès.ta)
  • (íl.moit).(tàu.tu).(mì.nen)
  • (ón.nit).(tè.le).(mà.ni).kin
  • (ó.pis).(kè.li).ja
  • (ó.pet).ta.(màs.sa)
  • (vói.mis).te.(lùt.te).le.(màs.ta)
  • (strúk.tu).ra.(lìs.mi)
  • (rá.vin).(tò.lat)
  • (rá.kas).ta.(jàt.ta).ri.(àn.sa)
  • (ré.pe).(ä.mä)
  • (pé.ri).jä
  • (pú.he).li.(mèl.la).ni
  • (pú.he).li.(mìs.ta).ni
  • (mä.ki)
  • (má.te).ma.(tìik.ka)
  • (mér.ko).(nò.min)
  • (kái.nos).(tè.li).jat
  • (ká.las).te.(lèm.me)
  • (ká.las).te.(lè.mi).nen
  • (ká.las).(tè.let)
  • (kú.nin).gas
  • (jär.jes).(tèl.mäl).li.(syy.del).(lä.ni)
  • (jär.jes).(tèl.mät).tö.(myy.des).(tän.sä)
  • (jär.jes).(tèl.mäl).(lìs.tä).mä.(tön.tä)

41
Did She Know?
Six syllables (Appendix of Elenbaas thesis) X X L
L L L áterìanàni áteriànani 'meal (Ess
1SG)' érgonòmiàna 'ergonomics
(Ess)' káinostèlijàna 'shy person
(Ess)' káinostèlijàni 'shy person (Nom
1SG)' kúnnallìsenàni 'council (Ess
1SG)' kúnnallìsiàni councils (Part
1SG)' kúnnallìsinàni 'councils (Ess
1SG)' mérkonòmiàni 'degree in economics (Part
1SG)' mérkonòminàni 'degree in economics (Ess
1SG)' ópiskèlijàni 'student (Nom
1SG)' púhelìmenàni 'telephone (Ess
1SG)' púhelìmiàni telephone (Part
1SG) Missing pattern X X L L L H
42
Conclusion
  • Can we get ternary feet in Finnish for free,
    from the interaction of independently motivated
    principles?
  • We dont know.
  • We know that the Kiparsky and Elenbaas accounts
    fail.
  • Optimality Prosody is computationally very
    difficult.
  • The number of initial candidates is huge
  • kalasteleminen 70653
  • järjestelmällisyydelläni 21767579
  • Simple tableau methods do not work.
  • Finite-state implementation guards against errors
    made by a human GEN and EVAL.
  • But even when an error can be pinpointed, the fix
    is not obvious.
  • Debugging OT constraints is as hard as debugging
    two-level rules, in practice more difficult than
    rewrite systems.

43
Final Thoughts
  • Morphology is a regular relation.
  • The composition of words (morphosyntax),
    morphological alternations, and prosody can be
    described in finite-state terms.
  • A complex relation can be decomposed in different
    ways.
  • There are many flavors of finite-state
    morphology Item-and-Arrangement, Rewrite rules,
    Two-level rules, Realizational Morphology,
    Classical optimality constraints.
  • Computing with finite-state tools is fun and
    easy.
  • We have sophisticated formalism for describing
    regular relations, efficient compilers and
    runtime software.
  • Pen-and-pencil morphology badly needs
    computational support.
  • It is difficult to get globally correct results
    relying on a handful of interesting words, rules,
    and constraints.
Write a Comment
User Comments (0)
About PowerShow.com