Morphology from a computational point of view - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Morphology from a computational point of view

Description:

... addition sensation roy ration origin classic convention NULL.man sand news police states gross sun fresh sports boss sales 3- patrol bonds ed.er.ing slugg ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 66
Provided by: Prefer695
Category:

less

Transcript and Presenter's Notes

Title: Morphology from a computational point of view


1
Morphology from a computational point of view
  • March 2001

2
Today
  • Minimal Edit Distance, and Viterbi more
    generally
  • Letter to Sound
  • What is morphology?
  • Finite-state automata
  • Finite-state phonological rules

3
1. What is morphology?
  • Study of the internal structure of words
  • morph-ology word-s jump-ing
  • Why?
  • For some purposes, we need to know what the
    internal pieces are.
  • Knowledge of the words of a language cant be
    summarized in a finite list we need to know the
    principles of word-formation

4
Some resources
  • Richard Sproat Morphology and Computation (MIT
    Press, 1992)
  • Excellent overview of computational morphology
    and phonoloy by Harald Trost at
  • http//www.ai.univie.ac.at/harald/handbook.html

5
2. What applications need knowledge of words?
  • Any high-level linguistic analysis syntactic
    parser
  • machine translation
  • speech recognition, text-to-speech (TTS)
  • information retrieval (IR)
  • dictionary, spell-checker

6
3. A list is not enough
  • An empirical fact
  • AP newswire mid-Feb Dec 30 1988
  • Nearly 300,000 words.
  • New words that appeared on Dec 31 1988
  • compounds prenatal-care, publicly-funded,
    channel-switching, owner-president, logic-loving,
    part-Vulcan, signal-emitting, landsite,
    government-aligned, armhole, signal-emitting...

7
...new words...
  • dumbbells, groveled, fuzzier, oxidized
  • ex-presidency, puppetry, boulderlike,
    over-emphasized, hydrosulfite, outclassing,
    non-passengers, racialist, counterprograms,
    antiprejudice, re-unification, traumatological,
    refinancings, instrumenting, ex-critters,
    mega-lizard

8
  • ex-presidency prefix ex-
  • boulder-like suffix like
  • over-emphasized prefix over-
  • antiprejudice prefix anti
  • This is often called the OOV problem (out of
    vocabulary).

9
  • If we work out the principles of word-formation,
    we will simultaneously
  • compress the size of our internalized list of
    words
  • become able to deal with new words on the fly.

10
4. Overview of applications that need knowledge
of words
  • Speech generation text to speech (TTS)
  • Speech recognition

11
Text to speech
  • Problem take text, in standard spelling, and
    produce a sequence of phonemes which can be
    synthesized by the backend.
  • Severe problems Proper names (persons, places),
    OOV words
  • boathouse B OW1 T H AU2 S

12
Speech recognition
  • Take a sound file (e.g., .wav) and produce a
    list of words in standard orthography.
  • Bill Clinton is a recent ex-president.
  • If someone says it, we need to figure out what
    the word was.

13
Do we know what a word is?
  • This is actually not an easy question!
    especially if we turn to Asian languages, without
    a tradition of putting in white space between
    words, as we do in the West.
  • German writes more compounds without white space
    than English does.

14
Basic principles of morphology
  • For some purposes, we need to think about
    phonemes, while for others its more convenient
    to talk about letters.
  • For our purposes, Ill talk about letters
    whenever we dont need to specifically focus on
    phonemes.

15
Morpheme
  • It is convenient to be able to talk about the
    pieces into which words may be broken, and
    linguists call these pieces morphemes the
    smallest parts of a language that can be
    regularly assigned a meaning.

16
Morphemes
  • Uncontroversial morphemes
  • door, dog, jump, -ing, -s, to
  • More controversial morphemes
  • sing/sang s-ng i/a
  • cut/cut cut PAST

17
Classic distinctions in morphology
  • Analytic (isolating) languages
  • no morphology of derivational or inflectional
    sort.
  • Synthetic (inflecting) languages
  • Agglutinative 1 function per morpheme
  • Fusional gt 1 function per morpheme

18
AgglutinativeFinnish Nominal Declension
talo 'the-house' kaup-pa 'the-shop' talo-ni 'my
house' kaup-pa-ni 'my shop' talo-ssa 'in
the-house' kaup-a-ssa 'in the-shop' talo-ssa-ni
'in my house kaup-a-ssa-ni 'in my
shop' talo-i-ssa 'in the-houses kaup-o-i-ssa 'in
the-shops' talo-i-ssa-ni 'in my
houses kaup-o-i-ssa-ni 'in my shops'
Courtesy of Bucknell Univ. web page
19
Fusional LatinLatin Declension of hortus
'garden'
Singular Plural Nominative (Subject)
hort-us hort-i Genitive (of)
hort-i hort-rum Dative (for/to) hort-o
hort-is Accusative (Direct Obj) hort-um
hort-us Vocative (Call) hort-e hort-i Ablative
(from/with) hort-o hort-is
20
Morphemes vs. morphs
  • Some analysts distinguish between morphemes and
    morphs.
  • Morphemes are motivated by an analysis, and
    include plural and past
  • Morphs are strings of letters or phones that
    realize or manifest a morpheme.

21
Free and bound morphemes
  • Free morphemes can form (free-standing) words
    bound morphemes are only found in combination
    with other morphemes.
  • Examples?

22
Functions of morphology
  • Derivational morphology creates one lexeme from
    another
  • compute gt computer gt computerize gt
    computerization
  • Inflectional morphology creates the form of a
    lexeme thats right for a sentence
  • the nominative singular form of a noun or the
    past 3rd person singular form of a verb.

23
  • Word an identifiable string of letters (or
    phonemes) sing
  • Word-form a word with a specific set of
    syntactic and morphological features. The sing in
    I sing is 1st person sg, and is a distinct
    word-from from the sing in you sing.
  • Lexeme a complete set of inflectionally related
    word-forms, including sing, sings, and sang
  • Lemma a complete set of morphologically related
    lexemes sing, sings, song, sang.

24
A lexemes stem
  • In many languages (unlike English),
    constellations of word-forms forming a lexeme
    demand the recognition of a basic stem which does
    not stand freely as a word
  • Italian ragazzo, ragazzi (boy, girl)
  • ragazzi, ragazze (boys, girls)
  • ragazz-

25
Compounds
  • Compounds are composed of 2 (or more) words or
    stems
  • Compounds hot dog, White House, bookstore,
    cherry-covered

26
Languages vary in the amount of morphology they
have and use
  • English has a lot of derivational morphology and
    relatively little inflectional morphology
  • English verbs inflectional forms
  • bare stem, -s, -ed, -ing

27
European languages
  • Not uncommon for a verb to have 30 to 50 forms
  • marking tense, person and number of the subject

28
Derivation
  • Derivational morphology usually consists of
    adding a prefix or suffix to a base ( stem).
  • The base has a lexical category (it is a noun,
    verb, adjective), and the suffix typically
    assigns a different category to the whole word.

Noun
-ness suffix that takes an adjective, makes a
noun.
Adj
sad ness
29
Adj
Adj
Verb
un interest ing
30
Distinct from contractions
  • English (and some other languages) permit the
    collapsing together of common words. In some
    extremely rare cases, only the collapsed form
    exists (English possessive s).
  • He will arrive tonight gt hell arrive
  • The King of Englands children

31
Some basics of English morphology
  • Inflectional morphology
  • Nouns -NULL, -s, -s
  • Verbs -NULL, s, -ed, -ing
  • (so-called weak verbs)
  • Strong verbs 3 major groups
  • a. Internal verb change (sing/sang,
    drive/drove/driven, dive/dove)
  • b. t suffix, typically with vowel-shortening
    dream/dreamt, sleep/slept
  • c. aught replacement catch, teach, seek,

32
Derivational morphology in complex
  • This morphology creates new words, by adding
    prefixes or suffixes.
  • It is helpful to divide them into two groups,
    depending on whether they leave the pronunciation
    of the base unchanged or not.
  • There are, as always, some fuzzy cases.

33
  • Level 1
  • ize, ization, al, ity, al, ic, al, ity, ion, y
    (nominaliz-ing), al, ate, ous, ive, ation
  • Can attach to non-word stems (fratern-al,
    paternal parent-al)
  • Typically change stress and vowel quality of stem
  • Level 2
  • Never precede Level 1 suffixes
  • Never change stress pattern or vowel quality
  • Almost always attach to words that already exist
  • hood, ness, ly, s, ing, ish, ful, ly, ize, less,
    y (adj.)

34
Combinations of Class 1,2
  • Class 1 Class 1 histor-ic-al,
    illumina-at-tion, indetermin-at-y
  • Class 1 Class 2 frantern-al-ly,
    transform-ate-ion-less
  • Class 2 Class 2 weight-less-ness
  • ?? Class 2 Class 1 weight-less-ity,
    fatal-ism-al

35
Signature
  • Set of suffixes (or prefixes) that occurs in a
    corpus with a set of stems.

36
  • NULL.ed.ing.s
  • look interest add claim mark extend demand
    remain want succeed record offer represent cover
    return end explain follow help belong attempt
    talk fear happen assault account point award
    appeal train contract result request staff view
    fail kick visit confront attack comment sponsor

37
  • NULL.s
  • paper retain improvement missile song truth
    doctor indictment window conductor dick
    misunderstanding struggle stake tank belief
    cafeteria material mind operator bassi lot
    movement chain notion marriage dancer scholarship
    reservoir sweet right battalion hold mr shot
    cardinal athletic revenue duel confrontation solo
    talent guest shoe russian commitment average monk
    election street roger rifle worker area plane
    pinch-hitter dozen browning conclusion teacher
    narcotic appearance alternative dealer producer
    mile stock shrine sometime bag successor career
    mistake ankle weapon model front spotlight rhode
    pace debate payment requirement fairway
    consultation chip dollar employer thank mustang
    rocket-bomb hat string precinct robert employee
    action detective pressure measure spirit forbid
    hitter breast yankee partner floor member

38
  • NULL.d.s
  • increase tie hole associate reserve price fire
    receive challenge rate purchase propose feature
    celebrate decide suite single change sculpture
    combine privilege pledge issue frame indicate
    believe damage include use aide graduate surprise
    intervene practice trouble serve oppose promise
    charge note schedule continue raise decline cause
    operate emphasize relieve hope share judge birdie
    produce exchange

39
  • NULL.ed.er.ing.s report turn walk park pick flow
  • NULL.d.ment enforce announce engage arrange
    replace improve encourage
  • NULL.n.s rose low take law drive rise undertake
  • NULL.al intern profession logic fat tradition
    extern margin jurisdiction historic education
    promotion constitution addition sensation roy
    ration origin classic convention
  • NULL.man sand news police states gross sun fresh
    sports boss sales 3- patrol bonds
  • ed.er.ing slugg manag crush publish robb
  • NULL.ity.s major senior moral hospital
  • NULL.ry hung mason ave summit scene surge rival
    forest
  • NULL.a.s indian kind american

40
Finite state morphology
41
FSA finite-state automata
  • Consists of
  • a set of states
  • a starting state
  • a set of final or accepting states
  • a finite set of symbols
  • a set of transitions each is defined by a
    from-state, a to-state, and a symbol

42
  • Its natural to think of this as describing an
    annotated directed graph.

a
b
a
a
a
!
!
a
q0
q1
q2
q3
q3
43
  • An FSA can be thought of as judging (accepting) a
    string, or as generating one.
  • How does it judge? Find a start to finish path
    that matches the string.
  • How does it generate? Walk through any
    start-to-finish path.

44
Deterministic and Non-deterministic FSAs
  • Just a little difference
  • Deterministic case For every state, there is a
    maximum of one transition associated with any
    given symbol. You can say that theres a function
    from statesXsymbols ? states
  • Nondeterministic case There is no such
    restriction hence, given a state and a symbol,
    it is not necessarily certain which transition is
    to be taken.

45
a
b
a
!
a
q0
q1
q2
q3
q3
deterministic
46
a
b
a
!
a
q0
q1
q2
q3
q3
non-deterministic
The best things in life are non-deterministic.
47
Figure 3.4 p. 68
un-
adj-root
-er est -ly
q0
q1
q2
q3
e
Alternate notation
48
Yet a third way rows in an array(to-column can
consist of pointers)
From To Output
0 1 un
0 1 NULL
1 2 adj-root-list
2 3 erestly
Stop states 2,3
49
adj-root-1
un-
-er est -ly
q1
q2
q0
q5
adj-root-1
q3
q4
-er est
e
adj-root-2
Figure 3.5 p. 69
50
Yet a third way rows in an array(to-column can
consist of pointers)
Stop states 2,4,5
From To Output
0 1 un
0 3 NULL
1 2 adj-root-list-1
2 5 erestly
3 2 adj-root-list-1
3 4 adj-root-list-2
4 5 erest
51
0 1 noun 1
1 2 ize
2 3 ation
2 4 er
2 5 able
0 5 -al adj
0 1 adj-al
5 6 ityness
0 7 verbs1
7 8 ive adj
8 6 ness
8 9 ly
0 10 verb2
10 8 ative
0 11 noun2
11 8 ful

Figure 3.6, p. 70
52
We can simplify greatly (generating a bit more.)
53
Finite-State Transducers (FST)
  • The symbols of the FST are complex theyre
    really pairs of symbols, one for each of two
    tapes or levels.
  • Recognizer decides if a given pair of
    representations fits together OK
  • Generator generates pairs of representations
    that fit together
  • Translator takes a representation on one level
    and produces the appropriate representation on
    the other level

54
Finite state transducers
  • can be inverted, or
  • composed
  • and you get another FST.

55
Complex symbols
  • Usually of the form ab, which means a can appear
    on the upper tape when b appears on the lower
    tape.
  • So ab means thats a permissible pairing up of
    symbols.
  • a along means aa, etc.
  • epsilon e means null character.
  • Remember, other means any feasible pair that
    is not in this transducer (p. 78)

56
Using FSTs for orthographic rules
57
Using FSTs for orthographic rules
foxswe get to q1 with x
58
Using FSTs for orthographic rules
foxswe get to q2 with
59
Using FSTs for orthographic rules
foxswe can get to q3 with NULL
60
Using FSTs for orthographic rules
foxswe also get to q5 with s but we dont
want to!
61
So why is this transition there? ?friendship,
?foxss ( foxess)
foxswe also get to q5 with s but we dont
want to!
62
foxsq4 with s
63
foxsq0 with (accepting state)
64
Other transitions
arizona we leave q0 but return
65
Other transitions
m i s s s
Write a Comment
User Comments (0)
About PowerShow.com