Lecture 2 Finite State Machines - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Lecture 2 Finite State Machines

Description:

Consider mapping singular nouns to plurals. Duck ducks, cat cats, book books ... Morphological parsing recognizing plurals. Spelling rules. Morphological rules ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 41
Provided by: mantonm5
Category:

less

Transcript and Presenter's Notes

Title: Lecture 2 Finite State Machines


1
Lecture 2Finite State Machines Morphology
CSCE 771 Natural Language Processing
  • Topics
  • Finite State Machines
  • Morphology
  • Readings Chapter 3 (skim Chapter 2)

June 4, 2009
2
  • Last Time
  • Challenge of 2001s HAL
  • Areas of Research
  • Examples of Language Processing
  • Formal Languages
  • Alphabets, strings,
  • regular expressions denote languages
  • finite automata (DFA, NFA) accept languages
  • Today
  • Slides from Lecture 1 30-
  • Regular expressions in Perl, grep, vi, emacs,
    word?
  • Eliza
  • Morphology

3
Python ??
  • Windows

4
Concepts on Regular Languages
  • If r is a regular expression then there is a
    construction from r that will yield an NFA Mr
    such that L(r) L(Mr)
  • Every NFA can be converted into an equivalent
    DFA?
  • There are languages that cannot be regular.
  • Every regular language can be generated by a
    regular grammar.

5
Grep family revisited
  • Global match Regular Expression and Print (GREP)
  • grep uUnix f1 f2 fn
  • egrep pat files // efficient NFA?DFA, then
    execute
  • fgrep pat files // fixed grep for fixed strings
  • Find for searching directories (not really reg
    expr)
  • find dir name pat // search for files with name
    matching pat
  • find dir -exec grep pat //search in
    files for the pattern pat

6
Editing scripts
  • Create a script of editing commands then execute
    with
  • ex file1 lt edScript
  • Example
  • 1,s/uUnix/UNIX/g
  • 1,s/langauge/language/g
  • g//d // delete empty lines start of line
    end
  • w
  • q

7
Other Unix regular expression Based Tools
  • sed (stream editor)
  • awk
  • Perl scripting language
  • Python, Ruby
  • Nltk now in Python

8
Perl Regular Expressions
9
Eliza Substitutions
  • Eliza took the input and performed several
    transformations (substitutions) to produce the
    output.
  • First
  • s/mMy/YOUR/g my ? YOUR
  • s/Im/YOU ARE/g Im ? YOU ARE

10
Eliza in Perl
  • print "Welcome to Elizalike. Talk to me! (Or type
    \"bye\" to quit.)\n"
  • Start an infinite loop
  • while (1 1)
  • This line reads in user input, and stores it in
    the special
  • variable _, which makes the regular expression
  • statements below more succint.
  • _ ltSTDINgt
  • Allow users to quit
  • if (_ /byeByeBYE/)
  • print "Elizalike Well, it was nice talking
    with you!\n" exit (0)

http//en.wikipedia.org/wiki/ELIZA
11
  • Insert a tag at the beginning of the line to
    identify
  • it as Eliza's reponse, and to make finding word
  • boundaries easier.
  • s//Elizalike /
  • Replace all instances of "you are" with "Eliza
    is"
  • Note how (\W) and \1, etc are used to mark word
  • boundaries and keep whatever non-word character
    was
  • in the input in the output.
  • s/(\W)(youYou) are(\W)/\1Eliza is\3/g

http//en.wikipedia.org/wiki/ELIZA
12
Transformations
  • x

http//en.wikipedia.org/wiki/ELIZA
13
  • Print the result to STDOUT.
  • print STDOUT

http//en.wikipedia.org/wiki/ELIZA
14
Morphology -
  • A writer is someone who writes, a stinger is
    something that stings. But fingers dont fing,
    grocers dont groc, hammers dont ham and
    humdingers dont humding.
  • Richard Lederer, Crazy English
  • Consider mapping singular nouns to plurals
  • Duck ? ducks, cat ? cats, book ? books
  • Fox ? foxes, bush ? bushes
  • Goose ? geese
  • Fish ? fish
  • Morphological parsing recognizing plurals
  • Spelling rules
  • Morphological rules

15
Morphology -
  • Morphemes minimal meaning bearing unit
  • Stems and affixes
  • Cat -s
  • Affixes
  • Prefixes
  • Suffixes
  • Infixes
  • Circumfixes
  • Inflection combination of word stem with affix
    reulting in a word of the same class (noun ?
    noun)
  • Dervivation combination of word stem with
    grammatical morpheme resulting in another class
    verb?noun (e.g. walk ing ? walking)

16
Turkish Example of a complex language
  • The following is a Turkish word and its
    decomposition.

17
English Regular Nouns (Number)
  • In other languages you typically indicate other
    features such as gender.
  • Plurality
  • possesive

SLP fig from 3.1.1
18
Regular Verbs
  • X

SLP fig from 3.1.1
19
Irregular Verbs
  • X

SLP fig from 3.1.1
20
To Love in Spanish
  • X

OLD SLP fig from 3.1.1
21
Nominalization
  • Nominalization formation of nouns from verbs or
    adjectives

SLP fig from 3.1.2
22
Adjectives from Nouns and Verbs
  • X

23
Morphological Parsed Output
  • N Noun, V Verb
  • PL, SG, Present (default not shown), Past,
    Pres-Participle, Past-Participle

SLP fig 3.2
24
Figure 3.3 FSA for nominal inflection
  • X

25
Figure 3.4 FSA for verbal inflection
26
Verb stem classes
  • X

27
Fig 3.5 FSA for Adjective morphology
  • Conisder
  • Cool, happy, natural
  • Big, equal

28
  • X

29
Fig 3.6 FSA for another Fragment of English
derivational morphology
  • X

30
3.7 English nouns with inflection
  • X

31
  • X

32
  • X

33
  • X

34
Fig 3.14 FSA for English nominilazation
  • X

35
  • X

36
  • X

37
  • X

38
  • X

39
  • x

40
  • X
Write a Comment
User Comments (0)
About PowerShow.com