Fall 2005 - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Fall 2005

Description:

Geese: goose N PL. Ducks: (duck N PL) or (duck V 3SG) Merging: ... Irregular nouns: e.g., geese, sheep, mice. Irregular verbs: e.g., caught, ate, eaten ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 57
Provided by: rad75
Category:
Tags: fall

less

Transcript and Presenter's Notes

Title: Fall 2005


1

EECS 595 / LING 541 / SI 661
Natural Language Processing
  • Fall 2005
  • Lecture Notes 4

2
Features and unification
3
Introduction
  • Grammatical categories have properties
  • Constraint-based formalisms
  • Example this flights agreement is difficult to
    handle at the level of grammatical categories
  • Example many water count/mass nouns
  • Sample rule that takes into account features S ?
    NP VP (but only if the number of the NP is equal
    to the number of the VP)

4
Feature structures
CAT NP NUMBER SINGULAR PERSON 3
CAT NP AGREEMENT
NUMBER SG
PERSON 3
Feature paths x agreement number
5
Unification
  • NUMBER SG NUMBER SG
  • NUMBER SG NUMBER PL -
  • NUMBER SG NUMBER NUMBER
    SG
  • NUMBER SG PERSON 3 ?

6
Agreement
  • S ? NP VPNP AGREEMENT VP AGREEMENT
  • Does this flight serve breakfast?
  • Do these flights serve breakfast?
  • S ? Aux NP VPAux AGREEMENT NP AGREEMENT

7
Agreement
  • These flights
  • This flight
  • NP ? Det NominalDet AGREEMENT Nominal
    AGREEMENT
  • Verb ? serveVerb AGREEMENT NUMBER PL
  • Verb ? servesVerb AGREEMENT NUMBER SG

8
Subcategorization
  • VP ? VerbVP HEAD Verb HEADVP HEAD
    SUBCAT INTRANS
  • VP ? Verb NPVP HEAD Verb HEADVP HEAD
    SUBCAT TRANS
  • VP ? Verb NP NPVP HEAD Verb HEADVP HEAD
    SUBCAT DITRANS

9
Regular Expressions andAutomata
10
Regular expressions
  • Searching for woodchuck
  • Searching for woodchucks with an optional final
    s
  • Regular expressions
  • Finite-state automata (singular automaton)

11
Regular expressions
  • Basic regular expression patterns
  • Perl-based syntax (slightly different from other
    notations for regular expressions)
  • Disjunctions abc
  • Ranges A-Z
  • Negations Ss
  • Optional characters ? and
  • Wild cards .
  • Anchors and , also \b and \B
  • Disjunction, grouping, and precedence

12
Writing correct expressions
  • Exercise write a Perl regular expression to
    match the English article the

/the//tThe//\btThe\b//a-zA-ZtThea-zA
-Z//(a-zA-Z)tThea-zA-Z/
13
A more complex example
  • Exercise Write a regular expression that will
    match any PC with more than 500MHz and 32 Gb of
    disk space for less than 1000

/0-9//0-9\.0-90-9//\b0-9(\.0-9
0-9)?\b//\b0-9 (MHzMmegahertzGhz
Ggigahertz)\b//\b0-9 (MbMmegabytes?)\b/
/\b0-9(\.0-9) (GbGgigabytes?)\b/
14
Advanced operators
15
Substitutions and memory
  • Substitutions

s/colour/color/
  • Memory (\1, \2, etc. refer back to matches)

s/(0-9)/lt\1gt/
16
Eliza Weizenbaum, 1966
  • User Men are all alike
  • ELIZA IN WHAT WAY
  • User Theyre always bugging us about something
    or other
  • ELIZA CAN YOU THINK OF A SPECIFIC EXAMPLE?
  • User Well, my boyfriend made me come here
  • ELIZA YOUR BOYFRIEND MADE YOU COME HERE
  • User He says Im depressed much of the time
  • ELIZA I AM SORRY TO HEAR THAT YOU ARE DEPRESSED

17
Eliza-style regular expressions
Step 1 replace first person references with
second person referencesStep 2 use additional
regular expressions to generate replies Step 3
use scores to rank possible transformations
  • s/. YOU ARE (depressedsad) ./I AM SORRY TO
    HEAR YOU ARE \1/
  • s/. YOU ARE (depressedsad) ./WHY DO YOU THINK
    YOU ARE \1/
  • s/. all ./IN WHAT WAY/
  • s/. always ./CAN YOU THINK OF A SPECIFIC
    EXAMPLE/

18
Finite-state automata
  • Finite-state automata (FSA)
  • Regular languages
  • Regular expressions

19
Finite-state automata (machines)
baa! baaa! baaaa! baaaaa! ...
baa!
a
b
a
a
!
q0
q1
q2
q3
q4
finalstate
state
transition
20
Input tape
q0
a
b
a
!
b
21
Finite-state automata
  • Q a finite set of N states q0, q1, qN
  • ? a finite input alphabet of symbols
  • q0 the start state
  • F the set of final states
  • ?(q,i) transition function

22
State-transition tables
Input Input Input
State b a !
0 1 0 0
1 0 2 0
2 0 3 0
3 0 3 4
4 0 0 0
23
The FSM toolkit and friends
  • Developed at ATT Research (Riley, Pereira,
    Mohri, Sproat)
  • Download http//www.research.att.com/sw/tools/fs
    m/tech.htmlhttp//www.research.att.com/sw/tools/l
    extools/
  • Tutorial available
  • 4 useful parts FSM, Lextools, GRM, Dot
    (separate)
  • /data2/tools/fsm-3.6/bin
  • /data2/tools/lextools/bin
  • /data2/tools/dot/bin

24
D-RECOGNIZE
function D-RECOGNIZE (tape, machine) returns
accept or reject index ? Beginning of tape
current-state ? Initial state of machine loop
if End of input has been reached then
if current-state is an accept state then
return accept else return
reject elsif transition-table
current-state, tapeindex is empty then
return reject else current-state ?
transition-table current-state, tapeindex
index ? index 1end
25
Adding a failing state
a
b
a
a
!
q0
q1
q2
q3
q4
!
!
b
!
b
!
b
b
a
qF
a
26
Languages and automata
  • Formal languages regular languages, non-regular
    languages
  • deterministic vs. non-deterministic FSAs
  • Epsilon (?) transitions

27
Using NFSAs to accept strings
  • Backup add markers at choice points, then
    possibly revisit underexplored markers
  • Look-ahead look ahead in input
  • Parallelism look at alternatives in parallel

28
Using NFSAs
Input Input Input Input
State b a ! e
0 1 0 0 0
1 0 2 0 0
2 0 2,3 0 0
3 0 0 4 0
4 0 0 0 0
29
More about FSAs
  • Transducers
  • Equivalence of DFSAs and NFSAs
  • Recognition as search depth-first,
    breadth-search

30
Recognition using NFSAs
31
Regular languages
  • Operations on regular languages and FSAs
    concatenation, closure, union
  • Properties of regular languages (closed under
    concatenation, union, disjunction, intersection,
    difference, complementation, reversal, Kleene
    closure)

32
An exercise
  • JM 2.8. Write a regular expression for the
    language accepted by the NFSA in the Figure.

33
Morphology and Finite-State Transducers
34
Morphemes
  • Stems, affixes
  • Affixes prefixes, suffixes, infixes hingi
    (borrow) humingi (agent) in Tagalog,
    circumfixes sagen gesagt in German
  • Concatenative morphology
  • Templatic morphology (Semitic languages)
  • lmd (learn), lamad (he studied), limed (he
    taught), lumad (he was taught)

35
Morphological analysis
  • rewrites
  • unbelievably

36
Inflectional morphology
  • Tense, number, person, mood, aspect
  • Five verb forms in English
  • 40 forms in French
  • Six cases in Russianhttp//www.departments.buckn
    ell.edu/russian/language/case.html
  • Up to 40,000 forms in Turkish (you cause X to
    cause Y to do Z)

37
Derivational morphology
  • Nominalization computerization, appointee,
    killer, fuzziness
  • Formation of adjectives computational,
    embraceable, clueless

38
Finite-state morphological parsing
  • Cats cat N PL
  • Cat cat N SG
  • Cities city N PL
  • Geese goose N PL
  • Ducks (duck N PL) or (duck V 3SG)
  • Merging V PRES-PART
  • Caught (catch V PAST-PART) or (catch V PAST)

39
Principles of morphological parsing
  • Lexicon
  • Morphotactics (e.g., plural follows noun)
  • Orthography (easy ? easier)
  • Irregular nouns e.g., geese, sheep, mice
  • Irregular verbs e.g., caught, ate, eaten

40
FSA for adjectives
  • Big, bigger, biggest
  • Cool, cooler, coolest, coolly
  • Red, redder, reddest
  • Clear, clearer, clearest, clearly, unclear,
    unclearly
  • Happy, happier, happiest, happily
  • Unhappy, unhappier, unhappiest, unhappily
  • What about unbig, redly, and realest?

41
Using FSA for recognition
  • Is a string a legitimate word or not?
  • Two-level morphology lexical level surface
    level (Koskenniemi 83)
  • Finite-state transducers (FST) used for regular
    relations
  • Inversion and composition of FST

42
Orthographic rules
  • Beg/begging
  • Make/making
  • Watch/watches
  • Try/tries
  • Panic/panicked

43
Combining FST lexicon and rules
  • Cascades of transducersthe output of one
    becomes the input of another

44
Weighted Automata
45
Phonetic symbols
  • IPA
  • Arpabet
  • Examples

46
Using WFST for language modeling
  • Phonetic representation
  • Part-of-speech tagging

47
Word Classes andPart Of Speech Tagging
48
Some POS statistics
  • Preposition list from COBUILD
  • Single-word particles
  • Conjunctions
  • Pronouns
  • Modal verbs

49
Tagsets for English
  • Penn Treebank
  • Other tagsets (see Week 1 slides)

50
POS ambiguity
  • Degrees of ambiguity (DeRose 1988)
  • Rule-based POS tagging
  • ENGTWOL (Voutilainen et al. )
  • Sample rule
  • Adverbial-That rule (it isnt that
    odd) (Given input thatif (1
    A/ADV/QUANT) (2 SENT-LIM) (NOT 1
    SVOC/A) (not a verb like consider)then
    eliminate non-ADV tagselse eliminate ADV tag

51
Evaluating POS taggers
  • Percent correct
  • What is the lower bound on a systems
    performance?
  • What about the upper bound?

52
Kappa
  • N number of items (index i)
  • n number of categories (index j)
  • k number of annotators
  • when k gt .8 agreement is considered high

53
Midterm reading list
  • Chapter 1 Introduction
  • Chapter 2 Regular expressions and automata
  • Chapter 3 Morphology and finite-state
    transducers FSM tutorial
  • Chapter 8 Word classes and POS tagging
  • Chapter 9 Context-free grammars for English
  • Chapter 10 Parsing with context-free grammars
  • Chapter 11 - Features and unification

54
Syntaxscape
  • Written by Juno Suk of Lucent
  • http//www.cs.columbia.edu/radev/syntaxscape/

55
(No Transcript)
56
Read by yourselves
  • 9.9. Spoken language syntax
  • 9.10. Grammar equivalence
  • 9.11. Finite-state and context-free grammars
Write a Comment
User Comments (0)
About PowerShow.com