LING 438538 Computational Linguistics - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

LING 438538 Computational Linguistics

Description:

Britney Spears homework returned later today. Nov 29th ... noun(noun(money)) -- [money]. verb(verb(book)) -- [book]. verb(verb(include)) -- [include] ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 50
Provided by: sandiw
Category:

less

Transcript and Presenter's Notes

Title: LING 438538 Computational Linguistics


1
LING 438/538Computational Linguistics
  • Sandiway Fong
  • Lecture 24 11/26

2
Lecture Schedule
  • Remaining Lectures
  • Nov 26th (today)
  • Grammar homework due today
  • Britney Spears homework returned later today
  • Nov 29th (Thursday) Ontologies/WordNet
  • Dec 4th (538 Presentations)
  • you must email me your slides (powerpoint or pdf)
    by midnight Dec 3rd
  • 8 mins total per presentation (WAS 10)
  • (you should practice/aim to finish in that time)
  • (no laptop switching to save time)
  • we will still run over class time...

3
Lecture Schedule
  • Final
  • prep session needed?
  • READING DAY (next Thursday) perhaps?
  • 11th December
  • 11am 1pm slot (here)
  • Topics (5 out of 7 - 538, 4 out of 7 - 438)
    tentatively
  • Finite state transducers (FST)
  • Porter stemmer
  • edit distance
  • natural language grammar (like the last homework)
  • parsing methods (todays lecture)
  • n-gram/smoothing computation
  • WordNet

4
Presentation Chapters
5
Chapters (2000 edition)
  • II Syntax
  • 11 Features and Unification ?
  • Jeff Berry
  • 12 Lexicalized and Probabilistic Parsing
  • Mary Dungan (human parsing)
  • 13 Language and Complexity
  • Roeland Hancock
  • III Semantics
  • 14 Representing Meaning
  • Mark Siner (FOPC)
  • 15 Semantic Analysis
  • Sean Humphreys (named entity recognition)
  • 16 Lexical Semantics
  • 17 Word Sense Disambiguation and Information
    Retrieval
  • Sunjing Ji
  • HsinMin Lu

6
Chapters (2000 edition)
  • IV Pragmatics
  • 18 Discourse
  • 19 Dialog and Conversational Agents
  • 20 Natural Language Generation
  • V Multilingual Processing
  • 21 Machine Translation
  • Dainon Woudstra
  • Other
  • Biomedical Information
  • Tara Paulsen

7
Parsing Methods
  • Background reading
  • Chapter 10
  • Parsing with Context-Free Grammars

8
Top-Down Parsing
  • we already know one top-down parsing algorithm
  • DCG rule system starting at the top node
  • using the Prolog computation rule
  • always try the first matching rule
  • expand x --gt y, z.
  • top-down x then y and z
  • left-to-right do y then z
  • depth-first expands DCG rules for y before
    tackling z
  • problems
  • left-recursion
  • gives termination problems
  • no bottom-up filtering
  • inefficient
  • left-corner idea

9
Top-Down Parsing
  • Prolog computation rule
  • is equivalent to the stack-based algorithm shown
    in the textbook
  • (figure 10.6)
  • Prolog advantage
  • we dont need to implement this explicitly
  • this is default Prolog strategy

10
Top-Down Parsing
  • assume grammar

11
Top-Down Parsing
mismatch
  • example
  • does this flight include a meal?

12
Top-Down Parsing
  • example
  • does this flight include a meal?

13
Top-Down Parsing
  • example
  • does this flight include a meal?
  • query
  • ?- s(X,does,this,flight,include,a,meal,).
  • X s(aux(does),np(det(this),nom(noun(flight))),vp
    (verb(include),np(det(a),nom(noun(meal)))))

14
Top-Down Parsing
  • Prolog grammar
  • s(s(NP,VP)) --gt np(NP), vp(VP).
  • s(s(Aux,NP,VP)) --gt aux(Aux), np(NP), vp(VP).
  • s(s(VP)) --gt vp(VP).
  • np(np(D,N)) --gt det(D), nominal(N).
  • nominal(nom(N)) --gt noun(N).
  • nominal(nom(N1,N)) --gt noun(N1), nominal(N).
  • np(np(PN)) --gt propernoun(PN).
  • vp(vp(V)) --gt verb(V).
  • vp(vp(V,NP)) --gt verb(V),np(NP).
  • det(det(that)) --gt that.
  • det(det(this)) --gt this.
  • det(det(a)) --gt a.
  • noun(noun(book)) --gt book.
  • noun(noun(flight)) --gt flight.
  • noun(noun(meal)) --gt meal.
  • noun(noun(money)) --gt money.
  • verb(verb(book)) --gt book.
  • verb(verb(include)) --gt include.
  • verb(verb(prefer)) --gt prefer.
  • aux(aux(does)) --gt does.
  • preposition(prep(from)) --gt from.
  • preposition(prep(to)) --gt to.
  • preposition(prep(on)) --gt on.
  • propernoun(propn(houston)) --gt houston.
  • propernoun(propn(twa)) --gt twa.
  • nominal(nom(N,PP)) --gt nominal(N), pp(PP).
  • pp(pp(P,NP)) --gt preposition(P), np(NP).

15
Top-Down Parsing
  • example
  • does this flight include a meal?
  • gain in efficiency
  • avoid computing the first row of Figure 10.7

16
Top-Down Parsing
  • no bottom-up filtering
  • left-corner idea
  • eliminate unnecessary top-down search
  • reduce the number of choice points (amount of
    branching)
  • example
  • does this flight include a meal?
  • computation
  • s --gt np, vp.
  • s --gt aux, np, vp.
  • s --gt vp.
  • left-corner idea rules out 1 and 3

17
Left Corner Parsing
  • need bottom-up filtering
  • filter top-down rule expansion using bottom-up
    information
  • current input is the bottom-up information
  • left-corner idea
  • example
  • s(s(NP,VP)) --gt np(NP), vp(VP).
  • what terminals can be used to begin this phrase?
  • answer whatever can begin NP
  • np(np(D,N)) --gt det(D), nominal(N).
  • np(np(PN)) --gt propernoun(PN).
  • answer whatever can begin Det or ProperNoun
  • det(det(that)) --gt that.
  • det(det(this)) --gt this.
  • det(det(a)) --gt a.
  • propernoun(propn(houston)) --gt houston.
  • propernoun(propn(twa)) --gt twa.
  • answer
  • that,this,a,houston,twa Left Corner

s /\ np vp /\ det nominal propernoun
18
Left Corner Parsing
  • example
  • does this flight include a meal?
  • computation
  • s(s(NP,VP)) --gt np(NP), vp(VP). LC
    that,this,a,houston,twa
  • s(s(Aux,NP,VP)) --gt aux(Aux), np(NP), vp(VP). LC
    does
  • s(s(VP)) --gt vp(VP). LC book,include,prefer
  • only rule 2 is compatible with the input
  • match first input terminal against left-corner
    (LC) set for each possible matching rule
  • left-corner idea prunes away or rules out options
    1 and 3

19
Left Corner Parsing
  • DCG Rules
  • s(s(NP,VP)) --gt np(NP), vp(VP). LC
    that,this,a,houston,twa
  • s(s(Aux,NP,VP)) --gt aux(Aux), np(NP), vp(VP).
    LC does
  • s(s(VP)) --gt vp(VP). LC book,include,prefer
  • left-corner database facts
  • lc(rule,word_,word_).
  • lc(1,thatL,thatL). lc(2,doesL,doesL).
  • lc(1,thisL,thisL). lc(3,bookL,bookL).
  • lc(1,aL,aL). lc(3,includeL,includeL).
  • lc(1,houstonL,houstonL). lc(3,preferL,pr
    eferL).
  • lc(1,twaL,twaL).
  • rewrite Prolog rules to check input against lc
  • s(s(NP,VP)) --gt lc(1), np(NP), vp(VP).
  • s(s(Aux,NP,VP)) --gt lc(2), aux(Aux), np(NP),
    vp(VP).
  • s(s(VP)) --gt lc(3), vp(VP).

20
Left Corner Parsing
  • left-corner database facts
  • lc(rule,word_,word_).
  • lc(1,thatL,thatL). lc(2,doesL,doesL).
  • lc(1,thisL,thisL). lc(3,bookL,bookL).
  • lc(1,aL,aL). lc(3,includeL,includeL)
    .
  • lc(1,houstonL,houstonL). lc(3,preferL,pr
    eferL).
  • lc(1,twaL,twaL).
  • rewrite DCG rules to check input against lc/3
  • s(s(NP,VP)) --gt lc(1), np(NP), vp(VP).
  • s(s(Aux,NP,VP)) --gt lc(2), aux(Aux), np(NP),
    vp(VP).
  • s(s(VP)) --gt lc(3), vp(VP).
  • DCG rules are translated into underlying Prolog
    rules
  • s(s(A,B), C, D) - lc(1, C, E), np(A, E, F),
    vp(B, F, D).
  • s(s(A,B,C), D, E) - lc(2, D, F), aux(A, F, G),
    np(B, G, H), vp(C, H, E).
  • s(s(A), B, C) - lc(3, B, D), vp(A, D, C).

21
Left Corner Parsing
  • Summary
  • Given a context-free DCG
  • Generate left-corner database facts
  • lc(rule,word_,word_).
  • Rewrite DCG rules to check input against lc
  • s(s(NP,VP)) --gt lc(1), np(NP), vp(VP).
  • DCG rules are translated into underlying Prolog
    rules
  • s(s(A,B), C, D) - lc(1, C, E), np(A, E, F),
    vp(B, F, D).
  • This process can be done automatically (by
    program)
  • Note
  • not all rules need be rewritten
  • lexicon rules are direct left-corner rules
  • no filtering is necessary
  • det(det(a)) --gt a.
  • noun(noun(book)) --gt book.
  • i.e. no need to call lc as in
  • det(det(a)) --gt lc(11), a.
  • noun(noun(book)) --gt lc(12), book.

22
Left Corner Parsing
  • lc(1, thatA, thatA).
  • lc(1, thisA, thisA).
  • lc(1, aA, aA).
  • lc(1, houstonA, houstonA)
  • .lc(1, twaA, twaA).
  • lc(2, doesA, doesA).
  • lc(3, bookA, bookA).
  • lc(3, includeA, includeA).
  • lc(3, preferA, preferA).
  • lc(3, bookA, bookA).
  • lc(3, includeA, includeA).
  • lc(3, preferA, preferA).
  • lc(4, thatA, thatA).
  • lc(4, thisA, thisA).
  • lc(4, aA, aA).
  • lc(5, bookA, bookA).
  • lc(5, flightA, flightA).
  • lc(5, mealA, mealA).
  • lc(5, moneyA, moneyA).
  • s(s(_549,_550))--gtlc(1),np(_549),vp(_550).
  • s(s(_554,_555,_556))--gtlc(2),aux(_554),np(_555),vp
    (_556).
  • s(s(_544))--gtlc(3),vp(_544).
  • np(np(_549,_550))--gtlc(4),det(_549),nominal(_550).
  • nominal(nom(_544))--gtlc(5),noun(_544).
  • nominal(nom(_549,_550))--gtlc(6),noun(_549),nominal
    (_550).
  • np(np(_544))--gtlc(7),propernoun(_544).
  • vp(vp(_544))--gtlc(8),verb(_544).
  • vp(vp(_549,_550))--gtlc(9),verb(_549),np(_550).
  • nominal(nom(_549,_550))--gtlc(5),nominal(_549),pp(_
    550).
  • pp(pp(_549,_550))--gtlc(27),preposition(_549),np(_5
    50).
  • det(det(that)) --gt that.
  • det(det(this)) --gt this.
  • det(det(a)) --gt a.
  • noun(noun(book)) --gt book.
  • noun(noun(flight)) --gt flight.
  • noun(noun(meal)) --gt meal.
  • noun(noun(money)) --gt money.

23
Left Corner Parsing
  • Prolog query
  • ?- s(X,does,this,flight,include,a,meal,).
  • 1 1 Call s(_430,does,this,flight,in
    clude,a,meal,) ?
  • 2 2 Call lc(1,does,this,flight,incl
    ude,a,meal,_1100) ?
  • 2 2 Fail lc(1,does,this,flight,incl
    ude,a,meal,_1100) ?
  • 3 2 Call lc(2,does,this,flight,incl
    ude,a,meal,_1107) ?
  • 3 2 Exit lc(2,does,this,flight,incl
    ude,a,meal,does,this,flight,include,a,meal) ?
  • 4 2 Call aux(_1112,does,this,flight
    ,include,a,meal,_1100) ?
  • 5 3 Call 'C'(does,this,flight,inclu
    de,a,meal,does,_1100) ? s
  • 5 3 Exit 'C'(does,this,flight,inclu
    de,a,meal,does,this,flight,include,a,meal) ?
  • 4 2 Exit aux(aux(does),does,this,fl
    ight,include,a,meal,this,flight,include,a,meal)
    ?
  • 6 2 Call np(_1113,this,flight,inclu
    de,a,meal,_1093) ?
  • 7 3 Call lc(4,this,flight,include,a
    ,meal,_3790) ?
  • ? 7 3 Exit lc(4,this,flight,include,a
    ,meal,this,flight,include,a,meal) ?
  • 8 3 Call det(_3795,this,flight,incl
    ude,a,meal,_3783) ? s
  • ? 8 3 Exit det(det(this),this,flight,
    include,a,meal,flight,include,a,meal) ?
  • 9 3 Call nominal(_3796,flight,inclu
    de,a,meal,_1093) ?
  • 10 4 Call lc(5,flight,include,a,meal
    ,_5740) ? s
  • ? 10 4 Exit lc(5,flight,include,a,meal
    ,flight,include,a,meal) ?

24
Left Corner Parsing
  • Prolog query (contd.)
  • 12 2 Call vp(_1114,include,a,meal,) ?
  • 13 3 Call lc(8,include,a,meal,_8441
    ) ? s
  • ? 13 3 Exit lc(8,include,a,meal,incl
    ude,a,meal) ?
  • 14 3 Call verb(_8446,include,a,meal
    ,) ? s
  • 14 3 Fail verb(_8446,include,a,meal
    ,) ?
  • 13 3 Redo lc(8,include,a,meal,incl
    ude,a,meal) ? s
  • 13 3 Fail lc(8,include,a,meal,_8441
    ) ?
  • 15 3 Call lc(9,include,a,meal,_8448
    ) ? s
  • ? 15 3 Exit lc(9,include,a,meal,incl
    ude,a,meal) ?
  • 16 3 Call verb(_8453,include,a,meal
    ,_8441) ? s
  • ? 16 3 Exit verb(verb(include),include
    ,a,meal,a,meal) ?
  • 17 3 Call np(_8454,a,meal,) ?
  • 18 4 Call lc(4,a,meal,_10423) ? s
  • 18 4 Exit lc(4,a,meal,a,meal) ?
  • 19 4 Call det(_10428,a,meal,_10416)
    ? s
  • 19 4 Exit det(det(a),a,meal,meal)
    ?
  • 20 4 Call nominal(_10429,meal,)
    ?
  • 21 5 Call lc(5,meal,_12385) ? s

25
Bottom-Up Parsing
  • LR(0) parsing
  • An example of bottom-up tabular parsing
  • Similar to the top-down Earley algorithm
    described in the textbook in that it uses the
    idea of dotted rules

26
Tabular Parsing
  • e.g. LR(k) (Knuth, 1960)
  • invented for efficient parsing of programming
    languages
  • disadvantage a potentially huge number of states
    can be generated when the number of rules in the
    grammar is large
  • can be applied to natural languages (Tomita 1985)
  • build a Finite State Automaton (FSA) from the
    grammar rules, then add a stack
  • tables encode the grammar (FSA)
  • grammar rules are compiled
  • no longer interpret the grammar rules directly
  • Parser Table Push-down Stack
  • table entries contain instruction(s) that tell
    what to do at a given state
  • possibly factoring in lookahead
  • stack data structure deals with maintaining the
    history of computation and recursion

27
Tabular Parsing
  • Shift-Reduce Parsing
  • example
  • LR(0)
  • left to right
  • bottom-up
  • (0) no lookahead (input word)
  • LR actions
  • Shift read an input word
  • i.e. advance current input word pointer to the
    next word
  • Reduce complete a nonterminal
  • i.e. complete parsing a grammar rule
  • Accept complete the parse
  • i.e. start symbol (e.g. S) derives the terminal
    string

28
Tabular Parsing
  • LR(0) Parsing
  • L(G) LR(0)
  • i.e. the language generated by grammar G is LR(0)
  • if there is a unique instruction per state
  • (or no instruction error state)
  • LR(0) is a proper subset of context-free
    languages
  • note
  • human language tends to be ambiguous
  • there are likely to be multiple or conflicting
    actions per state
  • can let Prologs computation rule handle it
  • i.e. use Prolog backtracking

29
Tabular Parsing
  • Dotted Rule Notation
  • dot used to indicate the progress of a parse
    through a phrase structure rule
  • examples
  • vp --gt v . np
  • means weve seen v and predict np
  • np --gt . d np
  • means were predicting a d (followed by np)
  • vp --gt vp pp.
  • means weve completed a vp
  • state
  • a set of dotted rules encodes the state of the
    parse
  • kernel
  • vp --gt v . np
  • vp --gt v .
  • completion (of predict NP)
  • np --gt . d n
  • np --gt . n
  • np --gt . np cp

30
Tabular Parsing
  • compute possible states by advancing the dot
  • example
  • (Assume d is next in the input)
  • vp --gt v . np
  • vp --gt v . (eliminated)
  • np --gt d . n
  • np --gt . n (eliminated)
  • np --gt . np cp

31
Tabular Parsing
  • Dotted rules
  • example
  • State 0
  • s -gt . np vp
  • np -gt .d np
  • np -gt .n
  • np -gt .np pp
  • possible actions
  • shift d and go to new state
  • shift n and go to new state
  • Creating new states

shift d
shift n
32
Tabular Parsing
  • State 1 Shift N, goto State 2

33
Tabular Parsing
shift n
shift d
  • Shift
  • take input word, and
  • place on stack

34
Tabular Parsing
  • State 2 Reduce action NP -gt N .

35
Tabular Parsing
  • Reduce NP -gt N .
  • pop N milk off the stack, and
  • replace with NP N milk on stack

V is
N milk
Input
  • State 2

Stack
36
Tabular Parsing
  • State 3 Reduce NP -gt D N .

37
Tabular Parsing
  • Reduce NP -gt D N .
  • pop N man and D a off the stack
  • replace with NPD aN man

V hit
N man D a
Input
  • State 3

Stack
38
Tabular Parsing
  • State 0 Transition NP

39
Tabular Parsing
  • for both states 2 and 3
  • NP -gt N . (reduce NP -gt N)
  • NP -gt D N . (reduce NP -gt D N)
  • after Reduce NP operation
  • Goto state 4
  • notes
  • states are unique
  • grammar is finite
  • procedure generating states must terminate since
    the number of possible dotted rules

40
Tabular Parsing
41
Tabular Parsing
  • Observations
  • table is sparse
  • example
  • State 0, Input V ..
  • parse fails immediately
  • in a given state, input may be irrelevant
  • example
  • State 2 (there is no shift operation)
  • there may be action conflicts
  • example
  • State 1 shift D, shift N
  • more interesting cases
  • shift-reduce and reduce-reduce conflicts

42
Tabular Parsing
  • finishing up
  • an extra initial rule is usually added to the
    grammar
  • SS --gt S .
  • SS start symbol
  • end of sentence marker
  • input
  • milk is good for you
  • accept action
  • discard from input
  • return element at the top of stack as the parse
    tree

43
LR Parsing in Prolog
  • Recap
  • finite state machine
  • each state represents a set of dotted rules
  • example
  • S --gt . NP VP
  • NP --gt . D N
  • NP --gt . N
  • NP --gt . NP PP
  • we transition, i.e. move, from state to state by
    advancing the dot over terminal and nonterminal
    symbols

44
Build Actions
  • two main actions
  • Shift
  • move a word from the input onto the stack
  • Example
  • NP --gt .D N
  • Reduce
  • build a new constituent
  • Example
  • NP --gt D N.

45
Parser
  • Example
  • ?- parse(john,saw,the,man,with,a,telescope,X).
  • X s(np(n(john)),vp(v(saw),np(np(d(the),n(man)),p
    p(p(with),np(d(a),n(telescope))))))
  • X s(np(n(john)),vp(vp(v(saw),np(d(the),n(man))),
    pp(p(with),np(d(a),n(telescope)))))
  • no

46
LR(0) Goto Table
47
LR(0) Action Table
S shift, R reduce, A accept Empty cells
error states Multiple actions machine
conflict Prologs computation rule backtrack
48
LR(0) Conflict Statistics
  • Toy grammar
  • 14 states
  • 6 states
  • with 2 competing actions
  • states 11,10,8
  • shift-reduce conflict
  • 1 state
  • with 3 competing actions
  • State 7
  • shift(d) shift(n) reduce(vp-gtv)

49
LR Parsing
  • in fact
  • LR-parsers are generally acknowledged to be the
    fastest parsers
  • using lookahead (current terminal symbol)
  • and when combined with the chart technique
    (memorizing subphrases in a table - dynamic
    programming)
  • textbook
  • Earleys algorithm
  • uses chart
  • but builds dotted-rule configurations dynamically
    at parse-time
  • instead of ahead of time (so slower than LR)
Write a Comment
User Comments (0)
About PowerShow.com