Syntax - PowerPoint PPT Presentation

About This Presentation
Title:

Syntax

Description:

Over the last 12 years statistical parsing has succeeded wonderfully! ... Going into it, building a treebank seems a lot slower and less useful than building a grammar ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 77
Provided by: DanJur1
Category:
Tags: goingover | syntax

less

Transcript and Presenter's Notes

Title: Syntax


1
Syntax
  • Sudeshna Sarkar
  • 25 Aug 2008

2
Top-Down and Bottom-Up
  • Top-down
  • Only searches for trees that can be answers (i.e.
    Ss)
  • But also suggests trees that are not consistent
    with any of the words
  • Bottom-up
  • Only forms trees consistent with the words
  • But suggest trees that make no sense globally

3
Problems
  • Even with the best filtering, backtracking
    methods are doomed if they dont address certain
    problems
  • Ambiguity
  • Shared subproblems

4
Ambiguity
5
Shared Sub-Problems
  • No matter what kind of search (top-down or
    bottom-up or mixed) that we choose.
  • We dont want to unnecessarily redo work weve
    already done.

6
Shared Sub-Problems
  • Consider
  • A flight from Indianapolis to Houston on TWA

7
Shared Sub-Problems
  • Assume a top-down parse making bad initial
    choices on the Nominal rule.
  • In particular
  • Nominal -gt Nominal Noun
  • Nominal -gt Nominal PP

8
Shared Sub-Problems
9
Shared Sub-Problems
10
Shared Sub-Problems
11
Shared Sub-Problems
12
Parsing
  • CKY
  • Earley
  • Both are dynamic programming solutions that run
    in O(n3) time.
  • CKY is bottom-up
  • Earley is top-down

13
Sample Grammar
14
Dynamic Programming
  • DP methods fill tables with partial results and
  • Do not do too much avoidable repeated work
  • Solve exponential problems in polynomial time
    (sort of)
  • Efficiently store ambiguous structures with
    shared sub-parts.

15
CKY Parsing
  • First well limit our grammar to epsilon-free,
    binary rules (more later)
  • Consider the rule A -gt BC
  • If there is an A in the input then there must be
    a B followed by a C in the input.
  • If the A spans from i to j in the input then
    there must be some k st. iltkltj
  • Ie. The B splits from the C someplace.

16
CKY
  • So lets build a table so that an A spanning from
    i to j in the input is placed in cell i,j in
    the table.
  • So a non-terminal spanning an entire string will
    sit in cell 0, n
  • If we build the table bottom up well know that
    the parts of the A must go from i to k and from k
    to j

17
CKY
  • Meaning that for a rule like A -gt B C we should
    look for a B in i,k and a C in k,j.
  • In other words, if we think there might be an A
    spanning i,j in the input AND
  • A -gt B C is a rule in the grammar THEN
  • There must be a B in i,k and a C in k,j for
    some iltkltj

18
CKY
  • So to fill the table loop over the celli,j
    values in some systematic way
  • What constraint should we put on that?
  • For each cell loop over the appropriate k values
    to search for things to add.

19
CKY Table
20
CKY Algorithm
21
CKY Parsing
  • Is that really a parser?

22
Note
  • We arranged the loops to fill the table a column
    at a time, from left to right, bottom to top.
  • This assures us that whenever were filling a
    cell, the parts needed to fill it are already in
    the table (to the left and below)

23
Example
24
Other Ways to Do It?
  • Are there any other sensible ways to fill the
    table that still guarantee that the cells we need
    are already filled?

25
Other Ways to Do It?
26
Sample Grammar
27
Problem
  • What if your grammar isnt binary?
  • As in the case of the TreeBank grammar?
  • Convert it to binary any arbitrary CFG can be
    rewritten into Chomsky-Normal Form automatically.
  • What does this mean?
  • The resulting grammar accepts (and rejects) the
    same set of strings as the original grammar.
  • But the resulting derivations (trees) are
    different.

28
Problem
  • More specifically, rules have to be of the form
  • A -gt B C
  • Or
  • A -gt w
  • That is rules can expand to either 2
    non-terminals or to a single terminal.

29
Binarization Intuition
  • Eliminate chains of unit productions.
  • Introduce new intermediate non-terminals into the
    grammar that distribute rules with length gt 2
    over several rules. So
  • S -gt A B C
  • Turns into
  • S -gt X C
  • X - A B
  • Where X is a symbol that doesnt occur anywhere
    else in the the grammar.

30
CNF Conversion
31
CKY Algorithm
32
Example
Filling column 5
33
Example
34
Example
35
Example
36
Example
37
END
38
Statistical parsing
  • Over the last 12 years statistical parsing has
    succeeded wonderfully!
  • NLP researchers have produced a range of (often
    free, open source) statistical parsers, which can
    parse any sentence and often get most of it
    correct
  • These parsers are now a commodity component
  • The parsers are still improving year-on-year.

39
Classical NLP Parsing
  • Wrote symbolic grammar and lexicon
  • S ? NP VP NN ? interest
  • NP ? (DT) NN NNS ? rates
  • NP ? NN NNS NNS ? raises
  • NP ? NNP VBP ? interest
  • VP ? V NP VBZ ? rates
  • Used proof systems to prove parses from words
  • This scaled very badly and didnt give coverage
  • Minimal grammar on Fed raises sentence 36
    parses
  • Simple 10 rule grammar 592 parses
  • Real-size broad-coverage grammar millions of
    parses

40
Classical NLP ParsingThe problem and its
solution
  • Very constrained grammars attempt to limit
    unlikely/weird parses for sentences
  • But the attempt make the grammars not robust
    many sentences have no parse
  • A less constrained grammar can parse more
    sentences
  • But simple sentences end up with ever more parses
  • Solution We need mechanisms that allow us to
    find the most likely parse(s)
  • Statistical parsing lets us work with very loose
    grammars that admit millions of parses for
    sentences but to still quickly find the best
    parse(s)

41
The rise of annotated dataThe Penn Treebank
  • ( (S
  • (NP-SBJ (DT The) (NN move))
  • (VP (VBD followed)
  • (NP
  • (NP (DT a) (NN round))
  • (PP (IN of)
  • (NP
  • (NP (JJ similar) (NNS increases))
  • (PP (IN by)
  • (NP (JJ other) (NNS lenders)))
  • (PP (IN against)
  • (NP (NNP Arizona) (JJ real) (NN
    estate) (NNS loans))))))
  • (, ,)
  • (S-ADV
  • (NP-SBJ (-NONE- ))
  • (VP (VBG reflecting)
  • (NP
  • (NP (DT a) (VBG continuing) (NN
    decline))
  • (PP-LOC (IN in)

42
The rise of annotated data
  • Going into it, building a treebank seems a lot
    slower and less useful than building a grammar
  • But a treebank gives us many things
  • Reusability of the labor
  • Broad coverage
  • Frequencies and distributional information
  • A way to evaluate systems

43
Human parsing
  • Humans often do ambiguity maintenance
  • Have the police eaten their supper?
  • come in and look
    around.
  • taken out and shot.
  • But humans also commit early and are garden
    pathed
  • The man who hunts ducks out on weekends.
  • The cotton shirts are made from grows in
    Mississippi.
  • The horse raced past the barn fell.

44
Phrase structure grammars context-free grammars
  • G (T, N, S, R)
  • T is set of terminals
  • N is set of nonterminals
  • For NLP, we usually distinguish out a set P ? N
    of preterminals, which always rewrite as
    terminals
  • S is the start symbol (one of the nonterminals)
  • R is rules/productions of the form X ? ?, where X
    is a nonterminal and ? is a sequence of terminals
    and nonterminals (possibly an empty sequence)
  • A grammar G generates a language L.

45
Probabilistic or stochastic context-free grammars
(PCFGs)
  • G (T, N, S, R, P)
  • T is set of terminals
  • N is set of nonterminals
  • For NLP, we usually distinguish out a set P ? N
    of preterminals, which always rewrite as
    terminals
  • S is the start symbol (one of the nonterminals)
  • R is rules/productions of the form X ? ?, where X
    is a nonterminal and ? is a sequence of terminals
    and nonterminals (possibly an empty sequence)
  • P(R) gives the probability of each rule.
  • A grammar G generates a language model L.

46
Soundness and completeness
  • A parser is sound if every parse it returns is
    valid/correct
  • A parser terminates if it is guaranteed to not go
    off into an infinite loop
  • A parser is complete if for any given grammar and
    sentence, it is sound, produces every valid parse
    for that sentence, and terminates
  • (For many purposes, we settle for sound but
    incomplete parsers e.g., probabilistic parsers
    that return a k-best list.)

47
Top-down parsing
  • Top-down parsing is goal directed
  • A top-down parser starts with a list of
    constituents to be built. The top-down parser
    rewrites the goals in the goal list by matching
    one against the LHS of the grammar rules, and
    expanding it with the RHS, attempting to match
    the sentence to be derived.
  • If a goal can be rewritten in several ways, then
    there is a choice of which rule to apply (search
    problem)
  • Can use depth-first or breadth-first search, and
    goal ordering.

48
Top-down parsing
49
Bottom-up parsing
  • Bottom-up parsing is data directed
  • The initial goal list of a bottom-up parser is
    the string to be parsed. If a sequence in the
    goal list matches the RHS of a rule, then this
    sequence may be replaced by the LHS of the rule.
  • Parsing is finished when the goal list contains
    just the start category.
  • If the RHS of several rules match the goal list,
    then there is a choice of which rule to apply
    (search problem)
  • Can use depth-first or breadth-first search, and
    goal ordering.
  • The standard presentation is as shift-reduce
    parsing.

50
Shift-reduce parsing one path
  • cats scratch people with claws
  • cats scratch people with claws SHIFT
  • N scratch people with claws REDUCE
  • NP scratch people with claws REDUCE
  • NP scratch people with claws SHIFT
  • NP V people with claws REDUCE
  • NP V people with claws SHIFT
  • NP V N with claws REDUCE
  • NP V NP with claws REDUCE
  • NP V NP with claws SHIFT
  • NP V NP P claws REDUCE
  • NP V NP P claws SHIFT
  • NP V NP P N REDUCE
  • NP V NP P NP REDUCE
  • NP V NP PP REDUCE
  • NP VP REDUCE
  • S REDUCE
  • What other search paths are there for parsing
    this sentence?

51
Problems with top-down parsing
  • Left recursive rules
  • A top-down parser will do badly if there are many
    different rules for the same LHS. Consider if
    there are 600 rules for S, 599 of which start
    with NP, but one of which starts with V, and the
    sentence starts with V.
  • Useless work expands things that are possible
    top-down but not there
  • Top-down parsers do well if there is useful
    grammar-driven control search is directed by the
    grammar
  • Top-down is hopeless for rewriting parts of
    speech (preterminals) with words (terminals). In
    practice that is always done bottom-up as lexical
    lookup.
  • Repeated work anywhere there is common
    substructure

52
Problems with bottom-up parsing
  • Unable to deal with empty categories termination
    problem, unless rewriting empties as constituents
    is somehow restricted (but then it's generally
    incomplete)
  • Useless work locally possible, but globally
    impossible.
  • Inefficient when there is great lexical ambiguity
    (grammar-driven control might help here)
  • Conversely, it is data-directed it attempts to
    parse the words that are there.
  • Repeated work anywhere there is common
    substructure

53
Repeated work
54
Principles for success take 1
  • If you are going to do parsing-as-search with a
    grammar as is
  • Left recursive structures must be found, not
    predicted
  • Empty categories must be predicted, not found
  • Doing these things doesn't fix the repeated work
    problem
  • Both TD (LL) and BU (LR) parsers can (and
    frequently do) do work exponential in the
    sentence length on NLP problems.

55
Principles for success take 2
  • Grammar transformations can fix both
    left-recursion and epsilon productions
  • Then you parse the same language but with
    different trees
  • Linguists tend to hate you
  • But this is a misconception they shouldn't
  • You can fix the trees post hoc
  • The transform-parse-detransform paradigm

56
Principles for success take 3
  • Rather than doing parsing-as-search, we do
    parsing as dynamic programming
  • This is the most standard way to do things
  • Q.v. CKY parsing, next time
  • It solves the problem of doing repeated work
  • But there are also other ways of solving the
    problem of doing repeated work
  • Memoization (remembering solved subproblems)
  • Also, next time
  • Doing graph-search rather than tree-search.

57
Probabilistic or stochastic context-free grammars
(PCFGs)
  • G (T, N, S, R, P)
  • T is set of terminals
  • N is set of nonterminals
  • For NLP, we usually distinguish out a set P ? N
    of preterminals, which always rewrite as
    terminals
  • S is the start symbol (one of the nonterminals)
  • R is rules/productions of the form X ? ?, where X
    is a nonterminal and ? is a sequence of terminals
    and nonterminals (possibly an empty sequence)
  • P(R) gives the probability of each rule.
  • A grammar G generates a language model L.

58
PCFGs Notation
  • w1n w1 wn the word sequence from 1 to n
    (sentence of length n)
  • wab the subsequence wa wb
  • Njab the nonterminal Nj dominating wa wb
  • Nj
  • wa wb
  • Well write P(Ni ? ?j) to mean P(Ni ? ?j Ni
    )
  • Well want to calculate maxt P(t ? wab)

59
The probability of trees and strings
  • P(t) -- The probability of tree is the product of
    the probabilities of the rules used to generate
    it.
  • P(w1n) -- The probability of the string is the
    sum of the probabilities of the trees which have
    that string as their yield
  • P(w1n) Sj P(w1n, t) where t is a parse of
    w1n
  • Sj P(t)

60
A Simple PCFG (in CNF)
S ? NP VP 1.0 VP ? V NP 0.7 VP ? VP PP 0.3 PP ? P NP 1.0 P ? with 1.0 V ? saw 1.0 NP ? NP PP 0.4 NP ? astronomers 0.1 NP ? ears 0.18 NP ? saw 0.04 NP ? stars 0.18 NP ? telescope 0.1
61
(No Transcript)
62
(No Transcript)
63
Tree and String Probabilities
  • w15 astronomers saw stars with ears
  • P(t1) 1.0 0.1 0.7 1.0 0.4 0.18
  • 1.0 1.0 0.18
  • 0.0009072
  • P(t2) 1.0 0.1 0.3 0.7 1.0 0.18
  • 1.0 1.0 0.18
  • 0.0006804
  • P(w15) P(t1) P(t2)
  • 0.0009072 0.0006804
  • 0.0015876

64
Chomsky Normal Form
  • All rules are of the form X ? Y Z or X ? w.
  • This makes parsing easier/more efficient

65
Treebank binarization
N-ary Trees in Treebank
TreeAnnotations.annotateTree
Binary Trees
Lexicon and Grammar
TODO CKY parsing
Parsing
66
An example before binarization
ROOT
S
VP
NP
NP
V
PP
N
P
NP
N
N
with
people
cats
claws
scratch
67
After binarization..
ROOT
S
_at_S-gt_NP
VP
NP
_at_VP-gt_V
_at_VP-gt_V_NP
NP
V
PP
N
P
_at_PP-gt_P
N
NP
N
people
claws
with
cats
scratch
68
The CKY algorithm (1960/1965)
function CKY(words, grammar) returns most
probable parse/prob score new
double(words)1(words)(nonterms) back
new Pair(words)1(words)1nonterms
for i0 ilt(words) i for A in nonterms
if A -gt wordsi in grammar
scoreii1A P(A -gt wordsi) //handle
unaries boolean added true while added
added false for A, B in nonterms
if scoreii1B gt 0 A-gtB in grammar
prob P(A-gtB)scoreii1B
if(prob gt scoreii1A)
scoreii1A prob backii1
A B added true
69
The CKY algorithm (1960/1965)
for span 2 to (words) for begin 0 to
(words)- span end begin span for
split begin1 to end-1 for A,B,C in
nonterms probscorebeginsplitBs
coresplitendCP(A-gtBC) if(prob gt
scorebeginendA)
scorebeginendA prob
backbeginendA new Triple(split,B,C)
//handle unaries boolean added true
while added added false for A,
B in nonterms prob P(A-gtB)scorebegin
endB if(prob gt scorebeginend
A) scorebeginend A prob
backbeginend A B
added true return buildTree(score, back)
70
cats
scratch
walls
with
claws
1
2
3
4
5
0
score01
score02
score03
score04
score05
1
score12
score13
score14
score15
2
score23
score24
score25
3
score34
score35
4
score45
5
71
cats
scratch
walls
with
claws
1
2
3
4
5
0
N?cats P?cats V?cats
1
N?scratch P?scratch V?scratch
2
N?walls P?walls V?walls
3
N?with P?with V?with
for i0 ilt(words) i for A in nonterms
if A -gt wordsi in grammar
scoreii1A P(A -gt wordsi)
4
N?claws P?claws V?claws
5
72
cats
scratch
walls
with
claws
1
2
3
4
5
0
N?cats P?cats V?cats NP?N _at_VP-gtV?NP _at_PP-gtP?NP
1
N?scratch P?scratch V?scratch NP?N _at_VP-gtV?NP _at_PP-gt
P?NP
2
N?walls P?walls V?walls NP?N _at_VP-gtV?NP _at_PP-gtP?NP
3
N?with P?with V?with NP?N _at_VP-gtV?NP _at_PP-gtP?NP
// handle unaries
4
N?claws P?claws V?claws NP?N _at_VP-gtV?NP _at_PP-gtP?NP
5
73
cats
scratch
walls
with
claws
1
2
3
4
5
0
N?cats P?cats V?cats NP?N _at_VP-gtV?NP _at_PP-gtP?NP
PP?P _at_PP-gt_P VP?V _at_VP-gt_V
1
N?scratch P?scratch V?scratch NP?N _at_VP-gtV?NP _at_PP-gt
P?NP
PP?P _at_PP-gt_P VP?V _at_VP-gt_V
2
N?walls P?walls V?walls NP?N _at_VP-gtV?NP _at_PP-gtP?NP
PP?P _at_PP-gt_P VP?V _at_VP-gt_V
3
N?with P?with V?with NP?N _at_VP-gtV?NP _at_PP-gtP?NP
PP?P _at_PP-gt_P VP?V _at_VP-gt_V
4
N?claws P?claws V?claws NP?N _at_VP-gtV?NP _at_PP-gtP?NP
probscorebeginsplitBscoresplitendCP
(A-gtBC) probscore01Pscore12_at_PP-gt_PP
(PP?P _at_PP-gt_P) For each A, only keep the A-gtBC
with highest prob.
5
74
1
2
3
4
5
cats
scratch
walls
with
claws
0
N?cats P?cats V?cats NP?N _at_VP-gtV?NP _at_PP-gtP?NP
PP?P _at_PP-gt_P VP?V _at_VP-gt_V _at_S-gt_NP?VP _at_NP-gt_NP?PP _at_
VP-gt_V_NP?PP
1
N?scratch P?scratch V?scratch NP?N _at_VP-gtV?NP _at_PP-gt
P?NP
N?scratch 0.0967 P?scratch 0.0773 V?scratch 0.9285
NP?N 0.0859 _at_VP-gtV?NP 0.0573 _at_PP-gtP?NP 0.0859
PP?P _at_PP-gt_P VP?V _at_VP-gt_V _at_S-gt_NP?VP _at_NP-gt_NP?PP _at_
VP-gt_V_NP?PP
2
N?walls P?walls V?walls NP?N _at_VP-gtV?NP _at_PP-gtP?NP
N?walls 0.2829 P?walls 0.0870 V?walls 0.1160 NP?N
0.2514 _at_VP-gtV?NP 0.1676 _at_PP-gtP?NP 0.2514
PP?P _at_PP-gt_P VP?V _at_VP-gt_V _at_S-gt_NP?VP _at_NP-gt_NP?PP _at_
VP-gt_V_NP?PP
3
N?with P?with V?with NP?N _at_VP-gtV?NP _at_PP-gtP?NP
N?with 0.0967 P?with 1.3154 V?with 0.1031 NP?N 0.0
859 _at_VP-gtV?NP 0.0573 _at_PP-gtP?NP 0.0859
PP?P _at_PP-gt_P VP?V _at_VP-gt_V _at_S-gt_NP?VP _at_NP-gt_NP?PP _at_
VP-gt_V_NP?PP
// handle unaries
4
N?claws P?claws V?claws NP?N _at_VP-gtV?NP _at_PP-gtP?NP
N?claws 0.4062 P?claws 0.0773 V?claws 0.1031 NP?N
0.3611 _at_VP-gtV?NP 0.2407 _at_PP-gtP?NP 0.3611
5
75

76
cats
scratch
walls
with
claws
1
2
3
4
5
0
N?cats 0.5259 P?cats 0.0725 V?cats 0.0967 NP?N 0.4
675 _at_VP-gtV?NP 0.3116 _at_PP-gtP?NP 0.4675
PP?P _at_PP-gt_P 0.0062 VP?V _at_VP-gt_V
0.0055 _at_S-gt_NP?VP 0.0055 _at_NP-gt_NP?PP
0.0062 _at_VP-gt_V_NP?PP 0.0062
_at_VP-gt_V?NP _at_VP-gt_V_NP
0.0030 NP?NP _at_NP-gt_NP
0.0010 S?NP _at_S-gt_NP
0.0727 ROOT?S
0.0727 _at_PP-gt_P?NP 0.0010
PP?P _at_PP-gt_P 5.187E-6 VP?V _at_VP-gt_V
2.074E-5 _at_S-gt_NP?VP 2.074E-5 _at_NP-gt_NP?PP
5.187E-6 _at_VP-gt_V_NP?PP
5.187E-6
_at_VP-gt_V?NP _at_VP-gt_V_NP
1.600E-4 NP?NP _at_NP-gt_NP 5.335E-5 S?NP
_at_S-gt_NP 0.0172 ROOT?S
0.0172 _at_PP-gt_P?NP 5.335E-5
1
N?scratch 0.0967 P?scratch 0.0773 V?scratch 0.9285
NP?N 0.0859 _at_VP-gtV?NP 0.0573 _at_PP-gtP?NP 0.0859
PP?P _at_PP-gt_P 0.0194 VP?V _at_VP-gt_V 0.1556
_at_S-gt_NP?VP 0.1556 _at_NP-gt_NP?PP
0.0194 _at_VP-gt_V_NP?PP 0.0194
_at_VP-gt_V?NP _at_VP-gt_V_NP
2.145E-4 NP?NP _at_NP-gt_NP 7.150E-5 S?NP
_at_S-gt_NP 5.720E-4 ROOT?S
5.720E-4 _at_PP-gt_P?NP 7.150E-5
PP?P _at_PP-gt_P 0.0010 VP?V _at_VP-gt_V
0.0369 _at_S-gt_NP?VP 0.0369 _at_NP-gt_NP?PP
0.0010 _at_VP-gt_V_NP?PP 0.0010
2
N?walls 0.2829 P?walls 0.0870 V?walls 0.1160 NP?N
0.2514 _at_VP-gtV?NP 0.1676 _at_PP-gtP?NP 0.2514
PP?P _at_PP-gt_P 0.0074 VP?V _at_VP-gt_V
0.0066 _at_S-gt_NP?VP 0.0066 _at_NP-gt_NP?PP
0.0074 _at_VP-gt_V_NP?PP 0.0074
_at_VP-gt_V?NP _at_VP-gt_V_NP
0.0398 NP?NP _at_NP-gt_NP 0.0132 S?NP
_at_S-gt_NP 0.0062 ROOT?S
0.0062 _at_PP-gt_P?NP 0.0132
3
N?with 0.0967 P?with 1.3154 V?with 0.1031 NP?N 0.0
859 _at_VP-gtV?NP 0.0573 _at_PP-gtP?NP 0.0859
PP?P _at_PP-gt_P 0.4750 VP?V _at_VP-gt_V 0.0248
_at_S-gt_NP?VP 0.0248 _at_NP-gt_NP?PP
0.4750 _at_VP-gt_V_NP?PP 0.4750
4
N?claws 0.4062 P?claws 0.0773 V?claws 0.1031 NP?N
0.3611 _at_VP-gtV?NP 0.2407 _at_PP-gtP?NP 0.3611
Call buildTree(score, back) to get the best parse
5
Write a Comment
User Comments (0)
About PowerShow.com