Probabilistic and Lexicalized Parsing - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Probabilistic and Lexicalized Parsing

Description:

Denver. NP. Cocke-Younger-Kasami Parser. Bottom-up ... John. Base Case: A. w NP. John. called. NP. Mary. V. from. NP. Denver. P. Recursive Cases: A. BC NP ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 32

Provided by: Sri646

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Probabilistic and Lexicalized Parsing

1
Probabilistic and Lexicalized Parsing
2
Probabilistic CFGs

Weighted CFGs
Attach weights to rules of CFG
Compute weights of derivations
Use weights to pick, preferred parses
Utility Pruning and ordering the search space,
disambiguate, Language Model for ASR.
Parsing with weighted grammars (like Weighted FA)
T arg maxT W(T,S)
Probabilistic CFGs are one form of weighted CFGs.

3
Probability Model

Rule Probability
Attach probabilities to grammar rules
Expansions for a given non-terminal sum to 1
R1 VP ? V .55
R2 VP ? V NP .40
R3 VP ? V NP NP .05
Estimate the probabilities from annotated corpora
P(R1)counts(R1)/counts(VP)
Derivation Probability
Derivation T R1Rn
Probability of a derivation
Most likely probable parse
Probability of a sentence
Sum over all possible derivations for the
sentence
Note the independence assumption Parse
probability does not change based on where the
rule is expanded.

4
Structural ambiguity

S ? NP VP
VP ? V NP
NP ? NP PP
VP ? VP PP
PP ? P NP

NP ? John Mary Denver
V -gt called
P -gt from

John called Mary from Denver
S
VP
NP
NP
V
NP
PP
called
John
Mary
P
NP
from
Denver
5
Cocke-Younger-Kasami Parser

Bottom-up parser with top-down filtering
Start State(s) (A, i, i1) for each A?wi1
End State (S, 0,n) n is the input size
Next State Rules
(B, i, k) (C, k, j) ? (A, i, j) if A?BC

6
Example
7
Base Case A?w
8
Recursive Cases A?BC
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Probabilistic CKY

Assign probabilities to constituents as they are
completed and placed in the table
Computing the probability
Since we are interested in the max P(S,0,n)
Use the max probability for each constituent
Maintain back-pointers to recover the parse.

22
Problems with PCFGs

The probability model were using is just based
on the rules in the derivation.
Lexical insensitivity
Doesnt use the words in any real way
Structural disambiguation is lexically driven
PP attachment often depends on the verb, its
object, and the preposition
I ate pickles with a fork.
I ate pickles with relish.
Context insensitivity of the derivation
Doesnt take into account where in the derivation
a rule is used
Pronouns more often subjects than objects
She hates Mary.
Mary hates her.
Solution Lexicalization
Add lexical information to each rule

23
An example of lexical information Heads

Make use of notion of the head of a phrase
Head of an NP is a noun
Head of a VP is the main verb
Head of a PP is its preposition
Each LHS of a rule in the PCFG has a lexical item
Each RHS non-terminal has a lexical item.
One of the lexical items is shared with the LHS.
If R is the number of binary branching rules in
CFG, in lexicalized CFG O(2?R)
Unary rules O(?R)

24
Example (correct parse)
Attribute grammar
25
Example (less preferred)
26
Computing Lexicalized Rule Probabilities

We started with rule probabilities
VP ? V NP PP P(ruleVP)
E.g., count of this rule divided by the number of
VPs in a treebank
Now we want lexicalized probabilities
VP(dumped) ? V(dumped) NP(sacks)PP(in)
P(ruleVP dumped is the verb sacks is the
head of the NP in is the head of the PP)
Not likely to have significant counts in any
treebank
Back-off to lesser contexts until reliable
estimates

27
Another Example

Consider the VPs
Ate spaghetti with gusto
Ate spaghetti with marinara
Dependency is not between mother-child.

Vp (ate)
Vp(ate)
Np(spag)
Vp(ate)
Pp(with)
Pp(with)
np
v
v
np
Ate spaghetti with marinara
Ate spaghetti with gusto
28
Log-linear models for Parsing

Why restrict to the conditioning to the elements
of a rule?
Use even larger context
Word sequence, word types, sub-tree context etc.
In general, compute P(yx) where fi(x,y) test
the properties of the context li is the weight
of that feature.
Use these as scores in the CKY algorithm to find
the best scoring parse.

29
Parsing as sequential decision making process

Parsing A series of decisions
Lexical category label, structural attachment,
phrasal category label
Each decision is trained using some context as a
classification task
Classification techniques (SVM, MaxEnt, Decision
Trees) can be used to train these decision
classifiers.
Context could depend on previous decisions
(CKY-style decoding)
CFGs can be recognized using Push Down Automata
(PDA)
Probabilistic extensions of PDA