Probabilistic and Lexicalized Parsing - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Probabilistic and Lexicalized Parsing

Description:

Denver. NP. Cocke-Younger-Kasami Parser. Bottom-up ... John. Base Case: A. w NP. John. called. NP. Mary. V. from. NP. Denver. P. Recursive Cases: A. BC NP ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 32
Provided by: Sri646
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic and Lexicalized Parsing


1
Probabilistic and Lexicalized Parsing
2
Probabilistic CFGs
  • Weighted CFGs
  • Attach weights to rules of CFG
  • Compute weights of derivations
  • Use weights to pick, preferred parses
  • Utility Pruning and ordering the search space,
    disambiguate, Language Model for ASR.
  • Parsing with weighted grammars (like Weighted FA)
  • T arg maxT W(T,S)
  • Probabilistic CFGs are one form of weighted CFGs.

3
Probability Model
  • Rule Probability
  • Attach probabilities to grammar rules
  • Expansions for a given non-terminal sum to 1
  • R1 VP ? V .55
  • R2 VP ? V NP .40
  • R3 VP ? V NP NP .05
  • Estimate the probabilities from annotated corpora
    P(R1)counts(R1)/counts(VP)
  • Derivation Probability
  • Derivation T R1Rn
  • Probability of a derivation
  • Most likely probable parse
  • Probability of a sentence
  • Sum over all possible derivations for the
    sentence
  • Note the independence assumption Parse
    probability does not change based on where the
    rule is expanded.

4
Structural ambiguity
  • S ? NP VP
  • VP ? V NP
  • NP ? NP PP
  • VP ? VP PP
  • PP ? P NP
  • NP ? John Mary Denver
  • V -gt called
  • P -gt from

John called Mary from Denver
S
VP
NP
NP
V
NP
PP
called
John
Mary
P
NP
from
Denver
5
Cocke-Younger-Kasami Parser
  • Bottom-up parser with top-down filtering
  • Start State(s) (A, i, i1) for each A?wi1
  • End State (S, 0,n) n is the input size
  • Next State Rules
  • (B, i, k) (C, k, j) ? (A, i, j) if A?BC

6
Example
7
Base Case A?w
8
Recursive Cases A?BC
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Probabilistic CKY
  • Assign probabilities to constituents as they are
    completed and placed in the table
  • Computing the probability
  • Since we are interested in the max P(S,0,n)
  • Use the max probability for each constituent
  • Maintain back-pointers to recover the parse.

22
Problems with PCFGs
  • The probability model were using is just based
    on the rules in the derivation.
  • Lexical insensitivity
  • Doesnt use the words in any real way
  • Structural disambiguation is lexically driven
  • PP attachment often depends on the verb, its
    object, and the preposition
  • I ate pickles with a fork.
  • I ate pickles with relish.
  • Context insensitivity of the derivation
  • Doesnt take into account where in the derivation
    a rule is used
  • Pronouns more often subjects than objects
  • She hates Mary.
  • Mary hates her.
  • Solution Lexicalization
  • Add lexical information to each rule

23
An example of lexical information Heads
  • Make use of notion of the head of a phrase
  • Head of an NP is a noun
  • Head of a VP is the main verb
  • Head of a PP is its preposition
  • Each LHS of a rule in the PCFG has a lexical item
  • Each RHS non-terminal has a lexical item.
  • One of the lexical items is shared with the LHS.
  • If R is the number of binary branching rules in
    CFG, in lexicalized CFG O(2?R)
  • Unary rules O(?R)

24
Example (correct parse)
Attribute grammar
25
Example (less preferred)
26
Computing Lexicalized Rule Probabilities
  • We started with rule probabilities
  • VP ? V NP PP P(ruleVP)
  • E.g., count of this rule divided by the number of
    VPs in a treebank
  • Now we want lexicalized probabilities
  • VP(dumped) ? V(dumped) NP(sacks)PP(in)
  • P(ruleVP dumped is the verb sacks is the
    head of the NP in is the head of the PP)
  • Not likely to have significant counts in any
    treebank
  • Back-off to lesser contexts until reliable
    estimates

27
Another Example
  • Consider the VPs
  • Ate spaghetti with gusto
  • Ate spaghetti with marinara
  • Dependency is not between mother-child.

Vp (ate)
Vp(ate)
Np(spag)
Vp(ate)
Pp(with)
Pp(with)
np
v
v
np
Ate spaghetti with marinara
Ate spaghetti with gusto
28
Log-linear models for Parsing
  • Why restrict to the conditioning to the elements
    of a rule?
  • Use even larger context
  • Word sequence, word types, sub-tree context etc.
  • In general, compute P(yx) where fi(x,y) test
    the properties of the context li is the weight
    of that feature.
  • Use these as scores in the CKY algorithm to find
    the best scoring parse.

29
Parsing as sequential decision making process
  • Parsing A series of decisions
  • Lexical category label, structural attachment,
    phrasal category label
  • Each decision is trained using some context as a
    classification task
  • Classification techniques (SVM, MaxEnt, Decision
    Trees) can be used to train these decision
    classifiers.
  • Context could depend on previous decisions
    (CKY-style decoding)
  • CFGs can be recognized using Push Down Automata
    (PDA)
  • Probabilistic extensions of PDA

30
Supertagging Almost parsing
Poachers now control the
underground trade
S
S
VP
NP
S
NP
NP
V
VP
NP
e
N
NP
V
e
poachers

e
Adj

underground
  • Selecting the correct supertag for a word is
    almost parsing
  • Use classifiers to select the correct supertag

31
Summary
  • Parsing context-free grammars
  • Top-down and Bottom-up parsers
  • Mixed approaches (CKY, Earley parsers)
  • Preferences over parses using probabilities
  • Parsing with PCFG and PCKY algorithms
  • Enriching the probability model
  • Lexicalization
  • Log-linear models for parsing
  • Classification techniques for parsing decisions
Write a Comment
User Comments (0)
About PowerShow.com