Title: Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics
1Statistische Methoden in der ComputerlinguistikSt
atistical Methods in Computational Linguistics
- 12. Probabilistic Parsing
- Jonas Kuhn
- Universität Potsdam, 2007
2Overview
- Probabilistic Context-Free Grammars (following
Jurafsky/Martin, ch. 12) - Definition, Independence assumptions
- Determining the most likely reading of a sentence
- Training a PCFG on a treebank
- Summary Computational Tasks for PCFGs
- Simple Prolog implementation, using DCGs
- Problems with simple PCFGs
- Approaches for dealing with them
- The Penn Treebank tree format
- Evaluation of probabilistic parsers
3Probabilistic Context-Free Grammars
- Context-free grammar
- Set of non-terminal symbols N
- Set of terminal symbols S
- Set of productions of the form A ? ß
- (where A is a nonterminal, ß a string of
terminals/nonterminals) - Start symbol (from N)
- Probabilistic CFG augments productions by a
conditional probability - A ? ß p
4Probabilistic Context-Free Grammars
- S ? NP VP .80
- S ? Aux NP VP .15
- S ? VP .05
- NP ? Det Nom .20
- NP ? PropN .35
- NP ? Nom .05
- NP ? Pronoun .40
- Nom ? Noun .75
- Nom ? Noun Nom .20
- Nom ? PropN Nom .05
- VP ? Verb .55
- VP ? Verb NP .40
- VP ? Verb NP NP .05
- Det ? that .05 the .80 a .15
- Noun ? book .10
- Noun ? flights .50
- Noun ? meal .40
- Verb ? book .30
- Verb ? include .30
- Verb ? want .40
- Aux ? can .40
- Aux ? does .30
- Aux ? do .30
- PropN ? TWA .40
- PropN ? Denver .40
- Pronoun ? you .40 I .60
-
5Probabilistic Context-Free Grammars
- The probability p in
- A ? ß p
- encodes how likely it is that nonterminal A
- will be rewritten as ß.
- Typically written as
- or simply
- Conditional probability of a certain expansion
given the left-hand side non-terminal A - The probabilities of all expansions of a
non-terminal have to sum to 1
6Uses of PCFGs
- Computing the probability of a parse tree
- Parse tree T with nodes n (r(n) is the grammar
rule used to expand n) - Joint probability of T and sentence S
- since
the words are - part
of the parse tree, - so
P(ST)1
7The probability of a parse tree
- Two example trees for
- can you book TWA flights (Jurafsky/Martin, p.
451) - can you book TWA flights
- can you book TWA flights
- Note mistake in Jurafsky/Martin one extra
- NP ? Pro rewrite for each tree
8The probability of a parse tree
- Reading 1 (left tree)
- P(Tl) 0.15 0.4 0.4 0.4 0.4 0.3 0.05
- 0.05 0.4 0.75 0.5
- 4.32 x 10-7
- Reading 2 (right tree)
- P(Tr) 0.15 0.4 0.4 0.4 0.5 0.3 0.35
- 0.4 0.05 0.75 0.5
- 3.78 x 10-6
9The probability of a parse tree
- Example from Jurafsky/Martin, 2nd edition
10(No Transcript)
11Uses of PCFGs Disambiguation
- Comparing the probabilities of all parse trees
for a given sentence
12Uses of PCFGs Language Modeling
- Probability of a sentence
- Unambiguous sentence P(S)P(T,S)P(T)
- Ambiguous sentence
- Sum of probabilities of all possible parse
trees for that sentence
13Where do we get the probabilities from?
- Two possibilities
- Training the PCFG on a treebank
- Maximum likelihood estimate relative frequency
-
- Training the PCFG on an unannotated corpus
- Trees for unambiguous sentences can be counted
directly - For ambiguous sentences, the rule expansions in
the various parses get partial counts, according
to the probability of the parse the occur in
Expectation Maximization (EM) algorithm
14Computational Tasks for PCFGs
- Computing the probability of a given string
- (summing over all readings/trees)
- Inside algorithm / Outside algorithm
- (compare forward and backward algorithm for
HMMs) - Finding the most likely tree for a sentence
- Viterbi algorithm
- (avoiding the re-computation of
- probabilities for subconstituents)
- Training a PCFG without having a treebank
- Inside-outside algorithm (instance of the
Expectation Maximization/EM algorithm - (compare forward-backward algorithm for HMMs)
15A simple Prolog implementation
- Using Definite Clause Grammars (DCGs)
- Recap
- s --gt np, vp.
- np --gt n.
- vp --gt v.
- n --gt it.
- v --gt works.
- Internal difference list representation
- s(A, B) -
- np(A, C),
- vp(C, B).
- Query
- ?- s(it,works,).
16DCGs Recap
- Additional arguments
- s(s(NP,VP)) --gt np(NP), vp(VP).
- np(np(N)) --gt n(N).
- vp(vp(V)) --gt v(V).
- n(it) --gt it.
- v(works) --gt works.
- Additional predicate calls
- s --gt np(NPagr), vp(non3sg),
- NPagr \ 3sg.
17PCFGs as DCGs
- With DCGs we can implement PCFGs and the
computation of a trees probability
straightforwardly (albeit inefficiently)
18DCG implementation of PCGs
- Example
- ?- s(P,can,you,book,twa,flights,).
- P 4.32e-007
- P 3.78e-006
- No
19Probabilistic Context-Free Grammars
- S ? NP VP .80
- S ? Aux NP VP .15
- S ? VP .05
- NP ? Det Nom .20
- NP ? PropN .35
- NP ? Nom .05
- NP ? Pronoun .40
- Nom ? Noun .75
- Nom ? Noun Nom .20
- Nom ? PropN Nom .05
- VP ? Verb .55
- VP ? Verb NP .40
- VP ? Verb NP NP .05
- Det ? that .05 the .80 a .15
- Noun ? book .10
- Noun ? flights .50
- Noun ? meal .40
- Verb ? book .30
- Verb ? include .30
- Verb ? want .40
- Aux ? can .40
- Aux ? does .30
- Aux ? do .30
- PropN ? TWA .40
- PropN ? Denver .40
- Pronoun ? you .40 I .60
-
20DCG implementation of PCGs
- s(P) --gt np(P1),vp(P2), P is P1 P2
0.8. - s(P) --gt aux(P1),np(P2),vp(P3),
- P is P1 P2 P3 0.15.
- s(P) --gt vp(P1), P is P1 0.05.
- np(P) --gt det(P1),nom(P2), P is P1 P2
0.2. - np(P) --gt pn(P1), P is P1 0.35.
- np(P) --gt nom(P1), P is P1 0.05.
- np(P) --gt pron(P1), P is P1 0.4.
- nom(P) --gt n(P1), P is P1 0.75.
- nom(P) --gt n(P1),nom(P2), P is P1 P2
0.2. - nom(P) --gt pn(P1),nom(P2), P is P1 P2
0.05. - vp(P) --gt v(P1), P is P1 0.55.
- vp(P) --gt v(P1),np(P2), P is P1 P2
0.4. - vp(P) --gt v(P1),np(P2),np(P3),
- P is P1 P2 P3 0.05.
- det(0.05) --gt that.
- det(0.8) --gt the.
- det(0.15) --gt 1.
- n(0.1) --gt book.
- n(0.5) --gt flights.
- n(0.4) --gt meal.
- v(0.3) --gt book.
- v(0.3) --gt include.
- v(0.3) --gt want.
- aux(0.4) --gt can.
- aux(0.3) --gt does.
- aux(0.3) --gt do.
-
- pn(0.4) --gt twa.
- pn(0.4) --gt denver.
21Overview
- Probabilistic Context-Free Grammars (following
Jurafsky/Martin, ch. 12) - Definition, Independence assumptions
- Determining the most likely reading of a sentence
- Training a PCFG on a treebank
- Summary Computational Tasks for PCFGs
- Simple Prolog implementation, using DCGs
- Problems with simple PCFGs
- Approaches for dealing with them
- The Penn Treebank tree format
- Evaluation of probabilistic parsers
22Problems with simple PCFGsIndependence
assumptions (1)
- Structural dependencies
- Each PCFG rule (application) is assumed to be
independent of all other rules - NP ? Pronoun vs. NP ? Det Noun
same probabilities for all occurrences
of NP - But subjects in declarative sentences
91 pronouns/9 lexical (objects
34 pron./66 lexical) - Shes able to take her baby to work with her.
- Uh, my wife worked until we had a family.
- Some laws absolutely prohibit it.
- All the people signed confessions.
23Problems with simple PCFGs(1) Structural
dependencies
- Approaches dealing with structural dependencies
- Transform CFG into a format that keeps track of
grandmother category (Johnson 1998) - ROOT ? S
- S ? NP VP
- VP ? V NP
- SROOT ? NPS VPS
- VPS ? VVP NPVP
24Problems with simple PCFGsIndependence
assumptions (2)
- Lexical dependencies
- PCFGs are insensitive to lexical information
- Moscow sent more than 100,000 soldiers into
Afghanistan. - Choice between NP ? NP PP and VP ? VP PP
is independent of lexical choice - Moscow sent more than 100,000 soldiers into
Afghanistan - M. sent more than 100,000 soldiers into
Afghanistan - 67 NP attachment, 33 VP attachment
- Subcategorization of send is not taken into
account
25Problems with simple PCFGs(2) Lexical
dependencies
- Approaches dealing with lexical dependencies
- Statistical model keeping track of lexical
dependencies - Head-lexicalized PCFGs
- Example Workers dumped sacks into a bin
- Workers dumped sacks into
a bin - VP(dumped) ? VPD(dumped) NP(sacks) PP(into) 3 x
10-10 - Impossible to train parameters (much too few
data) - Splitting this up into probabilities of
subconfigurations - p(head(node)into nodePP, head(mother(node))
dumped) - p(into PP, dumped)
26The Penn Treebank tree format
- Example
- (from ATIS section airport traffic information
service) - ( (SQ Does/VBZ
- (NP-SBJ this/DT flight/NN )
- (VP serve/VB
- (NP dinner/NN ))))
- ( END_OF_TEXT_UNIT )
- Category labels SQ, NP, VP
- (Pre-)terminals word forms with part-of speech
tags VBZ, DT, NN, VB - Additional labels for grammatical relations
NP-SBJ, PP-DIR, PP-LOC
27The Penn Treebank tree format
- Additional annotations empty syntactic nodes,
indices - ( (S
- (NP-SBJ /XXX )
- (VP List/VB
- (NP
- (NP the/DT flights/NNS )
- (PP-DIR from/IN
- (NP Baltimore/NNP ))
- (PP-DIR to/TO
- (NP Seattle/NNP ))
- (SBAR
- (WHNP-1 that/WDT )
- (S
- (NP-SBJ T-1/XXX )
- (VP stop/VBP
- (PP-LOC in/IN
- (NP Minneapolis/NNP )))))))))
28Evaluation of probabilistic parsers
- Standard measures PARSEVAL measures
- Compare Information Retrieval measures (for
finding items of category X) - Recall
- Precision
29Evaluation of probabilistic parsers
- Standard measures PARSEVAL measures
- Labeled Recall
- Labeled Precision
- Cross-brackets the number of crossed brackets
e.g., the number of constituents for which the
treebank has a bracketing such as ((A B) C) but
the candidate parse has a bracketing (A (B C))
30Evaluation of probabilistic parsers
- PARSEVAL measures for state-of-the-art parsers
(Charniak 1997, Collins 1999) - labeled recall c. 90
- labeled precision c. 90
- cross-brackets c. 1