Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics

Description:

Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 31

Provided by: jonas5

Category:

more less

Transcript and Presenter's Notes

Title: Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics

1
Statistische Methoden in der ComputerlinguistikSt
atistical Methods in Computational Linguistics

12. Probabilistic Parsing
Jonas Kuhn

Universität Potsdam, 2007

2
Overview

Probabilistic Context-Free Grammars (following
Jurafsky/Martin, ch. 12)
Definition, Independence assumptions
Determining the most likely reading of a sentence
Training a PCFG on a treebank
Summary Computational Tasks for PCFGs
Simple Prolog implementation, using DCGs
Problems with simple PCFGs
Approaches for dealing with them
The Penn Treebank tree format
Evaluation of probabilistic parsers

3
Probabilistic Context-Free Grammars

Context-free grammar
Set of non-terminal symbols N
Set of terminal symbols S
Set of productions of the form A ? ß
(where A is a nonterminal, ß a string of
terminals/nonterminals)
Start symbol (from N)
Probabilistic CFG augments productions by a
conditional probability
A ? ß p

4
Probabilistic Context-Free Grammars

S ? NP VP .80
S ? Aux NP VP .15
S ? VP .05
NP ? Det Nom .20
NP ? PropN .35
NP ? Nom .05
NP ? Pronoun .40
Nom ? Noun .75
Nom ? Noun Nom .20
Nom ? PropN Nom .05
VP ? Verb .55
VP ? Verb NP .40
VP ? Verb NP NP .05

Det ? that .05 the .80 a .15
Noun ? book .10
Noun ? flights .50
Noun ? meal .40
Verb ? book .30
Verb ? include .30
Verb ? want .40
Aux ? can .40
Aux ? does .30
Aux ? do .30
PropN ? TWA .40
PropN ? Denver .40
Pronoun ? you .40 I .60

5
Probabilistic Context-Free Grammars

The probability p in
A ? ß p
encodes how likely it is that nonterminal A
will be rewritten as ß.
Typically written as
or simply
Conditional probability of a certain expansion
given the left-hand side non-terminal A
The probabilities of all expansions of a
non-terminal have to sum to 1

6
Uses of PCFGs

Computing the probability of a parse tree
Parse tree T with nodes n (r(n) is the grammar
rule used to expand n)
Joint probability of T and sentence S
since
the words are
part
of the parse tree,
so
P(ST)1

7
The probability of a parse tree

Two example trees for
can you book TWA flights (Jurafsky/Martin, p.
451)
can you book TWA flights
can you book TWA flights
Note mistake in Jurafsky/Martin one extra
NP ? Pro rewrite for each tree

8
The probability of a parse tree

Reading 1 (left tree)
P(Tl) 0.15 0.4 0.4 0.4 0.4 0.3 0.05
0.05 0.4 0.75 0.5
4.32 x 10-7
Reading 2 (right tree)
P(Tr) 0.15 0.4 0.4 0.4 0.5 0.3 0.35
0.4 0.05 0.75 0.5
3.78 x 10-6

9
The probability of a parse tree

Example from Jurafsky/Martin, 2nd edition

10
(No Transcript)
11
Uses of PCFGs Disambiguation

Comparing the probabilities of all parse trees
for a given sentence

12
Uses of PCFGs Language Modeling

Probability of a sentence
Unambiguous sentence P(S)P(T,S)P(T)
Ambiguous sentence
Sum of probabilities of all possible parse
trees for that sentence

13
Where do we get the probabilities from?

Two possibilities
Training the PCFG on a treebank
Maximum likelihood estimate relative frequency
Training the PCFG on an unannotated corpus
Trees for unambiguous sentences can be counted
directly
For ambiguous sentences, the rule expansions in
the various parses get partial counts, according
to the probability of the parse the occur in
Expectation Maximization (EM) algorithm

14
Computational Tasks for PCFGs

Computing the probability of a given string
(summing over all readings/trees)
Inside algorithm / Outside algorithm
(compare forward and backward algorithm for
HMMs)
Finding the most likely tree for a sentence
Viterbi algorithm
(avoiding the re-computation of
probabilities for subconstituents)
Training a PCFG without having a treebank
Inside-outside algorithm (instance of the
Expectation Maximization/EM algorithm
(compare forward-backward algorithm for HMMs)

15
A simple Prolog implementation

Using Definite Clause Grammars (DCGs)
Recap
s --gt np, vp.
np --gt n.
vp --gt v.
n --gt it.
v --gt works.
Internal difference list representation
s(A, B) -
np(A, C),
vp(C, B).
Query
?- s(it,works,).

16
DCGs Recap

Additional arguments
s(s(NP,VP)) --gt np(NP), vp(VP).
np(np(N)) --gt n(N).
vp(vp(V)) --gt v(V).
n(it) --gt it.
v(works) --gt works.
Additional predicate calls
s --gt np(NPagr), vp(non3sg),
NPagr \ 3sg.

17
PCFGs as DCGs

With DCGs we can implement PCFGs and the
computation of a trees probability
straightforwardly (albeit inefficiently)

18
DCG implementation of PCGs

Example
?- s(P,can,you,book,twa,flights,).
P 4.32e-007
P 3.78e-006
No

19
Probabilistic Context-Free Grammars

S ? NP VP .80
S ? Aux NP VP .15
S ? VP .05
NP ? Det Nom .20
NP ? PropN .35
NP ? Nom .05
NP ? Pronoun .40
Nom ? Noun .75
Nom ? Noun Nom .20
Nom ? PropN Nom .05
VP ? Verb .55
VP ? Verb NP .40
VP ? Verb NP NP .05

Det ? that .05 the .80 a .15
Noun ? book .10
Noun ? flights .50
Noun ? meal .40
Verb ? book .30
Verb ? include .30
Verb ? want .40
Aux ? can .40
Aux ? does .30
Aux ? do .30
PropN ? TWA .40
PropN ? Denver .40
Pronoun ? you .40 I .60

20
DCG implementation of PCGs

s(P) --gt np(P1),vp(P2), P is P1 P2
0.8.
s(P) --gt aux(P1),np(P2),vp(P3),
P is P1 P2 P3 0.15.
s(P) --gt vp(P1), P is P1 0.05.
np(P) --gt det(P1),nom(P2), P is P1 P2
0.2.
np(P) --gt pn(P1), P is P1 0.35.
np(P) --gt nom(P1), P is P1 0.05.
np(P) --gt pron(P1), P is P1 0.4.
nom(P) --gt n(P1), P is P1 0.75.
nom(P) --gt n(P1),nom(P2), P is P1 P2
0.2.
nom(P) --gt pn(P1),nom(P2), P is P1 P2
0.05.
vp(P) --gt v(P1), P is P1 0.55.
vp(P) --gt v(P1),np(P2), P is P1 P2
0.4.
vp(P) --gt v(P1),np(P2),np(P3),
P is P1 P2 P3 0.05.

det(0.05) --gt that.
det(0.8) --gt the.
det(0.15) --gt 1.
n(0.1) --gt book.
n(0.5) --gt flights.
n(0.4) --gt meal.
v(0.3) --gt book.
v(0.3) --gt include.
v(0.3) --gt want.
aux(0.4) --gt can.
aux(0.3) --gt does.
aux(0.3) --gt do.
pn(0.4) --gt twa.
pn(0.4) --gt denver.

21
Overview

Probabilistic Context-Free Grammars (following
Jurafsky/Martin, ch. 12)
Definition, Independence assumptions
Determining the most likely reading of a sentence
Training a PCFG on a treebank
Summary Computational Tasks for PCFGs
Simple Prolog implementation, using DCGs
Problems with simple PCFGs
Approaches for dealing with them
The Penn Treebank tree format
Evaluation of probabilistic parsers

22
Problems with simple PCFGsIndependence
assumptions (1)

Structural dependencies
Each PCFG rule (application) is assumed to be
independent of all other rules
NP ? Pronoun vs. NP ? Det Noun
same probabilities for all occurrences
of NP
But subjects in declarative sentences
91 pronouns/9 lexical (objects
34 pron./66 lexical)
Shes able to take her baby to work with her.
Uh, my wife worked until we had a family.
Some laws absolutely prohibit it.
All the people signed confessions.

23
Problems with simple PCFGs(1) Structural
dependencies

Approaches dealing with structural dependencies
Transform CFG into a format that keeps track of
grandmother category (Johnson 1998)
ROOT ? S
S ? NP VP
VP ? V NP
SROOT ? NPS VPS
VPS ? VVP NPVP

24
Problems with simple PCFGsIndependence
assumptions (2)

Lexical dependencies
PCFGs are insensitive to lexical information
Moscow sent more than 100,000 soldiers into
Afghanistan.
Choice between NP ? NP PP and VP ? VP PP
is independent of lexical choice
Moscow sent more than 100,000 soldiers into
Afghanistan
M. sent more than 100,000 soldiers into
Afghanistan
67 NP attachment, 33 VP attachment
Subcategorization of send is not taken into
account

25
Problems with simple PCFGs(2) Lexical
dependencies

Approaches dealing with lexical dependencies
Statistical model keeping track of lexical
dependencies
Head-lexicalized PCFGs
Example Workers dumped sacks into a bin
Workers dumped sacks into
a bin
VP(dumped) ? VPD(dumped) NP(sacks) PP(into) 3 x
10-10
Impossible to train parameters (much too few
data)
Splitting this up into probabilities of
subconfigurations
p(head(node)into nodePP, head(mother(node))
dumped)
p(into PP, dumped)

26
The Penn Treebank tree format

Example
(from ATIS section airport traffic information
service)
( (SQ Does/VBZ
(NP-SBJ this/DT flight/NN )
(VP serve/VB
(NP dinner/NN ))))
( END_OF_TEXT_UNIT )
Category labels SQ, NP, VP
(Pre-)terminals word forms with part-of speech
tags VBZ, DT, NN, VB
Additional labels for grammatical relations
NP-SBJ, PP-DIR, PP-LOC

27
The Penn Treebank tree format

Additional annotations empty syntactic nodes,
indices
( (S
(NP-SBJ /XXX )
(VP List/VB
(NP
(NP the/DT flights/NNS )
(PP-DIR from/IN
(NP Baltimore/NNP ))
(PP-DIR to/TO
(NP Seattle/NNP ))
(SBAR
(WHNP-1 that/WDT )
(S
(NP-SBJ T-1/XXX )
(VP stop/VBP
(PP-LOC in/IN
(NP Minneapolis/NNP )))))))))

28
Evaluation of probabilistic parsers

Standard measures PARSEVAL measures
Compare Information Retrieval measures (for
finding items of category X)
Recall
Precision

29
Evaluation of probabilistic parsers

Standard measures PARSEVAL measures
Labeled Recall
Labeled Precision
Cross-brackets the number of crossed brackets
e.g., the number of constituents for which the
treebank has a bracketing such as ((A B) C) but
the candidate parse has a bracketing (A (B C))

30
Evaluation of probabilistic parsers