Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics

Description:

Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 31
Provided by: jonas5
Category:

less

Transcript and Presenter's Notes

Title: Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics


1
Statistische Methoden in der ComputerlinguistikSt
atistical Methods in Computational Linguistics
  • 12. Probabilistic Parsing
  • Jonas Kuhn
  • Universität Potsdam, 2007

2
Overview
  • Probabilistic Context-Free Grammars (following
    Jurafsky/Martin, ch. 12)
  • Definition, Independence assumptions
  • Determining the most likely reading of a sentence
  • Training a PCFG on a treebank
  • Summary Computational Tasks for PCFGs
  • Simple Prolog implementation, using DCGs
  • Problems with simple PCFGs
  • Approaches for dealing with them
  • The Penn Treebank tree format
  • Evaluation of probabilistic parsers

3
Probabilistic Context-Free Grammars
  • Context-free grammar
  • Set of non-terminal symbols N
  • Set of terminal symbols S
  • Set of productions of the form A ? ß
  • (where A is a nonterminal, ß a string of
    terminals/nonterminals)
  • Start symbol (from N)
  • Probabilistic CFG augments productions by a
    conditional probability
  • A ? ß p

4
Probabilistic Context-Free Grammars
  • S ? NP VP .80
  • S ? Aux NP VP .15
  • S ? VP .05
  • NP ? Det Nom .20
  • NP ? PropN .35
  • NP ? Nom .05
  • NP ? Pronoun .40
  • Nom ? Noun .75
  • Nom ? Noun Nom .20
  • Nom ? PropN Nom .05
  • VP ? Verb .55
  • VP ? Verb NP .40
  • VP ? Verb NP NP .05
  • Det ? that .05 the .80 a .15
  • Noun ? book .10
  • Noun ? flights .50
  • Noun ? meal .40
  • Verb ? book .30
  • Verb ? include .30
  • Verb ? want .40
  • Aux ? can .40
  • Aux ? does .30
  • Aux ? do .30
  • PropN ? TWA .40
  • PropN ? Denver .40
  • Pronoun ? you .40 I .60

5
Probabilistic Context-Free Grammars
  • The probability p in
  • A ? ß p
  • encodes how likely it is that nonterminal A
  • will be rewritten as ß.
  • Typically written as
  • or simply
  • Conditional probability of a certain expansion
    given the left-hand side non-terminal A
  • The probabilities of all expansions of a
    non-terminal have to sum to 1

6
Uses of PCFGs
  • Computing the probability of a parse tree
  • Parse tree T with nodes n (r(n) is the grammar
    rule used to expand n)
  • Joint probability of T and sentence S
  • since
    the words are
  • part
    of the parse tree,
  • so
    P(ST)1

7
The probability of a parse tree
  • Two example trees for
  • can you book TWA flights (Jurafsky/Martin, p.
    451)
  • can you book TWA flights
  • can you book TWA flights
  • Note mistake in Jurafsky/Martin one extra
  • NP ? Pro rewrite for each tree

8
The probability of a parse tree
  • Reading 1 (left tree)
  • P(Tl) 0.15 0.4 0.4 0.4 0.4 0.3 0.05
  • 0.05 0.4 0.75 0.5
  • 4.32 x 10-7
  • Reading 2 (right tree)
  • P(Tr) 0.15 0.4 0.4 0.4 0.5 0.3 0.35
  • 0.4 0.05 0.75 0.5
  • 3.78 x 10-6

9
The probability of a parse tree
  • Example from Jurafsky/Martin, 2nd edition

10
(No Transcript)
11
Uses of PCFGs Disambiguation
  • Comparing the probabilities of all parse trees
    for a given sentence

12
Uses of PCFGs Language Modeling
  • Probability of a sentence
  • Unambiguous sentence P(S)P(T,S)P(T)
  • Ambiguous sentence
  • Sum of probabilities of all possible parse
    trees for that sentence

13
Where do we get the probabilities from?
  • Two possibilities
  • Training the PCFG on a treebank
  • Maximum likelihood estimate relative frequency
  • Training the PCFG on an unannotated corpus
  • Trees for unambiguous sentences can be counted
    directly
  • For ambiguous sentences, the rule expansions in
    the various parses get partial counts, according
    to the probability of the parse the occur in
    Expectation Maximization (EM) algorithm

14
Computational Tasks for PCFGs
  • Computing the probability of a given string
  • (summing over all readings/trees)
  • Inside algorithm / Outside algorithm
  • (compare forward and backward algorithm for
    HMMs)
  • Finding the most likely tree for a sentence
  • Viterbi algorithm
  • (avoiding the re-computation of
  • probabilities for subconstituents)
  • Training a PCFG without having a treebank
  • Inside-outside algorithm (instance of the
    Expectation Maximization/EM algorithm
  • (compare forward-backward algorithm for HMMs)

15
A simple Prolog implementation
  • Using Definite Clause Grammars (DCGs)
  • Recap
  • s --gt np, vp.
  • np --gt n.
  • vp --gt v.
  • n --gt it.
  • v --gt works.
  • Internal difference list representation
  • s(A, B) -
  • np(A, C),
  • vp(C, B).
  • Query
  • ?- s(it,works,).

16
DCGs Recap
  • Additional arguments
  • s(s(NP,VP)) --gt np(NP), vp(VP).
  • np(np(N)) --gt n(N).
  • vp(vp(V)) --gt v(V).
  • n(it) --gt it.
  • v(works) --gt works.
  • Additional predicate calls
  • s --gt np(NPagr), vp(non3sg),
  • NPagr \ 3sg.

17
PCFGs as DCGs
  • With DCGs we can implement PCFGs and the
    computation of a trees probability
    straightforwardly (albeit inefficiently)

18
DCG implementation of PCGs
  • Example
  • ?- s(P,can,you,book,twa,flights,).
  • P 4.32e-007
  • P 3.78e-006
  • No

19
Probabilistic Context-Free Grammars
  • S ? NP VP .80
  • S ? Aux NP VP .15
  • S ? VP .05
  • NP ? Det Nom .20
  • NP ? PropN .35
  • NP ? Nom .05
  • NP ? Pronoun .40
  • Nom ? Noun .75
  • Nom ? Noun Nom .20
  • Nom ? PropN Nom .05
  • VP ? Verb .55
  • VP ? Verb NP .40
  • VP ? Verb NP NP .05
  • Det ? that .05 the .80 a .15
  • Noun ? book .10
  • Noun ? flights .50
  • Noun ? meal .40
  • Verb ? book .30
  • Verb ? include .30
  • Verb ? want .40
  • Aux ? can .40
  • Aux ? does .30
  • Aux ? do .30
  • PropN ? TWA .40
  • PropN ? Denver .40
  • Pronoun ? you .40 I .60

20
DCG implementation of PCGs
  • s(P) --gt np(P1),vp(P2), P is P1 P2
    0.8.
  • s(P) --gt aux(P1),np(P2),vp(P3),
  • P is P1 P2 P3 0.15.
  • s(P) --gt vp(P1), P is P1 0.05.
  • np(P) --gt det(P1),nom(P2), P is P1 P2
    0.2.
  • np(P) --gt pn(P1), P is P1 0.35.
  • np(P) --gt nom(P1), P is P1 0.05.
  • np(P) --gt pron(P1), P is P1 0.4.
  • nom(P) --gt n(P1), P is P1 0.75.
  • nom(P) --gt n(P1),nom(P2), P is P1 P2
    0.2.
  • nom(P) --gt pn(P1),nom(P2), P is P1 P2
    0.05.
  • vp(P) --gt v(P1), P is P1 0.55.
  • vp(P) --gt v(P1),np(P2), P is P1 P2
    0.4.
  • vp(P) --gt v(P1),np(P2),np(P3),
  • P is P1 P2 P3 0.05.
  • det(0.05) --gt that.
  • det(0.8) --gt the.
  • det(0.15) --gt 1.
  • n(0.1) --gt book.
  • n(0.5) --gt flights.
  • n(0.4) --gt meal.
  • v(0.3) --gt book.
  • v(0.3) --gt include.
  • v(0.3) --gt want.
  • aux(0.4) --gt can.
  • aux(0.3) --gt does.
  • aux(0.3) --gt do.
  • pn(0.4) --gt twa.
  • pn(0.4) --gt denver.

21
Overview
  • Probabilistic Context-Free Grammars (following
    Jurafsky/Martin, ch. 12)
  • Definition, Independence assumptions
  • Determining the most likely reading of a sentence
  • Training a PCFG on a treebank
  • Summary Computational Tasks for PCFGs
  • Simple Prolog implementation, using DCGs
  • Problems with simple PCFGs
  • Approaches for dealing with them
  • The Penn Treebank tree format
  • Evaluation of probabilistic parsers

22
Problems with simple PCFGsIndependence
assumptions (1)
  • Structural dependencies
  • Each PCFG rule (application) is assumed to be
    independent of all other rules
  • NP ? Pronoun vs. NP ? Det Noun
    same probabilities for all occurrences
    of NP
  • But subjects in declarative sentences
    91 pronouns/9 lexical (objects
    34 pron./66 lexical)
  • Shes able to take her baby to work with her.
  • Uh, my wife worked until we had a family.
  • Some laws absolutely prohibit it.
  • All the people signed confessions.

23
Problems with simple PCFGs(1) Structural
dependencies
  • Approaches dealing with structural dependencies
  • Transform CFG into a format that keeps track of
    grandmother category (Johnson 1998)
  • ROOT ? S
  • S ? NP VP
  • VP ? V NP
  • SROOT ? NPS VPS
  • VPS ? VVP NPVP

24
Problems with simple PCFGsIndependence
assumptions (2)
  • Lexical dependencies
  • PCFGs are insensitive to lexical information
  • Moscow sent more than 100,000 soldiers into
    Afghanistan.
  • Choice between NP ? NP PP and VP ? VP PP
    is independent of lexical choice
  • Moscow sent more than 100,000 soldiers into
    Afghanistan
  • M. sent more than 100,000 soldiers into
    Afghanistan
  • 67 NP attachment, 33 VP attachment
  • Subcategorization of send is not taken into
    account

25
Problems with simple PCFGs(2) Lexical
dependencies
  • Approaches dealing with lexical dependencies
  • Statistical model keeping track of lexical
    dependencies
  • Head-lexicalized PCFGs
  • Example Workers dumped sacks into a bin
  • Workers dumped sacks into
    a bin
  • VP(dumped) ? VPD(dumped) NP(sacks) PP(into) 3 x
    10-10
  • Impossible to train parameters (much too few
    data)
  • Splitting this up into probabilities of
    subconfigurations
  • p(head(node)into nodePP, head(mother(node))
    dumped)
  • p(into PP, dumped)

26
The Penn Treebank tree format
  • Example
  • (from ATIS section airport traffic information
    service)
  • ( (SQ Does/VBZ
  • (NP-SBJ this/DT flight/NN )
  • (VP serve/VB
  • (NP dinner/NN ))))
  • ( END_OF_TEXT_UNIT )
  • Category labels SQ, NP, VP
  • (Pre-)terminals word forms with part-of speech
    tags VBZ, DT, NN, VB
  • Additional labels for grammatical relations
    NP-SBJ, PP-DIR, PP-LOC

27
The Penn Treebank tree format
  • Additional annotations empty syntactic nodes,
    indices
  • ( (S
  • (NP-SBJ /XXX )
  • (VP List/VB
  • (NP
  • (NP the/DT flights/NNS )
  • (PP-DIR from/IN
  • (NP Baltimore/NNP ))
  • (PP-DIR to/TO
  • (NP Seattle/NNP ))
  • (SBAR
  • (WHNP-1 that/WDT )
  • (S
  • (NP-SBJ T-1/XXX )
  • (VP stop/VBP
  • (PP-LOC in/IN
  • (NP Minneapolis/NNP )))))))))

28
Evaluation of probabilistic parsers
  • Standard measures PARSEVAL measures
  • Compare Information Retrieval measures (for
    finding items of category X)
  • Recall
  • Precision

29
Evaluation of probabilistic parsers
  • Standard measures PARSEVAL measures
  • Labeled Recall
  • Labeled Precision
  • Cross-brackets the number of crossed brackets
    e.g., the number of constituents for which the
    treebank has a bracketing such as ((A B) C) but
    the candidate parse has a bracketing (A (B C))

30
Evaluation of probabilistic parsers
  • PARSEVAL measures for state-of-the-art parsers
    (Charniak 1997, Collins 1999)
  • labeled recall c. 90
  • labeled precision c. 90
  • cross-brackets c. 1
Write a Comment
User Comments (0)
About PowerShow.com