Probabilistic Parsing - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Probabilistic Parsing

Description:

Two parse trees for the sentence 'Moscow sent troops into Afghanistan' Correct parse ... NP(Moscow) NNP. VP(sent) VBD NP PP. NP(troops) NNS. PP(into) IN NP. NP ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 17
Provided by: aoifec
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Parsing


1
Probabilistic Parsing
  • Probabilistic Context Free Grammars
  • Reading
  • Jurafsky Martin Ch.12 12.1 12.2
  • 12.1 Exclude Probabilistic CYK Parsing

2
CFGs
  • CFG defined by 4 parameters (N S P S)
  • N set of nonterminal symbols
  • S set of terminal symbols N ? S Ø
  • P set of productions of the form A ? ? where A
    ? N and ß is a string of symbols Bi with Bi ? (N
    ? S)
  • S designated start symbol

3
Two different parse trees for the sentence the
man saw the girl with the telescope
The girl has the telescope
The man has the telescope
4
Probabilistic CFGs
  • Probabilistic CFGs
  • PCFGs or SCFGs
  • PCFG has extra parameter (N S P S D)
  • D function that assigns probabilities to each
    rule in P
  • A ? ? p
  • p P (A ? ?)
  • Note that p for all expansions of a nonterminal A
    must sum to 1

5
PCFGs
  • We use probabilities to disambiguate parses
  • Choose most likely parse from multiple parses

6
P(T,S)
  • n is a node in parse tree
  • r(n) is the rule that expands n
  • P(ST) 1, since a parse tree includes all the
  • words of the sentence

7
PCFG S? NP VP 1 VP? V NP 0.7 VP ? V NP
PP 0.3 NP ? Det n 0.4 NP ? Det n PP 0.6 PP ?
p NP 1 Lexicon det ? the 1 n ? man 0.4 n ?
girl 0.3 n ? telescope 0.3 v ? saw 1 p ?
with 1
8
P(Tree1) 1.0 x 0.6 x 1.0 x 0.4 x 0.7 x 1.0 x
0.4 x 1.0 x 0.3 x 1.0 x 1.0 x 0.6 x 1.0 x 0.3
0.0036288
9
P(Tree2) 1.0 x 0.6 x 1.0 x 0.4 x 0.3 x 1.0 x
0.6 x 1.0 x 0.3 x 1.0 x 1.0 x 0.6 x 1.0 x 0.3
0.0023328
10
Dynamic Programming
  • CYK (Cocke-Younger-Kasami) Algorithm
  • Give N non-terminals indices 1,2,, N
  • Start symbol S has index 1
  • N words w1wn
  • Array ai,j,a holds the maximum probability for
    a constituent with non-terminal index a spanning
    words wiwj
  • Maximum probability parse will be a1,n,1

11
Problems with PCFGs
  • Most current PCFG models are not vanilla PCFGs
  • Usually augmented in some way
  • Vanilla PCFGs assume independence of
    non-terminal expansions
  • But statistical analysis shows this is not a
    valid assumption
  • Structural and lexical dependencies

12
Structural Dependencies
  • E.g. Syntactic subject of a sentence tends to be
    a pronoun
  • Subject tends to realise the topic of a sentence
  • Topic is usually old information
  • Pronouns are usually used to refer to old
    information
  • So subject tends to be a pronoun
  • In Switchboard corpus
  • 91 of subjects in declarative sentences are
    pronouns
  • 66 of direct objects are lexical (nonpronominal)

13
Lexical Dependencies
Two parse trees for the sentence Moscow sent
troops into Afghanistan
Correct parse
Incorrect parse
14
Two lexicalised trees for the sentence Moscow
sent troops into Afghanistan
Correct parse
Incorrect parse
15
Lexicalised Grammar S(sent) ? NP(Moscow)
VP(sent) NP(Moscow) ? NNP(Moscow) VP(sent) ?
VBD(sent) NP(troops) PP(into) NP(troops) ?
NNS(troops) PP(into) ? IN(into)
NP(Afghansitan) NP(Afghanistan) ?
NNP(Afghanistan) Lexicon NNP ? Moscow NNP ?
Afghanistan IN ? into VBD ? sent NNS ? troops
16
Simplified Lexicalised Grammar S(sent) ? NP
VP NP(Moscow) ? NNP VP(sent) ? VBD NP
PP NP(troops) ? NNS PP(into) ? IN
NP NP(Afghanistan) ? NNP Lexicon NNP ?
Moscow NNP ? Afghanistan IN ? into VBD ? sent NNS
? troops
Write a Comment
User Comments (0)
About PowerShow.com