Beyond PCFGs

About This Presentation

Title:

Beyond PCFGs

Description:

Hermjakob and Mooney Briscoe and Carroll Collins Chelba and Jelinek – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 23

Provided by: ChrisB200

Learn more at: http://www.ling.ohio-state.edu

Category:

more less

Transcript and Presenter's Notes

Title: Beyond PCFGs

1
Beyond PCFGs

Chris Brew

Ohio State University
2
Beyond PCFGs

Shift-reduce parsers
probabilistic LR parsers
Data-oriented parsers

3
Motivation

Get round the limitations of the PCFG model
Exploit knowledge about individual words
Build better language models

4
Shift-reduce

Simple version
either shift a word from the input list to the
parse stack
or reduce two elements from top of parse stack to
a single tree
Hermjakob and Mooney cmp-lg 9706002
structures rather than just trees and words
more complex parse action language
not just binary rules

5
Machine learning for shift-reduce

supervisor shows the system correct sequences of
parsing actions
system tries to learn to predict correct actions
needs a feature language
as it learns, the supervisor has less need to
override the actions chosen by the system.

6
Examples of the feature language

broad syntactic class of the third element on the
stack
the tense of the first element of the input list
Does top element of stack contain an object?
Could top frame be an adjectival degree adverb
(e.g. very)?
Is frame1 a possible agent/patient of frame2?
Do frame1 and frame2 satisfy subject-verb
agreement?

7
Hand-crafted knowledge used

205 features, all moderately local (no references
to 1000th element of the stack or anything like
that)
4356 node lexical knowledge base
subcategorisation table for 242 verbs
But we learn the association between features and
actions

8
Various different hybrid decision structures

best was a hierarchical list of decision trees
which encoded information about the task.
Schematically.
decide whether to do anything
if not, we are done
if so, decide whether to do a reduction
if so, decide which reduction
if not, decide what sort of shift to do

9
Evaluation

Corpus of 272 annotated sentences.
17-fold cross validation (17 blocks of 16
sentences each)
Precision of 92.7, recall of 92.8 average
length 17.1 words, with 43.5 parse actions per
sentence. Parseval measures.
Correct structure and labelling 26.8 (i.e. 1 in
4 sentences are completely correct)

10
Comments on Hermjakob and Mooney

A lot of grunt work needed - but not as much as
full rationalist NLP system
The knowledge used is micro-modular very small
pieces of highly independent knowledge
Test set is small, sentences short
Fairly robust
Good on small scale tests in an English/German MT
task

11
(No Transcript)
12
LR Parsing

Builds a parsing table which gives parsing
actions and Gotos for possible combinations of
parser state and input symbols
There may be parsing action conflicts, in which
more than one action is available.
In programming language grammars, you almost
never want conflicts.
In NL grammars, you have no escape!

13
Probabilistic LR

When there is a conflict, non-deterministically
execute all possible actions.
But score them according to a probability
distribution.
So where do the probabilities come from? And what
do they mean? See analysis in Stolckes paper
relating them to his forward and inner
probabilities.

14
LR parsing using Alvey Tools Grammar

Wide coverage unification grammar written by
Claire Grover and Ted Briscoe
Build LR tables from CF backbone of this grammar
Interactively build disambiguated training corpus
by supervising choice of parse actions

15
Evaluation

Very good performance on LDOCE noun definitions
76 correct structure and labelling
State of the art results in later work on tag
sequence grammars where the available lexical
information is more restricted. (54 correct
structure and labelling)
Work underway to bring this technique to Wall
Street Journal data for comparison with other
methods

16
Data-oriented parsing

Rens Bod Enriching Linguistics with Statistics
Performance Models of Natural Language, Amsterdam
Ph.D
Treebank data again (this time ATIS -- 600
sentences)
Radical rejection of context-free assumption
Count subtrees of arbitrary depth, not rule
applications

17
A corpus
18
(No Transcript)
19
The probability of a tree

The probability of all the ways of making it out
of fragments
The probability of a fragment is given as a ratio
between the frequency of the fragment and the
total frequency of all fragments having that root

20
Complexity

Its hard to find efficient algorithms for sewing
together DOP trees (cf. Simaan for solutions)
Only very small corpora feasible
In practice, depth may have to be limited.
Many tree fragments are very rare, so there is an
issue about smoothing

21
Evaluation

several variations studied, DOP4 geta parse
accuracies around 80 without a hand-coded
dictionary, DOP-5 around 90 with.
results to be interpreted with caution due to
small size of corpus
Evaluation on Dutch OVIS domain suggests that DOP
is not competitive with Groningens more labour
intensive system (but maybe thats not the point)

Beyond PCFGs - PowerPoint PPT Presentation

Beyond PCFGs

Hermjakob and Mooney Briscoe and Carroll Collins Chelba and Jelinek – PowerPoint PPT presentation