Beyond PCFGs - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Beyond PCFGs

Description:

Hermjakob and Mooney Briscoe and Carroll Collins Chelba and Jelinek – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 23
Provided by: ChrisB200
Category:

less

Transcript and Presenter's Notes

Title: Beyond PCFGs


1
Beyond PCFGs
  • Chris Brew

Ohio State University
2
Beyond PCFGs
  • Shift-reduce parsers
  • probabilistic LR parsers
  • Data-oriented parsers

3
Motivation
  • Get round the limitations of the PCFG model
  • Exploit knowledge about individual words
  • Build better language models

4
Shift-reduce
  • Simple version
  • either shift a word from the input list to the
    parse stack
  • or reduce two elements from top of parse stack to
    a single tree
  • Hermjakob and Mooney cmp-lg 9706002
  • structures rather than just trees and words
  • more complex parse action language
  • not just binary rules

5
Machine learning for shift-reduce
  • supervisor shows the system correct sequences of
    parsing actions
  • system tries to learn to predict correct actions
  • needs a feature language
  • as it learns, the supervisor has less need to
    override the actions chosen by the system.

6
Examples of the feature language
  • broad syntactic class of the third element on the
    stack
  • the tense of the first element of the input list
  • Does top element of stack contain an object?
  • Could top frame be an adjectival degree adverb
    (e.g. very)?
  • Is frame1 a possible agent/patient of frame2?
  • Do frame1 and frame2 satisfy subject-verb
    agreement?

7
Hand-crafted knowledge used
  • 205 features, all moderately local (no references
    to 1000th element of the stack or anything like
    that)
  • 4356 node lexical knowledge base
  • subcategorisation table for 242 verbs
  • But we learn the association between features and
    actions

8
Various different hybrid decision structures
  • best was a hierarchical list of decision trees
    which encoded information about the task.
    Schematically.
  • decide whether to do anything
  • if not, we are done
  • if so, decide whether to do a reduction
  • if so, decide which reduction
  • if not, decide what sort of shift to do

9
Evaluation
  • Corpus of 272 annotated sentences.
  • 17-fold cross validation (17 blocks of 16
    sentences each)
  • Precision of 92.7, recall of 92.8 average
    length 17.1 words, with 43.5 parse actions per
    sentence. Parseval measures.
  • Correct structure and labelling 26.8 (i.e. 1 in
    4 sentences are completely correct)

10
Comments on Hermjakob and Mooney
  • A lot of grunt work needed - but not as much as
    full rationalist NLP system
  • The knowledge used is micro-modular very small
    pieces of highly independent knowledge
  • Test set is small, sentences short
  • Fairly robust
  • Good on small scale tests in an English/German MT
    task

11
(No Transcript)
12
LR Parsing
  • Builds a parsing table which gives parsing
    actions and Gotos for possible combinations of
    parser state and input symbols
  • There may be parsing action conflicts, in which
    more than one action is available.
  • In programming language grammars, you almost
    never want conflicts.
  • In NL grammars, you have no escape!

13
Probabilistic LR
  • When there is a conflict, non-deterministically
    execute all possible actions.
  • But score them according to a probability
    distribution.
  • So where do the probabilities come from? And what
    do they mean? See analysis in Stolckes paper
    relating them to his forward and inner
    probabilities.

14
LR parsing using Alvey Tools Grammar
  • Wide coverage unification grammar written by
    Claire Grover and Ted Briscoe
  • Build LR tables from CF backbone of this grammar
  • Interactively build disambiguated training corpus
    by supervising choice of parse actions

15
Evaluation
  • Very good performance on LDOCE noun definitions
    76 correct structure and labelling
  • State of the art results in later work on tag
    sequence grammars where the available lexical
    information is more restricted. (54 correct
    structure and labelling)
  • Work underway to bring this technique to Wall
    Street Journal data for comparison with other
    methods

16
Data-oriented parsing
  • Rens Bod Enriching Linguistics with Statistics
    Performance Models of Natural Language, Amsterdam
    Ph.D
  • Treebank data again (this time ATIS -- 600
    sentences)
  • Radical rejection of context-free assumption
  • Count subtrees of arbitrary depth, not rule
    applications

17
A corpus
18
(No Transcript)
19
The probability of a tree
  • The probability of all the ways of making it out
    of fragments
  • The probability of a fragment is given as a ratio
    between the frequency of the fragment and the
    total frequency of all fragments having that root

20
Complexity
  • Its hard to find efficient algorithms for sewing
    together DOP trees (cf. Simaan for solutions)
  • Only very small corpora feasible
  • In practice, depth may have to be limited.
  • Many tree fragments are very rare, so there is an
    issue about smoothing

21
Evaluation
  • several variations studied, DOP4 geta parse
    accuracies around 80 without a hand-coded
    dictionary, DOP-5 around 90 with.
  • results to be interpreted with caution due to
    small size of corpus
  • Evaluation on Dutch OVIS domain suggests that DOP
    is not competitive with Groningens more labour
    intensive system (but maybe thats not the point)

22
Where to find out more
  • Papers by Bod, Carroll, Hermjakob.
  • Manning and Schütze ch 12.
  • http//xxx.soton.ac.uk/archive/cs/intro.html
    (subarea Computation and Language)
Write a Comment
User Comments (0)
About PowerShow.com