Treebanks as Training Data for Parsers - PowerPoint PPT Presentation

1 / 7
About This Presentation
Title:

Treebanks as Training Data for Parsers

Description:

Treebanks as Training. Data for Parsers. Joakim Nivre. V xj University and Uppsala University ... E-mail: nivre_at_msi.vxu.se. Q1: What do you really care about ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 8
Provided by: msi72
Category:

less

Transcript and Presenter's Notes

Title: Treebanks as Training Data for Parsers


1
Treebanks as Training Data for Parsers
  • Joakim Nivre
  • Växjö University and Uppsala University
  • E-mail nivre_at_msi.vxu.se

2
Q1 What do you really care about when youre
building a parser?
  • For parsing unrestricted text, I care about the
    joint optimization of
  • Robustness
  • Disambiguation
  • Accuracy
  • Efficiency
  • Requirement on syntactic annotation
  • Balance between expressivity and complexity

3
Example Mildly Non-Projective Dependency
Structures
  • Dependency structure in two treebanks
  • Strictly projective (efficiently parsable)
  • PDT 75
  • DDT 85
  • Unrestricted non-projective (often intractable)
  • PDT 100
  • DDT 100
  • Well-nested, gap degree 1
  • PDT 99.5
  • DDT 99.7
  • Design choice in treebank annotation?

4
Q2 What works, what doesnt?
  • Anything works?
  • Top systems in CoNLL 2006 shared task
  • MSTParser Global, exhaustive, graph-based
  • MaltParser Local, greedy, stack-based
  • Features more important than parsers?
  • But not for all languages?
  • Results from CoNLL 2007 shared task
  • Configurational languages 85 LAS(Catalan,
    Chinese, English, Italian)
  • Richly inflected languages 75 LAS(Arabic,
    Basque, Czech, Greek, Hungarian, Turkish)
  • Treebank problem or parser problem?

5
Q3 What information is useful, what is not?
  • Word level
  • Morphological analysis (lemma, derivation,
    inflection)
  • Hierarchical parts-of-speech (incl. features)
  • Sentence level
  • Complete structural annotation (phrases, heads)
  • Complete functional annotation (syntactic
    relations)
  • Deep/non-local dependencies
  • Integrated morpho-syntactic annotation
  • The key to parsing richly inflected languages?

6
Skipping a few questions
  • Q4 How does grammar writing interact with
    treebanking?
  • No idea. Not my cup of tea.
  • Q5 What methodological lessons can be drawn
    for treebanking?
  • Q6 What are advantages and disadvantages of
    preprocessing the data to be treebanked with an
    automatic parser?
  • Dont know. Never got funding to build a real
    treebank.

7
Q7 Advantages of a phrase structure and/or a
dependency treebank?
  • Obvious answer
  • Phrase structure is good for phrase structure
    parsing.
  • Dependency is good for dependency parsing.
  • Methodological point
  • Parsing lossy conversions can be questionable.
  • Remedy
  • Make annotations (just) rich enough to support
    both.
  • Annotation scheme
  • Minimal source annotation
  • Well-defined conversions to target annotations
Write a Comment
User Comments (0)
About PowerShow.com