Grammar Development Platform - PowerPoint PPT Presentation

About This Presentation
Title:

Grammar Development Platform

Description:

Machine Translation (French, English) Tree Banking (English, German) ... Applicational Advantage: machine translation is made easier. ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 19
Provided by: nlpFi
Category:

less

Transcript and Presenter's Notes

Title: Grammar Development Platform


1
Grammar Development Platform
  • Miriam Butt
  • October 2002

2
Grammar Development
What is a Grammar Development Platform good for?
  • Information Retrieval/Extraction

XLE
  • Machine Translation (MT)

English Anna sees the man.
German Anna sieht den Mann.
Parser
Generator
MT
English c-str and f-str
German f-str
3
A Sample Development Platform
XLE (Xerox Linguistic Environment)
  • Main Developer John Maxwell (PARC)
  • Software (Shareware) Emacs, Tcl/Tk
  • Platforms Unix (Solaris), Linux, MacOsX

4
A Sample Development Platform
XLE (Xerox Linguistic Environment)
  • Linguistic Theory LFG (Lexical-Functional
    Grammar) orginally developed by Ronald M. Kaplan
    (PARC) and Joan Bresnan (Stanford)
  • Parser Bottom-Up, Left-to-Right
  • Performance Worst-case exponential, polynomial
    in practice (makes broad-coverage grammars
    feasible)

5
Palo Alto Research Center (PARC), English Grammar
IMS, University of Stuttgart German Grammar
Fuji Xerox Japanese Grammar
The ParGram Project

University of Bergen Norwegian Bokmal and
Nynorsk
UMIST Urdu Grammar
XRCE Grenoble French Grammar
6
ParGram
Possible Applications
  • Machine Translation (French, English)
  • Tree Banking (English, German)
  • Smart Text Annotation (German)
  • Robust Parsing (English, German, French)
  • Information Extraction (English)
  • Teaching Tools (Urdu)

7
Grammar Components
Each Grammar Contains
  • Phrase Structure Rules (S NP VP)
  • Lexicon (verb stems and functional elements)
  • Finite-State Morphological Analyzer

No Semantics
8
Phrase Structure Rules
Formulation as used today goes back to Chomsky
1957.
Sample Set for English
S NP VP
VP V NP
NP D (ADJ) N
Why these kinds of rules?
  • Natural Language is recursive and potentially
    infinite.
  • Constituency, X-bar Theory

9
Phrase Structure Rules
The syntax of natural languages is context-free.
Colorless green ideas sleep furiously.
However, we must also deal with context-sensitive
information.
The monkey sleeps.
The monkey sleep.
The monkeys sleeps.
10
Features and Unifications
Context-Sensitivity can be achieved in many ways.
XLE and LFG (like many other theories/platforms)
uses phrase-structure annotation via
attribute-value pairs.
S NP VP (?SUBJ) ? (?SUBJ NUM) (?
NUM)
XLE
Features are checked via Unificaition.
11
The Ambiguity Problem
XLE
PP-Attachment
The girl saw the monkey with the telescope.
Categorial Ambiguity
Flying planes can be dangerous.
Time flies like an arrow.
12
Lexicons
Typically Contain
  • Category Information (Terminal Node in Tree)
  • Context Sensitive Featural Information
  • Subcategorization Information
  • Semantics (sometimes)

XLE
13
Ambiguity in Large Grammars
Ambiguity a serious problem even in simple
sentences
  • PP-attachment (English)
  • Subject/Object Ambiguities (German)

Within XLE various techniques have been invented
to cut down on the explosion of parses.
  • Packed Representations

XLE
  • Optimality Marking

14
Morphologies and Tokenizers
Beyond the Word Writing and adding in
Morphological Analysis and Tokenization
XLE
15
Parallel Analyses
Languages Differ on the Surface (c-structure)
English Yassin was seen. German Yassin wurde
gesehen. Urdu yassin dekha gaya
XLE
ParGram Goal The same underlying f-structures
for all languages (modulo lexical semantics).
16
The Parallel in ParGram
Analyses at the level of f-structure are held as
parallel as possible across languages
(crosslinguistic invariance).
  • Theoretical Advantage This models the idea of
    UG.
  • Applicational Advantage machine translation is
    made easier.

Analyses at the level of c-structure are allowed
to differ much more (variance across languages).
17
FST Morphological Analyzers
Kaplan and Butt (2002) this LFG
morphology-syntax interface is natural
calana to drive (M.Sg)
surface form
Sequence Relation
driveVerbInfMSg
Lexical Relation
NUM sg
VFORM inf
GEND masc
Satisfaction Relation
f-structure (m-structure)
18
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com