Title: The Rulebased Parser of the NLP Group of the University of Torino
1The Rule-based Parser of the NLP Group of the
University of Torino
- Leonardo LesmoDipartimento di Informatica and
- Centro di Scienze Cognitive,
- Università di Torino,
- Italy
- Email lesmo_at_di.unito.it
2Goals
- Extensibility to semantics
Approach
- Two phases Chunking and subcategorization
- Procedural analysis of conjunctions and of
identification of verbal dependents
3TULE (Turin University Linguistic Environment)
4The grammar
- Rule-based dependency grammar
- Chunking (non-verbal groups) verbal
subcategorization frames
- Output a projective tree represented as pointers
to parents, including some null elements
(understood items e.g. pro-drop - and traces)
5Parser Architecture
Lexical Items
Splits the text into groups of strictly connected
words
CHUNKING
Chunking rules
Chunked text
Connects chunks linked by conjunctions, to form
larger chunks
ANALYSIS OF CONJUNCTIONS
Procedural preference rules 1
Chunked text
Procedural preference rules 2
Determines the dependents of verbs
SEGMENTATION
Lexical items
Verb classes
Determines the role (arc labels) of the verbal
dependents
VERBAL ATTACHMENT
Verbal Caseframes
Parse Tree
6An example
7Chunking
Example Puoi dirmi che spettacoli di cabaret
posso vedere domani? (Can you
tell me what cabaret plays I can see tomorrow?)
PuoiV-modal-2nd-sing-pres dirV-inf
miPron-1st-dativePron cheAdj-interr
spettacoliNoun diPrep cabaretNounP-group
N-group possoV-modal-1st-sing-pres vedereV-inf
domaniAdvA-group?
Chunking Rules
- Chunking rules are grouped in packets.
- Each packet is associated with a lexical
category, and describes the chunkable possible
dependents of words of that category. - Chunkable means a dependent handled during
chunking (e.g. auxiliaries, but not arguments of
verbs)
8A chunk rule
(NOUN common (precedes (ADJ qualif T (\- \'
\")) (ADJ ((type qualif)
(agree))) ADJCQUALIF-RMOD))
9Conjunctions
- When a coordinating conjunction is found, all
following and preceding chunks are collected - All pairs are built, and the best one is chosen
according to criteria based on structural
similarity and distance - Special treatment for verbs
Example Ho incontrato Marco e Lucia e li ho
salutati (I met Marco e Lucia
and I greeted them)
HoV-aux incontratoV-main
MarcoNoun-ProperNoun eConj-coord
LuciaNoun-ProperNoun eConj-coord liPron-pers
Pron hoV-aux salutatiV-main
10Segmentation
- For each verb (going from left to right)
- Look for possible dependents (on its right and
left) - On the left, the search is blocked from the
previous verb - On the right, some barriers are defined to stop
the search (for instance, a subordinating
conjunction acts as a barrier)
PuoiV-modal-2nd-sing-pres dirV-inf
miPron-1st-dativePron cheAdj-interr
spettacoliNoun diPrep cabaretNounP-group
N-group possoV-modal-1st-sing-pres
vedereV-inf domaniAdvA-group?
11Verbal Subcategorization
The subcategorization classes
12Example subcategorization class definitions
(subj-verbs (intrans) (verbs) verbs with
a subject. Definition of subject (
verb-subj ((noun (agree)) (art (agree))
(pron (not (word quale) (type relat)) (case
lsubj) (agree)) (adj (type (indef demons
deitt interr poss)) (agree)) (num (agree))
(prep (word in) (down (cat pron) (type indef))
(agree)))))
(ssubj-inf-verbs () (verbs) verbs with
an inf-verb sentential subject ( verb-subj
((verb (mood infinite)
(agree)))))
(empty-modal () (no-subj-verbs)
modals without subject (
verb-indcompl-modal ((verb
(mood infinite)))))
13Transformations
basic class (e.g. trans)
transformed classes (e.g. trans,
transpassivization, transinfinitivization, tran
sprodrop, transpassivizationinfinitivization,
.. )
Example transformation
(infinitivization replacing
(subj-verbs) (is-inf-form tr-verb
v-casefr) (cancel-case s-subj))
14- Chunking rules
- Total 295 rules
- Common 250 rules
- English 34 rules
- Italian 7 rules
- Spanish Catalan 4 rules
- Base Subcategorization
- Total 118 classes
- Abstract 21 classes
- plus verbal locutions
- Italian 40 classes
- English 1 class
- Derived surface case frames
- 2653 case frames
15Conclusions
- Test of the parser on other languages, using the
same grammar augmented with extra rules (see
previous slide)
- Partial use of semantic information (about 400
words classified according to a semantic taxonomy)
- The parser has been used in a project involving
spoken and written linguistic interaction with a
user. It has been interfaced with an repository
of semantic knowledge to build a meaning
representation.