morphological tag of some word - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

morphological tag of some word

Description:

Number of Views:36

Avg rating:3.0/5.0

Slides: 12

Provided by: pavel91

Category:

Tags: morphological | nominative | pall | tag | word

Transcript and Presenter's Notes

Title: morphological tag of some word

1
Introduction

2
Morphological tagging

still an open task for highly inflectional
languages
statistical methods and/or rule-based techniques
advantages of the rule-based approach
transparency
linguistic interpretability
manual improvability
independency
the manual design of the rules is expensive and
requires some linguistic knowledge
attempts to learn the rules automatically using
ILP and active learning (Nepil, Popelínský 2001
Nepil et al., 2001)
manually annotated learning data were still
needed
new method of the generating the disambiguation
rules using only unannotated data.

3
Basic Idea

homonymy often has a rather accidental nature
enu the accusative of the fem. ena or 1st sg
of the verb hnát
many Czech feminines in the accusative are not
homonymous with the 1st sg of any verb. And
conversely...
a step aside
grammatical meaning/morphological tag is
connected with a function of the word in a
sentence words with the same grammatical
meaning have the same or similar functions
words sharing the same function occur in contexts
which have some common properties
these properties are manifested in morphological
tags of the words constituting these contexts

4
Basic Idea

Back to our example
all feminines in the accusative occurs in similar
contexts which differ from contexts of verbs in
1st sg
word form enu cannot be both at the same time
to resolve a particular ambiguity between the
tags (or sets of tags) X and Y, we have to search
a corpus for words unambiguously tagged with tags
X and Y, resp. and find some common properties of
their contexts. Then we should be able to examine
context of a word ambiguously tagged with both X
and Y and determine, which of them can be removed

5
Training Data

unlimited amount of unanotated data
we can select only fine learning examples
quite strong restrictions
only whole sentences, no numbers, abbreviations,
interjections, proper names, words unknown to a
morphological analyzer, no very short sentences,
sentences without a finite verb, nor sentences
with an unclear punctuation
even with these restrictions we get many
non-grammatical sentences and incorrectly tagged
words
learning examples are annotated by the
morphological analyzer ajka (Sedlácek, 2001),
each word is labeled with all tags offered by the
analyzer
rare readings of some frequent words are removed
by simple lexical filters

6
Learning Algorithm

pos examples neg examples domain knowledge
gt ILP system
generates (induces) rules covering the positive
examples without covering the negative ones
refines a particular rule according to a given
criterion
learning examples for resolving X/Y ambiguity
sentences with some word unambiguously tagged
X/Y
learning examples are encoded into Prolog facts
to resolve a particular ambiguity X/Y we learn
both two sets of disambiguation rules
during disambiguation, whenever all rules
covering certain word fall into the same set, the
respective tag is retained, and the other
removed. If both sets contain some rule covering
the word, the more probable tag is retained.

7
Learning Algorithm

Construction of the set of rules RS describing
utmost positive examples and none of the negative
ones (P and N could be chosen arbitrarily)
Set the set RS to an empty set
Set a rule R to a trivial one
Choose at most P positive examples covered by
the rule R, but not covered by any rule from
RS. End if there are no such examples
Choose at most N negative examples covered by
the rule R. If there are no such examples, add
the rule R to the set RS and continue with step 2
Try to refine (specialize) the rule R to the
best advantage according to the selected
positive and negative examples. End if there is
no posibility of refinement, otherwise continue
with step 3
The utility of possible refinements is measured
by the following formula
Pcov / Pall Ncov / Nall
where Pcov (Ncov) stands for the count of
positive (negative) examples covered by the
refined rule, Pall (Nall) stands for the count of
all positive (negative) examples selected in
steps 2. and 3.
ILP system INDEED (Nepil 2003) is used for
refining the rules

8
Experiments

three experiments have been performed
the third and the fifth most frequent Czech word
(se and je) and the subset of the most frequent
POS ambiguity (words of type vedení) were chosen
se is either a reflexive pronoun or a vocalized
form of the preposition s
je is either a personal pronoun or the 3rd sg of
the verb být
vedení type words are some forms of either
substantives or adjectives
the evaluation of the generated rules has been
performed on the manually annotated corpus DESAM
all occurrences of these three types have been
used.
even the badly disambiguated words in
non-grammatical, but human-parseable sentences
have been counted as errors caused by the rules

9
Results

left recall correctly disambiguated / all
ambiguous words
right precision correctly disambiguated /
disambiguated words
baseline default precision, selection of the
more probable tag
frequency portion among all words ambiguous in
POS
HMM is Czech HMM-based Tagger (Krbec, Hajic
2001)
EXP is Czech Feature-based Tagger (Hajic 2001)
it should be stated that the comparison with HMM
and EXP is not quite fair, as they are not
specialized in solving these three particular
ambiguities

10
Discussion

in all cases I had to relax some of the
principles proposed formerly
allowing the coverage of a small amount of
negative examples
not all unambiguous words can be used, words
appearing in non-typical contexts have been
discarded
je was substituted with words bearing slightly
different tags
disadvantages
difficulty of searching for adequate unambiguous
substitutes
it seems that there will be many ambiguities
unresolvable with our method. e.g., the
nominative, accusative and vocative cases have
the same form for all Czech neuters
on the other hand, the results show that at least
for some ambiguities quite accurate rules can be
learned, which could be useful for a partial
disambiguation or some preprocessing

11
Conclusions and Future Work

new method of inducing rules for disambiguation
learning from raw, unannotated data
promising results, mainly in the accuracy
limitations constraining a wider/general
application
possible improvements
some kind of (semi?)automatic elimination of
non-grammatical learning examples should be
considered
some rearranging of the tagset
some rules are very similar to other ones some
methods of detecting these similarities could be
useful
the lexical filters can always be improved
some improving or lexicalization of the domain
knowledge?