Title: Conceptual noun types: grammar and automatic classification
1Conceptual noun typesgrammar and automatic
classification
- Christian Horn Christof Rumpf
- CTF 07, Düsseldorf
2Structure
- The four conceptual noun types and their
contextual properties - Investigation of grammatical properties of the
conceptual noun types on the basis of a German
text corpus - A framework for the automatic classification of
concept types - Conclusion
3Conceptual noun types
1. The four conceptual noun types and their
contextual properties
Löbner (1979, 1985, 1998)
4Grammatical uses of conceptual noun types
1. The four conceptual noun types and their
contextual properties
- Sortal concepts
- A rose is a nice present.
- Many roses are an even nicer present.
- Individual concepts
- The sun is burning.
- A sun is burning.
- The suns are burning.
- Many suns are burning.
- My sun is burning. / The sun of mine is
burning. - use differing from underlying concept type
5Grammatical uses of conceptual noun types
1. The four conceptual noun types and their
contextual properties
- Relational concepts
- One of Marys legs is too short.
- Marys leg is too short. / The leg of Mary is
too short. - Many legs of Mary are too short.
- Functional conceptsMary is Peters mother. /
Mary is the mother of Peter. - Mary is a mother of Peter.
- Mary is the mother.
6Contextual properties of conceptual noun types
1. The four conceptual noun types and their
contextual properties
- grammatical characteristics
- possessive use his mother / mother of him
- definiteness the sun
- subcategorization certain verbs require IC/FC as
complements - morphological properties certain nouns are often
functional - deadjectival nouns (Intelligenz intelligence)
- deverbal nouns (Krümmung bend, Dauer length)
- compounds -wert value Bestwert
optimum value-grad degree Wirkungsgrad
degree of efficiency-größe size
Kleidergröße dress size
72. Investigation of grammatical properties of the
conceptual noun types on the basis of a German
text corpus
2. Investigation of grammatical properties
- Goals
- to identify the possible uses of the different
concept types and their specific context features - to develop and implement a method for the
automatic classification of concept types in
texts based on morphosyntactic features - Hybrid approach
- semantic and grammatical analysis of the
conceptual noun types - statistic investigation automatic classification
allows the processing of large amounts of data - investigation is initially carried out on the
basis of a German text corpus (108.000 words) as
a training corpus - perspective further research intended on
English, French, Japanese
8Predictions
2. Investigation of grammatical properties
- Assumptions
- The lexicalized concept type of a noun is the
most frequently used type for each noun. - Conceptual noun types occur particularly often in
grammatical uses that match their underlying
conceptual properties. - sortal concepts (rose) singular, plural, with
quantifiers, indefinite ... - individual concepts (sun) singular, definite
- relational concepts (leg) indefinite, possessive
- functional concepts (mother) singular, definite,
possessive - Other uses (type shifts) are still possible.
The conditions under which these type shifts
occur still have to be investigated.
9Counting
2. Investigation of grammatical properties
- (selection, definiteness)
1 definite def. determiner, poss. pron., gen.
pron., d-Prep, d-selb, d-einzig, genitive
deren/dessen), d-jen 2 quantifiers/indefinite
quantifiers, indefinite determiner,
demonstratives, numbers, kein, d-beid, d-ord 3
null determiner 4 incl. -1
10Results
2. Investigation of grammatical properties
(selection)
? Results so far confirm our predictions.
11Tasks Challenges
2. Investigation of grammatical properties
- Type shifts in certain readings The meaning of
the word. (FC) - The word bottle has many meanings. (RC)
- Generic and anaphoric uses The lightbulb was
invented by Heinrich Göbel. (generic) - Polysemy
- Analysis of possessive constructions, plurals,
null determiner
123. A framework for the automatic classification
of concept types
3. A framework for the automatic classification
of concept types
- Architecture
- Training corpus
- Morphosyntactic analysis
- Training sample
- Computing classifiers
- Maximum entropy models
- Conclusion
13Architecture of the framework
3. A framework for the automatic classification
of concept types
manual annotation of concept types
training corpus
test corpus
msyn dependency grammar parser
learning
morphosyntactical analysis
morphosyntactical analysis
application
extraction of relevant context features
training sample
test sample
Generalized Iterative Scaling
learning / applicationof a classifier
annotated test korpus
maximum entropy model
14Training corpus
3. A framework for the automatic classification
of concept types
- Manually annotated version of Löbner (2003)
Semantik - Concept types of nouns marked with tags
Die ltf1gtSemantiklt/f1gt ist das ltr2gtTeilgebietlt/r2gt
der ltf2gtLinguistiklt/f2gt, das sich mit
ltr2gtBedeutunglt/r2gt befasst. Diese ltr2gtArtlt/r2gt
von ltf2gtDefinitionlt/f2gt mag vielleicht ihrem
ltr2gtFreundlt/r2gt genügen, der Sie zufällig mit
diesem ltsogtBuch lt/sogt in der ltr2gtHandlt/r2gt sieht
und Sie fragt, was denn nun schon wieder sei,
aber als ltf2gtAutorlt/f2gt einer solchen
ltr2gtEinführunglt/r2gt muss ich natürlich präziser
erklären, was der ltf2gtGegenstandlt/f2gt dieser
ltsogtWissenschaftlt/sogt ist.
15Morphosyntactical analysis
3. A framework for the automatic classification
of concept types
- We use Connexors msyn to analyse German texts.
www.connexor.com - Syntactical information consists of dependency
trees. - Morphological features include part-of-speech,
gender, number, case, time, mood and some more. - Some postprocessing is done by ourselves, i.e. to
add definitness markers.
16Dependency tree
3. A framework for the automatic classification
of concept types
Die Semantik ist das Teilgebiet der Linguistik,
The semantics is that branch
of linguistics
main - ist
possessor
subj - Semantik
comp - Teilgebiet
det - DieDef
det - dasDef
mod - LinguistikGen
det - derDef
17Output of Connexors msyn
3. A framework for the automatic classification
of concept types
lt?xml version"1.0" encoding"iso-8859-1"?gt lt!DOCT
YPE analysis SYSTEM "http//www.connexor.com/dtds/
4.0/fdg3.dtd"gt ltanalysisgtltsentence
id"w1"gt lttoken id"w2"gt lttextgtDielt/textgt
ltlemmagtdielt/lemmagt ltdepend head"w3"gtdetlt/dependgt
lttagsgtltsyntaxgtPREMODlt/syntaxgtltmorphogtDET Def
FEM SG NOMlt/morphogtlt/tagsgtlt/tokengt lttoken
id"w3"gt lttextgtSemantiklt/textgt ltlemmagtsemantiklt/le
mmagt ltdepend head"w4"gtsubjlt/dependgt lttagsgtltsynta
xgtNHlt/syntaxgt ltmorphogtN FEM SG NOMlt/morphogtlt/tagsgt
lt/tokengt lttoken id"w4"gt lttextgtistlt/textgt
ltlemmagtseinlt/lemmagt ltdepend head"w1"gtmainlt/depend
gt lttagsgtltsyntaxgtMAINlt/syntaxgt ltmorphogtV IND PRES
SG P3lt/morphogtlt/tagsgtlt/tokengt lttoken id"w5"gt
lttextgtdaslt/textgt ltlemmagtdaslt/lemmagt ltdepend
head"w6"gtdetlt/dependgt lttagsgtltsyntaxgtPREMODlt/synt
axgt ltmorphogtDET Def NEU SG NOMlt/morphogtlt/tagsgtlt/to
kengt lttoken id"w6"gt lttextgtTeilgebietlt/textgt
ltlemmagtteilgebietlt/lemmagt ltdepend
head"w4"gtcomplt/dependgt lttagsgtltsyntaxgtNHlt/syntaxgt
ltmorphogtN NEU SG NOMlt/morphogtlt/tagsgtlt/tokengt ltto
ken id"w7"gt lttextgtderlt/textgt ltlemmagtdielt/lemmagt
ltdepend head"w8"gtdetlt/dependgt lttagsgtltsyntaxgtPREM
ODlt/syntaxgt ltmorphogtDET Def FEM SG
GENlt/morphogtlt/tagsgtlt/tokengt lttoken id"w8"gt
lttextgtLinguistiklt/textgt ltlemmagtlinguistiklt/lemmagt
ltdepend head"w6"gtmodlt/dependgt lttagsgtltsyntaxgtNHlt/
syntaxgt ltmorphogtN FEM SG GENlt/morphogtlt/tagsgtlt/toke
ngt
18training sample
3. A framework for the automatic classification
of concept types
Extraction of relevant contextual features with
regular expressions mapped on dependency trees
with the programming language Perl. Results in
pairs (concept type list of context
features) (f1, tnr2, toksemantik, suffik,
numsg, artdef) (r2, tnr5, tokteilgebiet,
numsg, artdef, possrgen) (f1, tnr7,
toklinguistik, suffik, numsg, artdef) (f2,
tnr12, tokbedeutung, suffung, numsg,
artnone) (r2, tnr16, tokart, numsg,
artindef, possvon) (f2, tnr18,
tokdefinition, numsg, artnone) (r2, tnr22,
tokfreund, numsg, artdef) (so, tnr30,
tokbuch, numsg, artindef) (r2, tnr33,
tokhand, numsg, artdef) (f2, tnr49,
tokautor, numsg, artnone) (r2, tnr52,
tokeinführung, suffung, numsg,
artindef) (f2, tnr61, tokgegenstand, numsg,
artdef)
19Automatic classification
3. A framework for the automatic classification
of concept types
- given
- training sample t (a1,b1),,(an,bn)
- classes ai ? f1, f2, r1, r2
- contexts bi m1,,mm
- features mi ? artdef, artindef, posslgen,
- searched
- classifier p(ab)
- How probable is class a given context b?
- maximal argument a arg maxa p(ab)
- Which is the most probable class a given context
b?
20Computing a (bad) classifier
3. A framework for the automatic classification
of concept types
- simplest account
- Counting coocurrences ofclasses and contexts
- shortcomings
- Only the contexts in t are learned.
- Varying degrees of evidence of single features
are disregarded. - way out
- Computation of the classifier with a maximimum
entropy model.
21Maximum entropy models
3. A framework for the automatic classification
of concept types
- Basics
- Entropy number of bits required to encode events
of a particular type (tossing a coin 1 bit,
rolling a die 2 ½ Bit). - Principle of maximum entropy choose a model with
maximum entropy, i.e. dont go beyond the data. - Specific features
- Decompositon of contexts into single context
features or their combination. - Possibility to combine features from heterogenous
sources (e.g. syntax, semantics, morphology, ). - Computation of the weights (evidence) of single
features or their combination for every class
over all contexts.
22Contextual and binary features
3. A framework for the automatic classification
of concept types
- The weights for contextual features are
determined indirectly with binary features. These
relate classes and contextual features. - simple binary features example instance
- complex binary features example instance
23Maximum entropy framework
3. A framework for the automatic classification
of concept types
cf. Ratnaparkhi 1998
where aj gt 0 is a wheight for feature fj, k is
the total number of binary features,and Z(b) is
a normalization constant to ensure that Sa p(ab)
1 resp. 100
24Generalized Iterative Scaling
3. A framework for the automatic classification
of concept types
Unfortunately, there is no analytical method to
determine the weights a. There are some iterative
approximation algorithms to determine the a,
which converge to a correct p(ab) and respect
the principle of maximum entropy. We use
Generalized Iterative Scaling (GIS)
initialization
is the expectation value for feature fj
in the training corpus is the
expectation value for feature fj in the previous
iteration The constant C is the total number of
active binary features over all contexts.
iteration
25Conclusion
4. Conclusion
- The investigations so far support the assumption
that the referential properties of the concept
types match their grammatical uses. - The maximum entropy framework allows a fine
grained analysis of the evidence contributed by a
single context feature to the classification. - The selection of relevant features is essential
for the success of the automatic classification.
Our research objective consists to a great deal
in the examination of this features. - We start experiments with complex features to
model combined evidence of context features.