Conceptual noun types: grammar and automatic classification - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Conceptual noun types: grammar and automatic classification

Description:

Christian Horn & Christof Rumpf. CTF 07, D sseldorf. Institute for ... rose, car, horse, house, table, noun. inherently unique. L bner (1979, 1985, 1998) ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 26
Provided by: ASW
Category:

less

Transcript and Presenter's Notes

Title: Conceptual noun types: grammar and automatic classification


1
Conceptual noun typesgrammar and automatic
classification
  • Christian Horn Christof Rumpf
  • CTF 07, Düsseldorf

2
Structure
  • The four conceptual noun types and their
    contextual properties
  • Investigation of grammatical properties of the
    conceptual noun types on the basis of a German
    text corpus
  • A framework for the automatic classification of
    concept types
  • Conclusion

3
Conceptual noun types
1. The four conceptual noun types and their
contextual properties
Löbner (1979, 1985, 1998)
4
Grammatical uses of conceptual noun types
1. The four conceptual noun types and their
contextual properties
  • Sortal concepts
  • A rose is a nice present.
  • Many roses are an even nicer present.
  • Individual concepts
  • The sun is burning.
  • A sun is burning.
  • The suns are burning.
  • Many suns are burning.
  • My sun is burning. / The sun of mine is
    burning.
  • use differing from underlying concept type

5
Grammatical uses of conceptual noun types
1. The four conceptual noun types and their
contextual properties
  • Relational concepts
  • One of Marys legs is too short.
  • Marys leg is too short. / The leg of Mary is
    too short.
  • Many legs of Mary are too short.
  • Functional conceptsMary is Peters mother. /
    Mary is the mother of Peter.
  • Mary is a mother of Peter.
  • Mary is the mother.

6
Contextual properties of conceptual noun types
1. The four conceptual noun types and their
contextual properties
  • grammatical characteristics
  • possessive use his mother / mother of him
  • definiteness the sun
  • subcategorization certain verbs require IC/FC as
    complements
  • morphological properties certain nouns are often
    functional
  • deadjectival nouns (Intelligenz intelligence)
  • deverbal nouns (Krümmung bend, Dauer length)
  • compounds -wert value Bestwert
    optimum value-grad degree Wirkungsgrad
    degree of efficiency-größe size
    Kleidergröße dress size

7
2. Investigation of grammatical properties of the
conceptual noun types on the basis of a German
text corpus
2. Investigation of grammatical properties
  • Goals
  • to identify the possible uses of the different
    concept types and their specific context features
  • to develop and implement a method for the
    automatic classification of concept types in
    texts based on morphosyntactic features
  • Hybrid approach
  • semantic and grammatical analysis of the
    conceptual noun types
  • statistic investigation automatic classification
    allows the processing of large amounts of data
  • investigation is initially carried out on the
    basis of a German text corpus (108.000 words) as
    a training corpus
  • perspective further research intended on
    English, French, Japanese

8
Predictions
2. Investigation of grammatical properties
  • Assumptions
  • The lexicalized concept type of a noun is the
    most frequently used type for each noun.
  • Conceptual noun types occur particularly often in
    grammatical uses that match their underlying
    conceptual properties.
  • sortal concepts (rose) singular, plural, with
    quantifiers, indefinite ...
  • individual concepts (sun) singular, definite
  • relational concepts (leg) indefinite, possessive
  • functional concepts (mother) singular, definite,
    possessive
  • Other uses (type shifts) are still possible.
    The conditions under which these type shifts
    occur still have to be investigated.

9
Counting
2. Investigation of grammatical properties
  • (selection, definiteness)

1 definite def. determiner, poss. pron., gen.
pron., d-Prep, d-selb, d-einzig, genitive
deren/dessen), d-jen 2 quantifiers/indefinite
quantifiers, indefinite determiner,
demonstratives, numbers, kein, d-beid, d-ord 3
null determiner 4 incl. -1
10
Results
2. Investigation of grammatical properties
(selection)
? Results so far confirm our predictions.
11
Tasks Challenges
2. Investigation of grammatical properties
  • Type shifts in certain readings The meaning of
    the word. (FC)
  • The word bottle has many meanings. (RC)
  • Generic and anaphoric uses The lightbulb was
    invented by Heinrich Göbel. (generic)
  • Polysemy
  • Analysis of possessive constructions, plurals,
    null determiner

12
3. A framework for the automatic classification
of concept types
3. A framework for the automatic classification
of concept types
  • Architecture
  • Training corpus
  • Morphosyntactic analysis
  • Training sample
  • Computing classifiers
  • Maximum entropy models
  • Conclusion

13
Architecture of the framework
3. A framework for the automatic classification
of concept types
manual annotation of concept types
training corpus
test corpus
msyn dependency grammar parser
learning
morphosyntactical analysis
morphosyntactical analysis
application
extraction of relevant context features
training sample
test sample
Generalized Iterative Scaling
learning / applicationof a classifier
annotated test korpus
maximum entropy model
14
Training corpus
3. A framework for the automatic classification
of concept types
  • Manually annotated version of Löbner (2003)
    Semantik
  • Concept types of nouns marked with tags

Die ltf1gtSemantiklt/f1gt ist das ltr2gtTeilgebietlt/r2gt
der ltf2gtLinguistiklt/f2gt, das sich mit
ltr2gtBedeutunglt/r2gt befasst. Diese ltr2gtArtlt/r2gt
von ltf2gtDefinitionlt/f2gt mag vielleicht ihrem
ltr2gtFreundlt/r2gt genügen, der Sie zufällig mit
diesem ltsogtBuch lt/sogt in der ltr2gtHandlt/r2gt sieht
und Sie fragt, was denn nun schon wieder sei,
aber als ltf2gtAutorlt/f2gt einer solchen
ltr2gtEinführunglt/r2gt muss ich natürlich präziser
erklären, was der ltf2gtGegenstandlt/f2gt dieser
ltsogtWissenschaftlt/sogt ist.
15
Morphosyntactical analysis
3. A framework for the automatic classification
of concept types
  • We use Connexors msyn to analyse German texts.
    www.connexor.com
  • Syntactical information consists of dependency
    trees.
  • Morphological features include part-of-speech,
    gender, number, case, time, mood and some more.
  • Some postprocessing is done by ourselves, i.e. to
    add definitness markers.

16
Dependency tree
3. A framework for the automatic classification
of concept types
Die Semantik ist das Teilgebiet der Linguistik,
The semantics is that branch
of linguistics
main - ist
possessor
subj - Semantik
comp - Teilgebiet
det - DieDef
det - dasDef
mod - LinguistikGen
det - derDef
17
Output of Connexors msyn
3. A framework for the automatic classification
of concept types
lt?xml version"1.0" encoding"iso-8859-1"?gt lt!DOCT
YPE analysis SYSTEM "http//www.connexor.com/dtds/
4.0/fdg3.dtd"gt ltanalysisgtltsentence
id"w1"gt lttoken id"w2"gt lttextgtDielt/textgt
ltlemmagtdielt/lemmagt ltdepend head"w3"gtdetlt/dependgt
lttagsgtltsyntaxgtPREMODlt/syntaxgtltmorphogtDET Def
FEM SG NOMlt/morphogtlt/tagsgtlt/tokengt lttoken
id"w3"gt lttextgtSemantiklt/textgt ltlemmagtsemantiklt/le
mmagt ltdepend head"w4"gtsubjlt/dependgt lttagsgtltsynta
xgtNHlt/syntaxgt ltmorphogtN FEM SG NOMlt/morphogtlt/tagsgt
lt/tokengt lttoken id"w4"gt lttextgtistlt/textgt
ltlemmagtseinlt/lemmagt ltdepend head"w1"gtmainlt/depend
gt lttagsgtltsyntaxgtMAINlt/syntaxgt ltmorphogtV IND PRES
SG P3lt/morphogtlt/tagsgtlt/tokengt lttoken id"w5"gt
lttextgtdaslt/textgt ltlemmagtdaslt/lemmagt ltdepend
head"w6"gtdetlt/dependgt lttagsgtltsyntaxgtPREMODlt/synt
axgt ltmorphogtDET Def NEU SG NOMlt/morphogtlt/tagsgtlt/to
kengt lttoken id"w6"gt lttextgtTeilgebietlt/textgt
ltlemmagtteilgebietlt/lemmagt ltdepend
head"w4"gtcomplt/dependgt lttagsgtltsyntaxgtNHlt/syntaxgt
ltmorphogtN NEU SG NOMlt/morphogtlt/tagsgtlt/tokengt ltto
ken id"w7"gt lttextgtderlt/textgt ltlemmagtdielt/lemmagt
ltdepend head"w8"gtdetlt/dependgt lttagsgtltsyntaxgtPREM
ODlt/syntaxgt ltmorphogtDET Def FEM SG
GENlt/morphogtlt/tagsgtlt/tokengt lttoken id"w8"gt
lttextgtLinguistiklt/textgt ltlemmagtlinguistiklt/lemmagt
ltdepend head"w6"gtmodlt/dependgt lttagsgtltsyntaxgtNHlt/
syntaxgt ltmorphogtN FEM SG GENlt/morphogtlt/tagsgtlt/toke
ngt
18
training sample
3. A framework for the automatic classification
of concept types
Extraction of relevant contextual features with
regular expressions mapped on dependency trees
with the programming language Perl. Results in
pairs (concept type list of context
features) (f1, tnr2, toksemantik, suffik,
numsg, artdef) (r2, tnr5, tokteilgebiet,
numsg, artdef, possrgen) (f1, tnr7,
toklinguistik, suffik, numsg, artdef) (f2,
tnr12, tokbedeutung, suffung, numsg,
artnone) (r2, tnr16, tokart, numsg,
artindef, possvon) (f2, tnr18,
tokdefinition, numsg, artnone) (r2, tnr22,
tokfreund, numsg, artdef) (so, tnr30,
tokbuch, numsg, artindef) (r2, tnr33,
tokhand, numsg, artdef) (f2, tnr49,
tokautor, numsg, artnone) (r2, tnr52,
tokeinführung, suffung, numsg,
artindef) (f2, tnr61, tokgegenstand, numsg,
artdef)
19
Automatic classification
3. A framework for the automatic classification
of concept types
  • given
  • training sample t (a1,b1),,(an,bn)
  • classes ai ? f1, f2, r1, r2
  • contexts bi m1,,mm
  • features mi ? artdef, artindef, posslgen,
  • searched
  • classifier p(ab)
  • How probable is class a given context b?
  • maximal argument a arg maxa p(ab)
  • Which is the most probable class a given context
    b?

20
Computing a (bad) classifier
3. A framework for the automatic classification
of concept types
  • simplest account
  • Counting coocurrences ofclasses and contexts
  • shortcomings
  • Only the contexts in t are learned.
  • Varying degrees of evidence of single features
    are disregarded.
  • way out
  • Computation of the classifier with a maximimum
    entropy model.

21
Maximum entropy models
3. A framework for the automatic classification
of concept types
  • Basics
  • Entropy number of bits required to encode events
    of a particular type (tossing a coin 1 bit,
    rolling a die 2 ½ Bit).
  • Principle of maximum entropy choose a model with
    maximum entropy, i.e. dont go beyond the data.
  • Specific features
  • Decompositon of contexts into single context
    features or their combination.
  • Possibility to combine features from heterogenous
    sources (e.g. syntax, semantics, morphology, ).
  • Computation of the weights (evidence) of single
    features or their combination for every class
    over all contexts.

22
Contextual and binary features
3. A framework for the automatic classification
of concept types
  • The weights for contextual features are
    determined indirectly with binary features. These
    relate classes and contextual features.
  • simple binary features example instance
  • complex binary features example instance

23
Maximum entropy framework
3. A framework for the automatic classification
of concept types
cf. Ratnaparkhi 1998
where aj gt 0 is a wheight for feature fj, k is
the total number of binary features,and Z(b) is
a normalization constant to ensure that Sa p(ab)
1 resp. 100
24
Generalized Iterative Scaling
3. A framework for the automatic classification
of concept types
Unfortunately, there is no analytical method to
determine the weights a. There are some iterative
approximation algorithms to determine the a,
which converge to a correct p(ab) and respect
the principle of maximum entropy. We use
Generalized Iterative Scaling (GIS)
initialization
is the expectation value for feature fj
in the training corpus is the
expectation value for feature fj in the previous
iteration The constant C is the total number of
active binary features over all contexts.
iteration
25
Conclusion
4. Conclusion
  • The investigations so far support the assumption
    that the referential properties of the concept
    types match their grammatical uses.
  • The maximum entropy framework allows a fine
    grained analysis of the evidence contributed by a
    single context feature to the classification.
  • The selection of relevant features is essential
    for the success of the automatic classification.
    Our research objective consists to a great deal
    in the examination of this features.
  • We start experiments with complex features to
    model combined evidence of context features.
Write a Comment
User Comments (0)
About PowerShow.com