Relations between multiple annotations: Representation, Inferences, Context Specification, and Unifi - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Relations between multiple annotations: Representation, Inferences, Context Specification, and Unifi

Description:

... on occurrences of Attribute-Value-Pairs enter Attribute name or type q ... names. Kleene-star ... list of elements which should be deleted in the ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 48
Provided by: andreas156
Category:

less

Transcript and Presenter's Notes

Title: Relations between multiple annotations: Representation, Inferences, Context Specification, and Unifi


1
Relations between multiple annotations
Representation, Inferences, Context
Specification, and Unification
  • Andreas Witt
  • Dieter Metzing
  • Jens Pönninghaus
  • Daniela Goecke

www.text-technology.de
2
Contents
  • Project description
  • Approaches to Multiple Annotations
  • multiple Levels
  • multiple Layers
  • Representation
  • Inferences
  • Context Specification
  • Unification

3
Project description
  • The Project secondary information structuring and
    comparative discourse analysis (Sekimo) is part
    of the DFG-Forschergruppe 437 Text-technological
    modelling of information
  • Within this Project a corpus is annotated on
    different (linguistic) levels
  • Aim of the project Inferring, Describing, and
    Modelling relations between these levels

4
Contents
  • Project description
  • Approaches to Multiple Annotations
  • multiple Levels
  • multiple Layers
  • Representation
  • Inferences
  • Context Specification
  • Unification

5
Standard Methodology
  • A corpus is annotated according to a given tag
    set
  • The tag set is defined in a document grammar
    (e.g. the TEI-DTD)
  • In general, different tag sets exist for
    annotating different kinds of documents (e.g.
    poems, encyclopedia) or different kinds of
    information (e.g. linguistic information)
  • In particular, a linguistic annotation can depend
    on
  • theoretical assumptions
  • constituent structure,
  • functional structure, or
  • a (more) specific theory
  • the language
  • research questions

6
Problems of the standard methodology
  • Levels of description are neglected
  • or
  • Different levels of annotation are mixed up

Difficulties
  • Multiple hierarchies within one document

7
General Solutions (c.f. TEI-Guidelines)
  • concur an optional feature of SGML (not
    available in XML) which allows multiple
    hierarchies to be marked up concurrently in the
    same document
  • milestone elements empty elements which mark the
    boundaries between elements in a non-nesting
    structure
  • fragmentation of an item the division of what
    logically is a single element into two or more
    parts, each of which nests properly within its
    context
  • virtual joins the recreation of a virtual
    element from fragments of text, (requires a
    separate interpretation)
  • redundant encoding of information in multiple
    forms

8
Multiple hierarchies and language data
  • Hypertext linking techniques are used for
    connecting multiple layers of annotation, e.g.
  • Within the EU-Project NITE an annotation format
    has been developed which allows for specifying
    links between separate annotation layers
  • The annotation graphs (AGs) format uses a
    (possibly abstract) timeline as linking-layer
  • Modified versions of the AGs are applied by
  • the TASX-Annotator
  • the EXMARaLDA-Project

9
Alternative Methodology
  • XML-based multi-layer annotation
  • Technically, each layer becomes a separate and
    independent XML-document
  • The same text is annotated several times
  • Advantages
  • seems to be the only way to annotate multiple
    hierarchies without workarounds
  • each document instance uses its own DTD (or
    Schema), i.e. annotation formats are not mixed up
  • at any time a new annotation can be produced
  • transformation tools to the NITE and the
    TASX-format exist (Masters Thesis by Jan F. Maas)

10
Contents
  • Project description
  • Approaches to Multiple Annotations
  • multiple Levels
  • multiple Layers
  • Representation
  • Inferences
  • Context Specification
  • Unification

11
Layer vs. Level
  • We distinguish annotation level vs. annotation
    layer
  • Annotation level refers to an abstract level of
    analysis
  • Annotation layer refers to the realisation of an
    annotation in e.g. XML
  • Examples of annotation levels morphology in a
    linguistic grammar, text structure (sections,
    paragraphs,...), layout (lines and pages),
    thematic structure, rhetorical structure
  • Sometimes one layer contains several levels (e.g.
    HTML), but a level can also be distributed over
    several layers

12
Annotation Process
  • Given
  • the textual representation of language material
    (text)
  • the text is regarded as primary data
  • For each annotation layer the primary data is
    copied
  • The (copy of the) primary text is annotated
    according to a schema (e.g. a DTD)
  • Annotation can be prepared
  • in any XML-Editor (e.g. XMetaL, XML-Spy,
    psgml-emacs)
  • special purpose annotation tool

13
Sample annotation with a web-based, special
purpose annotation tool This tool is used only
for flat xml-structures, i.e. xml-annotations
with non-nested elements
14
ExampleXML-Annotation with the emacs
editor(useable for deep and flat annotations)
15
Multi-layer-annotation tool (master's thesis by
Stefan Michel work in progress)
16
Multiple Annotations
  • Drawbacks
  • redundant
  • the separate documents are independent (i.e. not
    connected)
  • But
  • since the documents contain exactly the same
    text, the text can function as the link
  • Solution
  • a common representation format for all separate
    XML-documents

17
Contents
  • Project description
  • Approaches to Multiple Annotations
  • multiple Levels
  • multiple Layers
  • Representation
  • Inferences
  • Context Specification
  • Unification

18
Prolog-Representation
  • The Prolog-representation is based on work by
    Renear, Huitfeld, Dubin and Sperberg-McQueen
  • Original representation for an XML-Elementnode/2,
    i.e. the predicate node has two arguments
  • the position in the document tree
  • a value, e.g. element(corpus)
  • Extension node/2 is replaced by node/5
  • The 3 new arguments
  • annotation layer
  • startingpoint of the annotated text
  • end-point of the annotated text

19
Conversion from XML to Prolog (xml2prolog)
  • Implemented in Python
  • Input 1 or more XML-Documents
  • Result Collection of Prolog facts
  • Example
  • the element ltRootgt is represented as the fact
  • node(AnnotationLayer, 0, 42332, 1,
    element(Root)).
  • the attribute attval of the Element ltRootgt is
    represented as the fact
  • attr(AnnotationLayer, 0, 42332, 1, 'att',
    'val').

20
xml2prolog.py
  • Some options for the transformation process
  • compare the primary data of the XML files are
    compared, if the primary data is not identical,
    the first difference is shown
  • pcdata/pcdatanodes character data can be
    included
  • aggressive whitespace is added or removed
    anywhere in document if whitespace is the reason
    for differences of the primary data
  • filter some elements in some files should be
    filtered (including their textual content), e.g.
    ltscriptgt within HTML-documents

21
Example
22
Example (Collection of Prolog-Facts)
23
Example (Collection of Prolog-Facts)
annotation layer start- and endpoint nodes in
DOM-tree element names attribute-value-pair data-c
ontents
24
Contents
  • Project description
  • Approaches to Multiple Annotations
  • multiple Levels
  • multiple Layers
  • Representation
  • Inferences
  • Context Specification
  • Unification

25
Relations between annotation Layers
  • Relations are inferred automatically
  • Special Prolog predicates have been implemented,
    for
  • compare the annotation layers
  • Example (Identity)
  • ltwgttreelt/wgt
  • ltmgttreelt/mgt
  • ltsyllgttreelt/syllgt

26
Relations between Annotations
Vgl. Durusau Brook O'Donnell (2002) und Durand
(1999)
1. ltagt....................lt/agt
ltbgt......lt/bgt 2. ltagt.................
...lt/agt
ltbgt.........lt/bgt 3. ltagt....................lt/agt
ltbgt.....................lt/bgt 4.
ltagt....................lt/agt
ltbgt................................lt/bgt 5.
ltagt....................lt/agt ltbgt.................
.........................lt/bgt 6.
ltagt....................lt/agt
ltbgt......lt/bgt 7. ltagt....................lt/agt
ltbgt....................lt/bgt 8.
ltagt....................lt/agt ltbgt.................
..............lt/bgt etc.
27
Relations between annotation layers
Visualisation
Relation
identity
independence
inclusion
start point identity
end point identity
end point is starting point
overlap
range of element a
range of element b
28
Comparison of annotation layers
  • We distinguish two kinds of relations between
  • elementsrelations between single instances of an
    element (relations)
  • relations between all occurrences of instances an
    element (meta-relations)
  • Prolog programs have been developed to infer both
    kinds of relations

29
Prolog Implementation
  • Aims
  • statistics on annotation layers
  • relations between occurrences of elements
  • meta-relations

30
Example deep annotation (HPSG)
31
Statistics of the annotation according to HPSG
?- get_statistics. Please enter layer name or
type "q" to exit, "h" for help
hpsg. Statistics for hpsg Number of Nodes 14,
Number of different Elements 5 Number of
Attributes 1, Number of different A/V-pairs
4 ------------------------------------------ Diffe
rent elements and their occurrences
hpsg 1 nodesAndLabels
3 nonannotated-text 4 phrase
2 punctuation
4 ------------------------------------------ Attri
bute occurrences different
values type 5 4 For
information on occurrences of Attribute-Value-Pair
s enter Attribute name or type q to quit.
type. ( edgeCOMP,1 ) , ( edgeHD,2 ) , ( np,1 ) ,
( np-no,1 )
32
Relations between occurrences of elements
  • Query How often does a certain relation between
    elements hold?
  • chk_relation(Relation,Element1,Layer1,Element2,La
    yer2,L).
  • Relation a relation between elements (e.g.
    identity, overlap, or
  • endA_is_starting_pointB)
  • Element1 element name of annotation Layer1
  • Element2 element name of annotation Layer2
  • L result-list
  • It is also possible to infer examples and
    counter-examples of a certain relation

33
ExampleRelations between elements of the HPSG
Annotation and the elements of a
dialogue-annotation
34
Ex. Relations between HPSG-phrases and X
?- chk_relation(Relation,phrase,hpsg,X,dialogue,L)
. Relation identity X _G160 L
Relation included_B_in_A X _G160 L
Relation included_A_in_B X _G160 L
phrase, dialogue, 2, phrase, 2, dialogue,
1 ... Relation overlap_A X _G160 L
Yes
35
Meta-relations
  • If a certain relation holds for all instances of
    an element we defined meta-relation
  • identity At every occurrence of an element A in
    Layer1 an element B in Layer2 exists which spans
    the same range of characters
  • inclusion
  • at every occurrence of an element A in Layer1 an
    element B in Layer2 exists which is included or
    is identical
  • the meta relation identity does not hold
  • overlap At every occurrence of an element A in
    Layer1 an element B in Layer2 exists which
    overlaps with A
  • mixed no meta-relations exist

36
Meta-relations (cntd.)
  • identity - For all occurrences, the following
    configuration can found
  • ltagt....................lt/agtltbgt..................
    ..lt/bgt
  • inclusion - For all occurrences, one of the
    following configurations can be found
  • ltagt....................lt/agt
    ltbgt................................lt/bgt
  • ltagt....................lt/agt ltbgt...............
    ...........................lt/bgt
  • ltagt....................lt/agtlt
    bgt.......................................lt/bgt
  • ltagt....................lt/agtltbgt..................
    ..lt/bgt
  • overlap - For all occurrences, the following
    configuration can found
  • ltagt....................lt/agt
    ltbgt....................lt/bgt

37
Contents
  • Project description
  • Approaches to Multiple Annotations
  • multiple Levels
  • multiple Layers
  • Representation
  • Inferences
  • Context Specification
  • Unification

38
Context specification 1 Motivation
  • Often, general Meta-relations do not hold
  • In these cases, the elements can be classified
    according to structural properties within their
    layer
  • This allows to construct specific Meta-relations
  • A format to express the structural properties
    called Context Specification Document (CSD) has
    been developed

39
Context specification 2 Realization
  • Subclassification of element nodes via tree
    walking automata (TWA)
  • Underlying path-language for the construction of
    TWA Caterpillar-Expressions (cf.
    Brüggemann-Klein and Wood, 2000)
  • moves up, right, left, firstChild, lastChild
  • tests isFirst, isLast, isLeaf, isRoot
  • test for element names
  • Kleene-star operator

40
Sample application
NP
COMP
HD
NF
NP.NO
COMP
HD
VN
PGen
k e N
shucchou
no
41
Context specification 3 Subclassification
NP
COMP
HD
NF
NP.NO
Relation holds for all Comp Elements
COMP
HD
VN
PGen
Relation holds only for a subset
k e N
shucchou
no
42
Contents
  • Project description
  • Approaches to Multiple Annotations
  • multiple Levels
  • multiple Layers
  • Representation
  • Inferences
  • Context Specification
  • Unification

43
Unification of annotation layers I
  • Two document layers can be merged
  • This process has also been implemented in Prolog
  • The predicate (semt) receives four arguments.
  • layer1 (to be unified)
  • layer2 (to be unified)
  • list of elements which should be deleted in the
    process of unification
  • The result of the merger (again a collection of
    Prolog facts) is written to a new file specified
    in the fourth argument
  • The new database contains a copy of all layers in
    the input database plus the result layer
  • In case the unification results to a layer where
    the elements would not be properly nested, a
    second result layer (a difference list) is
    created.

44
Unification of annotation layers II
  • The result database is re-converted to XML using
    a python program
  • If no difference list exists, the result of the
    merging of two layers can be linearised as an XML
    document straightforwardly
  • In case the result fact base contains a
    difference list, two different linearisations can
    be generated.
  • the default processing uses milestone elements to
    mark the borders of incompatible elements.
  • alternatively, the technique of fragmentation of
    elements can be invoked.

45
Architecture
Inference/ Query
via Python
XML-docu-ments
Generation of XML from the fact
base Unification of annotation levels
via Python
External information
XML-docu-ments
Rules
Rules
46
Contents
  • Project description
  • Approaches to Multiple Annotations
  • Representation
  • Inferences
  • Context Specification
  • Unification

47
Relations between multiple annotations
Representation, Inferences, Context
Specification, and Unification
  • Andreas Witt
  • Dieter Metzing
  • Jens Pönninghaus
  • Daniela Goecke

www.text-technology.de
Write a Comment
User Comments (0)
About PowerShow.com