Relations between multiple annotations: Representation, Inferences, Context Specification, and Unifi - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Relations between multiple annotations: Representation, Inferences, Context Specification, and Unifi

Description:

... on occurrences of Attribute-Value-Pairs enter Attribute name or type q ... names. Kleene-star ... list of elements which should be deleted in the ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 48

Provided by: andreas156

Category:

more less

Transcript and Presenter's Notes

Title: Relations between multiple annotations: Representation, Inferences, Context Specification, and Unifi

1
Relations between multiple annotations
Representation, Inferences, Context
Specification, and Unification

Andreas Witt
Dieter Metzing
Jens Pönninghaus
Daniela Goecke

www.text-technology.de
2
Contents

Project description
Approaches to Multiple Annotations
multiple Levels
multiple Layers
Representation
Inferences
Context Specification
Unification

3
Project description

The Project secondary information structuring and
comparative discourse analysis (Sekimo) is part
of the DFG-Forschergruppe 437 Text-technological
modelling of information
Within this Project a corpus is annotated on
different (linguistic) levels
Aim of the project Inferring, Describing, and
Modelling relations between these levels

4
Contents

Project description
Approaches to Multiple Annotations
multiple Levels
multiple Layers
Representation
Inferences
Context Specification
Unification

5
Standard Methodology

A corpus is annotated according to a given tag
set
The tag set is defined in a document grammar
(e.g. the TEI-DTD)
In general, different tag sets exist for
annotating different kinds of documents (e.g.
poems, encyclopedia) or different kinds of
information (e.g. linguistic information)
In particular, a linguistic annotation can depend
on
theoretical assumptions
constituent structure,
functional structure, or
a (more) specific theory
the language
research questions

6
Problems of the standard methodology

Levels of description are neglected
or
Different levels of annotation are mixed up

Difficulties

Multiple hierarchies within one document

7
General Solutions (c.f. TEI-Guidelines)

concur an optional feature of SGML (not
available in XML) which allows multiple
hierarchies to be marked up concurrently in the
same document
milestone elements empty elements which mark the
boundaries between elements in a non-nesting
structure
fragmentation of an item the division of what
logically is a single element into two or more
parts, each of which nests properly within its
context
virtual joins the recreation of a virtual
element from fragments of text, (requires a
separate interpretation)
redundant encoding of information in multiple
forms

8
Multiple hierarchies and language data

Hypertext linking techniques are used for
connecting multiple layers of annotation, e.g.
Within the EU-Project NITE an annotation format
has been developed which allows for specifying
links between separate annotation layers
The annotation graphs (AGs) format uses a
(possibly abstract) timeline as linking-layer
Modified versions of the AGs are applied by
the TASX-Annotator
the EXMARaLDA-Project

9
Alternative Methodology

XML-based multi-layer annotation
Technically, each layer becomes a separate and
independent XML-document
The same text is annotated several times
Advantages
seems to be the only way to annotate multiple
hierarchies without workarounds
each document instance uses its own DTD (or
Schema), i.e. annotation formats are not mixed up
at any time a new annotation can be produced
transformation tools to the NITE and the
TASX-format exist (Masters Thesis by Jan F. Maas)

10
Contents

Project description
Approaches to Multiple Annotations
multiple Levels
multiple Layers
Representation
Inferences
Context Specification
Unification

11
Layer vs. Level

We distinguish annotation level vs. annotation
layer
Annotation level refers to an abstract level of
analysis
Annotation layer refers to the realisation of an
annotation in e.g. XML
Examples of annotation levels morphology in a
linguistic grammar, text structure (sections,
paragraphs,...), layout (lines and pages),
thematic structure, rhetorical structure
Sometimes one layer contains several levels (e.g.
HTML), but a level can also be distributed over
several layers

12
Annotation Process

Given
the textual representation of language material
(text)
the text is regarded as primary data
For each annotation layer the primary data is
copied
The (copy of the) primary text is annotated
according to a schema (e.g. a DTD)
Annotation can be prepared
in any XML-Editor (e.g. XMetaL, XML-Spy,
psgml-emacs)
special purpose annotation tool

13
Sample annotation with a web-based, special
purpose annotation tool This tool is used only
for flat xml-structures, i.e. xml-annotations
with non-nested elements
14
ExampleXML-Annotation with the emacs
editor(useable for deep and flat annotations)
15
Multi-layer-annotation tool (master's thesis by
Stefan Michel work in progress)
16
Multiple Annotations

Drawbacks
redundant
the separate documents are independent (i.e. not
connected)
But
since the documents contain exactly the same
text, the text can function as the link
Solution
a common representation format for all separate
XML-documents

17
Contents

Project description
Approaches to Multiple Annotations
multiple Levels
multiple Layers
Representation
Inferences
Context Specification
Unification

18
Prolog-Representation

The Prolog-representation is based on work by
Renear, Huitfeld, Dubin and Sperberg-McQueen
Original representation for an XML-Elementnode/2,
i.e. the predicate node has two arguments
the position in the document tree
a value, e.g. element(corpus)
Extension node/2 is replaced by node/5
The 3 new arguments
annotation layer
startingpoint of the annotated text
end-point of the annotated text

19
Conversion from XML to Prolog (xml2prolog)

Implemented in Python
Input 1 or more XML-Documents
Result Collection of Prolog facts
Example
the element ltRootgt is represented as the fact
node(AnnotationLayer, 0, 42332, 1,
element(Root)).
the attribute attval of the Element ltRootgt is
represented as the fact
attr(AnnotationLayer, 0, 42332, 1, 'att',
'val').

20
xml2prolog.py

Some options for the transformation process
compare the primary data of the XML files are
compared, if the primary data is not identical,
the first difference is shown
pcdata/pcdatanodes character data can be
included
aggressive whitespace is added or removed
anywhere in document if whitespace is the reason
for differences of the primary data
filter some elements in some files should be
filtered (including their textual content), e.g.
ltscriptgt within HTML-documents

21
Example
22
Example (Collection of Prolog-Facts)
23
Example (Collection of Prolog-Facts)
annotation layer start- and endpoint nodes in
DOM-tree element names attribute-value-pair data-c
ontents
24
Contents

Project description
Approaches to Multiple Annotations
multiple Levels
multiple Layers
Representation
Inferences
Context Specification
Unification

25
Relations between annotation Layers

Relations are inferred automatically
Special Prolog predicates have been implemented,
for
compare the annotation layers
Example (Identity)
ltwgttreelt/wgt
ltmgttreelt/mgt
ltsyllgttreelt/syllgt

26
Relations between Annotations
Vgl. Durusau Brook O'Donnell (2002) und Durand
(1999)
1. ltagt....................lt/agt
ltbgt......lt/bgt 2. ltagt.................
...lt/agt
ltbgt.........lt/bgt 3. ltagt....................lt/agt
ltbgt.....................lt/bgt 4.
ltagt....................lt/agt
ltbgt................................lt/bgt 5.
ltagt....................lt/agt ltbgt.................
.........................lt/bgt 6.
ltagt....................lt/agt
ltbgt......lt/bgt 7. ltagt....................lt/agt
ltbgt....................lt/bgt 8.
ltagt....................lt/agt ltbgt.................
..............lt/bgt etc.
27
Relations between annotation layers
Visualisation
Relation
identity
independence
inclusion
start point identity
end point identity
end point is starting point
overlap
range of element a
range of element b
28
Comparison of annotation layers

We distinguish two kinds of relations between
elementsrelations between single instances of an
element (relations)
relations between all occurrences of instances an
element (meta-relations)
Prolog programs have been developed to infer both
kinds of relations

29
Prolog Implementation

Aims
statistics on annotation layers
relations between occurrences of elements
meta-relations

30
Example deep annotation (HPSG)
31
Statistics of the annotation according to HPSG
?- get_statistics. Please enter layer name or
type "q" to exit, "h" for help
hpsg. Statistics for hpsg Number of Nodes 14,
Number of different Elements 5 Number of
Attributes 1, Number of different A/V-pairs
4 ------------------------------------------ Diffe
rent elements and their occurrences
hpsg 1 nodesAndLabels
3 nonannotated-text 4 phrase
2 punctuation
4 ------------------------------------------ Attri
bute occurrences different
values type 5 4 For
information on occurrences of Attribute-Value-Pair
s enter Attribute name or type q to quit.
type. ( edgeCOMP,1 ) , ( edgeHD,2 ) , ( np,1 ) ,
( np-no,1 )
32
Relations between occurrences of elements

Query How often does a certain relation between
elements hold?
chk_relation(Relation,Element1,Layer1,Element2,La
yer2,L).
Relation a relation between elements (e.g.
identity, overlap, or
endA_is_starting_pointB)
Element1 element name of annotation Layer1
Element2 element name of annotation Layer2
L result-list
It is also possible to infer examples and
counter-examples of a certain relation

33
ExampleRelations between elements of the HPSG
Annotation and the elements of a
dialogue-annotation
34
Ex. Relations between HPSG-phrases and X
?- chk_relation(Relation,phrase,hpsg,X,dialogue,L)
. Relation identity X _G160 L
Relation included_B_in_A X _G160 L
Relation included_A_in_B X _G160 L
phrase, dialogue, 2, phrase, 2, dialogue,
1 ... Relation overlap_A X _G160 L
Yes
35
Meta-relations

If a certain relation holds for all instances of
an element we defined meta-relation
identity At every occurrence of an element A in
Layer1 an element B in Layer2 exists which spans
the same range of characters
inclusion
at every occurrence of an element A in Layer1 an
element B in Layer2 exists which is included or
is identical
the meta relation identity does not hold
overlap At every occurrence of an element A in
Layer1 an element B in Layer2 exists which
overlaps with A
mixed no meta-relations exist

36
Meta-relations (cntd.)

identity - For all occurrences, the following
configuration can found
ltagt....................lt/agtltbgt..................
..lt/bgt
inclusion - For all occurrences, one of the
following configurations can be found
ltagt....................lt/agt
ltbgt................................lt/bgt
ltagt....................lt/agt ltbgt...............
...........................lt/bgt
ltagt....................lt/agtlt
bgt.......................................lt/bgt
ltagt....................lt/agtltbgt..................
..lt/bgt
overlap - For all occurrences, the following
configuration can found
ltagt....................lt/agt
ltbgt....................lt/bgt

37
Contents

Project description
Approaches to Multiple Annotations
multiple Levels
multiple Layers
Representation
Inferences
Context Specification
Unification

38
Context specification 1 Motivation

Often, general Meta-relations do not hold
In these cases, the elements can be classified
according to structural properties within their
layer
This allows to construct specific Meta-relations
A format to express the structural properties
called Context Specification Document (CSD) has
been developed

39
Context specification 2 Realization

Subclassification of element nodes via tree
walking automata (TWA)
Underlying path-language for the construction of
TWA Caterpillar-Expressions (cf.
Brüggemann-Klein and Wood, 2000)
moves up, right, left, firstChild, lastChild
tests isFirst, isLast, isLeaf, isRoot
test for element names
Kleene-star operator

40
Sample application
NP
COMP
HD
NF
NP.NO
COMP
HD
VN
PGen
k e N
shucchou
no
41
Context specification 3 Subclassification
NP
COMP
HD
NF
NP.NO
Relation holds for all Comp Elements
COMP
HD
VN
PGen
Relation holds only for a subset
k e N
shucchou
no
42
Contents

Project description
Approaches to Multiple Annotations
multiple Levels
multiple Layers
Representation
Inferences
Context Specification
Unification

43
Unification of annotation layers I

Two document layers can be merged
This process has also been implemented in Prolog
The predicate (semt) receives four arguments.
layer1 (to be unified)
layer2 (to be unified)
list of elements which should be deleted in the
process of unification
The result of the merger (again a collection of
Prolog facts) is written to a new file specified
in the fourth argument
The new database contains a copy of all layers in
the input database plus the result layer
In case the unification results to a layer where
the elements would not be properly nested, a
second result layer (a difference list) is
created.

44
Unification of annotation layers II

The result database is re-converted to XML using
a python program
If no difference list exists, the result of the
merging of two layers can be linearised as an XML
document straightforwardly
In case the result fact base contains a
difference list, two different linearisations can
be generated.
the default processing uses milestone elements to
mark the borders of incompatible elements.
alternatively, the technique of fragmentation of
elements can be invoked.

45
Architecture
Inference/ Query
via Python
XML-docu-ments
Generation of XML from the fact
base Unification of annotation levels
via Python
External information
XML-docu-ments
Rules
Rules
46
Contents