Temporal information extraction: Reasoning with events based on their descriptions - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Temporal information extraction: Reasoning with events based on their descriptions

Description:

Temporal information affects everything described by language. ... Evolved through the MUC conferences and TERN, through TIMEX, TIMEX2 and TIMEX3. ... – PowerPoint PPT presentation

Number of Views:119

Avg rating:3.0/5.0

Slides: 33

Provided by: Leon216

Category:

more less

Transcript and Presenter's Notes

Title: Temporal information extraction: Reasoning with events based on their descriptions

1
Temporal information extractionReasoning with
events based on their descriptions

Leon Derczynski
University of Sheffield

2
Introduction

Background
Anchoring events
Reasoning about events
Representing temporal data
Evaluating annotations

3
Background

Why bother?
Temporal information affects everything described
by language.
The world is in a state that changes with time.
Not all assertions made in written text are true
together.
Temporal information shows which sets of data can
concurrently be true.

4
Tense and temporal models

Zeno Vendler (1957)
Verbs and Times
Hans Reichenbach (1947)
The tenses of verbs
James Allen (1983)
Maintaining knowledge about temporal intervals

5
Vendler

Vendler verb classification
Verb instances fall into one of four groups
Stative a persistent state (John sits)
Activity lasts for a finite period (Bob ran
for an hour)
Accomplishment takes a finite period, and
culminates (Kate climbed the hill in five
minutes)
Achievement Instantaneous finishing events
(Lucy reached the top of Everest)
Tests are provided to see which group a verb
sense fits in.

6
Reichenbach

Reichenbach model of verb tenses
Speech time when the words were uttered.
Event time when the described event occurred.
Reference time like a viewpoint.
The cat will break the door STRT, ET in the
future
The cat will have broken the door ST
present, ET in the future, RT looks back onto ET
Allows simplistic description of any phrase.
Tracking reference time is sometimes very
helpful
When John comes home, I will have gone
In this case, when describes a reference time
for the whole sentence.

7
Allen

Interval logic
All events are described as intervals, with start
and end points.
Interval relation types are defined (before,
includes, starts).
A table for inferring about interval relations is
given.
E.g. A before B, A includes D
Before stipulates that As endpoint is before Bs
start.
We can infer D before B.

8
Anchoring events

Introduction to event anchoring
Dealing with named weekdays
TEA an implemented anchoring system
Problems

9
Anchoring events

Fixing information from text to a timeline.
Calendrical time is a common reference, given a
calendar.
Expressions describing a time are sometimes
referred to as TIMEXs.
Once identified, a TIMEX may be normalised to a
fully specified date or interval.
Named entity recognition, finite state grammars
and machine learning have all been used to
identify these expressions.
Appropriate granularity should be chosen.

10
Weekday references

The English week has seven day names.
A single day name is often deemed sufficient
reference for a human
Ill see you next Tuesday
Monday, and the markets are buzzing
To anchor a weekday, given ST and a day name, we
need to choose direction from ST, and optionally
distance.
Baldwin an inclusive 7-day sliding window,
centred on today.
Mani Wilson find controlling verbs tense and
use this to determine direction.
Tense estimation check PoS of sentence tokens
for VBD if found, assume backwards.
Dependency-based use Stanford parser to find
controlling verb.
Mazur Dale - Whats the Date? (2008)

11
Generic vs. specific

Some expressions, that look like TIMEXs, should
not be normalised.
Today can mean
the 24-hour period containing ST and bounded by
midnights.
Modern times, or a change of frame of reference
In Victorian times, ladies wore long dresses.
Today, modern fashions do not dictate a single
length.
This second idea is not restricted to the period
from 0000 to 2359 GMT on Thursday 7th May 2009!
As 90 of uses in some texts are specific 1, some
systems choose to accept a 10 error rate.
Features based on local words can help
distinguish generic from specific, but below this
baseline accuracy. 2
1 Han, Gates Levin From Language to Time
A Temporal Expression Anchorer (2006)2 Mani
Wilson Robust Temporal Processing of News
(2000)

12
TEA

Temporal Expression Anchorer Han, Gates Levin
at CMU.
Calendar used as time ontology, dealing with
various levels of granularity.
Processes TCNL (Time Calculus for Natural
Language).
Identifies temporal expressions in input, and
associates TIMEXs with their textually nearest
verb.
Absolute and relative expressions are evaluated
using TCNL
Friday last week is split, into Friday and
last week
fri now - 1week fri,now - 1week
now - 1fri
Constraint satisfaction based on a calendar model
narrows the possible set of absolute dates.

13
Determining event durations

Given some normalised expressions, knowing event
durations can greatly increase our reasoning
ability.
Data can be taken from human annotators.
Determining a typical event duration is
difficult
The dog ran up the hill
Linda had finished her cleaning
This results in low inter-annotator agreement.
A simplified approach would allocate durations
into two classes shorter or longer than a day.
Possible to classify events this simply with 76
accuracy, using hypernym and local word PoS
features.
Pan, Mulkar Hobbs Learning Event Durations
from Event Descriptions (2006)

14
Reasoning about events

Introduction
Temporal closure
Minimal notations and temporal inference
Help from linguistic models

15
Reasoning about events

Annotations often only describe a subset of a
documents temporal information, perhaps as a
number of labelled events and times.
An annotation may also include some links between
pieces of temporal information.
It is possible to infer data about relations
between points, given a set of rules or logic,
and some existing relations.
It is also possible to add detail and boundaries
to an annotation based on linguistic features of
the source text.
This ability to reason about events saves human
annotators work, and allows us to maximise the
available descriptions from their efforts.

16
Temporal closure

A temporal closure can be thought of as a graph
Times and events are node relations are edges.
Every time and event is connected to every other.
E.g.
t1 is Tuesday 5th May 2009
e1 is hearing this talk
We can say t1 before e1, thus giving a type to
this relation.
A temporal closure includes relations between
every node in the graph.
This can lead to very large amounts of data for
only a moderate-sized document.

17
Minimal annotations

It is rare for every relation (graph edge) to be
annotated. We can infer some relations
(t1 before e1) (e1 before e2) gt (t1 before e2)
Inference can be used to complete a closure
without specifying every relations type.
When this applies, and no more relations can be
removed, we have a minimal annotation.
For example
Three nodes e4, e5, e6
Closure has 3 possible relations
A minimal graph may just say
(e4 after e5)
(e5 simultaneous e6)
To infer the closure, we simply need to add
(e4 after e6), or (e6 before e4)

18
Relation inference

Allens interval logic describes 13
relationships, and provides a transitivity table
for inferring a relation given two related ones.
Some inconsistent labellings are possible.
Backtracking over the initial graph should detect
these cases.
A set of ten inference rules can be used
Allens 13 relations are reduced to just 3,
including some reversal of parameters.
Only before, simultaneous and includes are used
e9 after e10 gt e10 before e9
These rules can be iteratively added to an agenda
and used to reason with a database of approved
relations.
For small graphs (lt 2000 edges) we can assign
types to around 10 of relations, given a human
annotation.

19
Applying Reichenbach

Reference time can provide a boundary on an
event.
John had eaten all the pies
Event 1 eating
ET RT ST
John had eaten all the pies when Annika arrived
Event 2 arriving
Reference time is the same across the sentence.
ET RT ET2 ST
Because we know that RT is after ET and equal to
ET2, we can specify three temporal relations
e1 before e2
e1 before ST
e2 before ST
Having a model for tenses allows us to
confidently add relations to a temporal graph of
a discourse.

20
Representing temporal data

Introduction
TIMEX and TimeML
TCNL
T-BOX

21
Representing temporal data

Once we can identify temporal information, we
need to store this information.
Temporal information is rich, and favours a
format that can capture it well.
Aspect, polarity, tense, part of speech
Event class, event frequency
Hints about reference, speech and event time
Notation languages are available both for storing
and working with this data.
These languages are new (under a decade old), and
possibly not yet mature.

22
TIMEX

Standard for describing a time-specific
expression.
Evolved through the MUC conferences and TERN,
through TIMEX, TIMEX2 and TIMEX3.
TIMEX3 is currently used as the means of
describing absolute times in TimeML.
ltTIMEX3
tid"t43" type"DATE"
value"1989-10-30" temporalFunction"false"
functionInDocument"CREATION_TIME"gt
10/30/89
lt/TIMEX3gt

23
TimeML

SGML-based language for temporal annotation.
Allows identification of events and times.
Thorough provision of links between events and
times
TLINK temporal, possibly including a SIGNAL tag
to a linking word
SLINK subordinate
ALINK aspectual
ISO standard.

24
TimeML - TimeBank

Corpus of 181 newswire texts.
Temporal information annotated in TimeML
6383 TLINKs,
7940 EVENTs,
3004kB in size.
Tiny compared to some other types of corpus.
Involved a large human annotator effort and a few
different versions.
Biggest temporally annotated corpus.

25
TCNL

Developed at CMU with L. Levin.
Useful for reasoning between events.
Captures intensional meanings of expressions.
Yesterday becomes now-1day instead of
something like 20090506
A set of operators are used to reason between
operands
/- for forward/reverse shifting
_at_ for in
2sun _at_ may is the second Sunday in May
for distribution
15hour wedfri is 3pm from Wednesday
to Friday

26
T-BOX

Reading solid SGML is inconvenient for humans a
visual representation of events may be
preferable.
Presenting events on a timeline may lead to
unintentional over-specification.
Suggests a distance.
Many intervals are left with one end open
Plotting parts of a sentence in temporal order
will destroy word order, making it hard to read
Annotating documents can be done more easily when
events are grouped locally and visually
connected.
T-BOX1 from Brandeis specifies a set of rules for
rendering events and their relations.
1 Verhagen Drawing TimeML relations with
T-BOX (2007)

27
T-BOX

Relations only exist between nodes that are
directly connected or contained.
This suggests
- X contains Y
- Y is before Z
Drawing a temporal closure could provide a very
cluttered and messy graph.
A set of guidelines are provided for reducing
graphs to something more visually appealing.
Equivalence classes for some events.
Break cycles in graphs.
Remove derivable relations.

28
Evaluating annotations

Typical annotation evaluation.
Graph-based evaluation.

29
Evaluating annotations

Annotations can be compared in different ways.
When evaluating automated TIMEX or relation
identification against a gold standard, we can
measure precision and recall.
TimeBank is often used as a gold standard for
training and evaluation or systems working in
TimeML.
Evaluating TIMEX normalisation needs a different
measure, as there are varying degrees of
correctness available.

30
Graph based evaluation

Based on the use of minimal temporal graphs.
Graphs between events (intervals) are converted
into graphs between points
Smaller set of relations, needing only and lt
Simpler algebra
Simultaneous points are grouped into nodes.
Graphs over the same set of points can then be
compared, based on the number of node splits and
merges needed to reach one from the other.

31
Summary