Temporal information extraction: Reasoning with events based on their descriptions - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Temporal information extraction: Reasoning with events based on their descriptions

Description:

Temporal information affects everything described by language. ... Evolved through the MUC conferences and TERN, through TIMEX, TIMEX2 and TIMEX3. ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 33
Provided by: Leon216
Category:

less

Transcript and Presenter's Notes

Title: Temporal information extraction: Reasoning with events based on their descriptions


1
Temporal information extractionReasoning with
events based on their descriptions
  • Leon Derczynski
  • University of Sheffield

2
Introduction
  • Background
  • Anchoring events
  • Reasoning about events
  • Representing temporal data
  • Evaluating annotations

3
Background
  • Why bother?
  • Temporal information affects everything described
    by language.
  • The world is in a state that changes with time.
  • Not all assertions made in written text are true
    together.
  • Temporal information shows which sets of data can
    concurrently be true.

4
Tense and temporal models
  • Zeno Vendler (1957)
  • Verbs and Times
  • Hans Reichenbach (1947)
  • The tenses of verbs
  • James Allen (1983)
  • Maintaining knowledge about temporal intervals

5
Vendler
  • Vendler verb classification
  • Verb instances fall into one of four groups
  • Stative a persistent state (John sits)
  • Activity lasts for a finite period (Bob ran
    for an hour)
  • Accomplishment takes a finite period, and
    culminates (Kate climbed the hill in five
    minutes)
  • Achievement Instantaneous finishing events
    (Lucy reached the top of Everest)
  • Tests are provided to see which group a verb
    sense fits in.

6
Reichenbach
  • Reichenbach model of verb tenses
  • Speech time when the words were uttered.
  • Event time when the described event occurred.
  • Reference time like a viewpoint.
  • The cat will break the door STRT, ET in the
    future
  • The cat will have broken the door ST
    present, ET in the future, RT looks back onto ET
  • Allows simplistic description of any phrase.
  • Tracking reference time is sometimes very
    helpful
  • When John comes home, I will have gone
  • In this case, when describes a reference time
    for the whole sentence.

7
Allen
  • Interval logic
  • All events are described as intervals, with start
    and end points.
  • Interval relation types are defined (before,
    includes, starts).
  • A table for inferring about interval relations is
    given.
  • E.g. A before B, A includes D
  • Before stipulates that As endpoint is before Bs
    start.
  • We can infer D before B.

8
Anchoring events
  • Introduction to event anchoring
  • Dealing with named weekdays
  • TEA an implemented anchoring system
  • Problems

9
Anchoring events
  • Fixing information from text to a timeline.
  • Calendrical time is a common reference, given a
    calendar.
  • Expressions describing a time are sometimes
    referred to as TIMEXs.
  • Once identified, a TIMEX may be normalised to a
    fully specified date or interval.
  • Named entity recognition, finite state grammars
    and machine learning have all been used to
    identify these expressions.
  • Appropriate granularity should be chosen.

10
Weekday references
  • The English week has seven day names.
  • A single day name is often deemed sufficient
    reference for a human
  • Ill see you next Tuesday
  • Monday, and the markets are buzzing
  • To anchor a weekday, given ST and a day name, we
    need to choose direction from ST, and optionally
    distance.
  • Baldwin an inclusive 7-day sliding window,
    centred on today.
  • Mani Wilson find controlling verbs tense and
    use this to determine direction.
  • Tense estimation check PoS of sentence tokens
    for VBD if found, assume backwards.
  • Dependency-based use Stanford parser to find
    controlling verb.
  • Mazur Dale - Whats the Date? (2008)

11
Generic vs. specific
  • Some expressions, that look like TIMEXs, should
    not be normalised.
  • Today can mean
  • the 24-hour period containing ST and bounded by
    midnights.
  • Modern times, or a change of frame of reference
  • In Victorian times, ladies wore long dresses.
    Today, modern fashions do not dictate a single
    length.
  • This second idea is not restricted to the period
    from 0000 to 2359 GMT on Thursday 7th May 2009!
  • As 90 of uses in some texts are specific 1, some
    systems choose to accept a 10 error rate.
  • Features based on local words can help
    distinguish generic from specific, but below this
    baseline accuracy. 2
  • 1 Han, Gates Levin From Language to Time
    A Temporal Expression Anchorer (2006)2 Mani
    Wilson Robust Temporal Processing of News
    (2000)

12
TEA
  • Temporal Expression Anchorer Han, Gates Levin
    at CMU.
  • Calendar used as time ontology, dealing with
    various levels of granularity.
  • Processes TCNL (Time Calculus for Natural
    Language).
  • Identifies temporal expressions in input, and
    associates TIMEXs with their textually nearest
    verb.
  • Absolute and relative expressions are evaluated
    using TCNL
  • Friday last week is split, into Friday and
    last week
  • fri now - 1week fri,now - 1week
    now - 1fri
  • Constraint satisfaction based on a calendar model
    narrows the possible set of absolute dates.

13
Determining event durations
  • Given some normalised expressions, knowing event
    durations can greatly increase our reasoning
    ability.
  • Data can be taken from human annotators.
  • Determining a typical event duration is
    difficult
  • The dog ran up the hill
  • Linda had finished her cleaning
  • This results in low inter-annotator agreement.
  • A simplified approach would allocate durations
    into two classes shorter or longer than a day.
  • Possible to classify events this simply with 76
    accuracy, using hypernym and local word PoS
    features.
  • Pan, Mulkar Hobbs Learning Event Durations
    from Event Descriptions (2006)

14
Reasoning about events
  • Introduction
  • Temporal closure
  • Minimal notations and temporal inference
  • Help from linguistic models

15
Reasoning about events
  • Annotations often only describe a subset of a
    documents temporal information, perhaps as a
    number of labelled events and times.
  • An annotation may also include some links between
    pieces of temporal information.
  • It is possible to infer data about relations
    between points, given a set of rules or logic,
    and some existing relations.
  • It is also possible to add detail and boundaries
    to an annotation based on linguistic features of
    the source text.
  • This ability to reason about events saves human
    annotators work, and allows us to maximise the
    available descriptions from their efforts.

16
Temporal closure
  • A temporal closure can be thought of as a graph
  • Times and events are node relations are edges.
  • Every time and event is connected to every other.
  • E.g.
  • t1 is Tuesday 5th May 2009
  • e1 is hearing this talk
  • We can say t1 before e1, thus giving a type to
    this relation.
  • A temporal closure includes relations between
    every node in the graph.
  • This can lead to very large amounts of data for
    only a moderate-sized document.

17
Minimal annotations
  • It is rare for every relation (graph edge) to be
    annotated. We can infer some relations
  • (t1 before e1) (e1 before e2) gt (t1 before e2)
  • Inference can be used to complete a closure
    without specifying every relations type.
  • When this applies, and no more relations can be
    removed, we have a minimal annotation.
  • For example
  • Three nodes e4, e5, e6
  • Closure has 3 possible relations
  • A minimal graph may just say
  • (e4 after e5)
  • (e5 simultaneous e6)
  • To infer the closure, we simply need to add
  • (e4 after e6), or (e6 before e4)

18
Relation inference
  • Allens interval logic describes 13
    relationships, and provides a transitivity table
    for inferring a relation given two related ones.
  • Some inconsistent labellings are possible.
  • Backtracking over the initial graph should detect
    these cases.
  • A set of ten inference rules can be used
  • Allens 13 relations are reduced to just 3,
    including some reversal of parameters.
  • Only before, simultaneous and includes are used
  • e9 after e10 gt e10 before e9
  • These rules can be iteratively added to an agenda
    and used to reason with a database of approved
    relations.
  • For small graphs (lt 2000 edges) we can assign
    types to around 10 of relations, given a human
    annotation.

19
Applying Reichenbach
  • Reference time can provide a boundary on an
    event.
  • John had eaten all the pies
  • Event 1 eating
  • ET RT ST
  • John had eaten all the pies when Annika arrived
  • Event 2 arriving
  • Reference time is the same across the sentence.
  • ET RT ET2 ST
  • Because we know that RT is after ET and equal to
    ET2, we can specify three temporal relations
  • e1 before e2
  • e1 before ST
  • e2 before ST
  • Having a model for tenses allows us to
    confidently add relations to a temporal graph of
    a discourse.

20
Representing temporal data
  • Introduction
  • TIMEX and TimeML
  • TCNL
  • T-BOX

21
Representing temporal data
  • Once we can identify temporal information, we
    need to store this information.
  • Temporal information is rich, and favours a
    format that can capture it well.
  • Aspect, polarity, tense, part of speech
  • Event class, event frequency
  • Hints about reference, speech and event time
  • Notation languages are available both for storing
    and working with this data.
  • These languages are new (under a decade old), and
    possibly not yet mature.

22
TIMEX
  • Standard for describing a time-specific
    expression.
  • Evolved through the MUC conferences and TERN,
    through TIMEX, TIMEX2 and TIMEX3.
  • TIMEX3 is currently used as the means of
    describing absolute times in TimeML.
  • ltTIMEX3
  • tid"t43" type"DATE"
  • value"1989-10-30" temporalFunction"false"
  • functionInDocument"CREATION_TIME"gt
  • 10/30/89
  • lt/TIMEX3gt

23
TimeML
  • SGML-based language for temporal annotation.
  • Allows identification of events and times.
  • Thorough provision of links between events and
    times
  • TLINK temporal, possibly including a SIGNAL tag
    to a linking word
  • SLINK subordinate
  • ALINK aspectual
  • ISO standard.

24
TimeML - TimeBank
  • Corpus of 181 newswire texts.
  • Temporal information annotated in TimeML
  • 6383 TLINKs,
  • 7940 EVENTs,
  • 3004kB in size.
  • Tiny compared to some other types of corpus.
  • Involved a large human annotator effort and a few
    different versions.
  • Biggest temporally annotated corpus.

25
TCNL
  • Developed at CMU with L. Levin.
  • Useful for reasoning between events.
  • Captures intensional meanings of expressions.
  • Yesterday becomes now-1day instead of
    something like 20090506
  • A set of operators are used to reason between
    operands
  • /- for forward/reverse shifting
  • _at_ for in
  • 2sun _at_ may is the second Sunday in May
  • for distribution
  • 15hour wedfri is 3pm from Wednesday
    to Friday

26
T-BOX
  • Reading solid SGML is inconvenient for humans a
    visual representation of events may be
    preferable.
  • Presenting events on a timeline may lead to
    unintentional over-specification.
  • Suggests a distance.
  • Many intervals are left with one end open
  • Plotting parts of a sentence in temporal order
    will destroy word order, making it hard to read
  • Annotating documents can be done more easily when
    events are grouped locally and visually
    connected.
  • T-BOX1 from Brandeis specifies a set of rules for
    rendering events and their relations.
  • 1 Verhagen Drawing TimeML relations with
    T-BOX (2007)

27
T-BOX
  • Relations only exist between nodes that are
    directly connected or contained.
  • This suggests
  • - X contains Y
  • - Y is before Z
  • Drawing a temporal closure could provide a very
    cluttered and messy graph.
  • A set of guidelines are provided for reducing
    graphs to something more visually appealing.
  • Equivalence classes for some events.
  • Break cycles in graphs.
  • Remove derivable relations.

28
Evaluating annotations
  • Typical annotation evaluation.
  • Graph-based evaluation.

29
Evaluating annotations
  • Annotations can be compared in different ways.
  • When evaluating automated TIMEX or relation
    identification against a gold standard, we can
    measure precision and recall.
  • TimeBank is often used as a gold standard for
    training and evaluation or systems working in
    TimeML.
  • Evaluating TIMEX normalisation needs a different
    measure, as there are varying degrees of
    correctness available.

30
Graph based evaluation
  • Based on the use of minimal temporal graphs.
  • Graphs between events (intervals) are converted
    into graphs between points
  • Smaller set of relations, needing only and lt
  • Simpler algebra
  • Simultaneous points are grouped into nodes.
  • Graphs over the same set of points can then be
    compared, based on the number of node splits and
    merges needed to reach one from the other.

31
Summary
  • Background and models useful for temporal
    information extraction.
  • Technical approaches to temporal IE.
  • How to reason about events.
  • Temporal closure minimal annotations.
  • Notations for temporal information.
  • Evaluating temporal graphs annotations.

32
Questions
Write a Comment
User Comments (0)
About PowerShow.com