Information Extraction from Single and Multiple Sentences Mark Stevenson Department of Computer Science University of Sheffield, UK - PowerPoint PPT Presentation

Loading...

PPT – Information Extraction from Single and Multiple Sentences Mark Stevenson Department of Computer Science University of Sheffield, UK PowerPoint presentation | free to download - id: 181430-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Information Extraction from Single and Multiple Sentences Mark Stevenson Department of Computer Science University of Sheffield, UK

Description:

'Daniel Glass was named president and chief executive officer of EMI Record Group' ... {Post PRESIDENT AND CHIEF EXECUTIVE OFFICER} {Org EMI RECORD GROUP} ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 17
Provided by: Informati140
Learn more at: http://nlp.shef.ac.uk
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Information Extraction from Single and Multiple Sentences Mark Stevenson Department of Computer Science University of Sheffield, UK


1
Information Extraction from Single and Multiple
Sentences Mark StevensonDepartment of
Computer Science University of Sheffield, UK
2
Introduction
  • Information Extraction often viewed as the
    process of identifying events described in text
  • Generally accepted that an event may be described
    across more than one sentence
  • Pace American Group Inc. said it notified
    two top executives it intends to dismiss them
    because an internal investigation found evidence
    of self-dealing and undisclosed financial
    relationships.
  • The executives are Don H. Pace, cofounder,
    president and chief executive officer and Greg
    S. Kaplan, senior vice president and chief
    financial officer.

3
Sentence-limited Approaches
  • Some approaches have treated each sentence in
    isolation and extracting only the events
    described within them
  • Zelenko et. al. (2003) - SVM
  • Soderland (1999) rule generalisation
  • Chieu and Ng (2002) maximum entropy
  • Yangarber et. al. (2000) pattern learning
  • This restriction often makes IE more practical
    for ML
  • How can results be compared against systems which
    extract all events?
  • How much can be achieved by just analysing within
    sentences?

4
Experiment
  • Compare two alternative annotations of the same
    corpus
  • Complete annotation identifies all events
    described in a document
  • Within sentence annotation marks only events
    described within a sentence
  • Corpus used are the MUC-6 evaluation texts
  • Documents describe management succession events
  • Complete annotation produced as part of formal
    evaluation
  • Within sentence annotation due to Soderland
    (1999).

5
Event Definition
  • Two annotations of this corpus have different
    definitions of what constitutes an event
  • Events in both annotations are transformed into a
    common representation scheme
  • Contains information encoded by both schemes
  • Allows comparison
  • Provides method for defining what constitutes an
    event
  • Each event is stored as a database entry
    consisting of four fields
  • type ? person_in or person_out
  • person, post, organisation
  • Minimal event description at least two elements

6
MUC Annotation
  • Annotations stored in complex nested template
    structure
  • Core SUCCESSION_EVENT, refers to specific
    position
  • Contains IN_AND_OUT events, movement of a single
    executive relative to that position
  • Aliases list alternative ways of referring to
    event objects
  • Representation does not directly link event
    objects to text so difficult to compute the
    proportion of events described with a sentence
    directly

7
  • ltSUCCESSION_EVENT-2gt
  • SUCCESSION_ORG
  • ltORGANIZATION-1gt
  • POST "chairman"
  • IN_AND_OUT ltIN_AND_OUT-4gt
  • ltIN_AND_OUT-4gt
  • IO_PERSON ltPERSON-1gt
  • NEW_STATUS IN
  • OTHER_ORG ltORGANIZATION-1gt
  • ltORGANIZATION-1gt
  • ORG_NAME "McCann-Erickson"
  • ORG_ALIAS "McCann"
  • ltPERSON-1gt
  • PER_NAME "John J. Dooner Jr."
  • PER_ALIAS "John Dooner, "Dooner"

type(person_in) post(chairman) org(McCann-Erickso
nMcCann) person(John J. Dooner Jr
John Dooner Dooner)
8
Within Sentence Annotation
  • Soderland (1999) produced an alternative
    annotation of the same corpus
  • Annotation is linked directly to source sentence
    so only events described within a single sentence
    are included
  • Annotations use a flat structure inspired by case
    frames

9
Within Sentence Annotation Example
  • Daniel Glass was named president and chief
    executive officer of EMI Record Group
  • _at__at_TAGS Succession
  • PersonIn DANIEL GLASS
  • Post PRESIDENT AND CHIEF EXECUTIVE OFFICER
  • Org EMI RECORD GROUP

event 1 type(person_in) event 2 type(person_in)
person(Daniel Glass) person(Daniel Glass)
org(EMI Records Group) org(EMI Records Group)
post(president) post(chief executive officer)
10
Matching
  • Allow two levels of matches between events in the
    two sets
  • Full match
  • Events contain the same fields and each field
    shares at least one filler
  • Partial match
  • Events share some fields and each of those fields
    share at least one filler
  • Matching process compares each event in the
    within sentence events with each event in the MUC
    event set
  • Allow only one-to-one mapping for full matches
    but many within sentence events can partially
    match onto a single MUC event

11
Fully matching events
type(person_in) type(person_in)
person(R. Wayne DieselDiesel) person(R. Wayne Diesel)
org(Mechanical Technology Inc. Mechanical Technology) org(Mechanical Technology)
post(chief executive officer) post(chief executive officer)
Partially matching events
type(person_in) type(person_in)
person(R. Wayne DieselDiesel) person(R. Wayne Diesel)
org(Mechanical Technology Inc. Mechanical Technology)
post(chief executive officer) post(chief executive officer)
12
Event Analysis
All events Within sentence events
Count 276 248
Full 40.6 (112) 45.2 (112)
Partial 39.1 (108) 47.6 (118)
No match 20.3 (56) 7.3 (18)
13
Mismatching Events
Within sentence events
All events
Spurious events in limited annotation set not matched to any event in MUC corpus 9
Events mentioned in limited annotation and text but not in MUC data (strict guidelines) 8
Event mentioned in limited annotation and text but not in MUC data 1
14
Event Field Analysis
Full match Partial match No match TOTAL
Type 112/112 100/108 0/56 76.8
Person 112/112 100/108 0/56 76.8
Org 112/112 6/108 0/53 43.2
Post 111/111 74/108 0/50 68.8
Total 447/447 280/432 0/215 66.5
15
Text Style
  • Variation between event field can be explained by
    the structure of documents in this corpus
  • Succession event often introduced at start of
    document. Generally complete
  • Washington Post Co. said Katherine Graham
    stepped down as chairman and will be succeeded by
    her son, Donald E. Graham, the companys chief
    executive.
  • Further events may not be described fully
  • Alan G. Spoon, 42, will succeed Mr. Graham as
    chief executive of the company.
  • Mr. Jones is succeeded by Mr. Green.
  • Mr. Smith assumed the role of CEO.

16
Conclusion
  • Analysis of a commonly used IE evaluation corpus
    showed that only 40.6 of events fully described
    within a single sentence.
  • A larger proportion of the events are partially
    described but wide variation between the event
    fields due to document style.
  • These results should be borne in mind during the
    design/evaluation of IE systems.
  • Additional implications for summarisation systems
    which select sentences and question answering
    systems.
About PowerShow.com