Proposition Bank: a resource of predicate-argument relations - PowerPoint PPT Presentation

About This Presentation
Title:

Proposition Bank: a resource of predicate-argument relations

Description:

Proposition Bank: a resource of predicate-argument relations Martha Palmer, Dan Gildea, Paul Kingsbury University of Pennsylvania February 26, 2002 – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 39
Provided by: coursesIs8
Category:

less

Transcript and Presenter's Notes

Title: Proposition Bank: a resource of predicate-argument relations


1
Proposition Bank a resource of
predicate-argument relations
Martha Palmer, Dan Gildea, Paul
Kingsbury University of Pennsylvania February
26, 2002 ACE PI Meeting, Fairfield Inn, MD
2
Outline
  • Overview
  • Status Report
  • Outstanding Issues
  • Automatic Tagging Dan Gildea
  • Details Paul Kingsbury
  • Frames files
  • Annotator issues
  • Demo

3
Proposition BankGeneralizing from Sentences to
Propositions
meet(Somebody1, Somebody2)
. . .
When Powell met Zhu Rongji on Thursday they
discussed the return of the spy
plane. meet(Powell, Zhu) discuss(Powell,
Zhu, return(X, plane))
4
Penn English Treebank
  • 1.3 million words
  • Wall Street Journal and other sources
  • Tagged with Part-of-Speech
  • Syntactically Parsed
  • Widely used in NLP community
  • Available from Linguistic Data Consortium

5
A TreeBanked Sentence
S
VP
NP-SBJ
Analysts
NP
S
VP
NP-SBJ
T-1
would
NP
PP-LOC
6
The same sentence, PropBanked
have been expecting
Arg1
Arg0
Analysts
7
English PropBank
  • 1M words of Treebank over 2 years, May01-03
  • New semantic augmentations
  • Predicate-argument relations for verbs
  • label arguments Arg0, Arg1, Arg2,
  • First subtask, 300K word financial subcorpus
  • (12K sentences, 29K predicates)
  • Spin-off Guidelines (necessary for annotators)
  • English lexical resource FRAMES FILES
  • 3500 verbs with labeled examples, rich semantics
  • http//www.cis.upenn.edu/ace/

8
English PropBank Current Status
  • Frames files
  • 742 verb lemmas (includes phrasal variants - 932)
  • 363/899 VerbNet semi-automatic expansions
    (subtask/PB)
  • First subtask 300K financial subcorpus
  • 22,595K unique predicates annotated out of 29K,
    (80)
  • 6K remaining (7 weeks, 1000_at_week, first pass)
  • 1005 verb lemmas out of 1700 (59)
  • 700 remaining (3.5 months, 200_at_month)
  • PropBank, (including some of Brown?)
  • 34,437 predicates annotated out of 118K, (29)
  • 1904 (1005 899) verb lemmas out of 3500, (54)

9
Projected delivery dates
  • Financial subcorpus
  • alpha release December, 2001
  • beta release June, 2002
  • adjudicated release Dec, 2002
  • Propbank
  • alpha release December, 2002
  • beta release Spring, 2003

10
English PropBank - Status
  • Sense tagging
  • 200 verbs with multiple rolesets
  • sense tag this summer with undergrads using NSF
    funds
  • Still need to address
  • 3 usages of "have imperative, possessive,
    auxiliary
  • be, become predicate adjectives, predicate
    nominals

11
Automatic Labeling of Semantic Relations
  • Features
  • Predicate
  • Phrase Type
  • Parse Tree Path
  • Position (Before/after predicate)
  • Voice (active/passive)
  • Head Word

12
Example with Features
13
Labelling Accuracy-Known Boundaries
Accuracy of semantic role prediction for known
boundaries--the system is given the constituents
to classify. Framenet examples (training/test)
are handpicked to be unambiguous.
14
Labelling Accuracy Unknown Boundaries
Accuracy of semantic role prediction for unknown
boundaries--the system must identify the
constituents as arguments and give them the
correct roles.
15
Complete Sentence
Analysts have been expecting a GM-Jaguar pact
that T-1 would give the U.S. car maker an
eventual 30 stake in the British company and
create joint ventures that T-2 would produce an
executive-model range of cars.
expect(analysts, pact) give(pact,
car_maker,stake) create(pact,joint_ventures) produ
ce(joint_ventures,range_of_cars)
16
Guidelines Frames Files
  • Created manually - Paul Kingsbury
  • new framer Olga Babko-Malaya, (Ph.D.,Rugters,
    Linguistics)
  • Refer to VerbNet, WordNet and Framenet
  • Currently in place for 787/986 verbs
  • Use "semantic role glosses" unique to each verb
    (map to Arg0, Arg1 labels appropriate to class)

17
Frames Example expect
Roles Arg0 expecter Arg1 thing
expected Example Transitive, active
Portfolio managers expect further declines in
interest rates. Arg0 Portfolio
managers REL expect
Arg1 further declines in interest rates
18
Frames example give
  • Roles
  • Arg0 giver
  • Arg1 thing given
  • Arg2 entity given to
  • Example double object
  • The executives gave the chefs a standing
    ovation.
  • Arg0 The executives
  • REL gave
  • Arg2 the chefs
  • Arg1 a standing ovation

19
How are arguments numbered?
  • Examination of example sentences
  • Determination of required / highly preferred
    elements
  • Sequential numbering, Arg0 is typical first
    argument, except
  • ergative/unaccusative verbs (shake example)
  • Arguments mapped for "synonymous" verbs

20
Additional tags (arguments or adjuncts?)
  • Variety of ArgM?s (Arggt4)
  • TMP - when?
  • LOC - where at?
  • DIR - where to?
  • MNR - how?
  • PRP -why?
  • REC - himself, themselves, each other
  • PRD -this argument refers to or modifies another
  • ADV -others

21
Ergative/Unaccusative Verbs rise
  • Roles
  • Arg1 Logical subject, patient, thing rising
  • Arg2 EXT, amount risen
  • Arg3 start point
  • Arg4 end point
  • Sales rose 4 to 3.28 billion from 3.16 billion.

Note Have to mention prep explicitly,
Arg3-from, Arg4-to, or could have used
ArgM-Source, ArgM-Goal. Arbitrary distinction.
22
Synonymous Verbs add in sense rise
  • Roles
  • Arg1 Logical subject, patient, thing
    rising/gaining/being added to
  • Arg2 EXT, amount risen
  • Arg4 end point
  • The Nasdaq composite index added 1.01 to
    456.6 on paltry volume.

23
Phrasal Verbs
  • Put together
  • Put in
  • Put off
  • Put on
  • Put out
  • Put up
  • ...
  • Accounts for additional 200 "verbs"

24
Frames Multiple Rolesets
  • Rolesets are not necessarily consistent between
    different senses of the same verb
  • Verb with multiple senses can have multiple
    frames, but not necessarily
  • Roles and mappings onto argument labels are
    consistent between different verbs that share
    similar argument structures, Similar to Framenet
  • Levin / VerbNet classes
  • http//www.cis.upenn.edu/dgildea/Verbs/
  • Out of the 787 most frequent verbs
  • 1 Roleset - 521
  • 2 rolesets - 169
  • 3 rolesets - 97 (includes light verbs)

25
Semi-automatic expansion of Frames
  • Experimenting with semi-automatic expansion
  • Find unframed members of Levin class in
    VerbNet--inherit frames from other member
  • 787 verbs manually framed
  • Can expand to 1200 using VerbNet
  • Will need hand correction
  • First experiment, automatic expansion provided
    90 coverage of data

26
More on Automatic Expansion
Destroy Arg0 destroyer Arg1 thing
destroyed Arg2 instrument of destruction Verbne
t class Destroy-44 annihilate, blitz, decimate,
demolish, destroy, devastate, exterminate,
extirpate, obliterate, ravage, raze, ruin, waste,
wreck
27
What a Waste
  • Waste
  • Arg0 destroyer
  • Arg1 thing destroyed
  • Arg2 instrument of destruction
  • He didnt waste any time distancing himself from
    his former boss
  • Arg0 He
  • Arg1 any time
  • Arg2 ? distancing himself...

28
Trends in Argument Numbering
  • Arg0 agent
  • Arg1 direct object / theme / patient
  • Arg2 indirect object / benefactive / instrument
    / attribute / end state
  • Arg3 start point / benefactive / instrument /
    attribute
  • Arg4 end point

29
Morphology
  • Verbs also marked for tense/aspect/voice
  • Passive/Active
  • Perfect/Progressive
  • Third singular (is has does was)
  • Present/Past/Future
  • Infinitives/Participles/Gerunds/Finites
  • Modals and negation marked as ArgMs

30
Annotation procedure
  • Extraction of all sentences with given verb
  • First pass Automatic tagging (Joseph
    Rosenzweig)
  • http//www.cis.upenn.edu/josephr/TIDES/index.html
    lexicon
  • Second pass Double blind hand correction
  • Variety of backgrounds
  • Less syntactic training than for treebanking
  • Tagging tool highlights discrepancies
  • Third pass Solomonization (adjudication)

31
Inter-Annotator Agreement
32
Annotator vs. Gold Standard
33
Financial Subcorpus Status
  • 1005 verbs framed (700 to go)
  • (742 363 VerbNet siblings)
  • 535 verbs first-passed
  • 22,595 unique tokens
  • Does not include 3000 tokens tagged for Senseval
  • 89 verbs second-passed
  • 7600 tokens
  • 42 verbs solomonized
  • 2890 tokens

34
Throughput
  • Framing approximately 25 verbs/week
  • Olga will also start framing joint up to 50
    verbs/wk
  • Annotation approximately 50 predicates/hour
  • 20 hours of annotation a week, 1000 predicates/wk
  • Solomonization approximately 1 hour per verb,
    but will speed up with lower frequency verbs.

35
Summary
  • Predicate-argument structure labels are arbitrary
    to a certain degree, but still consistent, and
    generic enough to be mappable to particular
    theoretical frameworks
  • Automatic tagging as a first pass makes the task
    feasible
  • Agreement and accuracy figures are reassuring
  • Financial subcorpus is 80 complete, beta-release
    June

36
Solomonization
  • Source tree Intel told analysts that the company
    will resume shipments of the chips within two to
    three weeks .
  • Kate said
  • arg0 Intel
  • arg1 the company will resume shipments of
    the chips within two to three weeks
  • arg2 analysts
  • Erwin said
  • arg0 Intel
  • arg1 that the company will resume shipments
    of the chips within two to three weeks
  • arg2 analysts

37
Solomonization
  • Such loans to Argentina also remain classified as
    non-accruing, TRACE-1 costing the bank 10
    million TRACE-U of interest income in the
    third period.
  • Kate said
  • arg1 TRACE-1
  • arg2 10 million TRACE-U of interest
    income
  • arg3 the bank
  • argM-TMP in the third period
  • Erwin said
  • arg1 TRACE-1 -gt Such loans to Argentina
  • arg2 10 million TRACE-U of interest
    income
  • arg3 the bank
  • argM-TMP in the third period

38
Solomonization
  • Also , substantially lower Dutch corporate tax
    rates helped the company keep its tax outlay flat
    relative to earnings growth.
  • Kate said
  • arg0 the company
  • arg1 its tax outlay
  • arg3-PRD flat
  • argM-MNR relative to earnings growth
  • Katherine said
  • arg0 the company
  • arg1 its tax outlay
  • arg3-PRD flat
  • argM-ADV relative to earnings growth
Write a Comment
User Comments (0)
About PowerShow.com