Title: Proposition Bank: a resource of predicate-argument relations
1Proposition Bank a resource of
predicate-argument relations
Martha Palmer, Dan Gildea, Paul
Kingsbury University of Pennsylvania February
26, 2002 ACE PI Meeting, Fairfield Inn, MD
2Outline
- Overview
- Status Report
- Outstanding Issues
- Automatic Tagging Dan Gildea
- Details Paul Kingsbury
- Frames files
- Annotator issues
- Demo
3Proposition BankGeneralizing from Sentences to
Propositions
meet(Somebody1, Somebody2)
. . .
When Powell met Zhu Rongji on Thursday they
discussed the return of the spy
plane. meet(Powell, Zhu) discuss(Powell,
Zhu, return(X, plane))
4Penn English Treebank
- 1.3 million words
- Wall Street Journal and other sources
- Tagged with Part-of-Speech
- Syntactically Parsed
- Widely used in NLP community
- Available from Linguistic Data Consortium
5A TreeBanked Sentence
S
VP
NP-SBJ
Analysts
NP
S
VP
NP-SBJ
T-1
would
NP
PP-LOC
6The same sentence, PropBanked
have been expecting
Arg1
Arg0
Analysts
7English PropBank
- 1M words of Treebank over 2 years, May01-03
- New semantic augmentations
- Predicate-argument relations for verbs
- label arguments Arg0, Arg1, Arg2,
- First subtask, 300K word financial subcorpus
- (12K sentences, 29K predicates)
- Spin-off Guidelines (necessary for annotators)
- English lexical resource FRAMES FILES
- 3500 verbs with labeled examples, rich semantics
- http//www.cis.upenn.edu/ace/
8English PropBank Current Status
- Frames files
- 742 verb lemmas (includes phrasal variants - 932)
- 363/899 VerbNet semi-automatic expansions
(subtask/PB) - First subtask 300K financial subcorpus
- 22,595K unique predicates annotated out of 29K,
(80) - 6K remaining (7 weeks, 1000_at_week, first pass)
- 1005 verb lemmas out of 1700 (59)
- 700 remaining (3.5 months, 200_at_month)
- PropBank, (including some of Brown?)
- 34,437 predicates annotated out of 118K, (29)
- 1904 (1005 899) verb lemmas out of 3500, (54)
9Projected delivery dates
- Financial subcorpus
- alpha release December, 2001
- beta release June, 2002
- adjudicated release Dec, 2002
- Propbank
- alpha release December, 2002
- beta release Spring, 2003
10English PropBank - Status
- Sense tagging
- 200 verbs with multiple rolesets
- sense tag this summer with undergrads using NSF
funds - Still need to address
- 3 usages of "have imperative, possessive,
auxiliary - be, become predicate adjectives, predicate
nominals
11 Automatic Labeling of Semantic Relations
- Features
- Predicate
- Phrase Type
- Parse Tree Path
- Position (Before/after predicate)
- Voice (active/passive)
- Head Word
12Example with Features
13Labelling Accuracy-Known Boundaries
Accuracy of semantic role prediction for known
boundaries--the system is given the constituents
to classify. Framenet examples (training/test)
are handpicked to be unambiguous.
14Labelling Accuracy Unknown Boundaries
Accuracy of semantic role prediction for unknown
boundaries--the system must identify the
constituents as arguments and give them the
correct roles.
15Complete Sentence
Analysts have been expecting a GM-Jaguar pact
that T-1 would give the U.S. car maker an
eventual 30 stake in the British company and
create joint ventures that T-2 would produce an
executive-model range of cars.
expect(analysts, pact) give(pact,
car_maker,stake) create(pact,joint_ventures) produ
ce(joint_ventures,range_of_cars)
16 Guidelines Frames Files
- Created manually - Paul Kingsbury
- new framer Olga Babko-Malaya, (Ph.D.,Rugters,
Linguistics) - Refer to VerbNet, WordNet and Framenet
- Currently in place for 787/986 verbs
- Use "semantic role glosses" unique to each verb
(map to Arg0, Arg1 labels appropriate to class)
17Frames Example expect
Roles Arg0 expecter Arg1 thing
expected Example Transitive, active
Portfolio managers expect further declines in
interest rates. Arg0 Portfolio
managers REL expect
Arg1 further declines in interest rates
18Frames example give
- Roles
- Arg0 giver
- Arg1 thing given
- Arg2 entity given to
- Example double object
- The executives gave the chefs a standing
ovation. - Arg0 The executives
- REL gave
- Arg2 the chefs
- Arg1 a standing ovation
19How are arguments numbered?
- Examination of example sentences
- Determination of required / highly preferred
elements - Sequential numbering, Arg0 is typical first
argument, except - ergative/unaccusative verbs (shake example)
- Arguments mapped for "synonymous" verbs
20Additional tags (arguments or adjuncts?)
- Variety of ArgM?s (Arggt4)
- TMP - when?
- LOC - where at?
- DIR - where to?
- MNR - how?
- PRP -why?
- REC - himself, themselves, each other
- PRD -this argument refers to or modifies another
- ADV -others
21Ergative/Unaccusative Verbs rise
- Roles
- Arg1 Logical subject, patient, thing rising
- Arg2 EXT, amount risen
- Arg3 start point
- Arg4 end point
- Sales rose 4 to 3.28 billion from 3.16 billion.
Note Have to mention prep explicitly,
Arg3-from, Arg4-to, or could have used
ArgM-Source, ArgM-Goal. Arbitrary distinction.
22Synonymous Verbs add in sense rise
- Roles
- Arg1 Logical subject, patient, thing
rising/gaining/being added to - Arg2 EXT, amount risen
- Arg4 end point
- The Nasdaq composite index added 1.01 to
456.6 on paltry volume.
23Phrasal Verbs
- Put together
- Put in
- Put off
- Put on
- Put out
- Put up
- ...
- Accounts for additional 200 "verbs"
24Frames Multiple Rolesets
- Rolesets are not necessarily consistent between
different senses of the same verb - Verb with multiple senses can have multiple
frames, but not necessarily - Roles and mappings onto argument labels are
consistent between different verbs that share
similar argument structures, Similar to Framenet - Levin / VerbNet classes
- http//www.cis.upenn.edu/dgildea/Verbs/
- Out of the 787 most frequent verbs
- 1 Roleset - 521
- 2 rolesets - 169
- 3 rolesets - 97 (includes light verbs)
25 Semi-automatic expansion of Frames
- Experimenting with semi-automatic expansion
- Find unframed members of Levin class in
VerbNet--inherit frames from other member - 787 verbs manually framed
- Can expand to 1200 using VerbNet
- Will need hand correction
- First experiment, automatic expansion provided
90 coverage of data
26More on Automatic Expansion
Destroy Arg0 destroyer Arg1 thing
destroyed Arg2 instrument of destruction Verbne
t class Destroy-44 annihilate, blitz, decimate,
demolish, destroy, devastate, exterminate,
extirpate, obliterate, ravage, raze, ruin, waste,
wreck
27What a Waste
- Waste
- Arg0 destroyer
- Arg1 thing destroyed
- Arg2 instrument of destruction
- He didnt waste any time distancing himself from
his former boss - Arg0 He
- Arg1 any time
- Arg2 ? distancing himself...
28Trends in Argument Numbering
- Arg0 agent
- Arg1 direct object / theme / patient
- Arg2 indirect object / benefactive / instrument
/ attribute / end state - Arg3 start point / benefactive / instrument /
attribute - Arg4 end point
29Morphology
- Verbs also marked for tense/aspect/voice
- Passive/Active
- Perfect/Progressive
- Third singular (is has does was)
- Present/Past/Future
- Infinitives/Participles/Gerunds/Finites
- Modals and negation marked as ArgMs
30Annotation procedure
- Extraction of all sentences with given verb
- First pass Automatic tagging (Joseph
Rosenzweig) - http//www.cis.upenn.edu/josephr/TIDES/index.html
lexicon - Second pass Double blind hand correction
- Variety of backgrounds
- Less syntactic training than for treebanking
- Tagging tool highlights discrepancies
- Third pass Solomonization (adjudication)
31Inter-Annotator Agreement
32Annotator vs. Gold Standard
33Financial Subcorpus Status
- 1005 verbs framed (700 to go)
- (742 363 VerbNet siblings)
- 535 verbs first-passed
- 22,595 unique tokens
- Does not include 3000 tokens tagged for Senseval
- 89 verbs second-passed
- 7600 tokens
- 42 verbs solomonized
- 2890 tokens
34Throughput
- Framing approximately 25 verbs/week
- Olga will also start framing joint up to 50
verbs/wk - Annotation approximately 50 predicates/hour
- 20 hours of annotation a week, 1000 predicates/wk
- Solomonization approximately 1 hour per verb,
but will speed up with lower frequency verbs.
35Summary
- Predicate-argument structure labels are arbitrary
to a certain degree, but still consistent, and
generic enough to be mappable to particular
theoretical frameworks - Automatic tagging as a first pass makes the task
feasible - Agreement and accuracy figures are reassuring
- Financial subcorpus is 80 complete, beta-release
June
36Solomonization
- Source tree Intel told analysts that the company
will resume shipments of the chips within two to
three weeks . - Kate said
- arg0 Intel
- arg1 the company will resume shipments of
the chips within two to three weeks - arg2 analysts
- Erwin said
- arg0 Intel
- arg1 that the company will resume shipments
of the chips within two to three weeks - arg2 analysts
37Solomonization
- Such loans to Argentina also remain classified as
non-accruing, TRACE-1 costing the bank 10
million TRACE-U of interest income in the
third period. - Kate said
- arg1 TRACE-1
- arg2 10 million TRACE-U of interest
income - arg3 the bank
- argM-TMP in the third period
- Erwin said
- arg1 TRACE-1 -gt Such loans to Argentina
- arg2 10 million TRACE-U of interest
income - arg3 the bank
- argM-TMP in the third period
38Solomonization
- Also , substantially lower Dutch corporate tax
rates helped the company keep its tax outlay flat
relative to earnings growth. - Kate said
- arg0 the company
- arg1 its tax outlay
- arg3-PRD flat
- argM-MNR relative to earnings growth
- Katherine said
- arg0 the company
- arg1 its tax outlay
- arg3-PRD flat
- argM-ADV relative to earnings growth