Title: Human summary production operations for computer-aided summarisation
1Human summary production operations for
computer-aided summarisation
- Laura Hasler
- University of Wolverhampton
- 30 May 2007
2Overview
- Original contributions of my thesis
- Human summarisation (HS)
- Automatic summarisation (AS)
- Computer-aided summarisation (CAS)
- Classification of human summary production
operations - Guidelines derived from the classification
- Evaluation of guidelines and classification
3Original contributions
- Reliable ways of creating abstracts from
extracts, improving coherence/readability - Set of guidelines to annotate source texts for
important information resulting in extracts for
corpus of extract/abstract pairs - Corpus of extract/abstract pairs for analysis
- Corpus-based classification of human summary
production operations that successfully transform
extracts into abstracts by improving coherence
and readability
4Original contributions 2
- Set of summary production guidelines derived from
classification which can be issued to users of a
CAS system - Development of Centering Theory (Grosz, Joshi
Weinstein 1995) as evaluation metric due to
unsuitable existing methods - Evaluation of coherence and readability of
abstracts produced using summary production
operations ? therefore of guidelines and
operations themselves
5Human summarisation 3 stages(Endres-Niggemeyer
1998)
- Document exploration summariser explores layout
and organisation of document to identify position
of important information - Relevance assessment summariser assesses
information in document to see if it is relevant
to summary by recognising the theme (what it is
about) - Summary production summariser cuts and pastes
relevant information from document and edits it
to form a coherent summary
6Automatic summarisation
- Extracting
- Units extracted from source verbatim ? problems
with coherence, unnecessary info - Methods can be easily used across domains
- Currently more popular CAST
- Abstracting
- Additional knowledge can be used ? concepts
- Not restricted to linguistic realisation of
source ? more coherent and concise - Needs knowledge base ? domain dependent
7Computer-aided summarisation
- A feasible alternative to fully automatic
summarisation given current technology problems
of coherence and readability with automatic
extracts - Uses automatic summarisation methods to produce
an extract (stages 12) then post-edited by human
summariser/user (stage 3) - Focus of this research on post-editing (extract ?
abstract) to improve coherence/readability
8Aim of the research
- A) Chernobyl reactor number 4 was ripped apart by
an explosion on 26 April 1986. Last September,
the IAEA and the WHO released a report. Its
headline conclusion that radiation from the
accident would kill a total of 4000 people was
widely reported. - B) Last September, the IAEA/WHO released a report
on the explosion of Chernobyl reactor number 4 on
26 April 1986, concluding that radiation from the
accident would kill a total of 4000 people.
(h03-ljh)
9How can we consistently transform extracts into
abstracts?
- Guidelines available for other aspects/types of
summarisation - Investigation of what exactly a human summariser
does to get from an extract to an abstract (and
improve coherence) - Corpus to allow analysis and classification
- Set of guidelines derived from classification
- Application and evaluation of classification/
guidelines to prove they work
10Corpus of extract/abstract pairs
- 43 pairs of news texts (extract, abstract)
- Source texts manually annotated for important
information - higher quality - Annotated using adapted CAST guidelines (Hasler
et al. 2003) 30 extracts produced - Extracts transformed into 20 abstracts - no
guidelines given
11Classification of operations
- 5 general classes of operations
- Atomic and complex
- Atomic deletion, insertion
- Complex replacement, reordering, merging
- Each split into sub-operations (26 in total)
- Sub-operations linked to triggers, or
recognisable surface forms - Function of units also important
12Classification
- Atomic operations and sub-operations
- Deletion complete sentences, subordinate
clauses, PPs, adverb phrases, reporting clauses,
NPs, determiners, the verb be, specially
formatted text, punctuation - Insertion connectives, formulaic units,
modifiers, punctuation
13Classification 2
- Complex operations and sub-operations
- Replacement pronominalisation, lexical
substitution, NP restructuring, nominalisation,
referred sentences, VPs, passivisation,
abbreviations - Reordering emphasising, coherence
- Merging clause/sentence restructuring,
punctuation/connectives
14Deletion
- The process of removing a unit from a certain
place in the extract so it does not appear in the
same place in the abstract - Used alone or as part of complex operations
- Very useful for reducing text when used alone
- Deletes non-essential units e.g. details,
repetitions - Complete sentences, subordinate clauses, PPs,
reporting clauses, determiners, be
15Deletion examples
- I suspect that the set would be the ideal book
for a physicist to be cast away with on a desert
island. (new-sci-B7L-54-ljh) - Three papers published recently in Science move
us a little closer to understanding the basis of
the disease, which turns out to be highly
complex. (sci04done-an) - Britain is among the front runners as
tomorrows supercomputers take shape.
(sci05done-an)
16Insertion
- The process of adding a unit which is not
present in the extract into the abstract - Used alone or as part of complex operations
- Interesting because it adds text to something
which is supposed to be reduced - Used to add coherence and to clarify whilst
saving space - Connectives, modifiers, formulaic units,
punctuation
17Insertion examples
- He sees the need to raise public awareness and
demystify science and technology as a key point
(new-sci-B7L-75-ljh) X sees Y as Z - The TV series Men of Science is now being shown
in a few other areas. (new-sci-B7L-69-ljh)
18Replacement
- The deletion of one unit and the insertion of a
different one in the same place in the text - Complex operation, can be used in combination
with other complex operations - Useful for avoiding repetition and saving space
- Pronominalisation, lexical substitution, NP
restructuring, nominalisation, VPs,
passivisation, abbreviations
19Replacement examples
- Zhanat Carr, a radiation scientist with the WHO
in Geneva, The WHO says admits the 5000 deaths
were omitted because the report was a "political
communication tool". (h03-ljh) - All this is hardly Culvers fault. The same
difficulties are to be found in all other parts
of evolutionary ecology. ? These general
difficulties of evolutionary ecology are hardly
Culvers fault. (new-sci-B7L-63-ljh)
20Reordering
- The deletion of a unit from one place in the
extract and its insertion in a different place in
the abstract - Complex operation, can be used in combination
with other complex operations - Sub-functions rather than operations difficult
to sub-classify - Emphasises information, improves coherence and
readability
21Reordering example
- Text about worlds second face transplant, all
other sentences about a specific person/
operation S2 ? last sentence - Experts predict the number of these operations
will rise rapidly as centres around the world
gear up to perform the procedure. (h01-ljh)
22Merging
- Taking information from different units in the
extract and presenting them as one unit in the
abstract - All other operations can be used
- Large class, most difficult to sub-classify
anything (appropriate) goes! - Best embodies abstracting as opposed to
extracting conciseness - Restructuring of clauses/sentences, punctuation/
connectives
23Merging example
- In October 1980 Zuccarelli filed an expensive
European patent application, covering nine
countries including Britain . The cost of
pushing a European patent through in nine
countries is around 10000. The cost of
application alone is around 2000 and Zuccarelli
has already paid an extra 500 for a further
stage of official examination. (new-sci-B7K-37)
24Evaluation
- Applied guidelines to a different set of extracts
- 25 human-produced extracts corresponding
abstracts - 25 automatically produced extracts
corresponding abstracts - Developed Centering Theory as an evaluation
method due to unsuitability of existing methods
25Centering Theory (CT) (Grosz, Joshi Weinstein
1995)
- Theory of local coherence and salience
- Accounts for coherence using repetitions of
entities across consecutive utterances (Cfs, Cps,
Cbs) - Uses the relationship between repetitions to
derive transitions (position in utterance) - Transitions are ordered in preference from most
to least coherent (continue, retain, smooth
shift, rough shift, no transition/no Cb)
26Centering Theory an example
- JohnCp went to his favorite music store to buy
a piano. - HeCp, Cb had frequented the store for many
years. - HeCp, Cb was excited that he could finally
buy a piano. - HeCp, Cb arrived just as the store was
closing for the day. - Continue, continue, continue
- JohnCp went to his favorite music store to buy
a piano. - ItCp was a store JohnCb had frequented for
many years. - HeCp, Cb was excited that he could finally
buy a piano. - ItCp was closing just as JohnCb arrived.
- Retain, continue, retain
- (Grosz, Joshi Weinstein 1995 206)
27Centering Theory a real example
- 1. (Everybody)Cp should be ready for
((Monday)'s national championship game), despite
(casualties in ((Saturday night)'s NCAA semifinal
battles)). ? no transition (indirect) - 2. (Jason Terry of (Arizona))Cp, Cb was
injured. ? retain - 3. (We)Cp were going to put (him)Cb in late
in (the game), said (Arizona coach (Lute
Olson)). ? rough shift - 4. (He)Cp had played a lot before (that), of
course, but when (we)'re protecting (a lead),
(we)Cb like getting (four perimeter guys) in
there and (that) gives (us) (another ball
handler), gives (us) (another free throw
shooter). ? retain - 5. (Kentucky coach (Rick Pitino))Cp predicted
that ((Monday)'s championship game) would be also
be physical, in view of (((Kentucky)'s all-out
pressure defence) and ((Arizona)Cb's blazing
speed)).
28CT evaluation metric
Transition Weight
Continue 3
Retain 2
No transition (indirect) 1
Smooth shift -1
Rough shift -2
No transition (no Cb) -5
29Evaluation 2
- Human judgment obtained to complement CT
- Overall, human summary production operations
improve texts CT 78 Judge 82 - Agreement between CT and judge 70
- Classification and resulting guidelines can be
reliably used during post-editing in CAS - CT is useful as an evaluation method
30Directions for future work
- To use more human summarisers/judges to further
validate classification/guidelines - To further explore/improve CT for evaluation
- To investigate the feasibility of automating
certain elements of summary production operations
for CAS - To look at scientific texts (also popular in AS)