LexPageRank: Prestige in Multi-Document Text Summarization

About This Presentation

Title:

LexPageRank: Prestige in Multi-Document Text Summarization

Description:

... from and taking the prestige of the voting node into account in weight each node Eigenvector centrality and LexPageRank PageRank (Page et al., 1998) ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 23

Provided by: nlg

Category:

more less

Transcript and Presenter's Notes

Title: LexPageRank: Prestige in Multi-Document Text Summarization

1
LexPageRank Prestige in Multi-Document Text
Summarization

Gunes Erkan and Dragomir R. Radev
Department of EECS, School of Information
University of Michigan
ACL 2004

2
Abstract

This paper consider an approach for computing
sentence importance based on the concept of
eigenvector centrality (prestige) LexPageRank
In this model, a sentence connectivity matrix is
constructed based on cosine similarity
The experimental results using DUC2004 show that
this approach outperforms centroid-based
summarization and is quite successful compared to
other summarization systems

3
Introduction

Text summarization is the process of
automatically creating a compressed version of a
given text that provides useful information for
the user
This summarization approach is to assess the
centrality of each sentence in a cluster and
include the most important ones in the summary
Introduce two new measures for centrality, Degree
and LexPageRank, inspired from the prestige
concept in social networks

4
Sentence centrality and centroid-based
summarization

Extractive summarization produces summaries by
choosing a subset of the sentences in the
original documents
Centrality of a sentence is often defined in
terms of the centrality of the words that it
contains
The centroid of a cluster is a psuedo-document
which consists of words that have frequencyIDF
scores above a predefined threshold
In centroid-based summarization (Radevet et al.,
2000), the sentences that contain more words from
the centroid of the cluster are considered
central
Centroid-based summarization has given promising
results in the past

5
Prestige-based sentence centrality

We hypothesize that the sentences that are
similar to many of the other sentences in a
cluster are more central (or prestigious) to the
topic
There are two issues
How to define similarity between two sentences
Cosine
How to compute the overall prestige of a sentence
given its similarity to other sentences
Degree centrality
Eigenvector centrality and LexPageank

6
Prestige-based sentence centrality

A cluster may be represented by a cosine
similarity matrix

7
Prestige-based sentence centrality
Most of them are nonzero
8
Prestige-based sentence centrality

Degree centrality
Since we are interested in significant
similarities in the matrix, we can eliminate some
low values by defining a threshold , so that the
cluster can be view as an undirected graph
We define degree centrality as the degree of each
node in the similarity graph

9
Prestige-based sentence centrality
10
Prestige-based sentence centrality
11
Prestige-based sentence centrality

Issue for degree centrality
Several unwanted sentences vote for each and
raise their prestige
This situation can be avoided by considering
where the votes come from and taking the prestige
of the voting node into account in weight each
node
Eigenvector centrality and LexPageRank
PageRank (Page et al., 1998) is a method propose
for assigning a prestige score to each page in
the web independent of a specific query
Depending on the number of pages that link to
that pages as well as the individual score of the
linking pages

12
Prestige-based sentence centrality

The PageRank of Page A
This recursively defined value can be computed by
forming the binary adjacency matrix of the web,
normalizing this matrix so that row sums equal to
1, and finding the principal eigenvector of the
normalized matrix
PageRank for ith pages equals to the ith entry in
the eigenvector

T1,,Tn pages that link to page A d damping
factor, C(Ti) the number of outgoing links from
page Ti
13
Prestige-based sentence centrality

This method can be easily applied to the cosine
similarity graph to find the most prestigious
sentences in a document
We called this new measure of sentence similarity
LexPageRank

14
Prestige-based sentence centrality
damping factor 1
15
Prestige-based sentence centrality

Advantage over Centroid
It accounts for information subsumption among
sentences
It prevents unnaturally high IDF scores from
boosting up the score of a sentence that is
unrelated to the topic

16
Experiments on DUC 2004 data

DUC 2004 data was used in our experiments
Task 2 involves summarization of 50 TDT English
clusters
Task 4 is to produce summaries of machine
translation output (in English) of 24 Arabic TDT
documents
Recall-based measure Rouge is adopted and
665-byte summaries for each cluster are produced

17
Experiments on DUC 2004 data

MEAD summarization toolkit
Extractive multi-document summarization
Consist of three components
Feature extractor (document -gt feature vector)
Centroid, Position and Length
Combiner (feature vector -gt scalar value)
Reranker (the scores are adjusted upward or
downward)
MMR (Maximum Margin Relevance), CSIS
(Cross-Sentence Information Subsumption)

weight
Threshold
18
Experiments on DUC 2004 data
Centroid
19
Experiments on DUC 2004 data
20
Experiments on DUC 2004 data
21
Experiments on DUC 2004 data
22
Conclusions

A novel approach to define sentence centrality
based on graph-based prestige scoring of
sentences
We have introduced two different methods, Degree
and LexPageRank , for computing prestige in
similarity graph
The experimental results is quite promising
Even the simplest approach, degree centrality, is
good enough heuristic to perform better than
lead-based and centroid-based summaries