LexPageRank: Prestige in Multi-Document Text Summarization - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

LexPageRank: Prestige in Multi-Document Text Summarization

Description:

... from and taking the prestige of the voting node into account in weight each node Eigenvector centrality and LexPageRank PageRank (Page et al., 1998) ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 23
Provided by: nlg
Category:

less

Transcript and Presenter's Notes

Title: LexPageRank: Prestige in Multi-Document Text Summarization


1
LexPageRank Prestige in Multi-Document Text
Summarization
  • Gunes Erkan and Dragomir R. Radev
  • Department of EECS, School of Information
  • University of Michigan
  • ACL 2004

2
Abstract
  • This paper consider an approach for computing
    sentence importance based on the concept of
    eigenvector centrality (prestige) LexPageRank
  • In this model, a sentence connectivity matrix is
    constructed based on cosine similarity
  • The experimental results using DUC2004 show that
    this approach outperforms centroid-based
    summarization and is quite successful compared to
    other summarization systems

3
Introduction
  • Text summarization is the process of
    automatically creating a compressed version of a
    given text that provides useful information for
    the user
  • This summarization approach is to assess the
    centrality of each sentence in a cluster and
    include the most important ones in the summary
  • Introduce two new measures for centrality, Degree
    and LexPageRank, inspired from the prestige
    concept in social networks

4
Sentence centrality and centroid-based
summarization
  • Extractive summarization produces summaries by
    choosing a subset of the sentences in the
    original documents
  • Centrality of a sentence is often defined in
    terms of the centrality of the words that it
    contains
  • The centroid of a cluster is a psuedo-document
    which consists of words that have frequencyIDF
    scores above a predefined threshold
  • In centroid-based summarization (Radevet et al.,
    2000), the sentences that contain more words from
    the centroid of the cluster are considered
    central
  • Centroid-based summarization has given promising
    results in the past

5
Prestige-based sentence centrality
  • We hypothesize that the sentences that are
    similar to many of the other sentences in a
    cluster are more central (or prestigious) to the
    topic
  • There are two issues
  • How to define similarity between two sentences
  • Cosine
  • How to compute the overall prestige of a sentence
    given its similarity to other sentences
  • Degree centrality
  • Eigenvector centrality and LexPageank

6
Prestige-based sentence centrality
  • A cluster may be represented by a cosine
    similarity matrix

7
Prestige-based sentence centrality
Most of them are nonzero
8
Prestige-based sentence centrality
  • Degree centrality
  • Since we are interested in significant
    similarities in the matrix, we can eliminate some
    low values by defining a threshold , so that the
    cluster can be view as an undirected graph
  • We define degree centrality as the degree of each
    node in the similarity graph

9
Prestige-based sentence centrality
10
Prestige-based sentence centrality
11
Prestige-based sentence centrality
  • Issue for degree centrality
  • Several unwanted sentences vote for each and
    raise their prestige
  • This situation can be avoided by considering
    where the votes come from and taking the prestige
    of the voting node into account in weight each
    node
  • Eigenvector centrality and LexPageRank
  • PageRank (Page et al., 1998) is a method propose
    for assigning a prestige score to each page in
    the web independent of a specific query
  • Depending on the number of pages that link to
    that pages as well as the individual score of the
    linking pages

12
Prestige-based sentence centrality
  • The PageRank of Page A
  • This recursively defined value can be computed by
    forming the binary adjacency matrix of the web,
    normalizing this matrix so that row sums equal to
    1, and finding the principal eigenvector of the
    normalized matrix
  • PageRank for ith pages equals to the ith entry in
    the eigenvector

T1,,Tn pages that link to page A d damping
factor, C(Ti) the number of outgoing links from
page Ti
13
Prestige-based sentence centrality
  • This method can be easily applied to the cosine
    similarity graph to find the most prestigious
    sentences in a document
  • We called this new measure of sentence similarity
    LexPageRank

14
Prestige-based sentence centrality
damping factor 1
15
Prestige-based sentence centrality
  • Advantage over Centroid
  • It accounts for information subsumption among
    sentences
  • It prevents unnaturally high IDF scores from
    boosting up the score of a sentence that is
    unrelated to the topic

16
Experiments on DUC 2004 data
  • DUC 2004 data was used in our experiments
  • Task 2 involves summarization of 50 TDT English
    clusters
  • Task 4 is to produce summaries of machine
    translation output (in English) of 24 Arabic TDT
    documents
  • Recall-based measure Rouge is adopted and
    665-byte summaries for each cluster are produced

17
Experiments on DUC 2004 data
  • MEAD summarization toolkit
  • Extractive multi-document summarization
  • Consist of three components
  • Feature extractor (document -gt feature vector)
  • Centroid, Position and Length
  • Combiner (feature vector -gt scalar value)
  • Reranker (the scores are adjusted upward or
    downward)
  • MMR (Maximum Margin Relevance), CSIS
    (Cross-Sentence Information Subsumption)

weight
Threshold
18
Experiments on DUC 2004 data
Centroid
19
Experiments on DUC 2004 data
20
Experiments on DUC 2004 data
21
Experiments on DUC 2004 data
22
Conclusions
  • A novel approach to define sentence centrality
    based on graph-based prestige scoring of
    sentences
  • We have introduced two different methods, Degree
    and LexPageRank , for computing prestige in
    similarity graph
  • The experimental results is quite promising
  • Even the simplest approach, degree centrality, is
    good enough heuristic to perform better than
    lead-based and centroid-based summaries
Write a Comment
User Comments (0)
About PowerShow.com