Title: The use of unlabeled data to improve supervised learning for text summarization
1The use of unlabeled data to improve supervised
learning for text summarization
- MR Amini, P Gallinari (SIGIR 2002)
Slides prepared by Jon Elsas for the
Semi-supervised NL Learning Reading Group
2Presentation Outline
- Overview of Document Summarization
- Major contribution Semi-Supervised Logistic
Classification Maximum Likelihood summaries. - Evaluation
- Baseline Systems
- Results
3Document Summarization
- Motivation text volume gtgt users time
- Single Document Summarization
- Used for display of search results, automatic
abstracting, browsing, etc. - Multi-Document Summarization
- Describe clusters document collections, QA,
etc. - Problem What is the summary used for? Does a
generic summary exist?
4Single Document Summarization example
5Document Summarization
- Generative Summaries
- Synthetic text produced after analysis of high
level linguistic features discourse, semantics,
etc. - Hard.
- Extract Summaries
- Text excerpts (usually sentences) composed
together to create summary - Boils down to a passage classification/ranking
problem
6Major Contribution
- Semi-supervised Logistic Classifying Expectation
Maximization (CEM) for passage classification - Advantage over other methods
- Works on small set of labeled data large set of
unlabeled data - No modeling assumptions for density estimation
- Cons
- (probably) slow no performance numbers given
7Expectation Maximization (EM)
- Finds maximum likelihood estimates of parameters
when underlying distribution depends on
unobserved latent variables. - Maximizes model fit to data distribution
- Criterion function
8Classifying EM (CEM)
- Like EM, with the addition of an indicator
variable for component membership. - Maximizes quality of clustering
- Criterion function
9Semi-supervised generative-CEM
- Fix component membership for labeled data.
- Criterion function
Labeled Data
Unlabeled Data
10Semi-supervised logistic-CEM
- Use discriminative classifier (logistic) instead
of generative. - M-step, need to re-do gradient descent to
estimate ßs
Labeled Data
Unlabeled Data
11Evaluation
- Algorithm evaluated against 3 other
single-document summarization algorithms - Non-trainable System passage ranking
- Trainable System Naïve Bayes sentence classifier
- Generative-CEM (using full Gaussians)
- Precision/Recall with regard to gold-standard
extract summaries - The fine print
- All systems used similar representation
schemes, but not the same
12Baseline System Sentence Ranking
- Rank sentences, using a TF-IDF similarity measure
with query expansion (Sim2) - Blind-relevance feedback from the top sentences
- WordNet similarity thesaurus
- Generic query created with the most frequent
words in the training set.
13Naïve Bayes Model Sentence Classification
- Simple Naïve Bayes classifier trained on 5
features - Sentence length lt tlength 0,1
- Sentence contains cue words 0,1
- Sentence query similarity (Sim2) gt tsim 0,1
- Upper-case/Acronym features (count?)
- Sentence/paragraph position in text 1, 2, 3
14Logistic-CEM Sentence Representation Features
- Features used to train Logistic-CEM
- Normalized sentence length 0, 1
- Normalized cue word frequency 0, 1
- Sentence Query Similarity (Sim2) 0, 8)
- Normalized acronym frequency 0, 1
- Sentence/paragraph position in text 1, 2, 3
- (All of the binary features converted to
continuous.)
15Results on Reuters dataset
16Results on Reuters dataset