The use of unlabeled data to improve supervised learning for text summarization - PowerPoint PPT Presentation

About This Presentation
Title:

The use of unlabeled data to improve supervised learning for text summarization

Description:

Boils down to a passage classification/ranking problem. Major Contribution ... Non-trainable System: passage ranking. Trainable System: Na ve Bayes sentence classifier ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 17
Provided by: jone51
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: The use of unlabeled data to improve supervised learning for text summarization


1
The use of unlabeled data to improve supervised
learning for text summarization
  • MR Amini, P Gallinari (SIGIR 2002)

Slides prepared by Jon Elsas for the
Semi-supervised NL Learning Reading Group
2
Presentation Outline
  • Overview of Document Summarization
  • Major contribution Semi-Supervised Logistic
    Classification Maximum Likelihood summaries.
  • Evaluation
  • Baseline Systems
  • Results

3
Document Summarization
  • Motivation text volume gtgt users time
  • Single Document Summarization
  • Used for display of search results, automatic
    abstracting, browsing, etc.
  • Multi-Document Summarization
  • Describe clusters document collections, QA,
    etc.
  • Problem What is the summary used for? Does a
    generic summary exist?

4
Single Document Summarization example
5
Document Summarization
  • Generative Summaries
  • Synthetic text produced after analysis of high
    level linguistic features discourse, semantics,
    etc.
  • Hard.
  • Extract Summaries
  • Text excerpts (usually sentences) composed
    together to create summary
  • Boils down to a passage classification/ranking
    problem

6
Major Contribution
  • Semi-supervised Logistic Classifying Expectation
    Maximization (CEM) for passage classification
  • Advantage over other methods
  • Works on small set of labeled data large set of
    unlabeled data
  • No modeling assumptions for density estimation
  • Cons
  • (probably) slow no performance numbers given

7
Expectation Maximization (EM)
  • Finds maximum likelihood estimates of parameters
    when underlying distribution depends on
    unobserved latent variables.
  • Maximizes model fit to data distribution
  • Criterion function

8
Classifying EM (CEM)
  • Like EM, with the addition of an indicator
    variable for component membership.
  • Maximizes quality of clustering
  • Criterion function

9
Semi-supervised generative-CEM
  • Fix component membership for labeled data.
  • Criterion function

Labeled Data
Unlabeled Data
10
Semi-supervised logistic-CEM
  • Use discriminative classifier (logistic) instead
    of generative.
  • M-step, need to re-do gradient descent to
    estimate ßs

Labeled Data
Unlabeled Data
11
Evaluation
  • Algorithm evaluated against 3 other
    single-document summarization algorithms
  • Non-trainable System passage ranking
  • Trainable System Naïve Bayes sentence classifier
  • Generative-CEM (using full Gaussians)
  • Precision/Recall with regard to gold-standard
    extract summaries
  • The fine print
  • All systems used similar representation
    schemes, but not the same

12
Baseline System Sentence Ranking
  • Rank sentences, using a TF-IDF similarity measure
    with query expansion (Sim2)
  • Blind-relevance feedback from the top sentences
  • WordNet similarity thesaurus
  • Generic query created with the most frequent
    words in the training set.

13
Naïve Bayes Model Sentence Classification
  • Simple Naïve Bayes classifier trained on 5
    features
  • Sentence length lt tlength 0,1
  • Sentence contains cue words 0,1
  • Sentence query similarity (Sim2) gt tsim 0,1
  • Upper-case/Acronym features (count?)
  • Sentence/paragraph position in text 1, 2, 3

14
Logistic-CEM Sentence Representation Features
  • Features used to train Logistic-CEM
  • Normalized sentence length 0, 1
  • Normalized cue word frequency 0, 1
  • Sentence Query Similarity (Sim2) 0, 8)
  • Normalized acronym frequency 0, 1
  • Sentence/paragraph position in text 1, 2, 3
  • (All of the binary features converted to
    continuous.)

15
Results on Reuters dataset
16
Results on Reuters dataset
Write a Comment
User Comments (0)
About PowerShow.com