Text Classification from Labeled and Unlabeled Documents using EM - PowerPoint PPT Presentation

About This Presentation
Title:

Text Classification from Labeled and Unlabeled Documents using EM

Description:

Text Classification from Labeled and Unlabeled Documents using EM [ Kamal Nigal, Andrew McCallum, Sebastian Thrun, Tom Mitchell, 1999 ] Eleni Foteinopoulou s0969664 – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 20
Provided by: Ganes66
Category:

less

Transcript and Presenter's Notes

Title: Text Classification from Labeled and Unlabeled Documents using EM


1
Text Classification from Labeled and Unlabeled
Documents using EM
Kamal Nigal, Andrew McCallum, Sebastian Thrun,
Tom Mitchell, 1999
  • Eleni Foteinopoulou s0969664
  • Efthymios Kouloumpis s0928744

2
Overview
  • Introduction
  • Motivation
  • Naïve Bayes Learning
  • Combination of NB and EM
  • EM Extensions
  • Experiments
  • Summary

3
Text Classification
Bag of Words
4
Need for an intermediate approach
  • Unsupervised and Supervised learning
  • Unsupervised learning
  • collection of documents without any labels
  • easy to collect, free, inexpensive, large pool
  • Supervised learning
  • each object tagged with a class
  • laborious job, time-consuming process
  • Semi-supervised learning
  • Real life applications

5
Challenges
  • How to reduce the number of labeled examples?
  • Can unlabeled examples increase the
    classification accuracy?
  • Any ideas...?
  • Semi-Supervised Learning

6
Motivation
  • Document collection D
  • A subset (with ) has known
    labels
  • Goal to label the rest of the collection.
  • Approach
  • Train a supervised learner using , the
    labeled subset. ? NB
  • Apply the trained learner on the remaining
    documents. ? EM
  • Idea
  • Harness information from unlabeled subset.

7
The Generative Model
  • Probabilistic generative model
  • Every document Probability distribution
  • Assumptions
  • Mixture model
  • One to one correspondence between
    mixture components and classes
  • Document length distribution

8
Naïve Bayes Learning
  • Assign each document to a particular mixture
    component.
  • The parameters of an individual mixture
    component form a multinomial distribution over
    words
  • Estimate model parameters ? maximum a posteriori
    estimation

9
Naïve Bayes Learning
  • Maximum a posteriori estimate of the model
    parameters given a small set of labeled data ?
    high variance
  • How to improve parameter estimates?
  • Incorporate unlabeled documents

10
EM Algorithm
  • Iterative algorithm for parameter estimation
    (maximum a posteriori)
  • Incomplete data ? missing labels
  • Estimate parameters ? from labeled subset
  • Iterate
  • E step calculate probabilistic labels for the
    unlabeled documents using current parameter
    estimate ?.
  • M step maximize the complete likelihood ? new
    maximum a posteriori estimate using current
    estimates of
  • Continue till convergence ? ? local max.

11
EM Issues
  • Generative model vs. real-world text data
  • Mixture model - One to one correspondence
    between mixture components and classes
  • Same parameter model as used in
  • classification ? violation
  • Word conditional independence
  • NB assumption
  • Extreme class probability estimates

12
EM Extensions
  • Real world data?
  • Weighting factor
  • Multiple mixture components

13
EM Reducing belief in unlabeled data
  • Problems due to unlabeled data
  • Noise in term distribution of documents in
  • Mistakes in E-step
  • Solution
  • attenuate the contribution from documents in
  • Add a damping factor ae0,1, in E Step for
    contribution from

14
EM Modeling labels using many mixture components
  • Previous extension ? reduces effect of mixture
    model assumption
  • Goal Relax assumption of one to one
    correspondence between mixture components and
    class labels.
  • Introduce many to one mapping ? missing values D
  • E.g. For two class case football vs. not
    football
  • Documents not about football are actually about
    a variety of other things

15
EM Modeling labels using many mixture components
  • Lower accuracy with one mixture component per
    label ? not naturally modeled
  • Higher accuracy with more mixture components per
    label ? word dependencies
  • Over fitting and poor performance with too large
    mixture components

16
Experiments
  • Unlabeled Data EM (Newsgroup articles)

17
Experiments
  • Unlabeled Data EM (web pages)

18
Experiments
  • Varying the weights on Unlabeled Data

19
Summary - Conclusions
  • Labels are expensive
  • Unlabeled data supplement scarce labeled data
  • reduce classification error up to 30
  • Data inconsistency with generative model
    assumptions
  • Extensions of EM
  • Weighted unlabeled data prevents decrease of
    accuracy
  • Many to one mixture components.
  • Future Work
  • Incremental learning algorithm using unlabeled
    data of test phase
Write a Comment
User Comments (0)
About PowerShow.com