BayesANIL A Bayesian Model for Handling Approximate, Noisy or Incomplete Labeling in Text Classification - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

BayesANIL A Bayesian Model for Handling Approximate, Noisy or Incomplete Labeling in Text Classification

Description:

A Bayesian Model for Handling Approximate, Noisy or Incomplete Labeling in ... Used Matlab based SVM learner --http://www.igi.tugraz.at/aschwaig/software.html ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 27
Provided by: IBMU301
Category:

less

Transcript and Presenter's Notes

Title: BayesANIL A Bayesian Model for Handling Approximate, Noisy or Incomplete Labeling in Text Classification


1
BayesANIL A Bayesian Model for Handling
Approximate, Noisy or Incomplete Labeling in Text
Classification
  • Ganesh Ramakrishnan (ganramkr_at_in.ibm.com)
  • Krishna Prasad Chitrapura (kchitrap_at_in.ibm.com)
  • Raghu Krishnapuram (kraghura_at_in.ibm.com)
  • Pushpak Bhattacharyya (pb_at_cse.iitb.ac.in)

2
Outline
  • Motivation
  • Related work
  • Role of BayesANIL in text classification setting
  • The BayesANIL model for learning
  • Use of BayesANIL parameters in classifiers
  • Experiments
  • Conclusions

3
Motivation - hurdles in supervised learning of
text classifiers
  • Approximations involved in manual labeling of
    documents.
  • Noise in the labeling
  • In many scenarios, it is easy to generate a
    labeled data set with some amount of noise in the
    labeling (e.g., by querying the Web)
  • Learning from unlabeled documents
  • Can be looked upon as learning with incomplete
    labeling

4
Related work
  • Learning from a mixture of positive and unlabeled
    examples (Lee and Liu, 2003)
  • Our proposed method outperforms this technique.
  • Countering class noise by iterative removal of
    training instances that can be potentially
    misclassified under many models (Brodley and
    Friedl,1996)
  • Does not handle approximations in the labeling
    process.
  • Cost-sensitive learning algorithm (Domingos,
    1999)
  • E.g. For data-sets with imbalanced classes
  • The proposed method is complementary to this work

5
Related work (contd.)
  • Generalization from few labeled examples
  • Learning with labeled and unlabeled data (Nigam
    et al., 2000 Ando Zhang, 2004)
  • Feature smoothing techniques such as Laplace,
    Lidstone and Jeffrey-Perks smoothing (Griffiths
    Tenenbaum, 2001)
  • These techniques do not account for empirical
    distribution of features in unlabeled documents
  • Probabilistic latent semantic analysis (Hofmann,
    1999)
  • More suited for information retrieval

6
What we propose
  • A model that estimates the degree to which each
    document d belongs to (or fits into) each class z
    (Pr(d,z)).
  • Use this measure of Pr(d,z) to aid traditional
    text classifiers (NB, SVM) to handle
    Approximate, Noisy or Incomplete labeling of text
    documents.
  • Pr(dz) can be used as a measure of support while
    Pr(zd) can be used as a measure of confidence.

7
Role of BayesANIL in text classification
8
The BayesANIL model notations
  • is independent of given
  • A class generates document instances, each of
    which is a bag of words
  • is computed as the
    fraction of times word occurs across all
    words in document
  • Observables
  • Parameters

9
The BayesANIL model notations
  • Scale each document to a common length to avoid
    modeling doc length.
  • Observations n(w,d) become Pr(wd) when scaled to
    unit length.
  • Use Empirical distribution
  • q(w,z) in place of n(w,z).

10
The BayesANIL model Objective function
  • Log-likelihood objective
  • More general form of the log-likelihood objective
    (Amari, 1995).

11
The BayesANIL model E and M Steps
  • Condition for the maximum value of the objective
    function is obtained by

12
The Algorithm
An EM iteration restructured for efficient
storage and computation
13
Re-estimating the empirical distribution
  • An optional E step
  • With a smoothing parameter
  • In the case of learning in presence of
    classification noise, serves as an estimate
    of the proportion of noise in the training data

14
Utilizing parameters of BayesANIL in NB
  • Improved estimation of NB parameter Pr(wz) based
    on the degree to which the training documents
    belong to each class.
  • We call this WeightedNB.
  • No explicit feature smoothing is performed.

Model Parameter
Model Parameter
15
Utilizing parameters of BayesANIL in SVM
  • Pr(d), computed from from Pr(d,z), is a measure
    of support for how well the d is labeled.
  • Cost based SVM learners allow setting the cost of
    misclassification for each document d.
  • Used Matlab based SVM learner --http//www.igi.tug
    raz.at/aschwaig/software.html
  • Error correcting output code for handling
    multiple classes.
  • We call the resultant classifier WeightedSVM.

16
Experiments and Results
  • Four types of experimental setups
  • Supervised Learning
  • Access to unlabeled examples
  • Learning in presence of noisy labels
  • Pr(d) as a measure of support
  • Two data sets
  • 20 Newsgroups
  • WebKB
  • Data preparation
  • Rainbow to parse, tokenize and index the
    documents
  • Stop words were not removed
  • No stemming was performed

17
Experiments and Results Supervised
  • Accuracies on 2 data sets with and without
    Pr(d,z) estimates from BayesANIL.
  • We stop the EM iterations when change in the
    log-likelihood difference of two successive
    iterations is less than 0.01.
  • The smoothing parameter was set to k0.001.
  • Train to test ratio was 6040
  • Results reported on 20 random train-test splits

18
Experiments and Results Labeled-unlabeled
  • Setup similar to (Nigam et al., 2000)
  • We set aside 1 training, 10 test and the
    unlabeled collection is built from the remaining
    documents
  • We report accuracies on test data by varying the
    number of unlabeled documents across two values
    of k.

19
Experiments and Results Access to unlabeled for
WebKB
20
Experiments and Results Access to unlabeled for
20 Newsgroups
21
Experiments and Results Noisy Labels
  • Experimental setup as in (Lee Liu, 2003). 50
    training, 20 validation (stopping criteria) and
    30 testing.
  • Classification noise is a, we set ka to counter
    the noisy labels.
  • The results tabulated are for WeightedSVM.

22
Comparison with results as reported by (Bing Liu
et al 2003)
Our results on F1
F1 Results reported by Bing Liu et al.
23
Experiments and Results Notion of Support
  • 10 labeled, rest unlabeled.
  • 30 classification noise in labeled set.
  • Mean and standard deviation of Pr(d) categories
    of training documents based on original label z
  • Labeled Correct d is in labeled and
    argmaxzPr(z,d)z.
  • Labeled Wrong d is in labeled and argmaxzPr(z,d)
    gz.
  • Unlabeled Correct d is unlabeled and
    argmaxzPr(z,d)z.
  • Unlabeled Wrong d is unlabeled
  • and argmaxzPr(z,d) gz.

24
Summary
  • EM based algorithm for estimating Pr(d,z)
  • provides measures of support and confidence
  • an effective way to assist (re)labeling of
    documents.
  • An intuitive modification to E step to
    re-estimate the empirical distribution, an
    effective way to
  • reinforce feature values in the unlabeled data
    and
  • reduce the influence of the noisily labeled
    examples.
  • BayesANIL provides measures of confidence
    Pr(zd) and support Pr(d).
  • Parameters of BayesANIL shown to improve the
    classification accuracy of NB and SVM.
  • in presence/absence of noise.
  • with and without unlabeled documents.

25
Future work
  • Handling multi-labeled documents
  • Extending to information retrieval.
  • Extending the implementation to handle multiple
    feature types such as links, titles, etc.

26
Thank you for your attention
Write a Comment
User Comments (0)
About PowerShow.com