BayesANIL A Bayesian Model for Handling Approximate, Noisy or Incomplete Labeling in Text Classification - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

BayesANIL A Bayesian Model for Handling Approximate, Noisy or Incomplete Labeling in Text Classification

Description:

A Bayesian Model for Handling Approximate, Noisy or Incomplete Labeling in ... Used Matlab based SVM learner --http://www.igi.tugraz.at/aschwaig/software.html ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 27

Provided by: IBMU301

Category:

more less

Transcript and Presenter's Notes

Title: BayesANIL A Bayesian Model for Handling Approximate, Noisy or Incomplete Labeling in Text Classification

1
BayesANIL A Bayesian Model for Handling
Approximate, Noisy or Incomplete Labeling in Text
Classification

Ganesh Ramakrishnan (ganramkr_at_in.ibm.com)
Krishna Prasad Chitrapura (kchitrap_at_in.ibm.com)
Raghu Krishnapuram (kraghura_at_in.ibm.com)
Pushpak Bhattacharyya (pb_at_cse.iitb.ac.in)

2
Outline

Motivation
Related work
Role of BayesANIL in text classification setting
The BayesANIL model for learning
Use of BayesANIL parameters in classifiers
Experiments
Conclusions

3
Motivation - hurdles in supervised learning of
text classifiers

Approximations involved in manual labeling of
documents.
Noise in the labeling
In many scenarios, it is easy to generate a
labeled data set with some amount of noise in the
labeling (e.g., by querying the Web)
Learning from unlabeled documents
Can be looked upon as learning with incomplete
labeling

4
Related work

Learning from a mixture of positive and unlabeled
examples (Lee and Liu, 2003)
Our proposed method outperforms this technique.
Countering class noise by iterative removal of
training instances that can be potentially
misclassified under many models (Brodley and
Friedl,1996)
Does not handle approximations in the labeling
process.
Cost-sensitive learning algorithm (Domingos,
1999)
E.g. For data-sets with imbalanced classes
The proposed method is complementary to this work

5
Related work (contd.)

Generalization from few labeled examples
Learning with labeled and unlabeled data (Nigam
et al., 2000 Ando Zhang, 2004)
Feature smoothing techniques such as Laplace,
Lidstone and Jeffrey-Perks smoothing (Griffiths
Tenenbaum, 2001)
These techniques do not account for empirical
distribution of features in unlabeled documents
Probabilistic latent semantic analysis (Hofmann,
1999)
More suited for information retrieval

6
What we propose

A model that estimates the degree to which each
document d belongs to (or fits into) each class z
(Pr(d,z)).
Use this measure of Pr(d,z) to aid traditional
text classifiers (NB, SVM) to handle
Approximate, Noisy or Incomplete labeling of text
documents.
Pr(dz) can be used as a measure of support while
Pr(zd) can be used as a measure of confidence.

7
Role of BayesANIL in text classification
8
The BayesANIL model notations

is independent of given
A class generates document instances, each of
which is a bag of words
is computed as the
fraction of times word occurs across all
words in document
Observables
Parameters

9
The BayesANIL model notations

Scale each document to a common length to avoid
modeling doc length.
Observations n(w,d) become Pr(wd) when scaled to
unit length.
Use Empirical distribution
q(w,z) in place of n(w,z).

10
The BayesANIL model Objective function

Log-likelihood objective

More general form of the log-likelihood objective
(Amari, 1995).

11
The BayesANIL model E and M Steps

Condition for the maximum value of the objective
function is obtained by

12
The Algorithm
An EM iteration restructured for efficient
storage and computation
13
Re-estimating the empirical distribution

An optional E step
With a smoothing parameter
In the case of learning in presence of
classification noise, serves as an estimate
of the proportion of noise in the training data

14
Utilizing parameters of BayesANIL in NB

Improved estimation of NB parameter Pr(wz) based
on the degree to which the training documents
belong to each class.
We call this WeightedNB.
No explicit feature smoothing is performed.

Model Parameter
Model Parameter
15
Utilizing parameters of BayesANIL in SVM

Pr(d), computed from from Pr(d,z), is a measure
of support for how well the d is labeled.
Cost based SVM learners allow setting the cost of
misclassification for each document d.
Used Matlab based SVM learner --http//www.igi.tug
raz.at/aschwaig/software.html
Error correcting output code for handling
multiple classes.
We call the resultant classifier WeightedSVM.

16
Experiments and Results

Four types of experimental setups
Supervised Learning
Access to unlabeled examples
Learning in presence of noisy labels
Pr(d) as a measure of support
Two data sets
20 Newsgroups
WebKB
Data preparation
Rainbow to parse, tokenize and index the
documents
Stop words were not removed
No stemming was performed

17
Experiments and Results Supervised

Accuracies on 2 data sets with and without
Pr(d,z) estimates from BayesANIL.
We stop the EM iterations when change in the
log-likelihood difference of two successive
iterations is less than 0.01.
The smoothing parameter was set to k0.001.
Train to test ratio was 6040
Results reported on 20 random train-test splits

18
Experiments and Results Labeled-unlabeled

Setup similar to (Nigam et al., 2000)
We set aside 1 training, 10 test and the
unlabeled collection is built from the remaining
documents
We report accuracies on test data by varying the
number of unlabeled documents across two values
of k.

19
Experiments and Results Access to unlabeled for
WebKB
20
Experiments and Results Access to unlabeled for
20 Newsgroups
21
Experiments and Results Noisy Labels

Experimental setup as in (Lee Liu, 2003). 50
training, 20 validation (stopping criteria) and
30 testing.
Classification noise is a, we set ka to counter
the noisy labels.
The results tabulated are for WeightedSVM.

22
Comparison with results as reported by (Bing Liu
et al 2003)
Our results on F1
F1 Results reported by Bing Liu et al.
23
Experiments and Results Notion of Support

10 labeled, rest unlabeled.
30 classification noise in labeled set.
Mean and standard deviation of Pr(d) categories
of training documents based on original label z
Labeled Correct d is in labeled and
argmaxzPr(z,d)z.
Labeled Wrong d is in labeled and argmaxzPr(z,d)
gz.
Unlabeled Correct d is unlabeled and
argmaxzPr(z,d)z.
Unlabeled Wrong d is unlabeled
and argmaxzPr(z,d) gz.

24
Summary

EM based algorithm for estimating Pr(d,z)
provides measures of support and confidence
an effective way to assist (re)labeling of
documents.
An intuitive modification to E step to
re-estimate the empirical distribution, an
effective way to
reinforce feature values in the unlabeled data
and
reduce the influence of the noisily labeled
examples.
BayesANIL provides measures of confidence
Pr(zd) and support Pr(d).
Parameters of BayesANIL shown to improve the
classification accuracy of NB and SVM.
in presence/absence of noise.
with and without unlabeled documents.

25
Future work

Handling multi-labeled documents
Extending to information retrieval.
Extending the implementation to handle multiple
feature types such as links, titles, etc.

26
Thank you for your attention

Write a Comment

User Comments (0)