Text Classification With Support Vector Machines - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Text Classification With Support Vector Machines

Description:

Text Classification With Support Vector Machines. Presenter: Aleksandar Milisic ... Support Vector Machines. Co-Training Algorithm (Blum and Mitchell, 1998) ... – PowerPoint PPT presentation

Number of Views:228
Avg rating:3.0/5.0
Slides: 13
Provided by: milisical
Category:

less

Transcript and Presenter's Notes

Title: Text Classification With Support Vector Machines


1
Text Classification With Support Vector Machines
Presenter Aleksandar Milisic Supervisor Dr.
David Albrecht
2
Overview
  • Text Classification What and Why?
  • Text Clustering
  • Support Vector Machines
  • Current Techniques
  • Project Aim and Plan

3
Text Classification What and Why?
  • Text Classification assigning documents to
    predefined classes (categories).
  • Example Web pages can be assigned to politics,
    sport, business, entertainment etc.
  • There are thousands of categories associated with
    web pages.
  • Labeling manually is time-consuming and sometimes
    impossible the process needs to be automated!

4
Text Classification What and Why?
  • Automated text classifiers need to be able to
    learn from
  • Small set of labeled documents
  • Large set of unlabeled documents
  • Otherwise a lot of labeling would have to be
    done by humans
  • So how is it done?

5
Representing Text
1 Companies
3 Document
0 Distance
. . . . . .
1 Offices
0 Unix
0 Match
With paperless offices becoming more common,
companies start using document databases with
classification schemes
Feature Vector
6
Clustering
Feature Vectors
1 2
0 4
1 0
Labeled documents Unlabeled documents

7
Support Vector Machines (SVM)
  • Binary Classifiers
  • Maximizes distance between two classes (finds
    Optimal Separating Hyperplane OSH)
  • Support Vectors are closest to OSH

OSH
Class1
Not Class 1
Support Vectors
8
Current Techniques
  • Clustering Methods
  • Rasmussens Single Pass Algorithm (as described
    by Raskutti et al. (2002))
  • Reallocation Method
  • Hierarchical Methods
  • Classification Methods
  • Support Vector Machines
  • Co-Training Algorithm (Blum and Mitchell, 1998)
  • Raskutti et al. (2002) describe an interesting
    approach combining SVMs with Rasmussens
    clustering algorithm

9
Combining SVM With Clustering
Added
Features
Labeled documents (Class 1) Labeled documents
(Not Class 1) Unlabeled documents Support
Vectors Separating Hyperplane

10
Project Aim
  • Resolve following issues
  • Can combining SVMs with other techniques
    improve performance?
  • Documents have thousands of features
  • Can different feature representation (selection)
    techniques improve performance without affecting
    accuracy?
  • Documents can belong to multiple classes but
    SVMs
  • are binary classifiers!

11
Project Plan
  • Currently implementing clustering technique
    described in Raskutti et al. (2002)
  • Plan to implement other clustering techniques
  • Investigate different feature representation
    (selection) techniques
  • For example, different weights for words in
    different positions in document
  • Investigate multi-class problem

12
References
  • Blum, A. and T. Mitchell (1998). Combining
    labeled and unlabeled data with co-training.
  • In COLT Proceedings of the Workshop on
    Computational Learning Theory, Morgan Kaufmann
    Publishers
  • Raskutti, B., H. Ferra, and A. Kowalczyk (2002).
    Using unlabeled data for text classification
    through addition of cluster parameters.
  • In International Conference on Machine
    Learning (Accepted)
Write a Comment
User Comments (0)
About PowerShow.com