Semisupervised Classification

About This Presentation

Title:

Semisupervised Classification

Description:

... the affinity matrix W defined by Wij = exp(-||xi-xj||2 /2 2) if i j and Wii = 0. ... Not all unlabeled data fit in one class. Reference ... – PowerPoint PPT presentation

Number of Views:342

Avg rating:3.0/5.0

Slides: 44

Provided by: publi

Category:

more less

Transcript and Presenter's Notes

Title: Semisupervised Classification

1
Semi-supervised Classification
Jieping Ye Department of Computer Science and
Engineering Arizona State University http//www.pu
blic.asu.edu/jye02
2
Outline of lecture

Overview of semi-supervised clustering
What is semi-supervised classification?
Algorithms for semi-supervised classification
Graph mincuts
Harmonic approach
Consistency approach
Transductive SVM

3
Overview of semi-supervised clustering

Domain knowledge
Partial label information is given
Apply some constraints (must-links and
cannot-links)
Approaches
Search-based Semi-Supervised Clustering
Alter the clustering algorithm using the
constraints
Similarity-based Semi-Supervised Clustering
Alter the similarity measure based on the
constraints
Combination of both

4
What is semi-supervised classification?

Use small number of labeled data to label large
amount of unlabeled data.
Labeling is expensive
Basic idea
Similar data should have the same class label.
Typical example
Web page classification
Document classification
Protein classification

5
Problem setting

X x1..xl,xl1..xn ? Rm
Label set L 0,1
The first l points have been labeled yi ?0,1,
i1,..,l.
For points with il, yi is unknown.
The error is checked on the unlabeled examples
only.

6
Problem setting
Labeled data
Labels of labeled data (0 or 1)
Goal predict the labels of unlabeled data
X
y
unlabeled data
7
The cluster assumption

The basic assumption of most Semi-Supervised
learning algorithms
Nearby points are likely to have the same label.
Two points that are connected by a path going
through high density regions should have the same
label.

8
Cluster Assumption Example
9
Cluster Assumption Example
10
Cluster Assumption Example
11
Semi-supervised classification algorithms

Semi-supervised EM NigamML00
Co-training BlumCOLT98
Graph based algorithms BlumICML01,
JoachimsICML03,ZhuICML03,ZHOUNIPS03
Transductive SVM Vapnik98,JoachimsICML99

12
Graph based approaches

Construct the graph G (V,E) corresponding to
the n data points.
The nxn weight matrix W on this graph is given as

13
Construct the graph
W is the similarity matrix
14
Graph based algorithms

Graph based algorithms for semi-supervised
classification
Graph mincuts
Harmonic approach
Consistency approach
Many others
Basics
Build the weighted graph
Solve an optimization problem
Use objective function based on cluster
assumption

15
Graph mincuts

Paper Learning from Labeled and Unlabeled Data
using Graph Mincuts. Blum and Chawla, ICML2001.
Build a weighted graph G (V,W), where
Set
Determine a minimum cut for
the graph
Use the max-flow algorithm in which is
the source and is the sink, and the
weights are treated as capacities.

16
Graph mincuts
Solved by linear programming
17
Harmonic approach

Paper Semi-Supervised Learning Using Gaussian
Fields and Harmonic functions. Zhu and et al.
Basics
Build the weighted graph
The labels on the labeled data are fixed
Determine the labels of the unlabeled data based
on the cluster Assumption

18
Intuition
Non-differentiable
f discrete
Determine the labels via thresholding
The values of f on labeled data are fixed.
19
Main idea

Define a real-valued function f V ? R on G with
certain properties.
Goal determine the label of unlabeled data by f.
Intuition Nearby points in the graph have the
same label.

Optimization problem Compute optimal f such that
E(f) is minimized, subject to the constraint that
the values of f on labeled data are fixed.
20
Harmonic function

The optimization problem
The optimal solution f is harmonic

on unlabeled points
21
Optimal solution in matrix form
22
Comparison with graph mincuts

Label function f
Continuous versus discrete
Objective function
Similar cluster assumption
Difference differentiable versus
non-differentiable
Computation
Matrix computations versus linear programming
(max-flow algorithm)

23
Consistency approach

Paper Learning with local and global
consistency. Zhou and et al.
Key ideas
Use the labeled points as sources to pump the
different classes labels via the graph, and use
the new labeled points as additional source until
a stable stage has been reached.
The label of each unlabeled point is set to be
the class of which it has received most
information during the iteration process.

24
Notations

X x1..xl,xl1..xn ? Rm and Label set L
0,1
The first l points have been labeled yi ?0,1.
For points with il, yi is unknown.
The classification will be presented on a
non-negative vector F.
yi 0 if Fi 0.5.
Let Y be a vector with elements Yi 1 if point i
has a label yi 1 or 0 otherwise.
For multi-class problem, the classification will
be presented on an n x k non-negative matrix
F.
The classification of point xi will be yi
argmax j

25
Main idea
26
The Main Algorithm

Form the affinity matrix W defined by
Wij exp(-xi-xj2 /2?2) if i ?j
and Wii 0.
Compute the matrix S defined by S D-½ W D- ½
D is a diagonal matrix with its (i,i) element
equal to the sum of the i-th row of W. The
eigenvalues of S represents the spectral clusters
of the data.
Iterate F(t1) ?SF(t) (1-?)Y until
convergence. ??(0, 1).
Let F denote the limit of the sequence
F(t).
Label the unlabeled point xi by
yi 0 if Fi 0.5.

27
Consistency Algorithm Convergence

Show the algorithm convergence to
F (1-?)(I -?S)-1Y
Without loss of generality, let F(0) Y.
F(t1) ?SF(t) (1-?)Y
And therefore F(t) (?S)tY (1-?)?t-1i0
(?S)iY.

28
Consistency Algorithm Convergence

Show the algorithm convergence to F
(1-?)(I -?S)-1Y
F(t) (?S)tY (1-?)?t-1i0 (?S)iY.
Since 01
lim t?? (?S)t-1 0
lim t?? ?i0t-1 (?S)i (I -?S)-1
Hence F lim t?? F(t) (1-?)(I -?S)-1Y

29
Regularization Framework

Define a cost function for the iteration stage

The classifiying function is

smoothness constraint a good classifying
function should not change too much between
nearby points.

30
Regularization Framework

fitting constraint a good classifying function
should not change too much from the initial label
assignment.
? 0 Trade-off between constraints

31
Regularization Framework
32
Results Two Moon Toy Problem
33
Results Two Moon Toy Problem
34
Results Two Moon Toy Problem
35
Results Two Moon Toy Problem
36
Experiments
Source Learning with Local and Global
Consistency. Zhou and et al.
37
Discussion

Graph mincuts
Harmonic approach
Consistency approach
Objective function
Graph mincuts and harmonic approach preserve
labels of the labeled data
Labels are preserved
Consistency applies a penalty term for labeled
data
Labels may change

38
Semi-supervised classification algorithms

Semi-supervised EM NigamML00
Co-training BlumCOLT98
Graph based algorithms BlumICML2001,
JoachimsICML2003,ZhuICML2003,ZHOUNIPS2003
Transductive SVM Vapnik98,JoachimsICML99

39
Transductive SVM Formulation
40
Transductive SVM Intuition
41
Transductive SVM An extension
42
Reference

Learning from Labeled and Unlabeled Data using
Graph Mincuts
http//www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/av
rim/Papers/mincut.ps
Learning with Local and Global Consistency
http//www.kyb.mpg.de/publications/pdfs/pdf2333.pd
f
Semi-Supervised Learning Using Gaussian Fields
and Harmonic Functions
http//www.hpl.hp.com/conferences/icml2003/papers/
132.pdf
Transductive Inference for Text Classification
using Support Vector Machines
http//www.cs.cornell.edu/People/tj/publications/j
oachims_99c.pdf
Semi-Supervised Classification by Low Density
Separation
http//eprints.pascal-network.org/archive/00000388
/01/pdf2899.pdf

43
Next class

Topics
Feature reduction (PCA, CCA)
Readings
Geometric Methods for Feature Extraction and
Dimensional Reduction
http//www.public.asu.edu/jye02/CLASSES/Fall-2005
/PAPERS/Burge-featureextraction.pdf

Write a Comment

User Comments (0)

About PowerShow.com

Semisupervised Classification - PowerPoint PPT Presentation

Semisupervised Classification

... the affinity matrix W defined by Wij = exp(-||xi-xj||2 /2 2) if i j and Wii = 0. ... Not all unlabeled data fit in one class. Reference ... – PowerPoint PPT presentation