Semi-supervised%20Learning - PowerPoint PPT Presentation

About This Presentation

Title:

Semi-supervised%20Learning

Description:

Semi-supervised Learning Rong Jin – PowerPoint PPT presentation

Number of Views:323

Avg rating:3.0/5.0

Slides: 43

Provided by: rongjin

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Semi-supervised%20Learning

1
Semi-supervised Learning

Rong Jin

2
Semi-supervised learning

Label propagation
Transductive learning
Co-training
Active learning

3
Label Propagation
Two labeled examples

A toy problem
Each node in the graph is an example
Two examples are labeled
Most examples are unlabeled
Compute the similarity between examples Sij
Connect examples to their most similar examples
How to predicate labels for unlabeled nodes using
this graph?

wij
Unlabeled example
4
Label Propagation

Forward propagation

5
Label Propagation

Forward propagation
Forward propagation

6
Label Propagation

Forward propagation
Forward propagation
Forward propagation
How to resolve conflicting cases

What label should be given to this node ?
7
Label Propagation

Let S be the similarity matrix SSi,jnxn
Let D be a diagonal matrix where Di åi ¹ j Si,j
Compute normalized similarity matrix S
SD-1/2SD-1/2
Let Y be the initial assignment of class labels
Yi 1 when the i-th node is assigned to the
positive class
Yi -1 when the i-th node is assigned to the
negative class
Yi 0 when the I-th node is not initially
labeled
Let F be the predicted class labels
The i-th node is assigned to the positive class
if Fi gt0
The i-th node is assigned to the negative class
if Fi lt 0

8
Label Propagation

Let S be the similarity matrix SSi,jnxn
Let D be a diagonal matrix where Di åi ¹ j Si,j
Compute normalized similarity matrix S
SD-1/2SD-1/2
Let Y be the initial assignment of class labels
Yi 1 when the i-th node is assigned to the
positive class
Yi -1 when the i-th node is assigned to the
negative class
Yi 0 when the i-th node is not initially
labeled
Let F be the predicted class labels
The i-th node is assigned to the positive class
if Fi gt0
The i-th node is assigned to the negative class
if Fi lt 0

9
Label Propagation

One iteration
F Y aSY (I aS)Y
a weights the propagation values
Two iteration
F Y aSY a2S2Y (I aS a2S2)Y
How about the infinite iteration
F (ån01anSn)Y (I - a S)-1Y
Any problems with such an approach?

10
Label Consistency Problem

Predicted vector F may not be consistent with the
initially assigned class labels Y

11
Energy Minimization

Using the same notation
Si,j similarity between the I-th node and j-th
node
Y initially assigned class labels
F predicted class labels
Energy E(F) åi,jSi,j(Fi Fj)2
Goal find label assignment F that is consistent
with labeled examples Y and meanwhile minimizes
the energy function E(F)

12
Harmonic Function

E(F) åi,jSi,j (Fi Fj)2 FT(D-S)F
Thus, the minimizer for E(F) should be (D-S)F
0, and meanwhile F should be consistent with Y.
FT (FlT, FuT), YT (YlT, YuT)
Fl Yl

13
Optical Character Recognition

Given an image of a digit letter, determine its
value

14
Optical Character Recognition

Labeled_ExamplesUnlabeled_Examples 4000

CMN label propagation
1NN for each unlabeled example, using the label
of its closest neighbor

15
Spectral Graph Transducer

Problem with harmonic function
Why this could happen ?
The condition (D-S)F 0 does not hold for
constrained cases

16
Spectral Graph Transducer

Problem with harmonic function
Why this could happen ?
The condition (D-S)F 0 does not hold for
constrained cases

17
Spectral Graph Transducer

minF FTLF c (F-Y)TC(F-Y)
s.t. FTFn, FTe 0
C is the diagonal cost matrix, Ci,i 1 if the
i-th node is initially labeled, zero otherwise
Parameter c controls the balance between the
consistency requirement and the requirement of
energy minimization
Can be solved efficiently through the computation
of eigenvector

18
Empirical Studies
19
Problems with Spectral Graph Transducer

minF FTLF c (F-Y)TC(F-Y)
s.t. FTFn, FTe 0
The obtained solution is different from the
desirable one minimize the energy function and
meanwhile is consistent with labeled examples Y
It is difficult to extend the approach to
multi-class classification

20
Greens Function

The problem of minimizing energy and meanwhile
being consistent with initially assigned class
labels can be formulated into Greens function
problem
Minimizing E(F) FTLF ? LF 0
Turns out L can be viewed as Laplacian operator
in the discrete case
LF 0 ? r2F0
Thus, our problem is find solution F
r2F0, s.t. F Y for labeled examples
We can treat the constraint that F Y for
labeled examples as boundary condition (Von
Neumann boundary condition)
A standard Green function problem

21
Why Energy Minimization?
Final classification results
22
Label Propagation

How the unlabeled data help classification?

23
Label Propagation

How the unlabeled data help classification?

Consider a smaller number of unlabeled example

Classification results can be very different

24
Cluster Assumption

Cluster assumption
Decision boundary should pass low density area
Unlabeled data provide more accurate estimation
of local density

25
Cluster Assumption vs. Maximum Margin

Maximum margin classifier (e.g. SVM)

w?xb

Maximum margin
? low density around decision boundary
? Cluster assumption

Any thought about utilizing the unlabeled data in
support vector machine?

26
Transductive SVM

Decision boundary given a small number of labeled
examples

27
Transductive SVM

Decision boundary given a small number of labeled
examples
How will the decision boundary change given both
labeled and unlabeled examples?

28
Transductive SVM

Decision boundary given a small number of labeled
examples
Move the decision boundary to place with low
local density

29
Transductive SVM

Decision boundary given a small number of labeled
examples
Move the decision boundary to place with low
local density
Classification results
How to formulate this idea?

30
Transductive SVM Formulation

Labeled data L
Unlabeled data D
Maximum margin principle for mixture of labeled
and unlabeled data
For each label assignment of unlabeled data,
compute its maximum margin
Find the label assignment whose maximum margin is
maximized

31
Tranductive SVM
Different label assignment for unlabeled data ?
different maximum margin
32
Transductive SVM Formulation
33
Computational Issue

No longer convex optimization problem. (why?)
How to optimize transductive SVM?
Alternating optimization

34
Alternating Optimization

Step 1 fix yn1,, ynm, learn weights w
Step 2 fix weights w, try to predict yn1,,
ynm (How?)

35
Empirical Study with Transductive SVM

10 categories from the Reuter collection
3299 test documents
1000 informative words selected using MI criterion

36
Co-training for Semi-supervised Learning

Consider the task of classifying web pages into
two categories category for students and
category for professors
Two aspects of web pages should be considered
Content of web pages
I am currently the second year Ph.D. student
Hyperlinks
My advisor is
Students

37
Co-training for Semi-Supervised Learning
38
Co-training for Semi-Supervised Learning
It is easier to classify this web page using
hyperlinks
It is easy to classify the type of this web page
based on its content
39
Co-training

Two representation for each web page

Content representation (doctoral, student,
computer, university)
Hyperlink representation Inlinks Prof.
Cheng Oulinks Prof. Cheng
40
Co-training Classification Scheme

Train a content-based classifier using labeled
web pages
Apply the content-based classifier to classify
unlabeled web pages
Label the web pages that have been confidently
classified
Train a hyperlink based classifier using the web
pages that are initially labeled and labeled by
the classifier
Apply the hyperlink-based classifier to classify
the unlabeled web pages
Label the web pages that have been confidently
classified

41
Co-training