Semi-supervised Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Semi-supervised Learning

Description:

A toy problem. Each node in the graph is an example. Two examples are labeled ... Train a hyperlink based classifier using both the labeled web pages ... – PowerPoint PPT presentation

Number of Views:297
Avg rating:3.0/5.0
Slides: 33
Provided by: rong7
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Semi-supervised Learning


1
Semi-supervised Learning
  • Rong Jin

2
Semi-supervised learning
  • Label propagation
  • Transductive learning
  • Co-training
  • Active learing

3
Label Propagation
Two labeled examples
  • A toy problem
  • Each node in the graph is an example
  • Two examples are labeled
  • Most examples are unlabeled
  • Compute the similarity between examples wij
  • Connect examples to their most similar examples
  • How to predicate labels for unlabeled nodes using
    this graph?

wij
Unlabeled example
4
Label Propagation
  • Forward propagation

5
Label Propagation
  • Forward propagation
  • Forward propagation

6
Label Propagation
  • Forward propagation
  • Forward propagation
  • Forward propagation
  • How to resolve conflicting cases

What label should be given to this node ?
7
Energy Minimization
  • Labels Y?0,1n
  • wi,j similarity between the i-th example and
    j-th example
  • Energy
  • Goal find label assignment Y that is consistent
    with labeled examples and meanwhile minimize the
    energy function E(Y)

wi,j
8
Energy Minimization
Final classification results
9
Label Propagation
  • How the unlabeled data help classification?

10
Label Propagation
  • How the unlabeled data help classification?
  • Consider a smaller number of unlabeled example
  • Classification results can be very different

11
Cluster Assumption
  • Cluster assumption
  • Decision boundary should pass low density area
  • Unlabeled data provide more accurate estimation
    of local density

12
Optical Character Recognition
  • Given an image of a digit letter, determine its
    value

13
Optical Character Recognition
  • Labeled_ExamplesUnlabeled_Examples 4000
  • CMN label propagation
  • 1NN for each unlabeled example, using the label
    of its closest neighbor

14
Cluster Assumption vs. Maximum Margin
  • Maximum margin classifier (e.g. SVM)

w?xb
  • Maximum margin
  • ? low density around decision boundary
  • ? Cluster assumption
  • Any thought about utilizing the unlabeled data in
    support vector machine?

15
Transductive SVM
  • Decision boundary given a small number of labeled
    examples

16
Transductive SVM
  • Decision boundary given a small number of labeled
    examples
  • How will the decision boundary change given both
    labeled and unlabeled examples?

17
Transductive SVM
  • Decision boundary given a small number of labeled
    examples
  • Move the decision boundary to place with low
    local density

18
Transductive SVM
  • Decision boundary given a small number of labeled
    examples
  • Move the decision boundary to place with low
    local density
  • Classification results
  • How to formulate this idea?

19
Transductive SVM Formulation
  • Labeled data L
  • Unlabeled data D
  • Maximum margin principle for mixture of labeled
    and unlabeled data
  • For each label assignment of unlabeled data,
    compute its maximum margin
  • Find the label assignment whose maximum margin is
    maximized

20
Tranductive SVM
Different label assignment for unlabeled data ?
different maximum margin
21
Transductive SVM Formulation
Another Quadratic Programming Problem
22
Empirical Study with Transductive SVM
  • 10 categories from the Reuter collection
  • 3299 test documents
  • 1000 informative words selected using MI criterion

23
Co-training for Semi-supervised Learning
  • Consider the task of classifying web pages into
    two categories category for students and
    category for professors
  • Two aspects of web pages should be considered
  • Content of web pages
  • I am currently the second year Ph.D. student
  • Hyperlinks
  • My advisor is
  • Students

24
Co-training for Semi-Supervised Learning
25
Co-training for Semi-Supervised Learning
It is more easy to classify this web page using
hyperlinks
It is easy to classify the type of this web page
based on its content
26
Co-training
  • Two representation for each web page

Content representation (doctoral, student,
computer, university)
Hyperlink representation Inlinks Prof.
Cheng Oulinks Prof. Cheng
27
Co-training
  • Classifying scheme
  • Train a content-based classifier using labeled
    web pages
  • Apply the content-based classifier to classify
    unlabeled web pages
  • Label the web pages that have been confidently
    classified
  • Train a hyperlink based classifier using both the
    labeled web pages
  • Apply the hyperlink-based classifier to classify
    the unlabeled web pages
  • Label the web pages that have been confidently
    classified

28
Co-training
  • Train a content-based classifier

29
Co-training
  • Train a content-based classifier using labeled
    examples
  • Label the unlabeled examples that are confidently
    classified

30
Co-training
  • Train a content-based classifier using labeled
    examples
  • Label the unlabeled examples that are confidently
    classified
  • Train a hyperlink-based classifier
  • Prof. outlinks to students and inlinks from
    students

31
Co-training
  • Train a content-based classifier using labeled
    examples
  • Label the unlabeled examples that are confidently
    classified
  • Train a hyperlink-based classifier
  • Prof. outlinks to students and inlinks from
    students
  • Label the unlabeled examples that are confidently
    classified

32
Co-training
  • Train a content-based classifier using labeled
    examples
  • Label the unlabeled examples that are confidently
    classified
  • Train a hyperlink-based classifier
  • Prof. outlinks to students and inlinks from
    students
  • Label the unlabeled examples that are confidently
    classified
Write a Comment
User Comments (0)
About PowerShow.com