Learning from Labeled and Unlabeled Data using Graph Mincuts - PowerPoint PPT Presentation

About This Presentation
Title:

Learning from Labeled and Unlabeled Data using Graph Mincuts

Description:

Learning from Labeled and Unlabeled Data using Graph Mincuts ... 83.3. 87.0. 56.9. 88.7. MUSH* 91.1. 97.0. 97.7. 97.7. MUSH. 3-NN. Mincut- 1/2. Mincut- 0 Mincut ... – PowerPoint PPT presentation

Number of Views:195
Avg rating:3.0/5.0
Slides: 13
Provided by: shu48
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning from Labeled and Unlabeled Data using Graph Mincuts


1
Learning from Labeled and Unlabeled Data using
Graph Mincuts
  • Avrim Blum and Shuchi Chawla
  • May 24, 2001

2
Utilizing unlabeled data
  • Cheap and available in large amounts
  • Gives no obvious information about classification
  • Gives information about distribution of examples
  • Useful with a prior
  • Our prior close examples have a similar
    classification

3
Classification using Graph Mincut
4
Why not nearest neighbor?
5
Why not nearest neighbor?
Classification by 1-nearest neighbor
6
Why not nearest neighbor?
Classification by Graph Mincut
7
Self-consistent classification
  • Mincut minimizes leave-one-out cross validation
    error of nearest neighbor
  • May not be the best classification
  • But, theoretically interesting!

8
Assigning edge weights
  • Several approaches
  • Decreasing function in distance
  • eg. Exponential decrease with appropriate slope
  • Unit weights but connect only nearby nodes
  • How near is near?
  • Connect every node to k-nearest nodes
  • What is a good value of k?
  • Need an appropriate distance metric

9
How near is near?
  • All pairs within ? distance are connected
  • Need a method of finding a good ?
  • As ? increases, cut value increases
  • Cut value 0 ? supposedly no-error
    situation
  • (Mincut- ?0)

10
  • Mincut- ?0 does not allow for noise in the
    dataset
  • Allow longer distance dependencies
  • Grow ? till the graph becomes sufficiently well
    connected
  • Growing till the largest component contains half
    the nodes seems to work well (Mincut- ?½ )

11
Other hacks
  • Weigh edges to labeled and unlabeled examples
    differently
  • Weigh different attributes differently
  • eg. Use information gain as in decision trees
  • Weigh edges to positive and negative example
    differently for a more balanced cut
  • Use mincut value as an indicator of performance

12
Some results
Dataset Mincut- ?opt Mincut- ?0 Mincut- ?1/2 3-NN
MUSH 97.7 97.7 97.0 91.1
MUSH 88.7 56.9 87.0 83.3
VOTING 91.3 66.1 83.3 89.6
PIMA 72.3 48.8 72.3 68.1
Write a Comment
User Comments (0)
About PowerShow.com