Metric Learning by Collapsing Classes - PowerPoint PPT Presentation

About This Presentation
Title:

Metric Learning by Collapsing Classes

Description:

Good metric same thing as good features. Metrics should be problem specific ... Duality and Optimization. There is a dual form, but it's not useful ... – PowerPoint PPT presentation

Number of Views:191
Avg rating:3.0/5.0
Slides: 14
Provided by: Dumi4
Category:

less

Transcript and Presenter's Notes

Title: Metric Learning by Collapsing Classes


1
Metric Learning by Collapsing Classes
  • Amir Globerson
  • Sam Roweis

Presented by Dumitru Erhan
2
Motivation
  • Good metric same thing as good features
  • Metrics should be problem specific
  • i.e., no one size fits all Euclidean metric
  • For instance
  • Same features images
  • Two tasks face recognition and gender
    identification
  • More insights into the structure of the data,
    better visualization, etc.
  • A propos feature extraction ¼ metric learning

3
What is a good metric?
  • Elements in same class close
  • Elements in different classes far
  • So, why not make
  • Same class at zero distance and
  • Different classes at infinite distance?
  • Ideal case of spectral clustering same thing

4
Technically speaking
  • n examples, (xi,yi), xi 2 Rr, yi 2 1k
  • Ideally, we look for W s.t. after Wx the metric
    is good
  • d(xi,xjA) dAij (xi - xj)TA(xi - xj), A is
    PSD
  • A WTW
  • We define pA(ji) e- dAij/Zi
  • where Zi ?k? ie- dAik
  • Ideally, p0(ji) / 1, if yi yj and 0 otherwise

5
Technically speaking part II
  • So we want to minimize (wrt A) the following
  • f(A) ?i KLp0(ji) pA(ji) , s.t. A is PSD
  • This is convex!
  • A0, A1 A ?A0 (1 - ?)A1, s.t. 0 ? 1
  • Objective function
  • f(A) -?i,jyj yilog pA(ji) -?i,jyj
    yidAij ?ilog Zi
  • dAij is linear ?ilog Zi is convex

6
Duality and Optimization
  • There is a dual form, but its not useful
  • Entropy maximization problem O(n2)
  • Optimizing the primal O(r2/2)
  • Some optimization details
  • Initialize to some random matrix
  • Take small step in the -direction of the gradient
  • Project back into the PSD cone (remove negative
    eigenvalues)

7
Dimensionality Reduction and Kernels
  • Rank of A ? dimension of the projection
  • Rank constraints not convex
  • But we could find A as above
  • Keep the components with the q largest eigs
  • Not the same result as rank constraint
  • But they say its still good
  • Kernels objective function
  • freg(A) ?i KLp0(ji) pA(ji) ?Tr(A)

8
Results I
  • Setup
  • UCI, USPS digits, YALE faces
  • Learn a metric
  • 1-NN
  • Compared with
  • Fishers LDA
  • Xing et als method (minimizes mean within class
    distance, while keeping between class distance
    larger than one)
  • PCA

9
Results II
10
Results II
  • Non-Convex Variant
  • Optimize W, not A
  • Neighbourhood Components Analysis
  • Minimize the LOO error of k-NN

11
Results III
12
Discussion
  • Main idea
  • same class close, different classes far
  • Only suitable for uni-modal class distributions
  • Some sort of EM-like algo could help it out?
  • Suitable for
  • Dimensionality reduction (global, one comp/all
    dimensions)
  • Kernels

13
References
  • E. Xing, A. Ng, M. Jordan, and S. Russell.
    Distance metric learning, with application to
    clustering with side-information. In Advances in
    Neural Information Processing Systems (NIPS),
    2004.
  • J. Goldberger, S. Roweis, G. Hinton, and R.
    Salakhutdinov. Neighbourhood components analysis.
    In Advances in Neural Information Processing
    Systems (NIPS), 2004.
Write a Comment
User Comments (0)
About PowerShow.com