A Survey on Distance Metric Learning (Part 2) - PowerPoint PPT Presentation

About This Presentation
Title:

A Survey on Distance Metric Learning (Part 2)

Description:

Lecture material shamelessly adapted from the following sources: Kilian ... nearest neighbors in a heap-tree structure, update heap tree every 15 gradient steps ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 31
Provided by: IBMU288
Category:

less

Transcript and Presenter's Notes

Title: A Survey on Distance Metric Learning (Part 2)


1
A Survey on Distance Metric Learning (Part 2)
  • Gerry Tesauro
  • IBM T.J.Watson Research Center

2
Acknowledgement
  • Lecture material shamelessly adapted from the
    following sources
  • Kilian Weinberger
  • Survey on Distance Metric Learning slides
  • IBM summer intern talk slides (Aug. 2006)
  • Sam Roweis slides (NIPS 2006 workshop on
    Learning to Compare Examples)
  • Yann LeCun talk slides (CVPR 2005, 2006)

3
Outline Part 2
  • Neighbourhood Components Analysis (Golderberger
    et al.), Metric Learning by Collapsing Classes
    (Globerson Roweis)
  • Metric Learning for Kernel Regression (Weinberger
    Tesauro)
  • Metric learning for RL basis function
    construction (Keller et al.)
  • Similarity learning for image processing (LeCun
    et al.)

4
Neighborhood Component Analysis
Distance metric for visualization and kNN
(Goldberger et. al. 2004)
5
Metric Learning for Kernel Regression
Weinberger Tesauro, AISTATS 2007
6
Killing three birds with one stone
We construct a method for linear dimensionality
reduction
that generates a meaningful distance metric
optimally tuned for distance-based kernel
regression
7
Kernel Regression
  • Given training set (xj , yj), j1,,N where x
    is ?-dim vector and y is real-valued, estimate
    value of a test point xi by weighted avg. of
    samples
  • where kij kD (xi, xj) is a distance-based
    kernel function using distance metric D

8
Choice of Kernel
  • Many functional forms for kij can be used in
    MLKR our empirical work uses the Gaussian
    kernel
  • where s is a kernel width parameter (can set s1
    W.L.O.G. since we learn D)
  • softmax regression estimate similar to Roweis
    softmax classifier

9
Distance Metric for Nearest Neighbor Regression
Learn a linear transformation that allows to
estimate the value of a test point from its
nearest neighbors
10
Mahalanobis Metric
Distance function is a pseudo Mahalanobis metric
(Generalizes Euclidean distance)
11
General Metric Learning Objective
  • Find parmaterized distance function D? that
    minimizes total leave-one-out cross-validation
    loss function
  • e.g. params ? elements Aij of A matrix
  • Since were solving for A not M, optimization is
    non-convex ? use gradient descent

12
Gradient Computation
  • where xij xi xj
  • For fast implementation
  • Dont sum over all i-j pairs, only go up to 1000
    nearest neighbors for each sample i
  • Maintain nearest neighbors in a heap-tree
    structure, update heap tree every 15 gradient
    steps
  • Ignore sufficiently small values of kij ( lt e-34
    )
  • Even better data structures cover trees, k-d
    trees

13
Learned Distance Metric example
orig. Euclidean D lt 1
learned D lt 1
14
Twin Peaks test
Training
n8000
we added 3 dimensions with 1000 noise
we rotated 5 dimensions randomly
15
Input Variance
Noise
Signal
16
Test data
17
Test data
18
Output Variance
Signal
Noise
19
DimReduction with MLKR
  • FG-NET face data 82 persons, 984 face images
    w/age

20
DimReduction with MLKR
  • FG-NET face data 82 persons, 984 face images
    w/age

21
DimReduction with MLKR
PowerManagement data (d21)
  • Force A to be rectangular
  • Project onto eigenvectors of A
  • Allows visualization of data

22
Robot arm results (8,32dim)
regression error
23
Unity Data Center Prototype
  • Objective Learn long-range resource value
    estimates for each application manager
  • State Variables (48)
  • Arrival rate
  • ResponseTime
  • QueueLength
  • iatVariance
  • rtVariance
  • Action of servers allocated
  • by Arbiter
  • Reward SLA(Resp. Time)

Maximize Total SLA Revenue
5 sec
Demand (HTTP req/sec)
Demand (HTTP req/sec)
Value(srvrs)
Value(srvrs)
Value(srvrs)
SLA
SLA
SLA
Value(RT)
WebSphere 5.1
Value(srvrs)
WebSphere 5.1
Value(RT)
DB2
DB2
Trade3
Batch
Trade3
8 xSeries servers
(Tesauro, AAAI 2005 Tesauro et al., ICAC 2006)
24
Power Performance Management
  • Objective Managing systems to multi-discipline
    objectives minimize Resp. Time and minimize
    Power Usage
  • State Variables (21)
  • Power Cap
  • Power Usage
  • CPU Utilization
  • Temperature
  • of requests arrived
  • Workload intensity ( Clients)
  • Response Time
  • Action Power Cap
  • Reward SLA(Resp. Time) Power Usage

(Kephart et al., ICAC 2007)
25
IBM Regression Results TEST ERROR
MLKR
14/47
3/5
10/22
26
IBM Regression Results TRAINING ERROR
MLKR
27
Metric Learning for RL basis function
construction (Keller et al. ICML 2006)
  • RL Dataset of state-action-reward tuples (si,
    ai, ri) , i1,,N

28
Value Iteration
  • Define an iterative bootstrap calculation
  • Each round of VI must iterate over all states in
    the state space
  • Try to speed this up using state aggregation
    (Bertsekas Castanon, 1989)
  • Idea Use NCA to aggregate states
  • project states into lower-dim rep keep states
    with similar Bellman error close together
  • use projected states to define a set of basis
    functions ?
  • learn linear value function over basis functions
    V ? ?i ?i

29
Chopra et. al. 2005
Similarity metric for image verification.
Problem Given a pair of face-images, decide if
they are from the same person.
30
Chopra et. al. 2005
Similarity metric for image verification.
Problem Given a pair of face-images, decide if
they are from the same person.
Too difficult for linear mapping!
31
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com