Advanced%20Machine%20Learning%20 - PowerPoint PPT Presentation

About This Presentation

Title:

Advanced%20Machine%20Learning%20

Description:

Advanced Machine Learning & Perception Instructor: Tony Jebara – PowerPoint PPT presentation

Number of Views:250

Avg rating:3.0/5.0

Slides: 35

Provided by: jeb93

Learn more at: http://www.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Advanced%20Machine%20Learning%20

1
Advanced Machine Learning Perception
Instructor Tony Jebara
2
Topic 13

Manifolds Continued and Spectral Clustering
Convex Invariance Learning (CoIL)
Kernel PCA (KPCA)
Spectral Clustering N-Cuts

3
Manifolds Continued

PCA linear manifold
MDS get inter-point distances, find 2D data with
same
LLE mimic neighborhoods using low dimensional
vectors
GTM fit a grid of Gaussians to data via
nonlinear warp
Linear PCA after Nonlinear normalization/invarianc
e of data
Manifold is Linear PCA in Hilbert space (Kernels)
Spectral Clustering in Hilbert space

4
Convex Invariance Learning

PCA is appropriate for finding a linear manifold
Variation in data is only modeled linearly
But, many problems are nonlinear
However, the nonlinear variations may be
irrelevant
Images morph, rotate, translate, zoom
Audio pitch changes, ambient acoustics
Video motion, camera view, angles
Genomics proteins fold, insertions,
deletions
Databases fields swapped, formats, scaled
Imagine a Gremlin is corrupting your data by
multiplying
each input vector Xt by a type of matrix At to
give AtXt
Idea remove nonlinear irrelevant variations
before PCA
But, make this part of PCA optimization, not
pre-processing

5
Convex Invariance Learning

Example of irrelevant variation in our data
permutation in image data each image Xt is
multiplied
by a permutation matrix At by gremlin. Must
clean it.
When we convert images to a vector, we are
assuming
arbitrary meaningless ordering (like Gremlin
mixing order)
This arbitrary ordering causes wild
nonlinearities (manifold)
We should not trust ordering, assume gremlin has
permuted it with arbitrary permutation matrix

6
Permutation Invariance

Permutation is irrelevant variation in our data.
Gremlin is permuting fields in our input vectors
So, view a datum as Bag of Vectors instead
single vector
i.e. grayscale image Set of Vectors or Bag
of Pixels
N pixels, each is a D3 XYI tuple
matrix Ai by gremlin. Must clean it.
Treat each input as permutable Bag of Pixels

x
x
x
7
Optimal Permutation

Vectorization / Rasterization uses index in
image
to sort pixels into large vector.
If we knew optimal correspondence could fix
sorting pixels in bag into large vector more
appropriately
we dont know it, must learn it

8
PCA on Permutated Data

In non-permuted vector images, linear changes
eigenvectors are additions deletions of
intensities (bad!). Translating, raising
eyebrows, etc. erasing redrawing
In bag of pixels (vectorized only after knowing
optimal permutation) get linear changes
eigenvectors are morphings, warpings, jointly
spatial intensity change

9
Permutation as a Manifold

Assume order unknown. Set of Vectors or Bag of
Pixels
Get permutational invariance (order doesnt
matter)
Cant represent invariance by single X vector
point in DxN
space since we dont know the ordering
Get permutation invariance by X spanning all
possible
reorderings. Multiply X by unknown A matrix
(permutation
or doubly-stochastic)

x
x
x
x
x
10
Invariant Paths as Matrix Ops

Move vector along manifold by multiplying by
matrix
Restrict A to be permutation matrix (operator)
Resulting manifold of configurations is orbit
if A is group
Or, for smooth manifold, make A doubly-stochastic
matrix
Endow each image in dataset with own
transformation matrix
. Each is now a bag or manifold

11
A Dataset of Invariant Manifolds

E.g. assume model is PCA, learn 2D subspace of 3D
data
Permutation lets points move independently along
paths
Find PCA after moving to form tight 2D subspace
More generally, move along manifolds to improve
fit of any
model (PCA, SVM, probability density, etc.)

12
Optimizing the Permutations

Optimize modeling cost linear constraints on
matrices
Estimate transformation
parameters
and model parameters (PCA, Gaussian,
SVM)
Cost on matrices A emerges from modeling
criterion
Typically, get a Convex Cost with Convex Hull of
Constraints (Unique!)
Since A matrices are soft permutation
matrices (doubly-stochastic) we have

13
Example Cost Gaussian Mean

Maximum Likelihood Gaussian Mean Model
Theorem 1 C(A) is convex in A (Convex Program)
Can solve via a quadratic program on the A
matrices
Minimizing the trace of a covariance tries to
pull the data spherically towards a common mean

14
Example Cost Gaussian Cov

Theorem 2 Regularized log determinant of
covariance is
convex. Equivalently, minimize
Theorem 3 Cost non-quadratic but upper boundable
by
quad. Iteratively solve with QP with
variational bound
Mining determinant flattens data into low volume
pancake

15
Example Cost Fisher Discrimin.

Find linear Fisher Discriminant model w that
maximizes ratio of between within-class
scatter
For discriminative invariance, transformation
matrices
should increase between-class scatter
(numerator) and
should reduce within class scatter
(denominator)
Minimizing above permutes data to make
classification easy

x
x
x
x
x
x
x
x
x
x
16
Interpreting C(A)

Maximum Likelihood Mean
Permute data towards common mean
Maximum Likelihood Mean Covariance
Permute data towards flat subspace
Pushes energy into few eigenvectors
Great as pre-processing before PCA
Fisher Discriminant
Permute data towards two flat
subspaces while repelling away
from each others means

17
SMO Optimization of QP

Quadratic Programming used for all C(A) since
Gaussian Mean quadratic
Gaussian Covariance upper boundable by quadratic
Fisher Discriminant upper boundable by quadratic
Use Sequential Minimal Optimization
axis parallel optimization, pick axes to update,
ensure constraints not violated
Soft permutation matrix 4 constraints
or 4 entries at a time

18
XY Digits Permuted PCA
20 Images of 3 and 9 Each is 70 (x,y) dots No
order on the dots PCA compress with
same number of Eigenvectors Convex Program
first estimates the permutation ? better
reconstruction
Original
PCA
Permuted PCA
19
Interpolation
Intermediate images are smooth morphs Points
nicely corresponded Spatial morphing versus
redrawing No ghosting
20
XYI Faces Permuted PCA
Original PCA Permuted
Bag-of XYI Pixels
PCA
2000 XYI Pixels Compress to 20 dims Improve
squared error of PCA by Almost 3 orders of
magnitude x103
21
XYI Multi-Faces Permuted PCA
/- Scaling on Eigenvector
Top 5 Eigenvectors All just linear variations
in bag of XYI pixels Vectorization nonlinear need
s huge of eigenvectors
22
XYI Multi-Faces Permuted PCA
/- Scaling on Eigenvector
Next 5 Eigenvectors
23
Kernel PCA