Learning structure of manifolds using random projections - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Learning structure of manifolds using random projections

Description:

Data matrix Xi,j of coordinates. Row i=1..N is data sample ... Compression also possible when going to projection coordinates. My interest: ... – PowerPoint PPT presentation

Number of Views:298
Avg rating:3.0/5.0
Slides: 35
Provided by: stev276
Category:

less

Transcript and Presenter's Notes

Title: Learning structure of manifolds using random projections


1
Learning structure of manifolds using random
projections
  • Freund, Dasgupta, Kabra, Verma
  • UC San Diego
  • Presentation by Steven Bergner
  • Simon Fraser University

2
Structure
  • Definitions and problem setting
  • Related Work
  • Random projections trees
  • Results

3
Data
  • Data matrix Xi,j of coordinates
  • Row i1..N is data sample
  • Column j1..D is attribute or dimension
  • Challenges
  • Large N storage, streaming, sampling
  • Large D insufficient training data
  • Undefined fields graphical models

4
Manifolds
  • Every point has an Rn neighborhood
  • Global structure may be different

Chinese Swiss roll
source wikipedia
Earth
5
Dimension
  • Extrinsic
  • Number of measurements
  • (Non-)linear dependencies
  • Intrinsic
  • Data near d-dimensional manifold dltD
  • Independent, uncorrelated
  • E.g. doubling dimension

6
Distributions with low intrinsic d
  • Example Motion capturing
  • D markers each with 3 coordinates
  • Body posture determined by joint angles

7
Related work
8
(Non-)parametric statistics
  • Parametric
  • E.g. fitting a Gaussian to observations
  • Needs a model
  • Non-parametric
  • E.g. estimating a histogram (density)
  • Bayesian statistics
  • Manifold learning
  • Needs lots of examples
  • Framework Approximation theory

9
Manifold learning
  • Incrementally grow neighborhoods
  • Locally-linear embeddings Roweis Saul 2001
  • Wi,j weights of local neighbors to reconstruct
    point i
  • Embedding coordinates in first eigenvectors of
    Wi,j
  • ISOMAP Tenenbaum et al. 2001
  • Build k-nearest neighbor graph
  • Shortest path lengths between all points in
    matrix A
  • Eigenvectors of A provide embedding coords

10
Random projections
  • Johnson-Lindenstrauss 83
  • Classifier capacity with random projections Garg
    2002
  • Compressive sensing Candes 2006

11
Johnson-Lindenstrauss Lemma
  • Target dimension k does not depend on original
    dimension d

DasguptaGupta 99
12
Random projection trees
13
Kd-trees
  • BSP
  • Used for nearest neighbor queries
  • Associative memory

14
RP-trees
  • Split along random directions
  • Split point minimized inner-cell variance

15
Algorithm Make Tree
  • S is point set
  • Rule(x) divides the set

16
Algorithm PCA choose rule
  • Sorting along random direction v will give
    similar median

17
Point set diameters
  • Diameter of S
  • maxx-y for all x,y in S
  • Average diameter

18
Algorithm RP tree choose rule
  • Split minimizes inner-class variance

19
Building an RP tree
  • PCA Ellipsoid for comparison only
  • split now chosen via RP rule

20
Building an RP tree
21
Building an RP tree
22
Building an RP tree
23
Building an RP tree
24
Building an RP tree
25
Split diameter Theorem
  • Covariance dimension d(?) fulfils
  • For a cell C split into several C

26
Proof for doubling dimension d
  • A cell of diameter ? may be covered by O(dlogd)
    balls of radiuslt?/2
  • Those can be split with O(dlogd) projections

27
Streaming implementation
  • Fixed set of random directions v chosen at
    beginning
  • Use v that minimizes avg. diameter
  • Both splits operate on projected pts.
  • Statistics updated for each node

28
Results
29
Results (synthetic data 1)
Data set 1 10,000 points in 1000-dimensional
unit cube randomly perturbed by Gaussian noise
with sigma1
30
Results (synthetic data 2)
Data set 2 10,000 points chosen equally from
two 1000-dimensional Gaussians at (1,..,1) and
(-1,,-1)
31
MNIST data Handwritten digits 1
32
MNIST data Handwritten digits 1
33
MNIST data Handwritten digits 1
34
Applications
  • Description of manifold may be used for
    classification, interpolation
  • Compression also possible when going to
    projection coordinates
  • My interest
  • VC-dimension and discrepancy

35
Thank you for your attention.
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com