Edwin R' Hancock and Richard Wilson presentation

About This Presentation

Transcript and Presenter's Notes

Title: Edwin R' Hancock and Richard Wilson

1
The University of York
Recent Progress on Learning with Graph
Representations
Edwin R. Hancockand Richard Wilson With help
from Bai Xiao Bin Luo, Antonio Robles-Kelly and
Andrea Torsello. University of YorkComputer
Science DepartmentYORK Y010 5DD,
UK. erh_at_cs.york.ac.uk
2
Outline

Motivation Background
Graphs from images
Spectral invariants
Lifting cospectrality
Generative models and description length
Conclusions

3
Motivation
4
Problem
In computer vision graph-structures are used to
abstract image structure. However, the algorithms
used to segment the image primitives are not
reliable. As a result there are both additional
and missing nodes (due to segmentation error) and
variations in edge-structure. Hence image
matching and recognition can not be reduced to a
graph isomorphism or even a subgraph isomorphism
problem. Instead inexact graph matching methods
are needed.
5
Problem
In computer vision graph-structures are used to
abstract image structure. However, the algorithms
used to segment the image primitives are not
reliable. As a result there are both additional
and missing nodes (due to segmentation error) and
variations in edge-structure. Hence image
matching and recognition can not be reduced to a
graph isomorphism or even a subgraph isomorphism
problem. Instead inexact graph matching methods
are needed.
6
Problem
In computer vision graph-structures are used to
abstract image structure. However, the algorithms
used to segment the image primitives are not
reliable. As a result there are both additional
and missing nodes (due to segmentation error) and
variations in edge-structure. Hence image
matching and recognition can not be reduced to a
graph isomorphism or even a subgraph isomorphism
problem. Instead inexact graph matching methods
are needed.
7
Measuring similarity of graphs

Early work on graph-matching is vision ( Barrow
and Popplestone) introduced association graph and
showed how it could be used to locate maximum
common subgraph,
Work on syntactic and structural pattern
recognition in 1980s unearthed problems with
inexact matching (SanfeliuEshera and Fu.
Haralick and Shapiro, Wong etc) and extended
concept of edit distance from strings to graphs.
Recent work has aimed to develop probability
distributions for graph matching (Christmas,
Kittler and Petrou, Wilson and Hancock, Seratosa
and Sanfeliu) and match using advanced
optmisation methods(Simic, Gold and Rangarjan).
Renewed interest in placing classical methods
such as edit distance (Bunke) and max-clique
(Pelillo) on a more rigorous footing.

8
Viewed from the perspective of learning
This work has shown how to measure the similarity
of graphs. It can be used to locate inexact
matches when significant levels of structural
error are present. May also provide a means by
which modes of structural variation can be
assessed.
9
Learning with graphs (circa 2000)

Learn class structure Assign graphs to classes.
Need a distance measure or vector of graph
characteristics. Central clustering is possible
with characteristics but difficult when number of
nodes and edges varies and correspondences are
not known. Easier to perform pairwise
clustering. (Bunke, Buhman).
Embed graphs in a low dimensional space
Correspondences are again needed, but spectral
methods may offer a solution. Can apply standard
statistical and geometric learning methods to
graph-vectors.
Learn modes of structural variation Understand
how edge (connectivity) structure varies for
graphs belonging to the same class.
(Dickinson,Williams)
Build generative model Borrow ideas from
graphical models (Langley, Friedman, Koller).

10
Why is structural learning difficult

Graphs are not vectors There is no natural
ordering of nodes and edges. Correspondences must
be used to establish order.
Structural variations Numbers of nodes and
edges are not fixed. They can vary due to
segmentation error.
Not easily summarised Since they do not reside
in a vector space, mean and covariance hard to
characterise.

11
Structural Variations
12
Contributions

Permutation invariant graph characteristics from
Laplacian spectrum (Wilson, Hancock, Luo PAMI
2005).
Computation of edit distance between graphs and
spectral clustering (Robles-Kelly, Torsello and
Hancock IJCV 2007, Robles-Kelly and Hancock
PAMI 2005).
Embedding based on properties of random walk and
geometric characterisation of embedded nodes (Qiu
and Hancock, PAMI 2007).
Spectral embedding of graphs (Luo, Wilson and
Hancock Patt. Rec.2004).
Learn generative model of tree structure using
description length (Torsello and Hancock PAMI
2006).

13
Spectral Methods
Use eigenvalues and eigenvectors of adjacency
graph (or Laplacian matrix) - Biggs, Cvetokovic,
Fan Chung

Singular value methods for exact graph-matching
and point-set alignment). (Umeyama)
Singular value methods for point-set
correspondence (Scott and Longuet-Higgins,
Shapiro and Brady).
Use of eigenvalues for image segmentation (Shi
and Malik) and for perceptual grouping (Freeman
and Perona, Sarkar and Boyer).
Graph-spectral methods for indexing shock-trees
(Dickinson and Shakoufandeh)

14
Graph (structural) representations of shape

Region adjacency graphs ( Popplestone etc,,
Worthington, Pizlo, Rosenfeld)
View graphs (Freeman, Ponce)
Aspect graphs (Dickisnon)
Trees (Forsyth, Geiger).
Shock graphs (Siddiqi, Zucker, Kimia).

Idea is to segment shape primitives from image
data and to abstract them using a graph. Shape
recognition becomes a problem of graph matching.
However, statistical learning of modes of shape
variation becomes difficult since available
methodology is limited.
15
Delaunay Graph
16
MOVI Sequence
17
Shock graphs
Type 1 shock(monotonically increasing radius)
Type 2 shock(minimum radius)
Type 3 shock(constant radius)
Type 4 shock(maximum radius)
18
Graph characteristics

Laplacian spectrum provides natural permutation
invariant for graph, but discards information in
eigensystem.
Symmetric polynomials over spectral matrix give
rich family of invariants.
Can be extended to attributed graphs using
complex number encoding and Hermitian extension
of Laplacian.
Recent work has shown how invariants are linked
to moments from Mellin transform of heat kernel.

19
Pairwise clustering

Compute tree/graph similarity using edit
distance.
Simplifying structure can simplify process
(convert graph to string).
Extract pairwise clusters using EM algorithm and
eigevectors of an affinity matrix between graphs.
Applied to learn shape classes.

20
Embeddings

Embed nodes of a graph into a vector space so as
to preserve node affinity properties of graph.
Examples include Laplacian eigenmap, diffusion
map.
Have shown how commute time leads to embedding
that is robust to modifications in edge structure.

21
Generative model

In structural domain model can be learned using
EM algorithm to fit mixture over classes to
sample of trees.
Each class is characterised by a prototype from
which trees belonging to class can be obtained
through tree edit operations.
Prototypes formed by merging trees. Merging
criterion is description length.
Edit distance between trees is linked to
description length advantage, and entropy
associated with ML node probabilities.

22
Spectral Generative Model

Embed nodes of graph in vector space using
heat-kernel.
Align embedded node positions using Procrustes
alignment.
Compute covariance matrix for node positions.
Deform node positions in directions of
eigenvectors of covariance matrix.

23
Algebraic graph theory (PAMI 2005)

Use symmetric polynomials to construct
permutation invariants from spectral matrix

24
.joint work with Richard Wilson
25
Spectral Representation

Compute Laplacian matrix LD-A, where A is the
adjacency matrix and D is the matrix with the
node degree on the diagonal.
Perform spectral decomposition on the Laplacian
matrix
Construct spectral matrix

26
Properties of the Laplacian

Eigenvalues are positive and smallest eigenvalue
is zero
Multiplicity of zero eigenvalue is number
connected components of graph.
Zero eigenvalue is associated with all-ones
vector.
Eigenvector associated with the second smallest
eigenvector is Fiedler vector.
Fiedler vector can be used to perform clustering
of nodes of graph by recursive bisection .

27
Eigenvalue spectrum
Vector of ordered eigevectors is permutation
invariant
28
Eigenvalues are invariant to permutations of the
Laplacian.

..would like to construct family of permutation
invariants from full spectral matrix.

29
Why

According to perturbation analysis eigenvalues
are relatively stable to noise.
Eigenvectors are not stable to noise and undergo
large rotations for small additions of noise.

30
Symmetric polynomials
31
Power symmetric polynomials
32
Symmetric polynomials on spectral matrix

Symmetric polynomials and power symmetric
polynomials related by Newton Giraud formula

33
Spectral Feature Vector

Construct a matrix of permutation invariants by
applying symmetric polynomials to elements in
columns of the spectral matrix. Use entropy
measure to flatten distribution
Stack columns of F to form a long-vector B.
Set of graphs represented by data-matrix

34
extend to weighted attributed graphs.
35
Complex Representation

Encode attributes as complex numbers.
Off-diagonal elements. Edge weights (W) as
modulus and normalised attributes as phase (y)
Diagonal elements encode node attributes (x) and
ensure H is positive semi-definite

36
Spectral analysis

Perform spectral analysis on H. Real eigenvalues
and complex eigenvectors
Construct spectral matrix of scaled complex
eigenvectors
Complex Laplacian

37
Pattern Spaces

PCA Project long vectors onto leading
eigenvectors of covariance matrix
MDS Embed graphs in low dimensional space
spanned by eigenvectors of distance matrix
LLP Locally linear projection (Niyogi) perform
eigenvector analysis on weighted covariance
matrix (mixture of PCA and MDS). PCA/MDS hybrid.

38
Manifold learning methods

ISOMAP construct neighbourhood graph on pairwise
geodesic distance between data-points. Low
distortion embedding by applying MDS to weighted
graph (Tennenbaum).
Locally linear embedding apply variant of PCA to
data (Roweiss Saul)
Locally linear projection use interpoint
distances to compute weighted covariance matrix,
and apply PCA (HeNiyogi).

39
Separation under structural error
Mahalanobis distance between feature vectors for
noise corrupted graph and remaining graphs
Distance between graph and edge-edited variants
Distance between graph and random graphs of same
size and edge density
40
Variation under structural error (MDS)
MDS applied to Mahalanobis distances between
feature vectors.
41
CMU Sequence
42
MOVI Sequence
43
YORK Sequence
44
Visualisation (LLPLaplacian Polynomials)
45
Cospectrality problem for trees

Classical random walks are determined by spectrum
of Laplacian matrix. Gives path-length
distribution, hitting and commute times.
Non-isomorphic graphs can have the same spectra
(co-spectrality). This problem is severe for
trees.
Turn to quantum walks to overcome this problem
and develop new algorithms for graph analysis
based on random walks.

46
Cospectral trees

Nearly every tree has a (adjacency,
laplacian,...) cospectral partner.
Such trees can be easily generated.
The spectrum of S(U3) distinguished all such
trees it was tested on.

pairs of cospectral trees
47
Overcome using quantum random walk

The unitary operator governing the evolution of
the walk can be written in matrix form as
where the basis states are the set of all
ordered pairs (i,j) such that
Eigenvalues of U are

48
The positive support of a matrix

For a real valued matrix, M, define its positive
support, S(M) by

S(Ur)ij is non-zero if and only if the sum of
all the paths of length r from state i to state j
is positive.
Interference effects on the quantum walk ensure
that S(Ur)ij gives useful information about the
graph when classical analogues do not.

49
Cospectral Trees
Spectrum of positive support for UxUxU not
determined by spectrum of L and lifts
cospectrality problem
50
Stongly regular graphs

There is no method proven to be able to decide
whether two SRGs are isomorphic in polynomial
time.
There are large families of strongly regular
graphs that we can test the method on.

MDS embeddings of the SRGs with parameters
(25,12,5,6)-red, (26,10,3,4)-blue,
(29,14,6,7)-black, (40,12,2,4)-green using the
adjacency spectrum (top) and the spectrum of
S(U3) (bottom).
51
Generative Tree Union Model

Probability distribution over the union tree

52
..work with Andrea Torsello
53
Ingredients

Set of tree unions
Set of node observation probabilities for each
node (probability of observing ith
node of union c).
Set of node correspondences

54
Illustration
55
Cluster structure

Cluster indicator
Number of trees assigned to cluster c
Number of nodes in union c

56
Model

Describe data using a mixture of tree unions
Where N is the node-set and O is the order
relation of the tree union and is the set
of node probabilities.

57
Union as tree distribution

For each node in the union we know how often it
is encountered in the sampled trees.
We can generate new trees by sampling with the
node probability equal to the normalised sample
frequency.
The union represents a generative model for a
distribution of trees.

58
Generative Model

Aim is to make maximum likelihood estimate of
the model .
Problem we do not know how sample-nodes map to
model-nodes.
Let node observation probability depend on
correspondence map M (determined later).

59
Max-likelihood parameters

Log-likelihood
Given M, L is maximized by any T consistent with
the hierarchies and by

60
Description length

Model coding cost of encoding k-dimensional
parameterisation of an m-dimensional
sample-vector is

Expected value of data log likelihod given best
fit model parameters
Cost of coding model (parametersstructure)
61
Expectation on observation density
depends on node entropy
62
Tree Union

Cost of describing tree union

Negative likelihood of data given model
Cost of encoding node probabilities
Cost of encoding mixture
Cost of encoding tree structure
63
Simplified Description Cost

Cost of describing tree union

64
Description Length Gain

Which nodes should be merged?
The description advantage obtained by merging
nodes v and v
Set of merges M that minimizes descriptor length
maximizes
Edit distance linked to node entropy

65
Unattributed
Pairwise clustering of tree edit distance.
Mixture of tree unions
66
Future

Links between spectral geometry and
graph-spectra.
MDL in spectral domain.

Write a Comment

User Comments (0)

About PowerShow.com

Edwin R' Hancock and Richard Wilson PowerPoint PPT Presentation