NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers PowerPoint PPT Presentation

presentation player overlay
1 / 25
About This Presentation
Transcript and Presenter's Notes

Title: NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers


1
NHDC and PHDC Local and Global Heat Diffusion
Based Classifiers
  • Haixuan Yang
  • Group Meeting
  • Sep 26, 2005

2
Outline
  • Introduction
  • Graph Heat Diffusion Model
  • NHDC and PHDC algorithms
  • Connections with other models
  • Experiments
  • Conclusions and future work

3
Introduction
  • Kondor Lafferty (NIPS2002)
  • Construct a diffusion kernel on a graph
  • Handle discrete attributes
  • Apply to a large margin classifier
  • Achieve goof performance in accuracy on 5 data
    sets from UCI
  • Lafferty Kondor (JMLR2005)
  • Construct a diffusion kernel on a special
    manifold
  • Handle continuous attributes
  • Restrict to text classification
  • Apply to SVM
  • Achieve good performance in accuracy on WEbKB and
    Reuters
  • Belkin Niyogi (Neural Computation 2003)
  • Reduce dimension by heat kernel and local
    distance
  • Tenenbaum et al (Science 2000)
  • Reduce dimension by local distance

4
Introduction
  • We inherit the ideas
  • Local information is relatively accurate in a
    nonlinear manifold.
  • The way heat diffuses on a manifold is related
    to the density of the data on the manifold the
    point where heat diffuses rapidly is one that has
    high density.
  • For example, in the ideal case when the manifold
    is the Euclidean space, heat diffuses in the same
    way as Gaussian density
  • The way heat diffuses on a manifold can be
    understood as a generalization of the Gaussian
    density from Euclidean space to manifold.
  • Learn local information by k nearest neighbors.

5
Introduction
  • We think differently
  • Unknown manifold in most cases.
  • Unknown solution for the known manifold.
  • The explicit form of the approximation to the
    solution in (Lafferty Lebanon JMLR2005)
  • is a rare case.
  • Establish the heat diffusion equation directly on
    a graph that is formed by K nearest neighbors.
  • Always have an explicit form in any case.
  • Form a classifier by the solution directly.

6
Illustration
The first heat diffusion
The second heat diffusion
7
Illustration
8
Illustration
9
Illustration
Heat received from A class 0.018 Heat received
from B class 0.016
Heat received from A class 0.002 Heat received
from B class 0.08
SVM
10
Graph Heat Diffusion Model
  • Given a directed weighted graph G(V,E,W), where
  • V1,2,,n,
  • E(i,j) if there is an edge from i to j,
  • W( w(i,j) ) is the weight matrix.
  • The edge (i,j) is imagined as a pipe that
    connects i and j, w(i,j) is the pipe length.
  • Let f(i,t) be the heat at node i at time t.
  • At time t, i receives M(i,j,t,dt) amount of heat
    from its neighbor j during a period of dt.

11
Graph Heat Diffusion Model
  • Suppose that M(i,j,t,dt) is proportional to the
    time period dt.
  • Suppose that M(i,j,t,dt) is proportional to the
    heat difference f(j,t)-f(i,t).
  • Moreover, the heat flows from j to i through the
    pipe and therefore the heat diffuses in the pipe
    in the same way as it does in the Euclidean space
    as described before.

12
Graph Heat Diffusion Model
  • The heat difference f(i,tdt) and f(i,t) can be
    expressed as
  • It can be expressed as a matrix form
  • Let dt tends to zero, the above equation becomes

13
NHDC and PHDC algorithm - Step 1
  • Construct neighborhood graph
  • Define graph G over all data points both in the
    training data set and in the test data set.
  • Add edge from j to i if j is one of the K
    nearest neighbors of i.
  • Set edge weight w(i,j)d(i, j) if j is one of the
    K nearest neighbors of i, where d(i, j) be the
    Euclidean distance between point i and point j.

14
NHDC and PHDC algorithm - Step 2
  • Compute the Heat Kernel
  • Using equation

15
NHDC and PHDC algorithm - Step 3
  • Compute the Heat Distribution
  • Set f(0) for each class c, nodes labeled by
    class c, has an initial unit heat at time 0, all
    other nodes have no heat at time 0.
  • In PHDC, use equation
  • to compute the heat distribution.
  • In NHDC, use equation

16
NHDC and PHDC algorithm - Step 4
  • Classify the nodes
  • For each node in the test data set, classify it
    to the class from which it receives most heat.

17
Connections with other models
  • The Parzen window approach (when the window
    function takes the normal form) is a special case
    of the NHDC.
  • It is a non-parametric method for probability
    density estimation

The class-conditional density for class k
Assign x to a class whose value is maximal.
18
Connections with other models
  • The Parzen window approach (when the window
    function takes the normal form) is a special case
    of the NHDC.
  • In our model, let Kn-1, then the graph
    constructed in Step 1 will be a complete graph.
    The matrix H will be

Heat that xp receives from the data points in
class k
19
Connections with other models
  • KNN is a special case of the NHDC.
  • For each test data, assign it to the class that
    has the maximal number in its K nearest neighbors.

20
Connections with other models
  • KNN is a special case of the NHDC.
  • In our model, letßtend to infinity, then the
    matrix H becomes

The number of the cases in class q in its K
nearest neighbor.
Heat that xp receives from the data points in
class k
21
Connections with other models
  • PHDC can approximate NHDC.
  • If ?is small, then
  • Since the identity matrix has no effect on the
    heat
  • distribution, PHDC and NHDC has
    similarclassification accuracy when ? is small.

22
Connections with other models
PHDC
NHDC
KNN
PWA
23
Experiments
  • 2 artificial Data sets
  • Spiral-100
    Spiral-1000
  • Compare with Parzen window (The window function
    takes the normal form), KNN and SVM.
  • The result is the average of the ten-cross
    validation.

24
Experiments
  • Results

Algorithm NHDC PHDC KNN PWA SVM
Spiral-100 84 84 67 83 34
Spiral-1000 99.6 99.8 99.3 99.7 68.7
Credit-g 76.1 76.06 75.59 72.35 71.5
Diabetes 76.3 76.22 75.78 74.96 76.6
Glass 72.99 73.12 70.64 71.56 68.1
Iris 97.36 97.79 97.36 97.07 96
Sonar 88.75 89.07 82.86 88.28 84.8
Vehicle 72.90 72.93 71.41 72.45 88.5
25
Conclusions and future work
  • Avoid the difficulty of finding the explicit
    expression for the unknown geometry
  • Avoid the difficult of finding a closed form heat
    kernel for some complicated geometries.
  • Both NHDC and PHDC are efficient in accuracy.
  • There is space to develop it further.
  • The assumption in the local heat diffusion is not
    fully justified.
  • We are now using a directed graph. Converting it
    into a undirected graph may be more reasonable
    because that in reality heat diffuses
    symmetrically.
  • Apply it to SVM?
Write a Comment
User Comments (0)
About PowerShow.com