Recursive SelfOrganizing Networks - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Recursive SelfOrganizing Networks

Description:

L0(n) = ? nhd(n,nwinner) (a L0(n)) L1(n) = ? nhd(n,nwinner) (R1 L1(n) ... Class assignment to neurons: a posteriori by using the training set. ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 56
Provided by: Barb371
Category:

less

Transcript and Presenter's Notes

Title: Recursive SelfOrganizing Networks


1
Recursive Self-Organizing Networks
  • Barbara Hammer,
  • AG LNM, Universität Osnabrück,
  • and Alessio Micheli,
  • Alessandro Sperduti, Marc Strickert

2
History
Symbolic systems
if red then apple otherwise pear
if (on(apple1,pear) and free(apple2)) moveto
(apple1,apple2)
Neural networks
finite dimensional vector
3
History
Recurrent networks for sequences
frec(x1,x2,x3,...) f(x1,frec(x2,x3,..))
...
Recursive networks for trees
frec(a(t1,t2)) f(a,frec(t1),frec(t2))
4
History
  • well established models
  • Training gradient based learning, ...
  • Theory representation/approximation capability,
    learnability, complexity, ...
  • Applications for RNNs too many to be mentioned,
    for RecNNs
  • term classification Goller, Küchler, 1996
  • automated theorem proving Goller, 1997
  • learning tree automata Küchler, 1998
  • QSAR/QSPR problems Schmitt, Goller, 1998
    Bianucci, Micheli, Sperduti, Starita, 2000
    Vullo, Frasconi, 2003
  • logo and image recognition Costa, Frasconi,
    Soda, 1999
  • natural language parsing Costa, Frasconi, Sturt,
    Lombardo, Soda, 2000
  • document classification Diligenti, Frasconi,
    Gori, 2001
  • fingerprint classification Yao, Marcialis, Roli,
    Frasconi, Pontil, 2001
  • prediction of contact maps Baldi, Frasconi,
    Pollastri, Vullo, 2002

5
History
  • unsupervised methods
  • Visualize (noisy) fruits

representation (Øx,Øy,Øx/Øy,curvature,color,hard
ness,weight, ) ? Rn
6
History
Unsupervised networks for data representation,
visualization, clustering, preprocessing, data
mining,
e.g. self organizing map (SOM) given a lattice
of neurons
i(i1,i2)
represent data via winner takes all
classification f Rn ? I, x ? i where x-wi2
minimal
Hebbian learning based on examples xi,
topological mapping because of neighborhood
cooperation wj wj ?exp(-j-jw/s2)
(xi-wj)
x
i x - wi2 minimal
?
?
7
History
8
History
  • Visualize (noisy) fruits

representation (Øx,Øy,Øx/Øy,curvature,color,hard
ness,weight, ) ? Rn
.. but what should we do with fruit salad?
9
Recursive self-organizing networks
  • Outline
  • Various approaches
  • A unifying framework
  • More approaches
  • Experiments
  • Theory
  • Conclusions

10
Various approaches
11
Various approaches
i(i1,i2)
  • Standard SOM

x
i x - wi2 minimal
sequence or structure
flatten this one or use an alternative metric
12
Various approaches
  • Temporal Kohonen Map Chappell,Taylor

i(i1,i2)
standard Hebbian learning for wi
sequence
x1,x2,x3,x4,
i1 d1(i) x1 - wi2 minimal
i2 d2(i) x2 - wi2 ad1(i) minimal
leaky integration also recurrent SOM
Varsta,Heikkonen,Milan
i3 d3(i) x3 - wi2 ad2(i) minimal
recurrence
13
Various approaches
  • Recursive SOM Voegtlin

Hebbian learning for wi and ci
i(i1,i2)
(wi,ci) with ci in RN
sequence
x1,x2,x3,x4,
i1 d1(i) x1 - wi2 minimal
i2 d2(i) x2 - wi2 a(d1(1),,d1(N)) -
ci2 minimal
i3 d3(i) x3 - wi2 a(d2(1),,d2(N)) -
ci2 minimal
...Voegtlin uses exp(-di(j))...
recurrence
14
Various approaches
  • SOMSD Hagenbuchner,Sperduti,Tsoi

Hebbian learning for wi, ci1, and ci2
i(i1,i2)
(wi,ci1,ci2) with ci1, ci2 in R2
x1
x2
x3
i2 x2 wi 2, i4 x4 wi 2, i5 x5
wi 2 minimal
i3 x3 wi 2 ai4 c i12 ai5 c
i22 minimal
x4
x5
i1 x1 wi 2 ai2 ci1 2 ai3 c
i22 minimal
recurrence
15
Various approaches
  • Example 42 ? 33? 33? ...

(1,1)
(1,3)
(1,1)
(2,3)
(3,3)
(3,1)
16
A unifying framework
17
A unifying framework
a(t,t)
t
t
a
(w,r,r)
w a2
rep(t) - r2
rep(t) r2
rep(t)
rep(t)
18
A unifying framework
  • ingredients
  • binary trees with labels in (W,dW)
  • formal representation of trees (R,dR)
    ? context!
  • neurons n with labeling (L0(n),L1(n),L2(n)) in
    WxRxR
  • a function rep RN ? R
  • recursive distance of a(t1,t2) from n
  • drec(a(t1,t2),n) dW(a,L0(n))
    adR(R1,L1(n)) adR(R2,L2(n))
  • where
  • Ri r? if ti ? and
  • Ri rep(drec(ti,n1),,dre
    c(ti,nN)) otherwise
  • training
  • L0(n) ?
    nhd(n,nwinner) (a L0(n))
  • L1(n) ?
    nhd(n,nwinner) (R1 L1(n))
  • L2(n) ?
    nhd(n,nwinner) (R2 L2(n))

19
A unifying framework
SOMSD
context index of the winner
(w,r,r) in WxRdxRd
w a2
rep(t) - r2
rep(t) r2
rep(t) index of the winner
rep(t) index of the winner
20
A unifying framework
RecSOM
context whole maps activation
(w,r) in WxRN
w a2
rep(t) - r2
rep(t)(exp(-drec(t,n1)),,exp(-drec(t,nN)))
21
A unifying framework
TKM
context neurons activation
(w,(0,,1,,0))
w a2
drec(t,ni) (0,,1,,0)t (drec(t,n1),,drec(t,nN)
)
repid (drec(t,n1),,drec(t,nN))
22
A unifying framework
recursive NN
context activation of all neurons
(w,w1,w2)
w a
w2 rep(t)
w1 rep(t)
rep(t) sgd(activation)
rep(t) sgd(activation)
23
More approaches
24
More approaches
context model
winner content
structure
MSOM
TKM no explicit context, hence restricted
winner content (w,r) (w,r) ? dimensionality too
high merge (1-?)w ?r
RecSOM very high dimensional context
SOMSD compressed information
25
More approaches
lattice
MNG
HSOMS
VQadapt only the winner
NGadapt every neuron according to its rank, i.e.
nhd(n,nw) h(rk(n,x)), no prior lattice!!!
HSOM hyperbolic lattice structure
26
Experiments
27
Experiments
Mackey-Glass time series
temporal quantization error
mean value for each t-j over all pattern
compare to the actual pattern
winner for t
28
Experiments
.2 .15 .1 .05 0
SOMSD
0 5 10 15 20 25
30 Index of past inputs (index 0 present)
past
present
29
Experiments
HSOMS
MNG
30
Experiments
  • Reber grammar

reconstruct 3-gram probabilities from the neurons
of a HSOMS by counting
31
Experiments
SOMSD
HSOMS
E
B
P
S
S
P
V
V
X
X
E
B
T
T
32
Theory
33
Theory
  • Cost function of training?
  • (approximate) SOM, VQ, NG training is a
    stochastic gradient descent on (some f)

  • ?if(xi-w12,...,xi-wN2)
  • recursive (approximate) SOM, VQ, NG is in general
    no stochastic gradient descent, but it can be
    interpreted as a truncated stochastic gradient
    descent on

  • ?if(drec(ti,n1),..., drec(ti,nN))
  • (the same f) if dW,dR is the squared
    Euclidean distance

34
Theory
(w,r,r)
w a2
R(t) - r2
a
R(t) r2
35
Theory
  • Representation of data?
  • SOMSD, RecSOM given a finite set, given enough
    neurons, every sequence/tree can be represented
    by a neuron,
  • context location in the lattice
  • TKM, MSOM
  • trees cannot be represented (commutativity),
  • the range of representation is restricted by the
    weight space,
  • optimum codes for a sequence (a0,a1,a2,...) are
  • TKM w ?tatat / ?tat
    and
  • MSOM w a0, c ?t?t-1at / ?t?t-1
  • context fractal encoding
  • for MSOM this is a fixed point, uses additional
    ?

36
Theory
  • Topology?

induces d1(t,t) ? distance of their winner
indices d2(t,t) ? distance of their
representations
?
?
tripels (w,c1,c2)
trees
37
Theory
  • Explicit metric for SOMSD
  • Assume granularity triples (w,c1,c2) form
    e-cover of the space w.r.t. dR, dW,
  • topological matching for all neurons i1, i2 with
    weights (w1,c11,c12), (w2,c21,c22) holds
    dR(i1,i2) (adW(w1,w2)bdR(c11,c21)bdR(c12,c2
    2)) lt e
  • then
  • d1(t,t) Drec(t,t) (ee(ab)const)
    (1-(2b)H1) / (1-2b), Hheight
  • where
  • Drec(t,?) Drec(?,t) dR(winner(t),r?)
  • Drec(x(t1,t2),x(t1,t2)) adW(x,x)
    bDrec(t1,t1) bDrec(t2,t2)

Markovian
38
Conclusions
39
Conclusions
  • recursive self-organizing models expand
    supervised RecNNs, they differ with respect to
    the choice of context ( activation for RecNNs)
  • context e.g.
  • neuron, map activation, winner index, winner
    content
  • complexity
  • lattice model
  • way of representation
  • models make sense, possibly (locally) Markovian
  • ... but many topics of ongoing research ...

40
(No Transcript)
41
(No Transcript)
42
Various approaches
  • SOMSD

SOMSD (sequences)
standard SOM
adaptation
43
Experiments
physiological data (heart rate, chest volume,
blood oxygen concentration, preprocessed), 617
neurons
44
(No Transcript)
45
Examples
0.4
-1
1
0.7
0.6
0.3
P (-1) 4 / 7
P(1) 3 / 7
generator for words with two discrete states
specialization unambiguous temporal context
46
Experiments
100 most probable binary words (all nodes except
for root node)
0.6
MNG receptive fields for 100 neurons (bullets
indicate spec. neurons)
1
-1
47
Experiments
reconstruction by counting for HSOMS
48
Experiments
probabilistic 2-gram model, symbols a,b,c
subject to noise
SOMSD with 100 neurons probability extraction
counting symbols???
49
Experiments
U-matrix on the weights U-value mean distance
from neighbors valleys symbols
50
(No Transcript)
51
Experiments
U-matrix, context U-matrix, and mean previous
contexts
52
Speaker Identification for a Japanese Vowel
Three exemplary patterns of Æ articulations
from different speakers
53
  • More info on the data
  • 9 different speakers.
  • Training set 30 articulations from each
    speaker.
  • Test set varying number of articulations.
  • available from UCI Knowledge Discovery in
    Database at http//kdd.ics.uci.edu/databas
    es/JapaneseVowels/JapaneseVowels.html.

Aim ? Unsupervised speaker identification.
Class assignment to neurons a posteriori by
using the training set. Speaker for which a
neuron is most active is considered as target
class.
54
Linear discriminant analysis (LDA) for MNG
24D -gt 2D projection of the 212D weight-context
tuples of MNG neurons
All 150 neurons are displayed with a posteriori
labels (colors).
  • Neurons separate well, already in crude (!)
    2D-projection.
  • Neurons specialize on speakers. ?

55
Speaker Identification Results
Classification errors ? Training set 2.96
(all 150 neurons used) ? Test set
4.86 (1 idle neuron, never selected) Referenc
e error 5.9 (supervised, rule based, Kudo et
al. 1999)
Write a Comment
User Comments (0)
About PowerShow.com