Recent advances in LVQ - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Recent advances in LVQ

Description:

Theory of online learning uses techniques of theoretical physics ... Dotted: optimum linear decision Dashed: LVQ2.1 with idealized stopping. Solid: LVQ1 Chain: LFM ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 54
Provided by: sebastia3
Category:

less

Transcript and Presenter's Notes

Title: Recent advances in LVQ


1
Recent advances in LVQ
  • Politechnika Clausthal
  • Instytut Informatyki
  • Adres
  • Ul. Julius-Albert 4
  • 38678 Clausthal Zellerfeld
  • Niemcy
  • Tel 0049 5323 /72 - 7100
  • Email info_at_in.tu-clausthal.de

Barbara Hammer Instytut Informatyki hammer_at_in.tu-c
lausthal.de
2
  • Introduction
  • AI and ML
  • LVQ
  • Mathematical background
  • Formal analysis
  • Foundation by means of a cost function
  • Metric adaptation
  • Relevance learning
  • Matrix LVQ
  • General metric

3
Introduction
4
AI and ML
5
AI and ML
6
AI and ML
Challenges
Evolution of Solutions
Machine Learning
Statistical ML / Pattern Recognition
Unsupervised
Supervised
7
LVQ
8
Protoype based methods
  • LVQ - network
  • solution represented by prototypes within the
    data space
  • classification given by the receptive fields

9
Protoype based methods
  • LVQ I training
  • init randomly
  • repeat
  • present training data
  • determine closest prototype
  • move it towards/away from the data depending on
    the class

10
Protoype based methods
  • LVQ 2.1 training
  • init randomly
  • repeat
  • present training data
  • determine closest correct/wrong prototype
  • move it towards/away from the data

11
LVQ
  • LVQ1
  • adapt the next prototype wj ?(xi-wj)
    depending on the class
  • LVQ 2.1
  • adapt the next correct prototype w ?(xi-w)
  • and the next incorrect prototype w- - ?(xi-w-)
  • possibly restrict the adaptation to a window
  • Learning from mistakes
  • Perform LVQ2.1 only if data point is
    misclassified
  • and further variants

12
Mathematical background
13
Theory of online learning
14
Theory of online learning
  • Theory of online learning uses techniques of
    theoretical physics
  • exact investigation of the behavior
  • in terms of characteristic quantities
  • for typical model situations
  • in the limit of infinite data dimensions (because
    the theory becomes nice)

(pgt p-)
(p- )
15
Theory of online learning
  • Model situation
  • mixture of two N-dimensional Gaussians
  • data i.i.d.
  • orthonormal centers
  • priors p p-
  • two prototypes

16
Theory of online learning
  • Strategy
  • describe the update rules in terms of few
    characteristic quantities (here projection of
    prototypes to the relevant two dimensions,
    correlation) such that random data point occurs
    only within dot products
  • average over the data points, sum with N 8 ?
    completely characterized by mean and variance
  • self-averaging properties of the characteristic
    quantities the variance vanishes for N 8
  • choose learning rate as ? / N, continuous
    learning time ? dynamic described by
    deterministic ODEs
  • express the generalization error in terms of
    characteristic quantities

17
Theory of online learning
  • LVQ1

18
Theory of online learning
learning curve
(p0.2, l1.2)
?1.2
?1.2
19
Theory of online learning
  • LVQ2.1

(pgt p-)
(p- )
20
Theory of online learning
LFM
21
Theory of online learning
  • Comparison

Equal variance
Unequal variance
Dotted optimum linear decision Dashed
LVQ2.1 with idealized stopping Solid LVQ1
Chain LFM
22
Foundation by means of a cost function
23
Cost function
  • function class F given by possible LVQ-networks
  • training data (xi,yi) ? machine learner ?
    LVQ-function f in F
  • often f(xi) yi for training points (i.e. small
    empirical error)
  • desired P(f(x) y) should be large (i.e. small
    real error)

24
Cost function
safe classification
insecure classification
  • (hypothesis) margin of xi m(xi) d- - d where
    d / d- is the squared distance to closest
    correct / wrong prototype
  • mathematics ? error is bounded by

  • E/m O( p2(B3(ln 1/d)1/2) / (?m1/2))
  • where E number of misclassified training
    data with margin smaller than ? (including
    errors)
  • d confidence
  • m number of examples, B
    support, p number of prototypes

data with (too) small margin
term / margin

does not include dimensionality
good bounds for few training errors and large
margin
25
Cost function
  • mathematical objective

maximize margin
26
Cost function
  • mathematical objective

maximize Si (d-(xi) - d(xi))
27
Cost function
  • mathematical objective

unbounded
minimize Si (d(xi) d-(xi))
28
Cost function
  • mathematical objective

minimize Si (d(xi) d-(xi)) / (d(xi)
d-(xi))
29
Cost function
  • mathematical objective min Si (d (xi)
    d-(xi)) / (d(xi) d-(xi))

derivatives
scaling
LVQ2.1
30
Metric adaptation
31
Relevance learning
32
Relevance learning
  • mathematical objective

euclidean metric sensitive to noise, scaling,
minimize Si (d(xi) d-(xi)) / (d(xi)
d-(xi))
33
Relevance learning
  • mathematical objective

minimize Si (d? (xi) d?-(xi)) / (d?(xi)
d?-(xi))
where d?(x,y) Sl?l(xl-yl)2
relevance learning
34
Relevance learning
  • mathematical objective min Si (d? (xi)
    d?-(xi)) / (d?(xi) d?-(xi))

derivatives
scaling
LVQ2.1
relevance update
intuitive, fast, well founded, flexible, suited
for large dimensions
35
Relevance learning
noise 1N(0.05), 1N(0.1),1N(0.2),1N(0.5),U(0.5
),U(0.2),N(0.5),N(0.2)
36
Application clinical proteomics
Relevance learning
unhappy because possibly ill ..
put into mass spectrometer
take serum
observe a characteristic spectrum which tells us
more about the molecules in the serum
37
Relevance learning
  • prostate cancer National Cancer Institute,
    Prostate Cancer Dataset, www.cancer.gov, 2004l
  • 318 examples, SELDI-TOF from blood serum, 130 dim
    after preprocessing (normalization, peak
    detection)
  • 2 classes (healthy versus cancer in different
    states)

potential biomarkers
38
Matrix learning
39
Matrix learning
GMLVQ can be applied locally (one matrix per
prototype) / globally (one matrix)
40
Matrix learning
41
Matrix learning
42
General metrics
43
General metrics
  • mathematical objective

minimize Si (d? (xi) d?-(xi)) / (d?(xi)
d?-(xi))
where d?(x,y) can be an arbitrary differentiable
dissimilarity
44
General metrics
  • Online-detection of faults for piston-engines

45
General metrics
  • Detection based on heterogeneous data

time dependent signals from sensors measuring
pressure and oscillation, process
characteristics, characteristics of the pV
diagramm,
sensors
46
General metrics
  • Data
  • ca. 30 time series with 36 entries per series
  • ca. 20 values from a time interval
  • ca. 40 global features
  • ca. 15 classes, ca. 100 training patterns

similarity measure
47
General metrics
  • Splicing for higher eucariotes

copy of DNA
branch site
A64G73G100T100G62A68G84T63
C65A100G100
reading frames
18-40 bp pyrimidines, i.e. T,C
donor
acceptor
  • ATCGATCGATCGATCGATCGATCGATCGAGTCAATGACC

no
yes
48
General metrics
  • IPsplice (UCI) human DNA, 3 classes, ca.3200
    points, window size 60, old
  • C.elegans (Sonneburg et al.) only
    acceptor/decoys, 1000/10000 training examples,
    10000 test examples, window size 50, decoys are
    close to acceptors
  • GRLVQ with few (8 resp. 5 per class) prototypes
  • LIK-similarity

local correlations
49
General metrics
  • IPsplice

50
General metrics
  • C.elegans

SVM with competitive kernel
.. GRLVQ yields sparser solutions, we are orders
of magnitude faster and get intuitive results ?
51
Handwritten digit recognition
General metrics
52
General metrics
  • USPS data set 9298 patterns, 256 dimensions
  • GRLVQ with correlation measure, 20 prototypes per
    class

53
Trap set by bear to catch hunter
Write a Comment
User Comments (0)
About PowerShow.com