DATA-MINING - PowerPoint PPT Presentation


PPT – DATA-MINING PowerPoint presentation | free to download - id: 6a2f09-Njc2Y


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation



DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006 – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 21
Provided by: alexey


Write a Comment
User Comments (0)
Transcript and Presenter's Notes


  • Artificial Neural Networks

Alexey Minin, Jass 2006
Teaching without the tutor introduction
ANN forms its output itself, according to the
information, presented for input. We have to
minimize some functional.After we have found this
functional we have to minimize it. It is the main
task, and according to this functional the input
vector will be changed.
In practice, adaptive networks code input
information in the most compact way, of course
according to some predefined requirements.
Teaching without the tutor redundancy of data
  • The length of data description

Dimension of data number of components of input
Capacity of data number of bits, defining the
possible variety of all values
Two ways of coding (reducing) the information
Reducing the dimension of data with min loss
Reducing the variety of data by detecting the
finding of independent features
Clustering and quantifying
Two ways to reduce the data
Reducing the dimension allows us to describe the
data with less components
Clustering allows us to reduce the variety of
data,reducing the number of bits,we need to
describe the data.
We can unite both types of algorithms. We can use
Kohonen maps, when prototypes regulate in the
space of low dimension. For example, input data
can be reflected on to 2-dimensional grid of
prototypes the way, you can visualize the data
you have.
Main idea neuron - indicator
Neuron has one output and its teaching upon a
d-dimension data
Lets say that the activation function is linear.
The output therefore is the linear combination
of its outputs
The amplitude after the training is finished can
be the indicator for the data. Showing rather the
data corresponds for training patterns or not.
Hebb training algorithm
According to Hebb
If we will reformulate the task as the
optimization task we will get the property of
such neuron and rule how to define functional we
have to min
NB! If we wont to have minimum of the E than we
will have an output amplitude equals to infinity
Oja training rule
The member interfering was added to stop
unlimited growth of weights
Rule Oja maximizes sensitivity of an output
neuron at the limited amplitude of weights. It is
easy to be convinced of it, having equated
average change of weights to zero. Having
increased then the right part of equality on w.
We are convinced, that in balance
Thus, weights of trained neuron are located on
hyper sphere
At training on Oja, a vector of weights settles
down on hyper sphere, In a direction maximizing
Projection of input vectors.
Oja training rule
SUMMARY Neuron is trying to reproduce the value
of its input for known output. It means that
its trying to maximize the sensitivity of its
output neurons-indicators for many dimensional
input information, doing compression this way.
NB! The output of the Oja output layer is the
linear combination of main components. If you
want to receive main components you should change
sum of all outputs
The analysis of main components
Lets say that we have d-dimensional data
we are training m linear neurons

We want an amplitude to be independent indicators
of all output neurons, fully reflecting
information about many-dimensional data we have.
The requirement
  • Neurons must interact somehow (if we will train
    them independently we will receive the same
    result for all of them)

In simple case
Lets take perceptron with linear neuron for
hidden layer, in which the number of inputs and
outputs equals, and the weights with the same
indexes in both layers are the same. Lets try to
teach ANN to reproduce the input on the output.
Training rule therefore
Looks like Oya training rule!
Self training layer
In our formulation the training of separate
neuron, is trying to reproduce the inputs
according to its outputs. Generalizing this note,
it is logical to suggest a rule,according to
which the value of outputs restoring according
to whole output information. Doing this way we
can get Oja training rule for one layer network
The hidden layer of such ANN, the same as Oya
layer,makes optimal coding of input data, and
contains maximum variety of data according to
existing restrictions.
Lets change activation function on the sigmoid in
the training rule
Brings new property (Oja, et al, 1991). Such
algorithm, in particular, was used for the
decomposition of mixed signals with an unknown
way (i.e. blind signal separation). For
example this task we have when we want to
separate human voice and noise.
Competition of neurons the winner gets all
Basis algorithm The training of competition layer
remains constant
of neuron winner
The winner
The winner will be the neuron, which has the
maximum response
Training of winner
The winner takes away not all
One of variants of updating of a base rule of
training of a competitive layer Consists in
training not only the neuron-winner, but also its
"neighbors", though and with In the smaller
speed. Such approach - "pulling up" of the
nearest to the winner neuron- It is applied in
topographical Kohonen cards
Function of the neighborhood is equal to unit for
the neuron- -winner with an index And
gradually falls down at removal from the
Training on Kohonen reminds stretching an elastic
grid of prototypes on Data file from training
Methodology of self-organizing cards
Schematic representation of self-organizing
Training on Kohonen reminds stretching an elastic
grid of prototypes on Data file from training
Neurons in the target layer are ordered and
correspond to cells of a bi-dimensional card
which can be painted by a principle of affinity
of attributes
Visualization a topographical card, Induced by
i-th component of entrance data
The convenient tool of visualization Data is
coloring topographical Cards, it is similar to
how it do on Usual geographical cards.
All attribute of data generates the coloring
Cells of a card - on size of average value This
attribute at the data who have got in given Cell.
Having collected together cards of all
interesting Us of attributes, we shall receive
topographical The atlas, giving integrated
representation About structure of multivariate
Methodology of self-organizing cards
Classified SOM for NASDAQ100 index for the
period from 10-Nov-1997 till 27-Aug-2001
Complexity of the algorithm
When its better to use reducing of dimension,
and when quantifying of the input information?
Number of training patterns
number of syn weights of 1 layer ANN with d
inputs m output neurons
Reducing the dim
of operations
of operations
Compression coef (b capacity data)
Compression coef
With the same compression coef
JPEG example
Image is divided on to 8x8 pixels, which should
be input vectors, we want to reduce. In our case
gradation of the gray accuracy of the
represented data
Lets propose that image contains
But if d64x64 than Kgt103
Any questions?