COMP 578 Artificial Neural Networks for Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

COMP 578 Artificial Neural Networks for Data Mining

Description:

Input Feature: 1, -1, -1, -1, 1, 1, -1, 1 Output Feature: 1 ... Let SUM be the weighted sum, the output of the Perceptron, y = f(SUM), can be 1, 0, -1. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 83
Provided by: keithc5
Category:

less

Transcript and Presenter's Notes

Title: COMP 578 Artificial Neural Networks for Data Mining


1
COMP 578Artificial Neural Networks for Data
Mining
  • Keith C.C. Chan
  • Department of Computing
  • The Hong Kong Polytechnic University

2
Human vs. Computer
  • Computers
  • Not good at performing such tasks as visual or
    audio processing/recognition.
  • Execute instructions one after another extremely
    rapidly.
  • Good at serial activities (e.g. counting,
    adding).
  • Human brain
  • Units respond at ?10/s (vs. PV 2.5GHz).
  • Work on many different things at once.
  • Vision or speech recognition by interaction of
    many different pieces of information.

3
The brain
  • Human brain is complicated and poorly understood.
  • Contains approximately 1010 basic units called
    neurons.
  • Each neuron connected to about 10,000 others.

Dendrites
Soma (or Cell Body)
Axon
Synapse
4
The Neuron
Dendrites
Soma
Axon
Synapse
  • Neuron accepts many inputs (through dendrites).
  • Inputs are all added up in some fashion.
  • If enough active inputs are received at once,
    neuron will be activated and fire (through
    axon).

5
The Synapse
  • Axon produce voltage pulse called action
    potential (AP).
  • Need arrival of more than one AP to trigger
    synapse.
  • Synapse releases neurotransmitters when AP is
    raised sufficiently.
  • Neurotransmitters diffuse across the gap
    chemically activating dendrites on the other
    side.
  • Some synapses pass a large signal across, whilst
    others allow very little through.

6
Modeling the Single Neuron
  • n inputs.
  • Efficiency of synapses modeled by having a
    multiplicative factor on each of the inputs to
    the neuron.
  • Multiplicative factor associated weights on
    input lines.
  • Neurons tasks
  • Calculates weighted sum of its inputs.
  • Compares sum to some internal threshold.
  • Turn on if threshold exceeded.

x1
w1
x2
w2
S
y
wn
xn
7
A Mathematical Model of Neurons
  • Neuron computes weighted sum
  • Fire if SUM exceeds a threshold ?.
  • y1 if SUM gt ?
  • y0 if SUM ? ?.

8
Learning in Simple Neurons
  • Need to be able to determine connection weights.
  • Inspiration comes from looking at real neural
    systems.
  • Reinforce good behavior and reprimand bad.
  • E.g., train a NN to recognize 2 characters H and
    F
  • Output 1 when a H is presented and 0 when it sees
    a F.
  • If it produces an incorrect output, we want to
    reduce the chances of that happening again.
  • This is done by modifying the weights.

9
Learning in Simple Neurons (2)
  • Neuron given random initial weights.
  • At starting state, neuron knows nothing.
  • Present an H.
  • Neuron computes the weighted sum of inputs.
  • Compare weighted sum with threshold.
  • If exceeds threshold, output a 1 otherwise a 0.
  • If output is 1, neuron is correct.
  • Do nothing.
  • Otherwise if neuron produces a 0.
  • Increase the weights so that next time it will
    exceed the threshold and produces a 1.

10
A Simple Learning Rule
  • How much weight to increase?
  • Can follow simple rule
  • Add the input values to the weights when we want
    the output to be on.
  • Subtract the input values from the weights when
    we want the output to be off.
  • This learning rule is called the Hebb rule
  • It is a variant on one proposed by Donald Hebb
    and is called Hebbian learning.
  • It is the earliest and simplest learning rule for
    a neuron.

11
The Hebb Net
  • Step 0. Initialize all weights
  • wi 0 (i 1 to n).
  • Step 1. For each input training record (s) its
    target output (t), do steps 2-4.
  • Step 2. Set activations for all input units
  • Step 3. Set activation for the output unit
  • Step 4. Adjust the weights and the bias
  • wi (new) wi (old) xi y (i 1 to n) (note
    ?wi xi y)
  • ?(new) ?(old) y .
  • The bias (the ?) adjusted like a weight from a
    unit whose output signal is always 1.

12
A Hebb Net Example
13
The Data Set
  • Attributes
  • HS_Index Drop, Rise
  • Trading_Vol Small, Medium, Large
  • DJIA Drop, Rise
  • Class Label
  • Buy_Sell Buy, Sell

14
The Data Set
HS_Index Trading_Vol DJIA Buy_Sell
1 Drop Large Drop Buy
2 Rise Large Rise Sell
3 Rise Medium Drop Buy
4 Drop Small Drop Sell
5 Rise Small Drop Sell
6 Rise Large Drop Buy
7 Rise Small Rise Sell
8 Drop Large Rise Sell
15
Transformation
Bias
  • Input Features
  • HS_Index_Drop -1, 1
  • HS_Index_Rise -1, 1
  • Trading_Vol_Small -1, 1
  • Trading_Vol_Medium -1, 1
  • Trading_Vol_Large -1, 1
  • DJIA_Drop -1, 1
  • DJIA_Rise -1, 1
  • Bias 1
  • Output Feature
  • Buy_Sell -1, 1

HISDrop
HISRise
B/S
DJIADrop
DJIARise
16
Transformed Data
Input Feature Output Feature
1 lt1, -1, -1, -1, 1, 1, -1, 1gt lt1gt
2 lt-1, 1, -1, -1, 1, -1, 1, 1gt lt-1gt
3 lt-1, 1, -1, 1, -1, 1, -1, 1gt lt1gt
4 lt1, -1, 1, -1, -1, 1, -1, 1gt lt-1gt
5 lt-1, 1, 1, -1, -1, 1, -1, 1gt lt-1gt
6 lt-1, 1, -1, -1, 1, 1, -1, 1gt lt1gt
7 lt-1, 1, 1, -1, -1, -1, 1, 1gt lt-1gt
8 lt1, -1, -1, -1, 1, -1, 1, 1gt lt-1gt
17
Record 1
  • Input Feature lt1, -1, -1, -1, 1, 1, -1, 1gt
  • Output Feature lt1gt
  • Original Weight lt0, 0, 0, 0, 0, 0, 0, 0gt
  • Weight Change lt1, -1, -1, -1, 1, 1, -1, 1gt
  • New Weight lt1, -1, -1, -1, 1, 1, -1, 1gt

18
Record 2
  • Input Feature lt-1, 1, -1, -1, 1, -1, 1, 1gt
  • Output Feature lt-1gt
  • Original Weight lt1, -1, -1, -1, 1, 1, -1, 1gt
  • Weight Change lt1, -1, 1, 1, -1, 1, -1, -1gt
  • New Weight lt2, -2, 0, 0, 0, 2, -2, 0gt

19
Record 3
  • Input Feature lt-1, 1, -1, 1, -1, 1, -1, 1gt
  • Output Feature lt1gt
  • Original Weight lt2, -2, 0, 0, 0, 2, -2, 0gt
  • Weight Change lt-1, 1, -1, 1, -1, 1, -1, 1gt
  • New Weight lt1, -1, -1, 1, -1, 3, -3, 1gt

20
Record 4
  • Input Feature lt1, -1, 1, -1, -1, 1, -1, 1gt
  • Output Feature lt-1gt
  • Original Weight lt1, -1, -1, 1, -1, 3, -3, 1gt
  • Weight Change lt-1, 1, -1, 1, 1, -1, 1, -1gt
  • New Weight lt0, 0, -2, 2, 0, 2, -2, 0gt

21
Record 5
  • Input Feature lt-1, 1, 1, -1, -1, 1, -1, 1gt
  • Output Feature lt-1gt
  • Original Weight lt0, 0, -2, 2, 0, 2, -2, 0gt
  • Weight Change lt1, -1, -1, 1, 1, -1, 1, -1gt
  • New Weight lt1, -1, -3, 3, 1, 1, -1, -1gt

22
Record 6
  • Input Feature lt-1, 1, -1, -1, 1, 1, -1, 1gt
  • Output Feature lt1gt
  • Original Weight lt1, -1, -3, 3, 1, 1, -1, -1gt
  • Weight Change lt-1, 1, -1, -1, 1, 1, -1, 1gt
  • New Weight lt0, 0, -4, 2, 2, 2, -2, 0gt

23
Record 7
  • Input Feature lt-1, 1, 1, -1, -1, -1, 1, 1gt
  • Output Feature lt-1gt
  • Original Weight lt0, 0, -4, 2, 2, 2, -2, 0gt
  • Weight Change lt1, -1, -1, 1, 1, 1, -1, -1gt
  • New Weight lt1, -1, -5, 3, 3, 3, -3, -1gt

24
Record 8
  • Input Feature lt1, -1, -1, -1, 1, -1, 1, 1gt
  • Output Feature lt-1gt
  • Original Weight lt1, -1, -5, 3, 3, 3, -3, -1gt
  • Weight Change lt-1, 1, 1, 1, -1, 1, -1, -1gt
  • New Weight lt0, 0, -4, 4, 2, 4, -4, -2gt

25
A Hebb Net Example 2
Input
Target
(x1 X2 1)
(1 1 1) 1
(1 -1 1) -1
(-1 1 1) -1
(-1 -1 1) -1
26
Input Target
Weight Changes Weights
(x1 x2 1) (?w ?w2 ??) (w1 w2 ?)
(0 0 0)
(1 1 1) 1 (1 1 1) (1 1 1)
The separating line becomes x2 -
x1 - 1
27
Input Target
Weight Changes Weights
(x1 x2 1) (?w1 ?w2 ?b) (w1 w2 b)
(1 1 1)
(1 -1 1) -1 (-1 1 -1) (0 2 0)
The separating line becomes x2 0
28
Input Target
Weight Changes Weights
(x1 x2 1) (?w1 ?w2 ?b) (w1 w2 b)
(0 2 0)
(-1 1 1) -1 (1 -1 -1) (1 1 -1)
x2
The separating line becomes x2 - x1 1
x1
29
Input Target
Weight Changes Weights
(x1 x2 1) (?w1 ?w2 ?b) (w1 w2 b)
(1 1 -1)
(-1 -1 1) -1 (1 1 -1) (2 2 -2)
x2
Even though the weights have changed, the
separating line is still x2 - x1 1 The
graph of the decision regions (the positive
response and the negative response) remains as
shown.
x1
30
A Hebb Net Example 3
Input
Target
(x1 x2 1)
(1 1 1) 1
(1 0 1) 0
(0 1 1) 0
(0 0 1) 0
31
Input Target
Weight Changes Weights
(x1 x2 1) (?w1 ?w2 ?b) (w1 w2 b)
(0 0 0)
(1 1 1) 1 (1 1 1) (1 1 1)
The separating line becomes x2 - x1 - 1
32
Since the target value is 0, no learning
occurs.Using binary target values prevents the
net from learning any pattern for which the
target is off.
Input Target
Weight Changes Weights
(x1 x2 1) (?w1 ?w2 ?b) (w1 w2 b)
(1 0 1) 0 (0 0 0) (1 1 1)
(0 1 1) 0 (0 0 0) (1 1 1)
(0 0 1) 0 (0 0 0) (1 1 1)
33
Characteristics of the Hebb Net
  • Choice of training records determines which
    problems can be solved.
  • Training records corresponding to the AND
    function can be solved if inputs and targets in
    bipolar form.
  • Bipolar representation allows modification of a
    weight when input and target are both on and
    when they are both off at the same time.

34
The Perceptron Learning Rule
  • More powerful than the Hebb rule.
  • The Perceptron learning rule convergence theorem
    states that
  • If weights exist to allow neuron to respond
    correctly to all training patterns, then the rule
    will find such weights.
  • The neuron will find these weights in a finite
    number of training steps.
  • Let SUM be the weighted sum, the output of the
    Perceptron, y f(SUM), can be 1, 0, -1.
  • The activation function is

35
Perceptron Learning
  • For each training record, the net would calculate
    the response of the output unit.
  • The net would determine whether an error occurred
    for this pattern (comparing the calculated with
    target value).
  • If an error occurred, weights would be changed
    according to wi (new) wi (old) ?txiwhere
    t is 1 or 1 and ? is the learning rate.
  • If an error did not occur, the weights would not
    be changed.
  • Training continue until no error occurred.

36
Perceptron for classification
  • Step 0. Initialize all weights and bias (For
    simplicity, set weights and bias to zero.) Set
    learning rate ? (0 lt ? lt 1). (For simplicity,
    ? can be set to 1.)
  • Step 1. While stopping condition is false, do
    steps 2-6.
  • Step 2. For each training pair, do Steps 3-5
  • Step 3. Set activation for input unit, xi.
  • Step 4. Compute response of output unit SUM
    ? ?i xi wi.
  • Step 5. Update weights and bias if error
    occurred for this vector. If y? y, wi (new)
    wi (old) ?txi ?(new) ? (old) ?t else
    wi (new) wi (old) ? (new) ? (old)
  • Step 6. If no weights changed in 2, stop else
    continue.

37
Perceptron for classification (2)
  • Only weights connecting active input units (xi?0)
    are updated.
  • Weights are updated only for patterns that do not
    produce the correct value of y.
  • Less learning as more training patterns produce
    the correct response.
  • The threshold on the activation function for the
    response unit is a fixed, non-negative value ?.
  • The form of the activation function for the
    output unit constitutes an undecided band of
    fixed width determined by ? separating the region
    of positive response from that of negative
    response.

38
Perceptron for classification (3)
  • Instead of one separating line, we have a line
    separating the region of positive response from
    the region of zero response (line bounding
    inequality)
  • w1 x1 w2 x2 b gt ?
  • and a line separating the region of zero response
    from the region of negative response (line
    bounding the inequality) w1 x1 w2 x2 b lt ??

w1 x1 w2 x2 b gt ?
w1 x1 w2 x2 b lt ??
39
Perceptron
40
The Data Set (1)
  • Attributes
  • HS_Index Drop, Rise
  • Trading_Vol Small, Medium, Large
  • DJIA Drop, Rise
  • Class Label
  • Buy_Sell Buy, Sell

41
The Data Set (2)
HS_Index Trading_Vol DJIA Buy_Sell
1 Drop Large Drop Buy
2 Rise Large Rise Sell
3 Rise Medium Drop Buy
4 Drop Small Drop Sell
5 Rise Small Drop Sell
6 Rise Large Drop Buy
7 Rise Small Rise Sell
8 Drop Large Rise Sell
42
Transformation
  • Input Features
  • HS_Index_Drop 0, 1
  • HS_Index_Rise 0, 1
  • Trading_Vol_Small 0, 1
  • Trading_Vol_Medium 0, 1
  • Trading_Vol_Large 0, 1
  • DJIA_Drop 0, 1
  • DJIA_Rise 0, 1
  • Bias 0
  • Output Feature
  • Buy ? 1
  • Sell ? -1

43
Transformed Data
Input Feature Output Feature
1 lt1, 0, 0, 0, 1, 1, 0, 1gt lt1gt
2 lt0, 1, 0, 0, 1, 0, 1, 1gt lt-1gt
3 lt0, 1, 0, 1, 0, 1, 0, 1gt lt1gt
4 lt1, 0, 1, 0, 0, 1, 0, 1gt lt-1gt
5 lt0, 1, 1, 0, 0, 1, 0, 1gt lt-1gt
6 lt0, 1, 0, 0, 1, 1, 0, 1gt lt1gt
7 lt0, 1, 1, 0, 0, 0, 1, 1gt lt-1gt
8 lt1, 0, 0, 0, 1, 0, 1, 1gt lt-1gt
44
Record 1
  • Input Feature lt1, 0, 0, 0, 1, 1, 0, 1gt
  • Output Feature lt1gt
  • Original Weight lt0, 0, 0, 0, 0, 0, 0, 0gt
  • Output f(0) 0
  • Weight Change lt1, 0, 0, 0, 1, 1, 0, 1gt
  • New Weight lt1, 0, 0, 0, 1, 1, 0, 1gt

45
Record 2
  • Input Feature lt0, 1, 0, 0, 1, 0, 1, 1gt
  • Output Feature lt-1gt
  • Original Weight lt1, 0, 0, 0, 1, 1, 0, 1gt
  • Output f(2) 1
  • Weight Change lt0, -1, 0, 0, -1, 0, -1, -1gt
  • New Weight lt1, -1, 0, 0, 0, 1, -1, 0gt

46
Record 3
  • Input Feature lt0, 1, 0, 1, 0, 1, 0, 1gt
  • Output Feature lt1gt
  • Original Weight lt1, -1, 0, 0, 0, 1, -1, 0gt
  • Output f(1) 0
  • Weight Change lt0, 1, 0, 1, 0, 1, 0, 1gt
  • New Weight lt1, 0, 0, 1, 0, 2, -1, 1gt

47
Record 4
  • Input Feature lt1, 0, 1, 0, 0, 1, 0, 1gt
  • Output Feature lt-1gt
  • Original Weight lt1, 0, 0, 1, 0, 2, -1, 1gt
  • Output f(4) 1
  • Weight Change lt-1, 0, -1, 0, 0, -1, 0, -1gt
  • New Weight lt0, 0, -1, 1, 0, 1, -1, 0gt

48
Record 5
  • Input Feature lt0, 1, 1, 0, 0, 1, 0, 1gt
  • Output Feature lt-1gt
  • Original Weight lt0, 0, -1, 1, 0, 1, -1, 0gt
  • Output f(0) 0
  • Weight Change lt0, -1, -1, 0, 0, -1, 0, -1gt
  • New Weight lt0, -1, -2, 1, 0, 0, -1, -1gt

49
Record 6
  • Input Feature lt0, 1, 0, 0, 1, 1, 0, 1gt
  • Output Feature lt1gt
  • Original Weight lt0, -1, -2, 1, 0, 0, -1, -1gt
  • Output f(-2) -1
  • Weight Change lt0, 1, 0, 0, 1, 1, 0, 1gt
  • New Weight lt0, 0, -2, 1, 1, 1, -1, 0gt

50
Record 7
  • Input Feature lt0, 1, 1, 0, 0, 0, 1, 1gt
  • Output Feature lt-1gt
  • Original Weight lt0, 0, -2, 1, 1, 1, -1, 0gt
  • Output f(-3) -1
  • Weight Change lt0, 0, 0, 0, 0, 0, 0gt
  • New Weight lt0, 0, -2, 1, 1, 1, -1, 0gt

51
Record 8
  • Input Feature lt1, 0, 0, 0, 1, 0, 1, 1gt
  • Output Feature lt-1gt
  • Original Weight lt0, 0, -2, 1, 1, 1, -1, 0gt
  • Output f(0) 0
  • Weight Change lt-1, 0, 0, 0, -1, 0, -1, -1gt
  • New Weight lt-1, -1, -3, 1, 0, 1, -3, -2gt

52
A Perceptron Example
(x1 x2 1)
(1 1 1) 1
(1 0 1) -1
(0 1 1) -1
(0 0 1) -1
53
Input Net Out Target
Weight Changes Weights
(x1 x2 1) (w1 w2 b)
(0 0 0)
(1 1 1) 0 0 1 (1 1 1) (1 1 1)
The separating lines become x1 x2 1
.2 and x1 x2 1 -.2
54
Input Net Out Target
Weight Changes Weights
(x1 x2 1) (w1 w2 b)
(1 1 1)
(1 0 1) 2 1 -1 (-1 0 -1) (0 1 0)
x2
The separating lines become x2 .2 and x2 -.2
x1
55
Input Net Out Target
Weight Changes Weights
(x1 x2 1) (w1 w2 b)
(0 1 0)
(0 (0 1 0 1) 1) 1 -1 1 -1 -1 -1 (0 (0 -1 0 -1) 0) (0 (0 0 0 -1) -1)
56
Input Net Out Target
Weight Changes Weights
(x1 x2 1) (w1 w2 b)
(0 0 -1)
(1 1 1) -1 -1 1 (1 1 1) (1 1 0)
x2
The separating line become x1 x2 .2 and
x1 x2 -.2
x1
57
Input Net Out Target
Weight Changes Weights
(x1 x2 1) (w1 w2 b)
(1 1 0)
(1 0 1) 1 1 -1 (-1 0 -1) (0 1 -1)
x2
Te separating line become x1 x2 .2 and x1
x2 -.2
x1
58
The results for the third epoch are
Input Net Out Target
Weight Changes Weights
(x1 x2 1) (w1 w2 b)
(0 1 -1)
(0 (0 1 0 1) 1) 0 -2 0 -1 -1 -1 (0 (0 -1 0 -1) 0) (0 (0 0 0 -2) -2)
Input Net Out Target
Weight Changes Weights
(x1 x2 1) (w1 w2 b)
(0 0 -2)
(1 1 1) -2 -1 1 (1 1 1) (1 1 -1)
(1 0 1) 0 0 -1 (-1 0 -1) (0 1 -1)
(0 1 1) -1 -1 -1 (0 0 0) (0 1 -2)
(0 0 1) -2 -1 -1 (0 0 0) (0 1 -2)
59
The results for the fourth epoch are


(1 1 1) -1 -1 1 (1 1 1) (1 2 -1)
(1 0 1) 0 0 -1 (-1 0 -1) (0 2 -2)
(0 1 1) 0 0 -1 (0 -1 -1) (0 1 -3)
(0 0 1) -3 -1 -1 (0 0 0) (0 1 -3)


(1 1 1) -2 -1 1 (1 1 1) (1 2 -2)
(1 0 1) -1 -1 -1 (0 0 0) (1 2 -2)
(0 1 1) 0 0 -1 (0 -1 -1) (1 1 -3)
(0 0 1) -3 -1 -1 (0 0 0) (1 1 -3)
For the fifth epoch, we have


(1 1 1) -1 -1 1 (1 1 1) (2 2 -2)
(1 0 1) 0 0 -1 (-1 0 -1) (1 2 -3)
(0 1 1) -1 -1 -1 (0 0 0) (1 2 -3)
(0 0 1) -3 -1 -1 (0 0 0) (1 2 -3)
And for the sixth epoch,
60
The eight epoch yields


(1 1 1) 0 0 1 (1 1 1) (2 3 -2)
(1 0 1) 0 0 -1 (-1 0 -1) (1 3 -3)
(0 1 1) 0 0 -1 (0 -1 -1) (1 2 -4)
(0 0 1) -4 -1 -1 (0 0 0) (1 2 -4)
The results for the seventh epoch are


(1 1 1) -1 -1 1 (1 1 1) (2 3 -3)
(1 0 1) -1 -1 -1 (0 0 0) (2 3 -3)
(0 1 1) 0 0 -1 (0 -1 -1) (2 2 -4)
(0 0 1) -4 -1 -1 (0 0 0) (2 2 -4)


(1 1 1) 0 0 1 (1 1 1) (3 3 -3)
(1 0 1) 0 0 -1 (-1 0 -1) (2 3 -4)
(0 1 1) -1 -1 -1 (0 0 0) (2 3 -4)
(0 0 1) -4 -1 -1 (0 0 0) (2 3 -4)
And the ninth
61
Finally, the results for the tenth epoch are


(1 1 1) 1 1 1 (0 0 0) (2 3 -4)
(1 0 1) -2 -1 -1 (0 0 0) (2 3 -4)
(0 1 1) -1 -1 -1 (0 0 0) (2 3 -4)
(0 0 1) -4 -1 -1 (0 0 0) (2 3 -4)
  • The positive response is given by
  • 2x1 3x2 4 gt .2
  • with boundary line
  • x2 -2 / 3x1 7 / 5
  • The negative response is given by
  • 2x1 3x2 4 lt -.2
  • with boundary line
  • x2 -2 / 3x1 19 / 15

62
The 2nd Perceptron Algorithm
Input Net Out Target
Weight Changes Weights
(x1 x2 1) (w1 w2 b)
(0 0 0)
(1 1 1) 0 0 1 (1 1 1) (1 1 1)
(1 -1 1) 1 1 -1 (-1 1 -1) (0 2 0)
(-1 1 1) 2 1 -1 (1 -1 -1) (1 1 -1)
(-1 -1 1) -3 -1 -1 (0 0 0) (1 1 -1)
63
In the second epoch of training, we have


(1 1 1) 1 1 1 (0 0 0) (1 1 -1)
(1 -1 1) -1 -1 -1 (0 0 0) (1 1 -1)
(-1 1 1) -1 -1 -1 (0 0 0) (1 1 -1)
(-1 -1 1) -3 -1 -1 (0 0 0) (1 1 -1)
Since all the ? ws are 0 in epoch 2, the system
was fully trained after the first epoch.
64
Limitations of Perceptrons
  • Perceptron finds a straight line that separates
    classes.
  • It cannot learn for exclusive-or (XOR) problems.
  • Such patterns are not linearly separable.
  • Not much work after Minsky and Papert published
    their book in 1969.
  • Rumelhart and McClelland produced an improvement
    in 1986.
  • Proposed some modern adaptations to Perceptron,
    called multilayer Perceptron.

65
The Multilayer Perceptron
  • Overcome linearly inseparability
  • Use more perceptrons.
  • Each set up to identify small, linearly
    separable sections of the inputs.
  • Combine their outputs into another perceptron.
  • Each neuron still takes weighted sum of inputs,
    thresholds it, outputs 1 or 0.
  • But how can we learn?

66
The Multilayer Perceptron (2)
  • Perceptrons in the 2nd layer do not know which of
    the real inputs were on or not.
  • Only 2-state, on or off, gives no indication of
    how much to adjust the weights.
  • Some weighted input definitely turn on a neuron.
  • Some weighted inputs only just turn a neuron on
    and should not be altered to the same extent.
  • What changes to produce a better solution next
    time?
  • Which of the input weights should be increased
    and which should not?
  • But we have no way of finding out (the credit
    assignment problem).

67
The Solution
  • Need a non-binary thresholding function.
  • Use a slightly different non-linearity so that it
    more or less turns on or off.
  • A possible new thresholding function is the
    sigmoid function.
  • Sigmoid thresholding function does not mask
    inputs from the outputs.

68
The Multi-layer Preceptron
  • An input layer, an output layer, and a hidden
    layer.
  • Each unit in hidden and output layer is like a
    perceptron unit.
  • But the thresholding function is sigmoid.
  • Units in input layer serve to distribute values
    they receive to next layer
  • Input units do not perform a weighted sum or
    threshold.

69
The Backpropagation Rule
  • Single-layer perceptron model changed.
  • Thresholding function from a step to a sigmoid
    function.
  • A hidden layer added.
  • Learning rule needs to be altered.
  • New learning rule for multilayer perceptron is
    called the generalized delta rule, or the
    backpropagation rule.
  • Show NN a pattern and calculate its response.
  • Compare with desired response.
  • Alter weights so that NN can produce a more
    accurate output next time.
  • The learning rule provides the method for
    adjusting the weights so as to decrease the error
    next time.

70
Backpropagation Details
  • Define an error function to represent difference
    between NN's current output and the correct
    output.
  • The backpropagation rule aims to reduce the error
    by
  • Calculating the value of the error for a
    particular input.
  • Then back-propagates the error from one layer to
    the previous one.
  • Each unit in the net has its weights adjusted so
    that it reduces the value of the error function
  • For units on the output.
  • Their output and desired output is known and
    adjusting the weights is relatively simple.
  • For units in the middle
  • Those that are connected to outputs with a large
    error should have their weights adjusted a lot.
  • Those that feed almost correct outputs should not
    be altered much.

71
The Detailed Algorithm
  • Step0. Initialize weights (Set to small random
    values).
  • Step 1. While stopping condition is false, do
    Steps 2-9.
  • Step 2. For each training pair, do Steps 3-8.
  • Feedbackward.
  • Step 3. Each input unit (xi , i 1, , n)
    receives input signal xi and broadcasts this
    signal to all units in the layer above (the
    hidden units).
  • Step 4. Each hidden unit (Zj , j 1, , p) sums
    its weighted input signals,
  • applies its activation function to compute its
    output signal,
  • zj f(z_inj),
  • and sends this signal to all units in the layer
    above (output units).
  • Step 5. Each output unit (Yk , k1, , m) sums
    its weighted input signals,
  • And applies its activation function to compute
    its output signal,
  • yk f(z_inj),

72
The Detailed Algorithm (2)
  • Feedbackward.
  • Step 6. Each output unit (yk , k 1, , m)
    receives a target pattern corresponding to the
    input training pattern, computes its error
    information term,
  • Calculates its weight correction term (used to
    update wjk later),
  • ?wjk??kzj,
  • Calculates its bias correction term (used to
    upate w0k later),
  • ?w0k??k,
  • And sends ?k to units in the layer below.
  • Step 7. Each hidden unit (Zj, j1, , p) sums
    its delta inputs (from units in the layer above),
  • Multiplies by the derivative of its activation
    function to calculate its error information term,
  • ?j ? _inj f(z_inj),
  • Calculates its weight correction term (used to
    update vij later),
  • ?vij??jxi,
  • And calculates its bias correction term (used to
    update v0j later),
  • ?v0j??j,

73
The Detailed Algorithm (3)
  • Update weights and biases
  • Step 8. Each output unit (Yk , k 1, , m)
    updates its bias and weights (j0, , p)
  • wjk(new) wjk (old)?wjk ,
  • Each hidden unit (Zj,j1, , p) updates its bias
    and weights (I0,,n)
  • vjk(new) vjk (old)?vjk ,
  • Step 9. Test stopping condition.

74
An exampleMultilayer Perceptron Networkwith
Backpropagation Training
HSIRise
VolHigh
DJIADrop
75
Initial Weights and Bias Values
  • wij Weight between nodes i and j.
  • ?i Bias value of node i.
  • For node 4,
  • w14 0.2, w24 0.4, w34 0.5, ?4 0.4
  • For node 5,
  • w15 0.3, w25 0.1, w35 0.2, ?5 0.2
  • For node 6,
  • w16 0.6, w26 0.7, w36 0.1, ?6 0.1
  • For node 7,
  • w47 0.3, w57 0.2, w67 0.1, ?7 0.6
  • For node 8,
  • w48 0.5, w58 0.1, w68 0.3, ?8 0.3

76
Training (1)
  • Learning Rate 0.9
  • Input lt1, 0, 1gt
  • Output lt1, 0gt
  • For node 4,
  • Input 0.2 0 0.5 0.4 0.7
  • Output 1 / (1 e 0.7) 0.332
  • For node 5,
  • Input 0.3 0 0.2 0.2 0.1
  • Output 1 / (1 e 0.1) 0.525
  • For node 6,
  • Input 0.6 0 0.1 0.1 0.6
  • Output 1 / (1 e 0.6) 0.646
  • For node 7,
  • Input 0.332 ( 0.3) 0.525 ( 0.2) 0.646
    0.1 0.6 0.460
  • Output 1 / (1 e 0.460) 0.613
  • For node 8,
  • Input 0.322 ( 0.5) 0.525 0.1 0.646 (
    0.3) 0.3 0.007
  • Output 1 / (1 e 0.007) 0.498

77
Training (2)
  • For node 7,
  • Error 0.613 (1 0.613) (1 0.613) 0.092
  • For node 8,
  • Error 0.498 (1 0.498) (0 0.498) 0.125
  • For node 4,
  • Error 0.332 (1 0.332) (0.092 ( 0.3) 0.125
    ( 0.5)) 0.008
  • For node 5,
  • Error 0.525 (1 0.525) (0.092 ( 0.2) 0.125
    0.1) 0.009
  • For node 6,
  • Error 0.646 (1 0.646) (0.092 0.1 0.125
    ( 0.3)) 0.008

78
Training (3)
  • For each weight,
  • w14 0.2 0.9 (0.008) (0.332) 0.202
  • w15 0.3 0.9 (0.009) (0.525) 0.296
  • For each bias,
  • ?4 0.4 0.9 (0.008) 0.393
  • ?5 0.2 0.9 (0.009) 0.208

79
Using ANN for Data Mining
  • Constructing a network
  • input data representation
  • selection of number of layers, number of nodes in
    each layer
  • Training the network using training data
  • Pruning the network
  • Interpret the results

80
Step 1 Constructing the Network
Multi-layer perceptron (MLP) feed forward back
propagation
x1 of Terms
w1
o1 Persist
x2 GPA
x3 Demographics
o2 Not-persist
x4 Courses
w5n
x5 Fin Aid
xjn
81
Constructing the Network (2)
  • The number of input nodes corresponds to the
    dimensionality of the input tuples
  • Thermometer coding
  • age 20-80 6 intervals
  • 20, 30) 000001, 30, 40) 000011, ., 70,
    80) 111111
  • Number of hidden nodes adjusted during training
  • Number of output nodes number of classes

82
Step 2 Network Training
  • The ultimate objective of training
  • obtain a set of weights that makes almost all the
    tuples in the training data classified correctly
  • Steps
  • Initial weights are set randomly
  • Input tuples are fed into the network one by one
  • Activation values for the hidden nodes are
    computed
  • Output vector can be computed after the
    activation values of all hidden node are
    available
  • Weights are adjusted using error
  • (desired output - actual output)

83
Step 3 Network Pruning
  • Fully connected network will be hard to
    articulate
  • n input nodes, h hidden nodes and m output nodes
    lead to h(mn) links (weights)
  • Pruning Remove some of the links without
    affecting classification accuracy of the network.

84
Step 4 Extracting Rules from ANN
  • Discretize activation values replace individual
    activation value by the cluster average maintain
    the network accuracy
  • Enumerate the output from the discretized the
    activation values to find rules between
    activation value and output
  • Find the relationship between the input and
    activation value
  • Combine the above two to have rules related the
    output to input

85
An Example (I)
  • IBM synthetic data
  • nine attributes (age, salary, )
  • classification function
  • if ((age lt 40) Ù (50K salary 100K)) Ú ((40
    age lt 60) Ù (75K salary 125K)) Ú ((age gt 60)
    Ù (25K salary 75K))
  • then class A else class B
  • initial network
  • 87 input nodes, 2 output nodes, 4 hidden nodes
  • trained network using 1000 tuples
  • pruned network
  • 7 input nodes, 3 hidden nodes, 2 output nodes
  • 17 links
  • accuracy 96.30

86
An Example (II)
  • Hidden node value discretization
  • a1 (-1, 0, 1)
  • a2 (0, 1)
  • a3 (-1, 0.24, 1)
  • Enumerate output from a
  • a2 0, a3 -1
  • a1 -1, a2 1, a3 -1
  • a1 -1, a2 0, a3 -0.24
  • Þ C1 1, C2 0
  • otherwise C1 0, C2 1

87
An Example (III)
  • From input to hidden node
  • I2 I17 0 Þ a2 0
  • I5 I15 1 Þ a3 -1
  • I13 0 Þ a3 -1
  • Obtain rules relating input and output
  • I2 I17 0, I5 I15 1 Þ class 1
  • I2 I17 0, I13 0 Þ class 1
  • Transform to original input attributes
  • I17 0 Þ age lt 40, I2 0 Þ salary lt 100K

88
ANN vs. Others for Data Mining
  • Advantages
  • prediction accuracy is generally high
  • robust, works when training examples contain
    errors
  • output may be discrete, real-valued, or a vector
    of several discrete or real-valued attributes
  • fast evaluation of the learned target function.
  • Criticism
  • long training time
  • difficult to understand the learned function
    (weights).
  • not easy to incorporate domain knowledge
Write a Comment
User Comments (0)
About PowerShow.com