Improved BP algorithms (first order gradient method)

About This Presentation

Title:

Improved BP algorithms (first order gradient method)

Description:

Improved BP algorithms (first order gradient method) BP with momentum Delta- bar- delta Decoupled momentum RProp Adaptive BP Trinary BP BP with adaptive gain – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 43

Provided by: Mah103

Category:

more less

Transcript and Presenter's Notes

Title: Improved BP algorithms (first order gradient method)

1
Improved BP algorithms(first order gradient
method)

BP with momentum
Delta- bar- delta
Decoupled momentum
RProp
Adaptive BP
Trinary BP
BP with adaptive gain
Extended BP

2
BP with momentum (BPM)

The basic improvement to BP (Rumelhart 1986)
Momentum factor alpha selected between zero and
one
Adding momentum improves the convergence speed
and helps network from being trapped in a local
minimum.

3
Modification form

Proposed by nagata 1990
Beta is a constant value decided by user
Nagata claimed that beta term reduce the
possibility of the network being trapped in the
local minimum
This seems beta repeating alpha rule again!!! But
not clear

4
Delta-bar-delta (DBD)

Use adaptive learning rate to speed up the
convergence
The adaptive rate adopted base on local
optimization
Use gradient descent for the search direction,
and use individual step sizes for each weight

5
RProp

Jervis and Fitzgerald (1993)
and limit the size of the step

6
Second order gradient methods

Newton
Gauss-Newton
Levenberg-Marquardt
Quickprop
Conjugate gradient descent
Broyde Fletcher-Goldfab-Shanno

7
Performance Surfaces

Taylor Series Expansion

8
Example
9
Plot of Approximations
10
Vector Case
11
Matrix Form
12
Performance Optimization

Steepest Descent

Basic Optimization Algorithm

13
Examples
14
Minimizing Along a Line
15
Example
16
Plot
17
Newtons Method
18
Example
19
Plot
20
Non-Quadratic Example
21
Different Initial Conditions
22

DIFFICULT
Inverse a singular matrix!!!
Complexity

23
Newtons Method
24
Matrix Form
25
Hessian
26
Gauss-Newton Method
27
Levenberg-Marquardt
28
Adjustment of mk
29
Application to Multilayer Network
30
Jacobian Matrix
31
Computing the Jacobian
32
Marquardt Sensitivity
33
Computing the Sensitivities
34
LMBP

Present all inputs to the network and compute the
corresponding network outputs and the errors.
Compute the sum of squared errors over all
inputs.
Compute the Jacobian matrix. Calculate the
sensitivities with the backpropagation algorithm,
after initializing. Augment the individual
matrices into the Marquardt sensitivities.
Compute the elements of the Jacobian matrix.
Solve to obtain the change in the weights.
Recompute the sum of squared errors with the new
weights. If this new sum of squares is smaller
than that computed in step 1, then divide mk by
u, update the weights and go back to step 1. If
the sum of squares is not reduced, then multiply
mk by u and go back to step 3.

35
Example LMBP Step
36
LMBP Trajectory
37
Conjugate Gradient
38
LRLS method

Recursive least square method based method, does
not need gradient descent.

39
LRLS method cont.
40
LRLS method cont.
41
Example for compare methods
Algorithms Average time steps (cut off at 0.15) Average time steps (cut off at 0.1) Average time steps (cut off at 0.05)
BPM 101 175 594
DBD 64 167 2382
LM 2 123 264
LRLS 3 9 69
42
IGLS (Integration of the Gradient and Least
Square)

The LRLS algorithm has faster convergence speed
than BPM algorithm. However, it is
computationally more complex and is not ideal for
very large network size. The learning strategy
describe here combines ideas from both the
algorithm to achieve the objectives of
maintaining complexity for large network size and
reasonably fast convergence.