Interior Point Optimization Methods in Support Vector Machines Training - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Interior Point Optimization Methods in Support Vector Machines Training

Description:

School of Industrial Engineering. Interior Point Optimization Methods ... School of Industrial Engineering. 6. Constraints on the Weights g(v,w) ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 32

Provided by: tbtra

Category:

more less

Transcript and Presenter's Notes

Title: Interior Point Optimization Methods in Support Vector Machines Training

1
Interior Point Optimization Methodsin Support
Vector Machines Training

Part 3 Primal-Dual Optimization Methodsand
Neural Network Training
Theodore Trafalis
E-mail trafalis_at_ecn.ou.edu
ANNIE99, St. Louis, Missouri, U.S.A, Nov. 7,
1999

2
Outline

Objectives
Artificial Neural Networks
Neural Network Training as a Mathematical
Programming Problem
A Nonlinear Primal-Dual Technique
A Stochastic Variant
An Incremental Primal-Dual Method
Primal-Dual Path Following Algorithms for QP

3
Objectives
4
Artificial Neural Networks
xp1
z1
1
1
1
vij
wjk
xpi
zk
i
j
k
xpn(1)
zn(3)
n(1)
n(2)
n(3)
a
f
w1
of(w1aw2bw3c) f(x)tanh(x)
w2
b
c
w3
Neuron
5
Neural Network Training as a Mathematical
Programming Problem
6
Constraints on the Weights g(v,w)

To avoid saturation of the neurons (Network
Paralysis), we restrict the weights in the region
??
Block constraints with respect to p.
The error minimization problem can be decomposed.

7
A Nonlinear Primal-Dual Technique

Consider the general nonlinear programming
problem
min f(x)
s.t. h(x)0 (NLP)
g(x)?0
where f?n ????h?n ??m, and g?n ??p.
The Lagrangian associated with (NLP) is
L(x,y,z)f(x)yTh(x)-zTg(x)
where y??m and z??p are the Lagrange multipliers.

8
KKT Optimality Conditions

The Karush-Kuhn-Tucker (KKT) conditions are
To ensure adherence to central path, we use
perturbed KKT complementarity slackness
conditions
ZSe?e

9
Adherence to Central Trajectory

When Newtons method is used to solve the KKT
system,
Z?s S?z -ZSe
If becomes zero, it will remain at zero in
the following iterations.
If current iterate approaches the boundary, it
gets trapped by that boundary.

10
Solving the KKT Conditions Algorithm

Consider vk(xk,yk,sk,zk) and ?vk(?xk,?yk,?sk,?zk
).
Newtons method
J(vk) ?vk -F?(vk) (S)
where

NLPD Algorithm
Initialization.
Solve linear system of equations (S).
Calculate step lengths.
Update current point.
If stopping criterion satisfied,STOP.
Otherwise, update ? and go to step 2.

11
Hessian Calculation

For convex problems, ?x2L(x,y,z) is positive
definite, J(vk) is nonsingular. ?x2L(x,y,z) is
calculated by central differences.
For nonconvex problems, ?x2L(x,y,z) is generally
indefinite and J(vk) might become singular. We
approximate ?x2L(x,y,z) by a positive definite
matrix H using a recursive formula.
H(k1) ?H(k) ?xL(x,y,z)(?xL(x,y,z))T
Update based on the Recursive Prediction Error
Method (RPEM) (Soderstrom and Stoica, 1989
Davidon, 1976).
Hock and Schittkowski database of constrained
nonlinear programming problems (Hock and
Schittkowski, 1981).
Comparisons with Breitfeld and Shannos Modified
Barrier Algorithm (Breitfeld and Shanno, 1994).

12
A Stochastic Variant of NLPD

We add random noise to the objective function as
follows
Resulting perturbation on the direction of move
Probability of accepting a bad move

13
An Incremental Primal-Dual Algorithm

Consider problems of the form
Applications
General least square problems
Artificial neural network training problem.

14
Example Unconstrained Case

Consider the following unconstrained minimization
problem
min f(x) f1(x) f2(x) f3(x)
where x ??? and,
f1(x) x2
f2(x) (0.75 x 5)2
f3(x) (1.5 x - 5)2

15
Example (contd)
16
The Algorithm

From a point v10, the sequence (v1t,...,vL1t)t0,
1,... is generated where
?vlt is calculated by performing one Newton step
towards the solution of the KKT conditions of the
following subproblem
min fl(x)
s.t. hl(x)0
gl(x)?0

INCNLPD Algorithm
Initialization.l1, t0.
Solve linear system of equations (Sl).
Calculate step lengths.
Update current point.
If stopping criterion satisfied,STOP.
Otherwise,
- if l?L1, set ll1, go to step 2.
- if lgtL1, set tt1, l1, v1tvL1t1, update
? and go to step 2.

Algorithm Convergence
Local convergence of the algorithm can be shown
(Trafalis and Couellan 1997, paper submitted to
SIAM Journal on Optimization, under revision).
Starting from a neighborhood of the optimal
solution, the sequence of iterates generated by
INCNLPD converges q-linearly to that solution.
Motivations
The algorithm is suitable to online
applications.
Leads to memory space savings.
Leads to better fit of the data for some
applications.

18
Primal- dual Path Following Algorithms for QP

The problem we are concerned with is
Converting inequalities into equalities

Dual of this problem is

Central Path

As m goes to zero, the central path converges to
an optimal solution to both primal and dual
problems.
Primal dual path following algorithm is defined
as an iterative process that starts from a point
in the feasible region and at each iteration
estimates a value of m representing a point on
the central path that is in some sense closer to
the optimal solution than the current point
then attempts to step toward this central path
point making sure that the new point remains in
the strict interior of the appropriate orthant
Vanderbei 1998.

Suppose we have already decided the value of m.
Let (x,y,z,s,g,t) be the current point on the
orthant, and (xDx,....tDt) denotes the point
on the central path. Then we have

Predictor-corrector algorithm will be used to
solve this problem. First, we solve the above
system after dropping m and D's from right hand
side of the equations. Then estimate of the
target value of m is made.
The m and D terms are reinstated on the right
hand side of the above equation using the current
estimates and then resulting system is again
solved for delta variables.
The second step is called corrector step and
resulting step directions are used to move to a
new point in the primal dual space. As it can be
seen from this procedure, we need to solve system
of equations twice in each step.

24
Predictor Corrector Method

predicting direction
Centering or xk
Correcting direction
path of centers

? k1
xk
xk1
? k
25

The drawback of this method is to solve the
system of equations twice in each iteration. The
system of equations is a large, indefinite and
sparse and linear system. It can be converted
into a symmetric system by negating certain rows
and rearranging rows and columns Vanderbei
1998.

The following systematic elimination procedure is
applied to the above system (Vanderbei, 1998). We
use the pivot elements -ST-1 and -G-1Z to solve
for Dt and Dg. After solving for ?t and ?g, we
get the following system of equations.

By using S-1T and GZ-1 as pivot elements, we get
the following system of equations, called reduced
KKT system.

In order to start the algorithm, we need to
provide initial values for all the variables.
Vanderbei (1998) recommends the following
procedure to start the algorithm. First, we solve
the following system to find initial values of x
and y.
Then, other variables are set as follows

m is updated by the following formula
ap and ad are the step directions for primal and
dual variables. They must be normalized to 1. The
following formulas are used to compute them.

At the end of each iteration, the current
solution is updated by using the following
formulas

31
Conclusions

An incremental primal-dual technique has been
developed for
problems with special decomposition properties.
The algorithm, its implementation and its
convergence results are
provided in (Trafalis and Couellan 1997).
A stochastic primal-dual technique has been
proposed (Trafalis and Couellan 1997, paper
submitted to Journal of Global Optimization,
under revision). Results show that it
achieves better results than
the deterministic approach.
A primal-dual path following algorithm for QP was
developed.