Title: Interior Point Optimization Methods in Support Vector Machines Training
1Interior Point Optimization Methodsin Support
Vector Machines Training
- Part 3 Primal-Dual Optimization Methodsand
Neural Network Training - Theodore Trafalis
- E-mail trafalis_at_ecn.ou.edu
- ANNIE99, St. Louis, Missouri, U.S.A, Nov. 7,
1999
2Outline
- Objectives
- Artificial Neural Networks
- Neural Network Training as a Mathematical
Programming Problem - A Nonlinear Primal-Dual Technique
- A Stochastic Variant
- An Incremental Primal-Dual Method
- Primal-Dual Path Following Algorithms for QP
3Objectives
4Artificial Neural Networks
xp1
z1
1
1
1
vij
wjk
xpi
zk
i
j
k
xpn(1)
zn(3)
n(1)
n(2)
n(3)
a
f
w1
of(w1aw2bw3c) f(x)tanh(x)
w2
b
c
w3
Neuron
5Neural Network Training as a Mathematical
Programming Problem
6Constraints on the Weights g(v,w)
- To avoid saturation of the neurons (Network
Paralysis), we restrict the weights in the region
?? - Block constraints with respect to p.
- The error minimization problem can be decomposed.
7A Nonlinear Primal-Dual Technique
- Consider the general nonlinear programming
problem - min f(x)
- s.t. h(x)0 (NLP)
- g(x)?0
-
- where f?n ????h?n ??m, and g?n ??p.
- The Lagrangian associated with (NLP) is
- L(x,y,z)f(x)yTh(x)-zTg(x)
- where y??m and z??p are the Lagrange multipliers.
8KKT Optimality Conditions
- The Karush-Kuhn-Tucker (KKT) conditions are
- To ensure adherence to central path, we use
perturbed KKT complementarity slackness
conditions - ZSe?e
9Adherence to Central Trajectory
- When Newtons method is used to solve the KKT
system, - Z?s S?z -ZSe
- If becomes zero, it will remain at zero in
the following iterations. - If current iterate approaches the boundary, it
gets trapped by that boundary.
10Solving the KKT Conditions Algorithm
- Consider vk(xk,yk,sk,zk) and ?vk(?xk,?yk,?sk,?zk
). - Newtons method
- J(vk) ?vk -F?(vk) (S)
- where
- NLPD Algorithm
- Initialization.
- Solve linear system of equations (S).
- Calculate step lengths.
- Update current point.
- If stopping criterion satisfied,STOP.
- Otherwise, update ? and go to step 2.
11Hessian Calculation
- For convex problems, ?x2L(x,y,z) is positive
definite, J(vk) is nonsingular. ?x2L(x,y,z) is
calculated by central differences. - For nonconvex problems, ?x2L(x,y,z) is generally
indefinite and J(vk) might become singular. We
approximate ?x2L(x,y,z) by a positive definite
matrix H using a recursive formula. - H(k1) ?H(k) ?xL(x,y,z)(?xL(x,y,z))T
- Update based on the Recursive Prediction Error
Method (RPEM) (Soderstrom and Stoica, 1989
Davidon, 1976). - Hock and Schittkowski database of constrained
nonlinear programming problems (Hock and
Schittkowski, 1981). - Comparisons with Breitfeld and Shannos Modified
Barrier Algorithm (Breitfeld and Shanno, 1994).
12A Stochastic Variant of NLPD
- We add random noise to the objective function as
follows - Resulting perturbation on the direction of move
- Probability of accepting a bad move
13An Incremental Primal-Dual Algorithm
- Consider problems of the form
- Applications
- General least square problems
- Artificial neural network training problem.
14Example Unconstrained Case
- Consider the following unconstrained minimization
problem - min f(x) f1(x) f2(x) f3(x)
- where x ??? and,
- f1(x) x2
- f2(x) (0.75 x 5)2
- f3(x) (1.5 x - 5)2
15Example (contd)
16The Algorithm
- From a point v10, the sequence (v1t,...,vL1t)t0,
1,... is generated where - ?vlt is calculated by performing one Newton step
towards the solution of the KKT conditions of the
following subproblem - min fl(x)
- s.t. hl(x)0
- gl(x)?0
- INCNLPD Algorithm
- Initialization.l1, t0.
- Solve linear system of equations (Sl).
- Calculate step lengths.
- Update current point.
- If stopping criterion satisfied,STOP.
- Otherwise,
- - if l?L1, set ll1, go to step 2.
- - if lgtL1, set tt1, l1, v1tvL1t1, update
? and go to step 2. -
17- Algorithm Convergence
- Local convergence of the algorithm can be shown
(Trafalis and Couellan 1997, paper submitted to
SIAM Journal on Optimization, under revision). - Starting from a neighborhood of the optimal
solution, the sequence of iterates generated by
INCNLPD converges q-linearly to that solution. - Motivations
- The algorithm is suitable to online
applications. - Leads to memory space savings.
- Leads to better fit of the data for some
applications.
18Primal- dual Path Following Algorithms for QP
- The problem we are concerned with is
- Converting inequalities into equalities
19 20 21- As m goes to zero, the central path converges to
an optimal solution to both primal and dual
problems. - Primal dual path following algorithm is defined
as an iterative process that starts from a point
in the feasible region and at each iteration
estimates a value of m representing a point on
the central path that is in some sense closer to
the optimal solution than the current point - then attempts to step toward this central path
point making sure that the new point remains in
the strict interior of the appropriate orthant
Vanderbei 1998.
22- Suppose we have already decided the value of m.
Let (x,y,z,s,g,t) be the current point on the
orthant, and (xDx,....tDt) denotes the point
on the central path. Then we have
23- Predictor-corrector algorithm will be used to
solve this problem. First, we solve the above
system after dropping m and D's from right hand
side of the equations. Then estimate of the
target value of m is made. - The m and D terms are reinstated on the right
hand side of the above equation using the current
estimates and then resulting system is again
solved for delta variables. - The second step is called corrector step and
resulting step directions are used to move to a
new point in the primal dual space. As it can be
seen from this procedure, we need to solve system
of equations twice in each step.
24Predictor Corrector Method
- predicting direction
-
-
- Centering or xk
- Correcting direction
- path of centers
? k1
xk
xk1
? k
25- The drawback of this method is to solve the
system of equations twice in each iteration. The
system of equations is a large, indefinite and
sparse and linear system. It can be converted
into a symmetric system by negating certain rows
and rearranging rows and columns Vanderbei
1998.
26- The following systematic elimination procedure is
applied to the above system (Vanderbei, 1998). We
use the pivot elements -ST-1 and -G-1Z to solve
for Dt and Dg. After solving for ?t and ?g, we
get the following system of equations.
27- By using S-1T and GZ-1 as pivot elements, we get
the following system of equations, called reduced
KKT system.
28- In order to start the algorithm, we need to
provide initial values for all the variables.
Vanderbei (1998) recommends the following
procedure to start the algorithm. First, we solve
the following system to find initial values of x
and y. - Then, other variables are set as follows
29- m is updated by the following formula
- ap and ad are the step directions for primal and
dual variables. They must be normalized to 1. The
following formulas are used to compute them. -
30- At the end of each iteration, the current
solution is updated by using the following
formulas
31Conclusions
- An incremental primal-dual technique has been
developed for - problems with special decomposition properties.
- The algorithm, its implementation and its
convergence results are - provided in (Trafalis and Couellan 1997).
-
- A stochastic primal-dual technique has been
proposed (Trafalis and Couellan 1997, paper
submitted to Journal of Global Optimization, - under revision). Results show that it
achieves better results than - the deterministic approach.
- A primal-dual path following algorithm for QP was
developed.