Interior Point Optimization Methods in Support Vector Machines Training - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Interior Point Optimization Methods in Support Vector Machines Training

Description:

School of Industrial Engineering. Interior Point Optimization Methods ... School of Industrial Engineering. 6. Constraints on the Weights g(v,w) ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 32
Provided by: tbtra
Category:

less

Transcript and Presenter's Notes

Title: Interior Point Optimization Methods in Support Vector Machines Training


1
Interior Point Optimization Methodsin Support
Vector Machines Training
  • Part 3 Primal-Dual Optimization Methodsand
    Neural Network Training
  • Theodore Trafalis
  • E-mail trafalis_at_ecn.ou.edu
  • ANNIE99, St. Louis, Missouri, U.S.A, Nov. 7,
    1999

2
Outline
  • Objectives
  • Artificial Neural Networks
  • Neural Network Training as a Mathematical
    Programming Problem
  • A Nonlinear Primal-Dual Technique
  • A Stochastic Variant
  • An Incremental Primal-Dual Method
  • Primal-Dual Path Following Algorithms for QP

3
Objectives
4
Artificial Neural Networks
xp1
z1
1
1
1
vij
wjk
xpi
zk
i
j
k
xpn(1)
zn(3)
n(1)
n(2)
n(3)
a
f
w1
of(w1aw2bw3c) f(x)tanh(x)
w2
b
c
w3
Neuron
5
Neural Network Training as a Mathematical
Programming Problem
6
Constraints on the Weights g(v,w)
  • To avoid saturation of the neurons (Network
    Paralysis), we restrict the weights in the region
    ??
  • Block constraints with respect to p.
  • The error minimization problem can be decomposed.

7
A Nonlinear Primal-Dual Technique
  • Consider the general nonlinear programming
    problem
  • min f(x)
  • s.t. h(x)0 (NLP)
  • g(x)?0
  • where f?n ????h?n ??m, and g?n ??p.
  • The Lagrangian associated with (NLP) is
  • L(x,y,z)f(x)yTh(x)-zTg(x)
  • where y??m and z??p are the Lagrange multipliers.

8
KKT Optimality Conditions
  • The Karush-Kuhn-Tucker (KKT) conditions are
  • To ensure adherence to central path, we use
    perturbed KKT complementarity slackness
    conditions
  • ZSe?e

9
Adherence to Central Trajectory
  • When Newtons method is used to solve the KKT
    system,
  • Z?s S?z -ZSe
  • If becomes zero, it will remain at zero in
    the following iterations.
  • If current iterate approaches the boundary, it
    gets trapped by that boundary.

10
Solving the KKT Conditions Algorithm
  • Consider vk(xk,yk,sk,zk) and ?vk(?xk,?yk,?sk,?zk
    ).
  • Newtons method
  • J(vk) ?vk -F?(vk) (S)
  • where
  • NLPD Algorithm
  • Initialization.
  • Solve linear system of equations (S).
  • Calculate step lengths.
  • Update current point.
  • If stopping criterion satisfied,STOP.
  • Otherwise, update ? and go to step 2.

11
Hessian Calculation
  • For convex problems, ?x2L(x,y,z) is positive
    definite, J(vk) is nonsingular. ?x2L(x,y,z) is
    calculated by central differences.
  • For nonconvex problems, ?x2L(x,y,z) is generally
    indefinite and J(vk) might become singular. We
    approximate ?x2L(x,y,z) by a positive definite
    matrix H using a recursive formula.
  • H(k1) ?H(k) ?xL(x,y,z)(?xL(x,y,z))T
  • Update based on the Recursive Prediction Error
    Method (RPEM) (Soderstrom and Stoica, 1989
    Davidon, 1976).
  • Hock and Schittkowski database of constrained
    nonlinear programming problems (Hock and
    Schittkowski, 1981).
  • Comparisons with Breitfeld and Shannos Modified
    Barrier Algorithm (Breitfeld and Shanno, 1994).

12
A Stochastic Variant of NLPD
  • We add random noise to the objective function as
    follows
  • Resulting perturbation on the direction of move
  • Probability of accepting a bad move

13
An Incremental Primal-Dual Algorithm
  • Consider problems of the form
  • Applications
  • General least square problems
  • Artificial neural network training problem.

14
Example Unconstrained Case
  • Consider the following unconstrained minimization
    problem
  • min f(x) f1(x) f2(x) f3(x)
  • where x ??? and,
  • f1(x) x2
  • f2(x) (0.75 x 5)2
  • f3(x) (1.5 x - 5)2

15
Example (contd)
16
The Algorithm
  • From a point v10, the sequence (v1t,...,vL1t)t0,
    1,... is generated where
  • ?vlt is calculated by performing one Newton step
    towards the solution of the KKT conditions of the
    following subproblem
  • min fl(x)
  • s.t. hl(x)0
  • gl(x)?0
  • INCNLPD Algorithm
  • Initialization.l1, t0.
  • Solve linear system of equations (Sl).
  • Calculate step lengths.
  • Update current point.
  • If stopping criterion satisfied,STOP.
  • Otherwise,
  • - if l?L1, set ll1, go to step 2.
  • - if lgtL1, set tt1, l1, v1tvL1t1, update
    ? and go to step 2.

17
  • Algorithm Convergence
  • Local convergence of the algorithm can be shown
    (Trafalis and Couellan 1997, paper submitted to
    SIAM Journal on Optimization, under revision).
  • Starting from a neighborhood of the optimal
    solution, the sequence of iterates generated by
    INCNLPD converges q-linearly to that solution.
  • Motivations
  • The algorithm is suitable to online
    applications.
  • Leads to memory space savings.
  • Leads to better fit of the data for some
    applications.

18
Primal- dual Path Following Algorithms for QP
  • The problem we are concerned with is
  • Converting inequalities into equalities

19
  • Dual of this problem is

20
  • Central Path

21
  • As m goes to zero, the central path converges to
    an optimal solution to both primal and dual
    problems.
  • Primal dual path following algorithm is defined
    as an iterative process that starts from a point
    in the feasible region and at each iteration
    estimates a value of m representing a point on
    the central path that is in some sense closer to
    the optimal solution than the current point
  • then attempts to step toward this central path
    point making sure that the new point remains in
    the strict interior of the appropriate orthant
    Vanderbei 1998.

22
  • Suppose we have already decided the value of m.
    Let (x,y,z,s,g,t) be the current point on the
    orthant, and (xDx,....tDt) denotes the point
    on the central path. Then we have

23
  • Predictor-corrector algorithm will be used to
    solve this problem. First, we solve the above
    system after dropping m and D's from right hand
    side of the equations. Then estimate of the
    target value of m is made.
  • The m and D terms are reinstated on the right
    hand side of the above equation using the current
    estimates and then resulting system is again
    solved for delta variables.
  • The second step is called corrector step and
    resulting step directions are used to move to a
    new point in the primal dual space. As it can be
    seen from this procedure, we need to solve system
    of equations twice in each step.

24
Predictor Corrector Method
  • predicting direction
  • Centering or xk
  • Correcting direction
  • path of centers

? k1
xk
xk1
? k
25
  • The drawback of this method is to solve the
    system of equations twice in each iteration. The
    system of equations is a large, indefinite and
    sparse and linear system. It can be converted
    into a symmetric system by negating certain rows
    and rearranging rows and columns Vanderbei
    1998.

26
  • The following systematic elimination procedure is
    applied to the above system (Vanderbei, 1998). We
    use the pivot elements -ST-1 and -G-1Z to solve
    for Dt and Dg. After solving for ?t and ?g, we
    get the following system of equations.

27
  • By using S-1T and GZ-1 as pivot elements, we get
    the following system of equations, called reduced
    KKT system.

28
  • In order to start the algorithm, we need to
    provide initial values for all the variables.
    Vanderbei (1998) recommends the following
    procedure to start the algorithm. First, we solve
    the following system to find initial values of x
    and y.
  • Then, other variables are set as follows

29
  • m is updated by the following formula
  • ap and ad are the step directions for primal and
    dual variables. They must be normalized to 1. The
    following formulas are used to compute them.

30
  • At the end of each iteration, the current
    solution is updated by using the following
    formulas

31
Conclusions
  • An incremental primal-dual technique has been
    developed for
  • problems with special decomposition properties.
  • The algorithm, its implementation and its
    convergence results are
  • provided in (Trafalis and Couellan 1997).
  • A stochastic primal-dual technique has been
    proposed (Trafalis and Couellan 1997, paper
    submitted to Journal of Global Optimization,
  • under revision). Results show that it
    achieves better results than
  • the deterministic approach.
  • A primal-dual path following algorithm for QP was
    developed.
Write a Comment
User Comments (0)
About PowerShow.com