Model Characteristics And Practical Solvers In Nonlinear Programming - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Model Characteristics And Practical Solvers In Nonlinear Programming

Description:

We assume that all nonlinear functions are sufficiently smooth and differentiable. ... KNITRO: Byrd, Nocedal, Waltz, ... For Convex NLP: Mosek ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 56
Provided by: arnestolb
Category:

less

Transcript and Presenter's Notes

Title: Model Characteristics And Practical Solvers In Nonlinear Programming


1
Model Characteristics And Practical Solvers
InNonlinear Programming
  • Arne Stolbjerg Drud

2
General NLP Model
  • Minimize f(x)
  • Subject to
  • g(x) b
  • lb x ub
  • Other forms can be useful for special cases, e.g.
    inequality formulations for convex models.
  • We assume that all nonlinear functions are
    sufficiently smooth and differentiable.
  • Notation n dim(x) and m dim(g)

3
Modeling Environment
  • Models are formulated in a modeling system such
    as AMPL, AIMMS, GAMS, ...
  • Function values, first and second derivatives and
    their sparsety patterns are easily available
    (without user intervention).
  • Function and derivatives are fairly cheap.

4
Optimality Conditions
  • Lagrange Function
  • L(x,y) f(x) yT(g(x)-b)
  • Optimality Conditions
  • Primal g(x) b, lb x ub
  • Dual df/dx yTdg/dx - 0, -8 y 8
  • The orthogonality operator - is defined
    componentwise as
  • if xi lbi then elseif xi ubi then
    else

5
Perturbed Optimality Conditions
  • The Optimality Conditions are usually solved
    using some perturbation method related to Newtons
    Method.
  • Evaluate all nonlinear terms and derivatives at a
    base point x0,y0 and solve for ?x,?y
  • Primal g dg/dx?x b
  • Dual df/dx d2f/dx2?x y0Td2g/dx2?x
    y0Tdg/dx ?yTdg/dx - 0

6
The KKT Matrix
7
Sources of Difficulty
  • Combinatorics
  • The Combinatorial nature of the orthogonality
    operator which bounds are active and which are
    not and therefore which dual constraints are
    active and which are not.
  • Nonlinearity
  • The Nonlinear nature of the objective and
    constraints local properties will in general not
    hold globally.

8
Nonlinearity
9
Handling Combinatorics
  • Interior Point Methods
  • Bounds are penalized using a log-barrier
    function.
  • A large equality constrained problem is solved
    for a series of penalty values.
  • Active Set Methods
  • The set of active inequalities bounds is guessed.
  • A small equality constrained sub-problem is
    solved to find an improving point.
  • The guessed set of active inequalities is updated
    iteratively.

10
Handling Nonlinearity
  • General Methods Iterate using Linear or
    Quadratic Approximations
  • Special Methods Numerous methods tailored to
    take advantage of available information, special
    forms, etc, e.g.
  • Sum of Square models can use special
    approximations (Gauss-Newton).
  • Models without bounds can avoid combinatorics.
  • Models without equality constraints can avoid
    projections.
  • QPs are different from general NLPs

11
Interior Point Methods
  • Lagrange Function
  • L(x,y) f(x) yT(g(x)-b)
  • µ?i(log(xi-lbi)log(ubi-xi))
  • Optimality Conditions
  • Primal g(x) b, lb x ub
  • Dual df/dxi yTdg/dxi
  • µ((xi-lbi)-1-(ubi-xi )-1) 0, -8 y 8

12
Interior Point KKT Matrix
  • where Di µ((xi-lbi)-2(ubi-xi )-2) is a
    diagonal matrix with non-negative entries.

13
Degree of Nonlinearity
Nonconvex NLP
Convex NLP
Convex QP
LP
14
Interior Point LP
  • Use diagonal matrix D as block-pivot and we get
    the usual interior point matrix AD-1AT in y-space

15
Interior Point Convex QP
16
Interior Point Convex QP
  • A block-pivot on QD gives the reduced matrix M
    A(QD)-1AT in y-space.
  • In practice Compute a Cholesky factorization of
    QD LLT (QD is positive definite) and
    compute M (A L-T) (A L-T)T.
  • The ordering for sparse Cholesky factorization of
    both QD and M can be done once.
  • The density of M will depend on the off-diagonal
    elements in Q

17
IP Separable Convex QP
  • If the QP is convex and separable, e.g. sum of
    square models, then Q and (QD) are diagonal.
  • All numerical work is equivalent to the numerical
    work in LP.
  • Convergence is as fast as for LP.

18
Interior Point Convex NLP
19
Interior Point Convex NLP
  • The usual block-pivot matrix HD is positive
    definite.
  • A block-pivot on HD gives the reduced matrix M
    J(HD)-1JT in y-space.
  • All work is exactly as in the convex QP case
    (except for repeated evaluation of J and H).
  • All sparse ordering can be done once.
  • Same convergence properties as for LP.

20
Interior Point Nonconvex NLP
21
Nonconvex NLP Technical Difficulties
  • The block-pivot matrix HD is NOT necessarily
    positive definite and a Cholesky factorization
    may not exist.
  • The factorization of the indefinite KKT system
    must be done as one system including both primal
    and dual parts
  • The factorization result in an LD1LT
    factorization where D1 is block-diagonal with 1
    by 1 and 2 by 2 blocks.

22
Nonconvex NLP Mathematical Problems
  • The model may have several locally optimal
    solutions.
  • The constraints may have infeasible points with
    local minima for various measures of
    infeasibility, even if the model is feasible.
  • The direction derived from the KKT system may not
    be a descend direction.
  • No reason to expect fast convergence for interior
    point methods.

23
More Technical Difficulties
  • Extra proximity-terms can be added to D to
    make approximation model more convex and force
    descend directions.
  • Interia-methods can be used to test if HD
    projected on the constraints is positive
    definite.
  • Changes in D require re-factorization.
  • Sparse ordering cannot be done a priori.
    Re-factorization is expensive.

24
Software based on IP Methods
  • For General NLP
  • LOQO Shanno, Vanderbei, Benson, ...
  • IPOPT Weachter, Biegler.
  • KNITRO Byrd, Nocedal, Waltz, ...
  • For Convex NLP
  • Mosek
  • For Convex QP and Convex Quadratic Constraints
  • Cplex with barrier and QP option

25
IP Software LOQO
  • Adds a proximity term to the constraints giving
    the KKT matrix an extra nonzero block in lower
    right hand corner. Should improve stability.
  • Factorization uses only 1 by 1 pivots (positive
    and negative).
  • User can select ordering as primal-first or
    dual-first. Standard ordering methods are used
    within each group.

26
IP Software IPOPT
  • Freely available via COIN-OR.
  • Based on 3rd party factorization software
    (Harwell Subroutine Library).

27
IP Software KNITRO
  • Based on 3rd party factorization software
    (Harwell Subroutine Library).
  • Commercially available from Ziena Optimization.
  • Available with AMPL, GAMS, and stand-alone.
  • Has alternative active set methods based on LP
    methods for guessing active constraints and
    equality constrained QPs for search directions.
  • Can use Conjugate Gradients instead of
    factorization for solving KKT system.

28
Comparison of IP Software
  • LOCO, IPOPT, and KNITRO are mostly used with
    AMPL.
  • Published comparisons based on the CUTE-R set of
    models show solvers to be similar ratings
    changes frequently.
  • Difficulties are related to
  • Slow Analyze/Factorize for the KKT matrix.
  • Newton steps are not good gt very many iterations
    with short steps on some non-convex models.
  • Several CUTE-R models seem to be constructed to
    hurt the competition.

29
Active Set Methods
  • The variables are partitioned into 3 sets
  • At Lower Bound (LB), At Upper Bound (UB), and
    Between Bounds (BB).
  • The dual constraints are as a consequence
    partitioned into
  • Binding as , Binding as , Binding as
  • Fixed variables and non-binding constraints are
    temporarily ignored, a smaller model is used to
    determine a search direction, and the sets are
    updated.

30
Active Set Methods
31
Active Set Methods
32
Active Set Methods
33
Active Set Methods
34
Active Set Methods
  • The Active Set Optimality Conditions
  • Have Fewer Variables
  • Have Fewer Dual Constraints and only Equalities
  • Are Symmetric
  • A direction ?x, ?y is computed based on the
    smaller system and a step is made.
  • The sets BB, LB, and UB are updated if necessary.

35
Active Set Strategies
  • Dual Equalities with Accurate or Approximate H?
  • Many active set methods date to a period when 2nd
    derivatives were not easily available.
  • Reduced Hessians (see later) may be cheaper to
    work with than complete Hessians.
  • How are the Active Set Optimality Conditions
    solved?
  • Direct factorization or partitioning methods.
  • Updating or Refactorization.

36
Active Set Strategies
  • How is the step ?x, ?y used?
  • Minor iterations using the same linearized KKT
    system.
  • Major iterations using accumulated ?x, ?y from
    minor iterations.
  • Step at Major iterations use some Merit Function
    (usually based on primal information only).
  • When are the sets BB, LB, and UB updated?
  • Variables must be moved from BB to LB or UB when
    they reach a bound.
  • Variables can be moved from LB or UB to BB when
    dual inequalities are violated. Number of
    variables to move and test frequency vary.

37
Active Set Optimality Conditions
  • Direct Factorization
  • The Symmetric Indefinite system is factorized as
    LDLT where D is block triangular with 1 by 1
    and 2 by 2 blocks and L is lower triangular.
  • Updates when LB, BB, and UB changes.

38
Active Set Optimality Conditions
  • Partitioning
  • J is partitioned in B, a non-singular Basis, and
    S Superbasis.
  • B and BT are used as Block Pivots.
  • The B-parts of H are eliminated.

39
Active Set Optimality Conditions
  • B-1S not formed explicitly.
  • The Reduced Hessian, RHSS, is complicated,
    symmetric, and dense.
  • RHSS can be approximated using Quasi-Newton
    methods.
  • Simplex-like updates used for B-1, projection
    updates used for RHSS.

40
Software based on Active Sets
  • MINOS and SNOPT, Saunders, Gill, ...
  • CONOPT, Drud
  • Many others, especially based on dense linear
    algebra for small-scale models

41
SNOPT
  • How are the Active Set Optimality Conditions
    solved?
  • Partitioning Primal constraints are satisfied
    accurately using Basis and Simplex technology.
    Phase-1 procedure.
  • Overall Hessian is not used -- Reduced Hessian is
    approximated using Quasi-Newton methods with
    updates between Major iterations.
  • Reduced Hessian stored as dense lower triangular
    matrix gt space can be a problem.
  • Minor/Major iterations.

42
SNOPT
  • How is the step ?x, ?y used?
  • Standard minor/major iteration approach.
  • Merit function needs accurate solution to primal
    constraints to guarantie descend.
  • How are the sets BB, LB, and UB updated?
  • Standard use of Basis Superbasis.
  • B is updated as in Simplex when basis changes.
  • RH is projected when basis changes.
  • Diagonal added when RH is increased.

43
CONOPT
  • How are the Reduced Optimality Conditions solved?
  • Partitioning Primal constraints are satisfied
    accurately using Basis and Simplex technology (as
    SNOPT).
  • Special fast SLP phase with H 0 is used when
    it is considered acceptable, i.e. bounds are
    more important than curvature.
  • Uses either one Hessian evaluation per major
    iteration or one Hessian-vector evaluation per
    minor iteration. Selection based on
    availability and expected work.

44
CONOPT
  • How are the Reduced Optimality Conditions solved
    (continued)?
  • Reduced Hessian approximated using Quasi-Newton
    methods (as SNOPT), but updated at appropriate
    minor iterations.
  • Conjugate Gradients are used in the xs space when
    Reduced Hessian becomes too large. Diagonal
    preconditioning is used if available.
  • Dual equalities are solved using minor
    iterations stop when expected change in
    objective is good.

45
CONOPT
  • How is ?x, ?y used?
  • ?x and ?y are accumulated over minor iterations
    (as SNOPT).
  • ?xS is used directly. ?xB is used as initial
    point in a Newton process with B-1 to ensure
    feasibility of the actual Primal constraints (as
    opposed to the linearized Primal constraints).
  • Newton Process requires many primal function
    evaluations.
  • Step is determined using actual objective
    function (Primal feasibility means that Merit
    function is not needed).

46
SNOPT
Merit function
Initial point
Tangent plane
Final Point
Constraint
47
CONOPT
Initial point
Tangent plane
Newton process restores feasibility
Constraint
True objective tested in linesearch
48
CONOPT
  • How are the sets BB, LB, and UB updated?
  • Similar to SNOPT but details are different.
  • More aggressive movement of variables from LB/UB
    to BB, motivated by cheaper and more frequent
    updates of Reduced Hessian (every minor
    iteration).

49
Comparison SNOPT/CONOPT
  • Models with fairly linear constraints are good
    for SNOPT.
  • CONOPTs SLP procedure good for Easy models.
  • The Newton process in CONOPT is costly but useful
    for very nonlinear models gt CONOPT more
    reliable.
  • Many superbasics cannot be handled by SNOPT
    (space).
  • Large unpredictable variation in relative
    solution time.

50
Comparison IP/Active Set
  • Active Set Methods usually good for models with
    few degrees of freedom, Interior Point for models
    with many degrees of freedom.
  • Relative solution times can vary by more than a
    factor 100 both ways.
  • Our ability to predict which solver will be best
    is fairly limited.

51
GAMS/Globallib Performance profile ignoring
objective value
52
GAMS/Globallib Performance profile - best
objective only
53
Limitations for NLP models
  • Models with over 100,000 variables equations
    have been solved, but models with 10,000 can be
    unsolvable.
  • Most models below 1000 varequ should be solvable
    unless poorly formulated.
  • We can usually iterate or move on vary large
    models (500,000 VarEqu), at least for Active Set
    methods. Convergence is the problem.
  • We have no way to predict if a model can be
    solved or not.

54
Personal Favorites
  • Do easy things with cheap algorithms.
  • Recognize when in trouble and switch algorithm.
  • Change from Active Set to Interior Point.
  • Easy use of Multi-Processor Run two different
    solvers in parallel.

55
Conclusions
  • We can solve many fairly large NLP models.
  • Solution times are often over 10 times larger
    than for comparable LPs.
  • Reliability is considerably lower than for LP.
  • Good model formulation is very important for
    reliability and speed.
  • Large variation between solvers. No good
    automatic solver selection.
  • Future Linear Algebra, Numerical Analysis,
    Pre-processing, and better model formulation.
Write a Comment
User Comments (0)
About PowerShow.com