High-Performance Optimization - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

High-Performance Optimization

Description:

High-Performance Optimization. Mark P. ... Find the 'best', or optimum value of an objective (cost) function. Very ... Realism of the underlying model ... – PowerPoint PPT presentation

Number of Views:309
Avg rating:3.0/5.0
Slides: 81
Provided by: nipis
Category:

less

Transcript and Presenter's Notes

Title: High-Performance Optimization


1
High-Performance Optimization
  • Mark P. Wachowiak, Ph.D.
  • Department of Computer Science and Mathematics
  • Nipissing University
  • June, 2007

2
Optimization
  • Find the best, or optimum value of an objective
    (cost) function.
  • Very large research area.
  • Multitude of applications
  • Economics and finance.
  • Engineering.
  • Manufacturing.
  • Management.
  • Chemistry.

3
Limited Impact of Optimization
  • Computational difficulties
  • High computational cost (especially G.O.)
  • Plethora of methods (which to use?)
  • Disconnect between CAD and CAE
  • Realism of the underlying model
  • Management resistance

http//www.cas.mcmaster.ca/mopta/mopta01/abstract
s.htmljones
4
Speedup and Efficiency
5
Response Surface
  • Assume a deterministic function of D variables at
    n points.
  • Denote sampled point i by xi (xii , xiD), i
    1, , n.
  • The associated function value is yi y(xi).

Regis and Shoemaker, 2007
6
Response Surface
Linear regression
Normally distributed error terms N(0, s)
Unknown coefficients to be estimated
Linear or nonlinear function of x.
7
2D Response Surface
d 8
d 4
f(x)
Response
d 6
x
8
Radial Basis Functions
Space of polynomials in D variables of degree ? m.
9
Radial Basis Functions
x ? 0
10
Radial Basis Functions
11
Radial Basis Functions
  • The RBF that interpolates (xi, f(xi)) for i
    1,,n is obtained by solving

12
Determining New Points
  • Resample in areas having promising function
    values as predicted by the model (Sóbester et
    al., 2004).
  • The next point selected maximizes the expected
    improvement in the objective function (Jones et
    al. 1998).
  • Search in areas of high predicted approximation
    error, using stochastic techniques (Sóbester et
    al., 2004).
  • Select a guess fn for the global minimum, and
    choose a new point xnew such that some function
    snew interpolating all (xi, fi) and (xnew, fn)
    such that some quality measure on snew is
    maximized (e.g. minimizing the bumpiness of
    snew (Gutmann, 2001 Regis and Shoemaker, 2007).

13
Constrained Optimization using Response Surfaces
  • Regis and Shoemaker, 2005.
  • Choose points so that they are at a maximum
    distance from previously-evaluated points.
  • Define the maximum distance between any point not
    evaluated and evaluated points as

14
Constrained Optimization using Response Surfaces
(2)
  • Let bn ? 0, 1 denote a user-defined distance
    factor.
  • The next point chosen is required to have a
    distance of at least bn?n from all
    previously-evaluated points.
  • This point solves the constrained minimization
    problem

Minimize
15
Auxiliary Subproblem
  • Auxiliary subproblem is generally convex.
  • Solved with
  • Gradient-based optimization software.
  • Multistart nonlinear programming solver.
  • Global optimization techniques, such as
    constrained DIRECT.

16
  • Suppose that xi, i 1, , n are previously
    evaluated points.

17
Unconstrained Nonlinear Optimization
  • Unconstrained minimization Find x that
    minimizes a function f(x).

18
Bound-Constrained Nonlinear Optimization
  • Find x that minimizes a function f(x).
  • subject to

19
General Constrained Nonlinear Optimization
  • Constrained minimization
  • subject to
  • Either f(x) and/or the c(x) are nonlinear.

Equality constraints
Inequality constraints
20
Nonlinear Equations Problem
  • Determine a root of a system of D nonlinear
    equations in D variables

21
Example
22
Definitions
  • Objective/cost function
  • Search space
  • Minimizer
  • Global optimization
  • Local optimization
  • Convergence criteria
  • Stationary point
  • Deterministic method
  • Stochastic method
  • Multiobjective optimization
  • Constrained
  • Unconstrained

23
Parallel Computing Terminology
  • Coarse-grained
  • Fine-grained
  • Distributed memory
  • Shared memory

24
Journals on Optimization
  • SIAM Journal on Optimization

25
Parallel Optimization
  • Schnabel
  • Three levels for introducing parallelism
  • Parallelize the function, gradient, and
    constraint evaluations.
  • Parallelize the numerical computations (linear
    algebra) and numerical libraries.
  • Parallelize the optimization algorithm at a high
    level i.e. adopt new optimization approaches.

26
Combinatorial Optimization
  • A linear or nonlinear function is defined over a
    very large finite set of solutions.
  • Categories
  • Network problems.
  • Scheduling.
  • Transportation.

27
Combinatorial Optimization (2)
  • If the function is piecewise linear, the function
    can be optimized exactly with mixed integer
    program methods using branch and bound.
  • Approximate solutions can be obtained with
    heuristic methods
  • Simulated annealing.
  • Tabu search.
  • Genetic algorithms.

28
General Unconstrained Problems
  • A nonlinear function is defined over the real
    numbers.
  • The function is not subject to constraints, or
    has simple bound constraints.
  • Patitioning strategies utilize a priori knowledge
    of how rapidly the function varies (e.g. the
    Lipschitz constant).
  • Interval methods can be used if the objective
    function can be expressed analytically.

29
General Unconstrained Problems Statistical
Methods
  • Stochastic in nature.
  • Use partitioning to decompose the search space.
  • Use a priori information or assumptions to model
    the objective function.
  • Usually inexact.

30
General Unconstrained Problems Statistical
Methods Examples
  • Simulated annealing.
  • Genetic algorithms.
  • Continuation methods,
  • Transforms the function into a smoother function
    with fewer local minimizers.
  • Applies local minimization procedure to trace the
    minimizers back to the original function.

31
General Constrained Problems
  • A nonlinear function is defined over the real
    numbers.
  • The optimization is subject to constraints.
  • These problems are usually solved by adapting
    techniques for unconstrained problems to address
    constraints.

32
Steepest Descent
  • Derivative-based local optimization method.
  • Simple to implement.

Gradient of f(x) using current x
Current x
x in next iteration
Step size
33
Steepest Descent Disadvantages
  • Unreliable convergence properties.
  • The new gradient at the minimum point of any line
    minimization is orthogonal to the direction just
    traversed.
  • ? Potentially many small steps are taken, leading
    to slow convergence.

34
Conjugate Gradient Methods
  • Assume that both f(x) and ??f(x) can be computed.
  • Assume that f can be approximated as a quadratic
    form
  • ?Optimality condition is

35
Conjugates
  • Given that A is a symmetric matrix, two vectors g
    and h are said to be conjugate w.r.t. A if gTAh
    0.
  • Orthogonal vectors are a special case of
    conjugate vectors gTh 0.
  • Therefore, the solution to the n ? n quadratic
    equation is

36
Conjugate Gradient Method
  • Select successive direction vectors as a
    conjugate version of the successive gradients
    obtained as the method progresses.
  • Conjugate directions are generated as the
    algorithm proceeds.

http//www.srl.gatech.edu/education/ME6103/NLP-Unc
onstrained-Multivariable.ppt
37
Conjugate Gradient Method (2)
Choose x0 Compute g0 ?f(x0) Set h0 -g0
Using a line search, find ak that minimizes f(xk
akhk).
Set xk1 xk akhk.
Fletcher-Reeves
Stopping criteria met?
NO
YES
Return x
Polack-Robiere
Compute new conjugate direction hk1 -gk1
bk hk
38
Branch and Bound
39
Clustering Methods
  • Used for unconstrained real-valued functions.
  • Multistart procedure - local searches are
    performed from mutliple points distributed over
    the search space.
  • The same local minimum may identified by multiple
    points. Clustering methods attempt to avoid this
    inefficiency by careful selection of points at
    which the local search is initiated.

40
Clustering Methods General Algorithm
  • Sample points in the search space.
  • Transform the sampled points to group them around
    the local minima.
  • Apply a clustering technique to identify groups
    representing convergence basins of local minima.

41
Clustering Methods Disadvantages
  • Because a potentially large number of points are
    randomly sampled to identify the clusters in
    neighborhoods of local minima, the objective
    function must be relatively inexpensive to
    compute.
  • Clustering methods are not suited to
    high-dimensional problems (more than a few
    hundred variables).
  • However, coarse-grained parallelism may be useful
    in improving efficiency.

42
Genetic Algorithms and Evolutionary Strategies
  • Sdf
  • Sdf
  • Sdf

43
Genetic Algorithms
Initialize the population
Evaluate the function values of the population
Perform competitive selection
Generate new solutions from genetic operators
NO
Stopping criteria met?
YES
END
44
(No Transcript)
45
Applications of Optimization
Engineering design
Business and industry
Biology and medicine
Radiotherapy planning
Economics
Design of materials
Manufacturing design
Bioinformatics Proteomics
Finance
Systems biology
Management
Simulation and modeling
Image registration
46
Optimization in Biology and Biomedicine
Engineering design
Fitting models to experimental data
Optimizing relevant functions
Radiotherapy planning
Design of biomaterials
Soft tissue biomechanics
Biomechanics
Bioinformatics Proteomics
Systems biology
Biomedical Imaging
Biochemical pathways
47
Global and local optimization
48
Global and local optimization
49
Local Optimization
Start
50
Local Optimization
End
51
Global Optimization
Start
52
Global Optimization
End
53
Global Optimization
  • Relatively new vis-à-vis local methods.
  • Many important objective functions are
  • Non-convex.
  • Irregular.
  • Derivatives are not available or cannot be easily
    computed.
  • The high computational cost of many global
    methods precludes their use in time
    critical-applications.

54
Optimization Techniques
Global
Local
Stochastic
Deterministic
Derivative
Derivative-free
Simulated annealing
Gradient descent
Nelder-Mead simplex
Interval analysis
Genetic algorithms
Trust region methods
Powells direction set
Homotopy
Evolutionary strategies
Newton-based methods
Pattern search
Particle swarm
DIRECT
Multidirectional search
55
DIRECT
  • Relatively new method for bound-constrained
    global optimization (Jones et al., 1993).
  • Lipschitzian approach, but the Lipschitz constant
    need not be explicitly specified.
  • Balance of global and local search.
  • Inherently parallel.

Wachowiak and Peters, IEEE TITB 2006. Dru,
Wachowiak, and Peters, SPIE Medical Imaging 2006.
56
DIRECT Algorithm
Sample at points around centers with dimensions
of maximum side length.
Evaluate function values at sampled points.
Normalize
Divide rectangles according to function value.
Identify potentially optimal rectangles.
Convergence?
Group rectangles by their diameters.
NO
YES
END
57
Potentially Optimal Hyperboxes
Potentially optimal HBs have centers that define
the bottom of the convex hull of a scatter plot
of rectangle diameters versus f(xi) for all HB
centers xi.
e 0.001
e 0.1
58
DIRECT Step 1
Normalize the search space and evaluate the
center point of an n-dimensional rectangle.
59
DIRECT Step 2
Evaluate points around the center.
60
DIRECT Step 3
Divide the rectangle according to the function
values.
61
Division of the Search Space
Potentially optimal rectangles with these
centres are divided in the next iteration.
Estimate of Lipschitz constant
Locality parameter
62
DIRECT Step 4
Use Lipschitz conditions and the rectangle
diameters to determine which rectangles should be
divided next.
63
DIRECT Iteration
Repeat steps 2 4 (Evaluate, find potentially
optimal rectangles, divide).
64
DIRECT Convergence
Based on rectangle clustering, number of
non-improving iterations, and function value
tolerance.
65
Particle Swarm Optimization (PSO)
  • Relatively new GO technique (Kennedy Eberhart,
    1995).
  • Iterative, population-based GO method.
  • Based on a co-operative model of individual
    (particle) interactions.
  • In contrast to the generally competitive model of
    genetic algorithms and evolutionary strategies.

66
Particle Swarm Optimization
  • Each individual, or particle, has memory of the
    best position it found so far (best response),
    and the best position of any particle.
  • The next move of the particle is determined by
    its velocity, v, which is a function of its
    personal best and global best locations.

67
Position Update in PSO
Velocity for particle j, parameter i, (t1)-th
iteration
Previous velocity
Effect of the best position found by particle j
(personal best)
Effect of the best position found by any particle
during the entire search (global best)
Position update
68
PSO Parameters
  • Maximum velocity vmax
  • Prevents particles from moving outside the region
    of feasible solutions.
  • Inertial weight w
  • Determines the degree to which the particle
    should stay in the same direction as the last
    iteration.
  • C1, C2
  • Control constants
  • j1, j2
  • Random numbers to add stochasticity to the
    update.

69
Particle Swarm Example
Iteration 1
Iteration 5
Iteration 10
Iteration 20
Iteration 30
Last iteration (41)
70
Proposed Adaptation of PSO to Biomedical Image
Registration
  • Incorporation of constriction factor c (Clerc
    Kennedy, 2002).
  • Registration functions using generalized
    similarity metrics and function stretching
    (Parsopoulos et al., 2001) in the global search.
  • Addition of new memory term that of initial
    orientation, as initial position is usually
    carefully chosen.

71
Proposed Adaptation of PSO to Biomedical Image
Registration
Constriction factor
Previous velocity
Personal best term
Global best term
Initial orientation term
Initial orientation
j1, j2, j3 Control parameters u1, u2, u3
Uniformly distributed random numbers
72
Proposed Adaptation of PSO to Biomedical Image
Registration
73
Parameter Estimation
  • Given a set of experimental data, calibrate the
    model so as to reproduce the experimental results
    in the best possible way (Moles et al., 2003).
  • Inverse problem.
  • Solutions to these problems are instrumental in
    the development of dynamic models, that promote
    functional understanding at the systems level.

74
Parameter Estimation (2)
  • Goal Minimize an objective function that
    measures the goodness of the fit of the model
    with respect to a given experimental data set,
    subject to the dynamics of the system
    (constraints).

http//www.genome.org/cgi/content/full/13/11/2467
75
Example Biochemical Pathways
  • Nonlinear programming problem with differential
    algebraic constraints.
  • The problems are frequently nonconvex and
    multimodal. Therefore, global methods are
    required.
  • Multistart strategies do not work in this case
    multiple convergence to the same local minimum.

76
Biochemical Pathways (2)
77
Grid Computing
78
Parallel Optimization
Coarse-grained
DIRECT
Fine-grained
Powell
MDS
Compute transformation matrix
Apply transformation to all voxels.
Determine valid coordinates.
Interpolate image intensities.
Compute marginal and joint densities.
Calculate the similarity metric.
Wachowiak and Peters, IEEE TITB 2006. Wachowiak
and Peters, IEEE HPCS 2005, 50-56.
79
Optimization Software
  • OPT (http//hpcrd.lbl.gov/meza/projects/opt/O
    PTdoc/html/index.html)

80
References
  • http//www.cs.sandia.gov/opt/survey/
Write a Comment
User Comments (0)
About PowerShow.com