HighPerformance and Parallel Optimization - PowerPoint PPT Presentation

1 / 143
About This Presentation
Title:

HighPerformance and Parallel Optimization

Description:

Plethora of methods (which to use?). Disconnect between CAD and CAE. ... from probability theory, information theory, geometry, topology, optimization, ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 144
Provided by: nipis4
Category:

less

Transcript and Presenter's Notes

Title: HighPerformance and Parallel Optimization


1
High-Performance and Parallel Optimization
  • Mark P. Wachowiak, Ph.D.
  • Department of Computer Science and Mathematics
  • Nipissing University
  • June 3, 2007

2
Objectives
  • Brief overview of optimization and applications.
  • Parallel computing and parallel optimization.
  • Specific examples of parallel optimization.
  • Future directions.

3
Optimization
  • Find the best, or optimum value of an objective
    (cost) function.
  • Very large research area.
  • Multitude of applications.

4
Journals on Optimization
  • SIAM Journal on Optimization

5
Applications of Optimization
Engineering design
Business and industry
Biology and medicine
Radiotherapy planning
Economics
Design of materials
Manufacturing design
Bioinformatics Proteomics
Finance
Systems biology
Management
Simulation and modeling
Image registration
6
Optimization in Biology and Biomedicine
Engineering design
Fitting models to experimental data
Optimizing relevant functions
Radiotherapy planning
Design of biomaterials
Soft tissue biomechanics
Biomechanics
Bioinformatics Proteomics
Systems biology
Biomedical Imaging
Biochemical pathways
7
Optimization in Tomotherapy and Intensity
Modulated Radiotherapy
  • Particle swarm (Li et al., 2005).
  • Cimmino algorithm (Xiao et al., 2004) .
  • Exhaustive search (Wang et al., 2004).
  • Simulated annealing (Rosen et al., 2005).
  • Gradient descent (Rosen et al., 2005).
  • Costlets (Kessler et al., 2005).

http//www.lrcc.on.ca/research/tomo/index.xml
8
New Applications of Optimization
  • Simulation-based optimization
  • Objective function values are the results of
    experiments and/or simulations.
  • Analytic forms for the objective function are
    unavailable.
  • Black box cost function computation.
  • Evaluating f is extremely costly.
  • PDE-constrained optimization.
  • Very large, complex, and high-dimensional.
  • Requires massive computing resources.
  • Inverse problems.
  • Costly computation of the objective function.

9
Applications of PDE-Constrained Optimization
  • Electromagnetics.
  • Aerodynamics.
  • Cardiac electrophysiology.
  • Image processing.
  • Reservoir simulation.
  • Compressible and incompressible flows.
  • Earthquake modeling.

10
Example PDE-Constrained Optimization
Ground-truth model of distribution of wave speed
of an elastic earth medium (portion of the Los
Angeles Basin)
Inverted model from solving an optimization
problem with 17 million inversion
parameters. Solution required 24 hours using 2048
CPUs at the PSC.
O. Ghattas, 2003
11
Parameter Estimation
  • For phenomena modeled by systems of ODEs and
    PDEs.
  • Optimization used to find the system parameters
    that best fit experimental and/or observed data.
  • Applications
  • Biochemical pathways.
  • Environmental modeling.
  • Economics.

12
Definitions
  • Objective/cost function
  • Search space
  • Minimizer
  • Global optimization
  • Local optimization
  • Convergence criteria
  • Stationary point
  • Deterministic method
  • Stochastic method
  • Multiobjective optimization
  • Constrained
  • Unconstrained

13
Combinatorial Optimization
  • A linear or nonlinear function is defined over a
    very large finite set of solutions.
  • Categories
  • Network problems.
  • Scheduling.
  • Transportation.

14
Combinatorial Optimization (2)
  • If the function is piecewise linear, the function
    can be optimized exactly with mixed integer
    program methods using branch and bound.
  • Approximate solutions can be obtained with
    heuristic methods
  • Simulated annealing.
  • Tabu search.
  • Genetic algorithms.

15
General Unconstrained Problems
  • A nonlinear function is defined over the real
    numbers.
  • The function is not subject to constraints, or
    has simple bound constraints.
  • Partitioning strategies utilize a priori
    knowledge of how rapidly the function varies
    (e.g. the Lipschitz constant).
  • Interval methods can be used if the objective
    function can be expressed analytically.

16
General Unconstrained Problems Statistical
Methods
  • Stochastic in nature.
  • Use partitioning to decompose the search space.
  • Use a priori information or assumptions to model
    the objective function.
  • Usually inexact.

17
General Unconstrained Problems Statistical
Methods Examples
  • Simulated annealing.
  • Genetic algorithms.
  • Continuation methods,
  • Transforms the function into a smoother function
    with fewer local minimizers.
  • Applies local minimization procedure to trace the
    minimizers back to the original function.

18
General Constrained Problems
  • A nonlinear function is defined over the real
    numbers.
  • The optimization is subject to constraints.
  • These problems are usually solved by adapting
    techniques for unconstrained problems to address
    constraints.

19
Unconstrained Nonlinear Optimization
  • Unconstrained minimization Find x that
    minimizes a function f(x).

Objective function
20
Bound-Constrained Nonlinear Optimization
  • Find x that minimizes a function f(x).
  • subject to

21
General Constrained Nonlinear Optimization
  • Constrained minimization
  • subject to
  • Either f(x) and/or the c(x) are nonlinear.

Equality constraints
Inequality constraints
22
General Constrained Nonlinear Optimization (2)
  • In general, the problem is to minimize f over a
    compact subset W of RD.

23
Nonlinear Equations Problem
  • Determine a root of a system of D nonlinear
    equations in D variables

24
Example
25
Global Optimization
  • Relatively new vis-à-vis local methods.
  • Many important objective functions are
  • Non-convex.
  • Irregular.
  • Derivatives are not available or cannot be easily
    computed.
  • The high computational cost of many global
    methods precludes their use in time
    critical-applications.

26
Local and Global Optimization
End
End
Start
27
Optimization Techniques
Global
Local
Stochastic
Deterministic
Derivative
Derivative-free
Simulated annealing
Gradient descent
Nelder-Mead simplex
Interval analysis
Genetic algorithms
Trust region methods
Powells direction set
Homotopy methods
Evolutionary strategies
Newton-based methods
Pattern search
Response surface techniques
Particle swarm
DIRECT
Multidirectional search
28
Derivative-Based Optimization
  • If derivatives can be computed or accurately
    estimated (e.g. finite differences), a
    derivative-based method is preferable.
  • If second derivatives are available, Newton based
    approaches should be considered.
  • Generally, these are the most successful,
    efficient approaches.
  • The best known methods for local optimization.

29
Problems with Derivative-Based Approaches
  • Derivatives may not always be available.
  • Finite difference approximations are too
    expensive or inaccurate.
  • There is no a priori information about f.
  • High dimensional problems preclude accurate
    estimates of the gradient.

30
Automatic Differentiation
  • A technique to compute derivatives based on the
    computer code for f.
  • A new program is generated wherein analytical
    derivates are computed based on the original
    code.
  • Utilizes the chain rule and differentials for
    generating the differentiation code.
  • Can be useful for gradient-based optimization of
    difficult functions.

31
Automatic Differentiation Disadvantages
  • Derivatives may not be accurate due to truncation
    error in functions involving solutions to PDEs.
  • Black-box function code contains branching.
  • Black-box code may not be available.
  • Usually requires the original program to be
    written in a specified manner.
  • Interfacing generated code with original program
    may be difficult.

32
Derivative-Free Optimization
  • Direct optimization
  • Relies only on the ability to compute function
    values.
  • Based on the relationship among values, not the
    actual numerical value itself.
  • Can be used for optimizing non-numeric functions.
  • Initial interest in the early 1960s 1970s, then
    sharp decline.
  • Interest revived recently, with the advent of
    simulation-based optimization and other new
    applications (Wright, 1995 Kolda, 2003).

33
Derivative-Free Optimization Motivation
  • Computation of the objective function is very
    time-consuming (expensive).
  • Results of experiments.
  • Results of simulations.
  • Gradient computation is very expensive.
  • Gradients cannot be calculated.
  • Objective function extremely complex, making A.D.
    difficult or impossible.
  • May be discontinuous in places.
  • Objective function is very noisy
  • Calculated values rely on discretization.
  • Observed values are very noisy.

M. Wright, 1995
34
Limited Impact of Optimization
  • Computational difficulties
  • High computational cost (especially G.O.).
  • Plethora of methods (which to use?).
  • Disconnect between CAD and CAE.
  • Realism of the underlying model.

http//www.cas.mcmaster.ca/mopta/mopta01/abstract
s.htmljones
35
Parallel and High-Performance Computing
  • Parallel computing
  • The use of multiple computer resources (usually
    processors or execution threads) to improve the
    efficiency of computationally expensive
    algorithms.
  • High-performance computing
  • Parallel computing, or
  • Optimizing algorithm performance by explicitly
    taking advantage of computational abilities
    inherent in the system hardware and/or software.

36
HPC and Parallel Computing
www.sharcnet.ca
37
Limits to Improvement via Parallelization
  • Not all useful algorithms are inherently
    parallel.
  • They do not have a natural parallel
    implementation.
  • Overhead required in parallelization negatively
    affects performance.
  • Cost of creating and joining threads.
  • Synchronization of threads and/or processes.
  • Communication overhead and/or network issues.
  • Inefficiencies resulting from introducing
    parallelism into an optimal serial algorithm.

38
Parallel Computing Terminology
  • Coarse-grained
  • Each processor has a large amount of work. A
    relatively small number of large computations is
    performed.
  • Fine-grained
  • Each processor works on a small subtask.
  • A relatively large number of small computations
    is performed.

39
Parallel Computing Terminology
  • Distributed memory
  • Processors each have their own memory.
  • Processors communicate through passing messages.
  • Useful for coarse-grained algorithms.
  • Relatively difficult to program (e.g. MPI, PVM).
  • Shared memory
  • All processors share one memory.
  • Inter-processor communication is less important.
  • Useful for fine-grained algorithms, as well as
    for some coarse-grained implementations.
  • Relatively simple to program (e.g. OpenMP,
    pThreads).

40
Speedup and Efficiency
P Number of processors
Scalability For a specified algorithm, the
increase in speedup as a function of the number
of processors P. In general, distributed memory
systems are more scalable than shared memory
systems.
41
Amdahls Law
  • Quantifies the efficiency benefits of
    parallelization for multiprocessing systems.
  • Assumption The problem size is fixed.
  • Notation
  • P number of processors
  • s fraction of intrinsically serial code (0 to 1)
  • 1 s fraction of code that can be parallelized

42
Amdahls Law (2)
s 0.01
Speedup(P)
s 0.05
s 0.2
s 0.5
P
43
Gustafsons Law
  • Quantifies massive parallelism.
  • Underlying principle the parallel or vector part
    of a program scales with the problem size.
  • Serial components, including start-up, program
    loading, serial bottlenecks, and I/O do not grow
    with problem size (are constant).
  • If a problem is run on n processors, speedup is
    linear

44
Gustafsons Law (2)
  • The most efficient way to use a parallel computer
    is to have each processor perform similar work,
    but on different sections of the data (Hillis,
    1998).
  • In this way, the fixed, sequential portion of the
    code can be considered as constant.

45
Three Levels for Introducing Parallelism into
Optimization
  • Parallelize the function, gradient, and
    constraint evaluations.
  • Parallelize the numerical computations (linear
    algebra) and numerical libraries.
  • Parallelize the optimization algorithm at a high
    level i.e. adopt new optimization approaches.

Schnabel, 1995
46
Example Image Registration
  • Aligning and combining images from the same or
    different type of image.
  • Linear
  • Nonlinear/elastic
  • Useful in simulation, modeling, and in planning
    surgical procedures.
  • Employs concepts from probability theory,
    information theory, geometry, topology,
    optimization, parallel computing, and many other
    areas.
  • Once the images are registered, they are fused
    together to provide complementary information.

47
Registration and Fusion
MRI
MRI
Ultrasound
MRI Ultrasound
PET
Histology cryosection
48
Similarity Metric
Small x, y, and z translation errors
Correct registration.
Rotation errors around all three axes.
Large translation and rotation errors.
Wachowiak et al., Proc. SPIE Medical Imaging, 2003
49
Entropy and Mutual Information
Entropy
Mutual Information (MI)
Normalized Mutual Information (NMI)
50
Parallel Optimization Linear Registration
Coarse-grained
DIRECT
Fine-grained
Powell
MDS
Compute transformation matrix
Apply transformation to all voxels.
Determine valid coordinates.
Interpolate image intensities.
Compute marginal and joint densities.
Calculate the similarity metric.
Wachowiak and Peters, 2005 2006
51
Examples of Parallel Optimization Approaches
Deterministic Methods
  • DIRECT (Dividing Rectangles) (Jones et al., 1993
    Wachowiak et al., 2006).
  • Parallel trust region methods (Hough and Meza,
    2002).
  • Asynchronous pattern search (local method)
    (Kolda, 2003).
  • Interval analysis (Eriksson and Lindström, 1995).
  • Parallel Newton-Krylov methods for
    PDE-constrained optimization (Biros and Ghattas,
    1999).
  • Various surface response techniques.

52
Examples of Parallel Optimization Approaches
Stochastic Methods
  • Various parallel simulated annealing approaches.
  • Parallel evolutionary strategies and genetic
    algorithms.
  • Parallel particle swarm optimization (Schutte et
    al., 2004).

53
Very Brief Overview of Gradient-Based Methods
  • Conjugate gradient.
  • Trust-region methods.
  • Can employ all three levels of parallelization
    (Schnabel, 1995), particularly
  • Parallelization of f and ?f.
  • Parallelization of linear algebra and numerical
    libraries.

54
Steepest Descent
  • Derivative-based local optimization method.
  • Simple to implement.

Gradient of f(x) using current x
Current x
x in next iteration
Step size
55
Steepest Descent Disadvantages
  • Unreliable convergence properties.
  • The new gradient at the minimum point of any line
    minimization is orthogonal to the direction just
    traversed.
  • ? Potentially many small steps are taken, leading
    to slow convergence.

56
Conjugate Gradient Methods
  • Assume that both f(x) and ??f(x) can be computed.
  • Assume that f can be approximated as a quadratic
    form
  • ?Optimality condition is

57
Conjugates
  • Given that A is a symmetric matrix, two vectors g
    and h are said to be conjugate w.r.t. A if gTAh
    0.
  • Orthogonal vectors are a special case of
    conjugate vectors gTh 0.
  • Therefore, the solution to the n ? n quadratic
    equation is

58
Conjugate Gradient Method
  • Select successive direction vectors as a
    conjugate version of the successive gradients
    obtained as the method progresses.
  • Conjugate directions are generated as the
    algorithm proceeds.

http//www.srl.gatech.edu/education/ME6103/NLP-Unc
onstrained-Multivariable.ppt
59
Conjugate Gradient Method (2)
Choose x0 Compute g0 ?f(x0) Set h0 -g0
Using a line search, find ak that minimizes f(xk
akhk).
Set xk1 xk akhk.
Fletcher-Reeves
Stopping criteria met?
NO
YES
Return x
Polack-Robiere
Compute new conjugate direction hk1 -gk1
bk hk
60
Two Important Classes of Derivative-Free,
Parallel, Global Optimization
  • Response surface methods.
  • DIRECT (Dividing Rectangles).
  • There are many other promising techniques to
    solve these problems.

61
Response Surface Methods
  • Model the unknown function f as an analytic
    approximation .
  • Response surface.
  • Surrogate model.
  • The model is optimized over W.
  • Assume a deterministic function of D variables at
    n points.

Regis and Shoemaker, 2007
62
Response Surface
Linear regression
Normally distributed error terms N(0, s)
Unknown coefficients to be estimated
Linear or nonlinear function of x.
63
2D Response Surface Fitting Polynomials
d 8
d 4
f(x)
Response
d 6
x
64
Radial Basis Functions
  • Radially-symmetric basis functions for modeling
    scattered data.
  • Used in many response surface approaches.
  • RBFs have nice mathematical properties that
    facilitate analysis (Powell, 1992).

65
Scattered Point Interpolation with Radial Basis
Functions
Interpolate scattered points
Original point cloud from segmented contours in
CT volume.
Enhanced point cloud
Radial basis interpolation
Surface normals
Final RBF model
Courtesy Derek Cool, Robarts Research Imaging
Laboratories
66
Radial Basis Functions
Space of polynomials in D variables of degree ? m.
67
Examples of Radial Basis Functions
r ? 0
68
Compact Support RBFs
r gt 0
otherwise
Compact support ? Positive definite matrices.
69
Examples RBFs
Gaussian with g 2
5th degree CSRBF
f(r)
6th degree CSRBF
8th degree CSRBF
r
70
Radial Basis Functions
Matrix of evaluated radial basis functions
71
Radial Basis Functions
  • The RBF that interpolates (xi, f(xi)) for i
    1,,n is obtained by solving

Polynomial not required if CSRBFs are used.
Coefficients l and c can be found by inversion.
72
Determining New Points
  • Resample in areas having promising function
    values as predicted by the model (Sóbester et
    al., 2004).
  • The next point selected maximizes the expected
    improvement in the objective function (Jones et
    al. 1998).
  • Search in areas of high predicted approximation
    error, using stochastic techniques (Sóbester et
    al., 2004).
  • Select a guess fn for the global minimum, and
    choose a new point xnew such that some function
    snew interpolating all (xi, fi) and (xnew, fn)
    such that some quality measure on snew is
    maximized (e.g. minimizing the bumpiness of
    snew (Gutmann, 2001 Regis and Shoemaker, 2007).

73
Constrained Optimization using Response Surfaces
  • Choose points so that they are at a maximum
    distance from previously-evaluated points.
  • Define the maximum distance between any point not
    evaluated and evaluated points as

Maximin distance
Regis and Shoemaker, 2005.
74
Constrained Optimization using Response Surfaces
(2)
  • Let bi ? 0, 1 denote a user-defined distance
    factor.
  • The next point chosen is required to have a
    distance of at least bi?i from all
    previously-evaluated points.

75
Auxiliary Constrained Minimization Subproblem
  • The auxiliary problem of selecting the next point
    for evaluation is expressed as
  • k denotes the initial number of points
    evaluated.
  • Auxiliary subproblem is generally nonconvex.

76
Solving the Auxiliary Subproblem
  • The bi are typically a search pattern balancing
    global and local search
  • b1 1 ? global exploratory search.
  • bN1 0 ? greedy local search.
  • The RBF model can be differentiated.
  • Gradient-based optimization software.
  • Multistart nonlinear programming solver.
  • Global optimization techniques, such as
    constrained DIRECT.

77
Parallelization
  • Each search pattern contains Ns 1 values,
    where Ns is the cycle length.
  • If P ? Ns 1, the search pattern is cycled
    through by the available processors.
  • If If P gt Ns 1, more values are added to the
    search pattern
  • E.g. With ??0.9, 0.75, 0.25, 0.05, 0.03, 0? (Ns
    5),
  • P 8 ? ??0.9, 0.9, 0.75, 0.25, 0.05, 0.03,
    0.03, 0?

78
Full Algorithm
Select initial points S1 (Latin hypercubes, etc.)
Fit or update using data points x ?Si
Select the next candidate point xki by solving
the constrained auxiliary subproblem. 0 ? bi ? 1
is set by the user.
Evaluate f at points in Si and select value
Convergence?
NO
Evaluate f at xki and update the best function
value thus far.
YES
End
Si1 ?? Si ?? xki Di1 ?? Di ??
(xki,f(xki)) i ? i 1
79
Example
y
x
RBF surface after first iteration
RBF surface after iteration 2
RBF surface after iteration 3
Initial RBF surface
80
DIRECT
  • Relatively new method for bound-constrained
    global optimization (Jones et al., 1993).
  • Lipschitzian approach, but the Lipschitz constant
    need not be explicitly specified.
  • Balance of global and local search.
  • Inherently parallel.

Wachowiak and Peters, 2006 Dru, Wachowiak, and
Peters, 2006.
81
DIRECT Algorithm
Sample at points around centers with dimensions
of maximum side length.
Evaluate function values at sampled points.
Normalize
Divide rectangles according to function value.
Identify potentially optimal rectangles.
Convergence?
Group rectangles by their diameters.
NO
YES
END
82
Potentially Optimal Hyperboxes
Potentially optimal HBs have centers that define
the bottom of the convex hull of a scatter plot
of rectangle diameters versus f(xi) for all HB
centers xi.
e 0.001
e 0.1
83
Example
84
Example
Normalize the search space and evaluate the
center point of a D-dimensional rectangle.
85
Example
Evaluate points around the center.
Iteration 1
86
Example
Divide the rectangle according to the function
values.
87
Example
Use Lipschitz conditions and the rectangle
diameters to determine which rectangles should be
divided next.
88
Division of the Search Space
Potentially optimal rectangles with these
centres are divided in the next iteration.
Estimate of Lipschitz constant
Locality parameter
89
Example
Evaluate points in the potentially optimal
rectangles.
Iteration 2
90
Example
Iterate (evaluate, divide, find potentially
optimal rectangles) until stopping criteria are
met.
91
Example
92
Example
Iteration 3
93
Example
Iteration 4
Iteration 5
Iteration 10
Iteration 20
Iteration 110
Iteration 50
94
Parallel DIRECT
  • Watson, 2001 Wachowiak 2006.

95
Branch and Bound
96
Clustering Methods
  • Used for unconstrained real-valued functions.
  • Multistart procedure - local searches are
    performed from mutliple points distributed over
    the search space.
  • The same local minimum may identified by multiple
    points. Clustering methods attempt to avoid this
    inefficiency by careful selection of points at
    which the local search is initiated.

97
Clustering Methods General Algorithm
  • Sample points in the search space.
  • Transform the sampled points to group them around
    the local minima.
  • Apply a clustering technique to identify groups
    representing convergence basins of local minima.

98
Clustering Methods Disadvantages
  • Because a potentially large number of points are
    randomly sampled to identify the clusters in
    neighborhoods of local minima, the objective
    function must be relatively inexpensive to
    compute.
  • Clustering methods are not suited to
    high-dimensional problems (more than a few
    hundred variables).
  • However, coarse-grained parallelism may be useful
    in improving efficiency.

99
Genetic Algorithms and Evolutionary Strategies
  • Sdf
  • Sdf
  • Sdf

100
Genetic Algorithms
Initialize the population
Evaluate the function values of the population
Perform competitive selection
Generate new solutions from genetic operators
NO
Stopping criteria met?
YES
END
101
Parameter Estimation
  • Given a set of experimental data, calibrate the
    model so as to reproduce the experimental results
    in the best possible way (Moles et al., 2003).
  • Inverse problem.
  • Solutions to these problems are instrumental in
    the development of dynamic models, that promote
    functional understanding at the systems level.

102
Parameter Estimation (2)
  • Goal Minimize an objective function that
    measures the goodness of the fit of the model
    with respect to a given experimental data set,
    subject to the dynamics of the system
    (constraints).

http//www.genome.org/cgi/content/full/13/11/2467
103
Example Biochemical Pathways
  • Nonlinear programming problem with differential
    algebraic constraints.
  • The problems are frequently nonconvex and
    multimodal. Therefore, global methods are
    required.
  • Multistart strategies do not work in this case
    multiple convergence to the same local minimum.

104
Biochemical Pathways (2)
105
Grid Computing
  • Nebro et al, 2007.

106
Optimization Software
  • OPT (http//hpcrd.lbl.gov/meza/projects/opt/O
    PTdoc/html/index.html)

107
Conclusion
  • Many new applications of optimization need
    methods not relying on derivatives.
  • Cost of computing f is extremely costly.
  • Parallel computation, particularly coarse-grained
    parallelism, provides the efficiency needed to
    solve large black-box problems.

108
Future Directions
  • The new focus is on deterministic global
    approaches.
  • New global optimization algorithms should be
    inherently parallel to maximize scalability.
  • Converge proofs
  • Easier for derivative-based methods.
  • Especially important for stochastic methods.
  • Improvements and performance-tweaking of
    optimization software.
  • Hybrid fine-grained and coarse-grained
    parallelization.
  • Exploiting the three levels of parallelism for
    optimization.

109
Future Directions (2)
  • Load balancing
  • Asynchronous versions of parallel techniques.
  • Selecting the appropriate methods for specific
    problems.
  • If derivatives are available, use gradient-based
    methods.
  • Otherwise, parallel direct methods should be
    used.
  • If all else fails, consider parallel stochastic
    methods.

110
Thank you.
111
Radial Basis Functions
  • The RBF that interpolates (xi, f(xi)) for i
    1,,n is obtained by solving

Polynomial not required if CSRBFs are used.
Coefficients l and c can be found by inversion.
112
Radial Basis Functions
113
Response Surface Methods
  • Model the unknown function f as an analytic
    approximation .
  • Response surface.
  • Surrogate model.
  • The model is optimized over W.
  • Assume a deterministic function of D variables at
    n points.
  • Denote sampled point i by xi (xii , xiD), i
    1, , n.
  • The associated function value is yi y(xi).

Regis and Shoemaker, 2007
114
DIRECT Step 1
Normalize the search space and evaluate the
center point of an n-dimensional rectangle.
115
DIRECT Step 2
Evaluate points around the center.
116
DIRECT Step 3
Divide the rectangle according to the function
values.
117
Division of the Search Space
Potentially optimal rectangles with these
centres are divided in the next iteration.
Estimate of Lipschitz constant
Locality parameter
118
DIRECT Step 4
Use Lipschitz conditions and the rectangle
diameters to determine which rectangles should be
divided next.
119
DIRECT Iteration
Repeat steps 2 4 (Evaluate, find potentially
optimal rectangles, divide).
120
DIRECT Convergence
Based on rectangle clustering, number of
non-improving iterations, and function value
tolerance.
121
Global and local optimization
122
Global and local optimization
123
Local Optimization
Start
124
Local Optimization
End
125
Global Optimization
Start
126
Global Optimization
End
127
Particle Swarm Optimization (PSO)
  • Relatively new GO technique (Kennedy Eberhart,
    1995).
  • Iterative, population-based GO method.
  • Based on a co-operative model of individual
    (particle) interactions.
  • In contrast to the generally competitive model of
    genetic algorithms and evolutionary strategies.

128
Particle Swarm Optimization
  • Each individual, or particle, has memory of the
    best position it found so far (best response),
    and the best position of any particle.
  • The next move of the particle is determined by
    its velocity, v, which is a function of its
    personal best and global best locations.

129
Position Update in PSO
Velocity for particle j, parameter i, (t1)-th
iteration
Previous velocity
Effect of the best position found by particle j
(personal best)
Effect of the best position found by any particle
during the entire search (global best)
Position update
130
PSO Parameters
  • Maximum velocity vmax
  • Prevents particles from moving outside the region
    of feasible solutions.
  • Inertial weight w
  • Determines the degree to which the particle
    should stay in the same direction as the last
    iteration.
  • C1, C2
  • Control constants
  • j1, j2
  • Random numbers to add stochasticity to the
    update.

131
Particle Swarm Example
Iteration 1
Iteration 5
Iteration 10
Iteration 20
Iteration 30
Last iteration (41)
132
Proposed Adaptation of PSO to Biomedical Image
Registration
  • Incorporation of constriction factor c (Clerc
    Kennedy, 2002).
  • Registration functions using generalized
    similarity metrics and function stretching
    (Parsopoulos et al., 2001) in the global search.
  • Addition of new memory term that of initial
    orientation, as initial position is usually
    carefully chosen.

133
Proposed Adaptation of PSO to Biomedical Image
Registration
Constriction factor
Previous velocity
Personal best term
Global best term
Initial orientation term
Initial orientation
j1, j2, j3 Control parameters u1, u2, u3
Uniformly distributed random numbers
134
Proposed Adaptation of PSO to Biomedical Image
Registration
135
Parallel Optimization
Coarse-grained
DIRECT
Fine-grained
Powell
MDS
Compute transformation matrix
Apply transformation to all voxels.
Determine valid coordinates.
Interpolate image intensities.
Compute marginal and joint densities.
Calculate the similarity metric.
Wachowiak and Peters, IEEE TITB 2006. Wachowiak
and Peters, IEEE HPCS 2005, 50-56.
136
References
  • http//www.cs.sandia.gov/opt/survey/

137
Example
y
y
x
x
138
Steepest Descent
  • Derivative-based local optimization method.
  • Simple to implement.

Gradient of f(x) using current x
Current x
x in next iteration
Step size
139
Steepest Descent Disadvantages
  • Unreliable convergence properties.
  • The new gradient at the minimum point of any line
    minimization is orthogonal to the direction just
    traversed.
  • ? Potentially many small steps are taken, leading
    to slow convergence.

140
Conjugate Gradient Methods
  • Assume that both f(x) and ??f(x) can be computed.
  • Assume that f can be approximated as a quadratic
    form
  • ?Optimality condition is

141
Conjugates
  • Given that A is a symmetric matrix, two vectors g
    and h are said to be conjugate w.r.t. A if gTAh
    0.
  • Orthogonal vectors are a special case of
    conjugate vectors gTh 0.
  • Therefore, the solution to the n ? n quadratic
    equation is

142
Conjugate Gradient Method
  • Select successive direction vectors as a
    conjugate version of the successive gradients
    obtained as the method progresses.
  • Conjugate directions are generated as the
    algorithm proceeds.

http//www.srl.gatech.edu/education/ME6103/NLP-Unc
onstrained-Multivariable.ppt
143
Conjugate Gradient Method (2)
Choose x0 Compute g0 ?f(x0) Set h0 -g0
Using a line search, find ak that minimizes f(xk
akhk).
Set xk1 xk akhk.
Fletcher-Reeves
Stopping criteria met?
NO
YES
Return x
Polack-Robiere
Compute new conjugate direction hk1 -gk1
bk hk
Write a Comment
User Comments (0)
About PowerShow.com