UNCONSTRAINED MINIMIZATION - PowerPoint PPT Presentation

1 / 60

About This Presentation

Title:

UNCONSTRAINED MINIMIZATION

Description:

9/30/09. ENGN8101 Modelling and Optimization. 1. UNCONSTRAINED ... A cantilever beam - design variables = w and h. Thus, x = (w,h)T currently x0 = (1,3)T ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 61

Provided by: Lowe

Category:

more less

Transcript and Presenter's Notes

Title: UNCONSTRAINED MINIMIZATION

1
UNCONSTRAINED MINIMIZATION i.e. as before but
several variables e.g. zero of a function
minimizing the energy in a system least
squares fitting probability distributions etc..
i.e. minimize f(x1, x2, ..xn)
2
Zero of a function

The problem of finding the zero of a function
f(x) is the same as finding x such that f(x) 0
The problem of finding g(x) a can be put in the
form f(x) g(x) a 0

3
Bisection approach
Bisection is the division of a given curve,
figure, or interval into two equal parts
(halves). A simple bisection procedure for
iteratively converging on a solution which is
known to lie inside some interval a,b proceeds
by evaluating the function in question at the
midpoint of the original interval x(ab)/2 and
testing to see in which of the subintervals a,
(ab)/2 or(ab)/2, b the solution lies. The
procedure is then repeated with the new interval
as often as needed to locate the solution to the
desired accuracy.
We illustrate this method by considering the
above-mentioned polynomial p(x) x7 9x5 - 13x
17. Note that p(0)-17 and p(2)373.
Therefore, since p(x) is a continuous function
(i.e., its graph has no breaks''), we know that
there must be a root, say r, in the interval
(0, 2). To close in on r, we now evaluate p at
the midpoint of (0, 2) which is 1. p(1)-20. Now
we see that r must actually lie in the interval
(1, 2) since p switches signs from negative to
positive as x ranges from 1 to 2. So we have
reduced the interval under consideration from
(0, 2) to (1, 2). We have cut the length of our
interval in half, or bisected it. We look next
at the midpoint of (1, 2), namely 1.5. p(1.5)?
48.929 . Thus, r must be in the interval
(1, 1.5). We continue this procedure until a
desired accuracy has been achieved. The following
table summarizes the results of iterating this
technique several times.
4
This is the Bisection Algorithm 1. Specify
bounds x1 and x2 2. Evaluate f1f(x1) and
f2f(x2) 3. Check that the sign of f1 is not
equat to the sign f2. Abandon with error message
if not so. 4. Start iteration. 5. Let xm
(x1x2)/2 6. Evaluate fm f(xm) 7. If sign fm
sign f1 then make x1 xm and f1fm 8. Else
make x2 xm and f2 fm 9. If fm sufficiently
close to zero then stop 10. Repeat from step 4
5
program simple ! Simplest bisection program with
no checks... X79x5-13x17 real f1,x1,f2,x2,
f,x integer tries,max max10 x10. x22.
! f1x179x15-13x117 f2x279x25-13
x217 ! ! Begin iteration.... do tries1,max
x(x1x2)/2.0 falog(x)x if (ff1gt0.0) then !
same sign as f1, which is thus discarded... f1f
x1x else ! must have sign of f2 f2f x2x
end if ! ... iteration finished print , x,f
end do print , 'At x ',x,' function is ',f
end program simple
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
NEWTONS METHODS for multi variables conceptuall
y very strong lead onto many other
methods Basis Hessian matrix i.e. a
second-order method in theory convergence is ?
much quicker (if convergence exists) IDEA to
construct a quadratic approximation to f (x) and
minimize the quadratic at a current point xk,
the quadratic approximation of q (x) is q (x)
f (xk) ?f (xk)T(x xk) 0.5(x xk)T?2f
(xk)(x xk)
22
Assuming ?2f (xk) is positive definite, minimum
of q (x) is found by setting ?q 0 so ?2f
(xk)dk -?f (xk) where dk xk1 xk so
solve for dk then obtain xk1 repeat to
create a series of points xk limit point of
this sequence is optimum x where ?f (x) 0 If
f is quadratic and positive definite -
convergence should be after 1 step! BUT not very
accurate for highly non-linear functions
23
Example f (x) x1 x2 2x1x2 2x12 x22 and
x0 (1,5)T determine the Newtons direction to
minimize f at x0 so ?f 1 2x2 4x1, -1
2x1 2x2T ?f (x0) 15, 11T thus
24
Secant Method While the Newton-Raphson method
has many positive features, it does require the
evaluation of two different functions on each
iteration, f(x) and df(x)/dx. When f(x) is
reasonably simple, it easy to compute df(x)/dx,
but when f(x) is a complicated function, the
computation of the derivative can be tedious at
best. Still many functions have nonelementary
forms (integrals, sums, etc.), and it is
desirable to have a method that converges almost
as fast as Newtons method yet involves only
evaluations of f (x) and not of df(x)/dx. The
secant method will require only one evaluation of
f (x) per step and at a simple root has an order
of convergence R 1.618033989. It is almost as
fast as Newtons method, which has order 2.
25
The secant method for a simple function. We
observe that, unlike the bisecton method, the
initial points do not enclose a point where we
necessarily know a zero of f(x) exists.
26
In the figure the secant method is illustrated
and we can see that the function f(x) is being
approximated by a straight line which is an
extrapolation based on the two points x0 and x1.
The line passing through the points (x0,f(x0))
and (x1,f(x1)) can be seen to be given by
so that solving for the value of x for which y
0 for this line, we have the two-point iteration
formula
The secant method is closely related to the
Newton-Raphson method, and this relationship is
clearly related to the fact that the quantity
27
Becomes df(x)/dx in the limit that xk?xk-1. In
fact, from the Mean Value Theorem of calculus, we
know that as long as f(x) is continuous in the
interval xk,xk-1, there is a point x c in
that interval for which
The secant method does not require the evaluation
of any formal derivative as the Newton-Raphson
does, and requires only one evaluation of f(x)
per iteration. In many cases the absence of any
requirement to compute a derivative is a
significant advantage, not only because there is
no need to perform the formal differentiation,
but because frequently the derivative of a
function is a significantly more complicated
function than the original function.
28
(No Transcript)
29
There are three real root. Starting with the
values p13. and p22.8. Use the secant
method to find a numerical approximation to the
root.
30
(No Transcript)
31
(No Transcript)
32
Example minimizing the energy in a system
nonlinear spring system displacements Q1 and Q2
under the applied load obtained by minimizing
the potential energy ? where the spring
extensions ?L1 and ?L2 relate to Q1 and Q2
33
Basically minimize ?(Q1,Q2) when k1 k2
1lb/in and F1 0, F2 2 we obtain from the
techniques in this chapter Q1 0 and Q2
2.55in
34
If f?C1 then ?f(x) 0 at a local
minimum i.e a stationary point This is a
necessary condition To be a strict local
minimum, ?2f(x) must be positive
definite This is a sufficient condition
35
Example An equation in the form is used to
provide a best fit in the sense of least squares
for the following (x,y) points (1,6), (3,10) and
(6,2). Determine a and b using necessary and
sufficient conditions for least squares
minimize substituting for xi and yi, we
get minimize f 3a2 41/36b2 3ab 58/3b
36a 140 necessary conditions require ?f/ ?a
6a 3b 36 0 and ?f/ ?b 41/18b 3a
58/3 0 giving a 5.1428, b 1.7143, and
f 30.86
36
CONVEXITY the Hessian matrix ?2f is related to
convexity of the function if f?C1 f is convex
over a convex set S if f(y) f(x)
?f(x)T(y-x) for all x, y ? S
example consider the function f x1x2 over
the convex set S (x1,x2)?R2x1gt0, x2gt0 Is f
convex over the set S?
37
Since the function f is twice-continuously
differentiable, we can use the Hessian test. We
have Since det(?2f) -1 Hessian is not
positive definite (Sylvesters check) and
is thus not convex as the diagram shows!
38
Most methods require a start point e.g. x0 a
direction of travel, d0 a direction
vector a step size a0 such that x1 x0
a0d0 direction vector, step size differ
between methods Non-convex problems with
multiple local minima gradient methods only
find local minimum closest to start point
39
Example A cantilever beam - design variables w
and h Thus, x (w,h)T currently x0
(1,3)T find the bending stress at the initial
design if d0 (-1,v5,-2,v5)T and a0
0.2, find the updated design and updated maximum
bending stress
40
Current value the updated design is x1 x0
a0d0 i.e. x1 updated value of the bending
stress is therefore 71,342 psi
41
STEEPEST DESCENT METHOD need a direction
vector If xk current point at the kth
iteration.. the direction vector is given as dk
-?f (xk) - known as the direction of steepest
descent more generally if ?f (xk)Td lt 0 - we
have a descent direction
42
Example given f x1x22, x0 (1,2)T find the
steepest descent direction at x0 and state if d
(-1,2)T is a direction of descent here the
gradient vector ?f (x22,2x1x2) x0 (1,2)T, so
-?f (x0) d (-4,-4)T steepest descent
direction at x0. is d (-1,2)T a descent
direction? - must obey ?f (xk)Td lt 0 ?f (xk)Td
(4,4)(-1,2)T 4 gt 0 so d is not a descent
direction
43
Now, we have the direction along which we
travel.. - but how far do we go? i.e. need a
step size, a along dk such that xk adk ?
x(a) and f (xk adk) ? f (a) again, similar
to experimental design choose a positive step
such that we minimize f (a) i.e. a line search
44
The slope, or derivative f (a) df/da is called
the directional derivative of f along the
direction d and is given by the expression In
the steepest descent method, the direction vector
is -?f (xk) resulting in the slope at the current
point a0 being f (0) -???f (xk)??2 lt 0,
meaning that we move in a downhill direction
45
Quadratic example f (0) ? f (xk) slope at a 0
is negative df(a)/daa0 ?f(xk)Tdk)
-???f(xk)??2 lt 0 when a gt 0 the function
decreases Now take steps along function such
that ai1 ai/t (0.618034) until function
increases
46
However if initial step a1 results in f (a1) gt f
(0) then 0, a3, a2 the three-point pattern
47
STOPPING CRITERIA Assuming eg tolerance on the
gradient, set by a user then if ???f (xk)???
eg, we should perform a line search to gain
optimality otherwise an optimal solution has
been reached (other criteria exist)
48
Steepest descent algorithm i.e. zig-zags
towards optimal point f x12 ax22 each
step orthogonal to previous step the larger
the a value the slower the convergence rate
49
Convergence speed related to condition number
of Hessian matrix i.e. rate of
largest-to-smallest eigenvalue - the smaller,
the better for f x12 ax22, and best a
value 1
50
SCALING Why not scale the design variables such
that ?2f is well-conditioned? e.g. if f x12
9x22 let y1 x1 and y2 3x2 so g y12
y22 i.e. x Ty then g (y) ?f (Ty) with
gradient and Hessian given as ?g TT?f ?2g
TT?2f T
51
Example let f (x1 2)4 (x1 2x2)2, x0
(0,3)T perform one iteration of the steepest
descent method f (x0) 52, ?f (x) 4(x1 2)3
2(x1 2x2), -4(x1 2x2)T thus d0 -?f (x0)
44,-24T normalizing the direction vector to
make it a unit vector gives d0
0.8779,-0.4789T solution to line search
problem minimizes f (a) f (x0 ad0) yielding
a0 3.0841 New point is ? x1 x0 a0d0
2.707,1.523T with f (x1) 0.365
52
Conjugate Gradient Method

Major improvement over steepest descent method
Minimum of quadratic functions of n variables
found in n iterations
No such property exist for steepest descent
method
Powerful for general functions
First presented by Fletcher and Reeves

53
CONJUGATE GRADIENT METHOD better still.! finds
the minimum of a quadratic function of n
variables in n iterations As a start point
consider minimizing the quadratic function q
(x) 0.5xTAx cTx assume A symmetric
positive definite First define conjugate
directions (directions that are mutually
conjugate w.r.t. A) i.e. diTAdj 0, i ?j
54
Mechanism initial point x0 set of conjugate
directions d0, d1, dn-1 minimize q (x) along
d0 to obtain x1 as before then from x1, minimize
q (x) along d1 to obtain x2 etc. until xn is
reached The point xn minimum solution i.e.
after n searches In this method the gradients
of q are used to generate conjugate directions
55
g gradient of q gk ?q (xk) Axk c xk
current point (k iteration index) first
direction (d0) chosen as steepest descent
direction, -g0 minimize q (x) along dk to find
xk1 xk1 xk akdk and ak
(dkTgk)/(dkTAdk) for the exact line search
condition dq (a)/d(a) 0 gives dkTgk1 0
56
Now let dk1 -gk1 ßkdk i.e. a
deflection in the steepest descent direction
gk1 To satisfy conjugacy dk1TAdk 0 so
following on ßk (gk1T(gk1 gk))/(akdkTAdk)
57
The Fletcher-Reeves algorithm k0 initial
point x0 and d0 -?q (x0) line search find ak
from ak (dkTgk)/(dkTAdk) then xk1 from xk1
xk akdk then ßk from ßk (gk1T(gk1
gk))/(akdkTAdk) and dk1 from dk1 -gk1
ßkdk
58
The Fletcher-Reeves algorithm k0 initial
point x0 and d0 -?q (x0) line search find ak
from ak (dkTgk)/(dkTAdk) then xk1 from xk1
xk akdk then ßk from ßk (gk1T(gk1
gk))/(akdkTAdk) and dk1 from dk1 -gk1
ßkdk
59
Example
60
OTHER TOPICS Modified Newton methods Quasi-Newt
on methods Trust region methods read up!

Write a Comment

User Comments (0)