Evolutionary Algorithms EVO Introduction and Local Search L1 and L2

About This Presentation

Title:

Evolutionary Algorithms EVO Introduction and Local Search L1 and L2

Description:

To familiarise you with a variety of nature-inspired problem solving techniques, ... Huygens Competition. http://gungurru.csse.uwa.edu.au/cara/huygens/cec2006.php ... – PowerPoint PPT presentation

Number of Views:155

Avg rating:3.0/5.0

Slides: 55

Provided by: SusanS156

Category:

more less

Transcript and Presenter's Notes

Title: Evolutionary Algorithms EVO Introduction and Local Search L1 and L2

1
Evolutionary Algorithms (EVO)Introduction and
Local Search (L1 and L2)

John A. Clark
Professor of Critical Systems
Non-Standard Computation Group

2
Aim of the Module

To familiarise you with a variety of
nature-inspired problem solving techniques, in
particular those inspired by concepts in
evolution
Genetic algorithms (of various sorts)
Simple GA
More advanced GAs
Estimation of distribution algorithms (EDAs)
Multi-objective Genetic Algorithms (MOGAs)
Genetic programming
Grammatical evolution
Evolutionary strategies
Co-evolution.
Will also give a variety of unusual applications
towards the end of the module.

3
Presentational Strategy

Generally aim to indicate why techniques emerged.
Technique A will have its strengths.
Technique A also has its limitations
too limited domain of application, e.g. only
applies to continuous functions.
cannot cope practically with large problems, e.g.
algorithms might take longer than the age of the
universe to find a solution, or may require
infeasible amounts of memory.
basic operation seems ill suited to the problem
at hand.
Technique B is developed to fix some aspect of
As deficiencies.
Technique B has its strengths
Technique B also has its limitations
.

4
Lectures and Practicals

Lectures in weeks 6,7,8,9
Monday 10.15-12.15. P/L/001 (Physics)
Thursday 16.15-18.15. ATB/056 (Langwith)
Practicals in weeks 7,8,9,10
Wednesday 9.15-11.15 CS/007 (Computer Science)

6
7
8
9
10
5
Orders Matter!

Scenario 1
Please tell me what your problem is.
Pause
I think the answer to your problem is
nature-inspired computation.

Scenario 2
I think the answer to your problem is
nature-inspired computation.
Pause.
Please tell me what your problem is.

6
A world outside NI-computation

Optimisation did not start with NI-computation.
There is a great deal of research in mathematics
and operations research aimed at finding optimal
or indeed good solutions to problems.
Before we decide I think the answer to your
problem is nature-inspired computation lets
take a brief look at a few of the techniques out
there.
Let me give you an excellent piece of advice

7
Evolutionary ComputationJust say No!
8
Linear Programming

Linear programming is an extremely powerful
solution technique.
Maximize f(x1,..,xn) c c1x1.. cnxn
Subject to the linear constraints
a11x1.. a1nxnltb1
a21x1.. a2nxnltb2
ak1x1.. aknxnltbk

9
LP Example (Taha)

Suppose a paint manufacturer makes two types of
paint (interior and exterior).
Interior paint sells for 2000 per tonne and
exterior paint sells for 3000 per tonne.
Making each tonne of paint requires the indicated
amounts of raw materials A and B.
Demand for interior paint cannot exceed that for
exterior by more than 1 tonne.
Maximum demand for interior paint is 2 tonnes.
How much of each type should be produced?

10
LP Example (Taha)

Let x be the amount of exterior paint and y be
the amount of interior paint produced.
Maximise
f(x,y) 2000 y 3000x
subject to
x2ylt6
2xylt8
y-x lt 1
ylt2
xgt0, ygt0

y Interior
Dotted lines show values of f(x,y)constant Optimu
m occurs when x3 1/3, y1 1/3
Optimum occurs here
3
6
x Exterior
For excellent introduction to LP see Tahas
Operations Research
11
Linear Programming

A vast amount of work exists in the area of
linear programming.
Commercial and freeware tools available.
What if not continuous variables?
Integer programming techniques exist.
Mixed variable type techniques too.
Dont blunder into using NI-computational
techniques just because you dont know if there
are any special purpose techniques for the
problem at hand.
Look for them!!!

12
Dijkstras Shortest Path Algorithm
Dijkstra's Algorithm solves the single-source
shortest path problem in weighted graphs.
Non-negative weights
1 function Dijkstra(G, w, s) 2 for each vertex
v in VG // Initializations 3 dv
infinity 4 previousv undefined 5 ds
0 // Distance from s to s 6 S empty set
7 Q VG // Set of all vertices 8 while Q
is not an empty set // The algorithm itself 9
u Extract_Min(Q) 10 S S union u 11
for each edge (u,v) outgoing from u 12 if
du w(u,v) lt dv // Relax (u,v) 13 dv
du w(u,v) 14 previousv u
http//en.wikipedia.org/wiki/Dijkstra's_algorithm
13
Dijkstras Shortest Path Algorithm
14
Dijkstras Shortest Path Algorithm

With simple implementation using linked lists
this algorithm has complexity O(V2)
With sparse graphs and with smart implementation
(e.g. using Fibonacci heap) this can be improved
to O(E V logV).
There are a few constraints (e.g. non-negative
edge weights), but.
if you have a shortest path problem like the one
shown, use a shortest path algorithm.
Note there are further algorithms that handle
some limitations, e.g. Bellman-Ford algorithm
allows negative edge weights.

15
Dijkstras Shortest Path Algorithm
C
0.99
0.99
E
Most reliable path from A to D is?
0.97
A
0.99
0.98
0.98
D
B
Now consider a network where the probability of a
message passing reliably across a link is as
shown. Reliability of a path is now the product
of the edge weights (probabilities) along that
path.
Time to reach for evolutionary computation?????
No! Take logs of weights. Add the minimum to all.
And now use Dijkstras SPA.
16
Moral of the Tale
Even if your problem does not seem to be solvable
by known efficient algorithms, a transformation
of it might be.
17
Calculus 1 variable

Calculus is a well known method of finding
optima.
Find minimum of
y(x)3x22x1
dy/dx6x2
Set dy/dx0
gtx-1/3,y2/3
Strictly should checkd2y/dx26 gt0for minimum

y(x) is a differentiable function. Same sorts of
ideas apply in higher dimensions
18
Finding zeroes of a polynomial function

Calculus is a well known method of finding
optima.
Find zeroes of
y(x)x2-5x6
Analytic solutions exist for quadratics
y(x)ax2bxc0
And for cubics
y(x)ax3bx2cxd0
And for quartics
y(x)ax4bx3cx2dxe0
And for quintics?????

Not pleasant! Look it up.
Not pleasant! Look it up.
No formulae exist.
19
Moral of the Tale
If your problem looks like a (special purpose)
nail
use a (special purpose) hammer
Available from all good mathematics and
operations research departments at reasonable
cost. Read the small print.
20
A bit of guidance goes a long way
Newton-Raphson zero finding
0.9
0.7
0.5
0.3
0.1
x1
x2
x3
x0
-0.1
-0.3
0.5
0.7
0.9
1.1
1.3
1.5
1.7
1.9
2.1
2.3
2.5
Approach uses the gradient at the current xn to
guide movement in the right direction to generate
a better xn1
21
Local Search
22
Local Search Procedure

A local search comprises a trace of execution
Trace (s0,r0), (s1,r1), , (s2,r2), ,
(send,rend)
sk are members of the search space
rk are the fitness values (evaluated in the
solution space) of the corresponding sk
Consecutive sk are related in a particular way.
for all k1..(end-1) . Sk1 is_in
Neighbourhood(sk)
The neighbourhood function N(sk)Neighbourhood(sk)
defines a set of points that are somehow deemed
to be near to or close to or in the
locality of (sk)

23
Some Local Search Questions

How do you determine the start state s0?
How do you define the the neighbourhood function
N() ?
How do you determine which member of
neighbourhood is selected to be the next state?

24
Hill Climbing

Let the current solution or point be x.
Define the neighbourhood N(x) to be the set of
solutions that are close to x
If possible, move to a neighbouring solution that
improves the value of f(x), otherwise stop.
Choose any y as next solution provided f(y) gt
f(x)
weak hill-climbing (dont go down)
Choose y as next solution such that
f(y)supf(v) v in N(x)
steepest gradient ascent (climb as fast as you
can)
For many purposes hill-climbing works very well
particularly when you climb the right hill - but
there is a problem.

25
Local Optimisation - Hill Climbing
Neighbourhood of a point x might be
N(x)x1,x-1
f(x)
Second (right hand) choice of x0 much better!
Search goes x0 ? x1 ? x2 ? xopt
sincef(x0)ltf(x1)ltf(x2) lt f(xopt) gt f(xopt1)
Search x0 ? x1 ? x2 sincef(x0)ltf(x1)ltf(x2) gt
f(x3)
26
Landscapes

The hills, peaks and the like are what we
generally refer to as the fitness landscape.
We have laid out the solutions in an ordered way
and can see how the fitness varies as we traverse
the solution space.
This is easy to visualise with 1- or 2-D
solutions spaces but the idea generalises.

27
Landscapes

Things that may affect our searches
Fitness differences between neighboring solution
space points
Generally referred to as ruggedness
Smooth landscapes, jagged/spiky landscapes,
fractal landscapes.
The number of local optima
A single local optimum helps a lot!!!!!
Many, many problems of interest have multiple
local optima
The distribution of the local optima in the
search space
Are optima similar, e.g. do they all have
particular important characteristics, or do
optima occur in radically different parts of
search space?
Can have implications if you try to mate
solutions as part of the search process (e.g.
genetic algorithms)
Sometimes possible to construct the global
optimum from many local optima.
The topology of basins of attraction of the local
optima
Search techniques have their own characteristics,
once the candidate solution falls into a local
optimums territory it may not be able to escape!
Great if you fall into the territory of the
solution you want. Not so good if you have been
taken prisoner by mediocrity (stuck in a
distinctly sub-optimal local optimum)
Basins of attractions can be very complex (cf
Mandelbrot set)

28
Measures

Wont go into details here but a variety of
measures have been proposed, e.g.
Time series autocorrelation
Go on a random walk around the search space and
measure correlation between f(xt) and f(xtk)
Fitness distance correlation
Developed for use in genetic algorithms
Measures how much fitness increases as we
approach a local optimum.

New Ideas in Optimisation. Editors Corne, Dorigo
and Glover
29
Nastier- Fractal Landscapes
Given 1000 function evaluations whats the best
you can achieve? Huygens Competition.
http//gungurru.csse.uwa.edu.au/cara/huygens/cec20
06.php
30
Local search solution No pain no gain
Allow non-improving moves so that it is possible
to go down
z(x)
in order to rise again
to reach global optimum
x
31
Simulated Annealing

Inspired by physics.
In condensed matter physics, annealing is known
as a thermal process for obtaining low energy
states of a solid in a heat bath.
Two steps
increase the temperature of the heat bath until
the solid metal melts and
decrease carefully the temperature of the heat
bath until the particles arrange themselves in
the ground state of the solid.
In liquid phase particles arrange themselves
randomly. In ground state the particles are
arranged in a highly structured lattice and the
energy of the system is minimal.
Compare with quenching very rapid lowering of
temperature (e.g. by dropping into a bath of cold
water).

32
Simulated Annealing

Thermal equilibrium is characterised by the
Boltzmann distribution - the probability of being
in state i with energy Ei is given by
Simulated annealing mimics the trajectory of
physical transitions between states of various
energies as thermal equilibrium is achieved.

33
Simulated Annealing

Candidate solutions in a combinatorial
optimisation problem are equivalent to the states
of a physical system.
The cost of a solution is equivalent to the
energy of a state.
We know that if we cool metals carefully enough,
we can achieve very low energy states.
Why not ape that process for optimisation?
Thats the inspiration for simulated annealing.
Transitions between states (candidate solutions)
are carried out probabilistically in an analogous
manner to the distribution for describing
physical state transitions.

34
Simulated Annealing

Improving moves always accepted
Non-improving moves may be accepted
probabilistically and in a manner depending on
the temperature parameter T. Loosely
the worse the move the less likely it is to be
accepted
a worsening move is less likely to be accepted
the cooler the temperature
The temperature T starts high and is gradually
cooled as the search progresses.
Initially (when things are hot) virtually
anything is accepted, at the end (when things are
nearly frozen) only improving moves are allowed
(and the search effectively reduces to
hill-climbing)

35
Simulated Annealing (Minimisation)
At each temperature Tk consider Lk moves
Always accept improving moves
Accept worsening moves probabilistically. Gets
harder to do this the worse the move. Gets
harder as Temp decreases.
Calculate next number of trial moves at
TkCalculate next temperature Tk
36
Acceptance Criterion
If Dlt0 then we clearly have an improvement and
we move to the trial state Snew. If Dgt 0 then
we clearly have a non-improving move. We also
have -D/Tk lt 0 and so 0ltexp(-D/Tk
)lt1 Therefore exp(-D/Tk )gtU(0,1) is a
probabilistic test for acceptance.
37
Cooling the System

It is possible to do this in various ways.
Most common is geometric cooling. This simply
reduces the temperature by some multiplicative
factor a, where 0lt a lt1. Thus we have TkT(k-1) x
a
Cooling factors most typically in the range 0.8
0.99 (with a bias towards the higher end).
Other methods are possible, e.g. logarithmic
cooling but the rough rate of cooling is
generally found to be more important than the
precise means of reduction.
THE RATE OF COOLING MATTERS A LOT
Also, if you thing the search isnt going well
(i.e. getting stuck) then you can reheat the
system too.

38
Achieving Thermal Equilibrium

At each temperature a number Lk of trial moves
are investigated.
How big should Lk be?
Harder to say.
There is some theoretical advice on how many
moves you need to consider, but most people
simply experiment.
People want results in good time and so feel a
need to take short cuts. If experiments are not
giving good enough results then greater values of
will then be used.
Some researchers spend less time at higher
temperatures.
Many simply make Lk constant over all k.

39
Very Basic Simulated Annealing Example
Iteration
1 Do 400 trial moves
2 Do 400 trial moves
3 Do 400 trial moves
4 Do 400 trial moves
m Do 400 trial moves
n Do 400 trial moves
40
Simulated Annealing

Simulated annealing is a tremendously simple form
of search.
The theory is based on Markov chains
(Markovgtlack of memory property)
Once the search reaches a state, it doesnt
matter how it got there. It effectively forgets
its past.
To move from sk to a neighbouring state sk1 that
state must
Be selected for consideration, with some
probability pk,k1
Pass the acceptance test, with some probability
qk,k1
These probabilities may vary between
temperatures within a temperature cycle they are
history independent.

41
Initial State

Most common approach to initial state selection
is random choice.
Though to overcome some of the limitations of the
technique, multiple runs with different starting
states may be carried out.

42
Initial Temperature

How do you choose the initial temperature T0?
We want a temperature at which a lot of moves are
accepted.
One way is to progressively increase the
temperature and execute an inner loop. When the
acceptance rate reaches, say, 95, we have an
appropriate T0 and can begin the annealing
proper.
Some tools progressively double temperature as
the means of determining T0 .

43
I want to Stop!

Time constrained usually.
Various criteria
No state change for a long time (you decide).
Temperature below some threshold.The following
(by Lundy and Mies) aims to provide a result
within e of the global optimum with probability
q.
A real solution has been detected.

44
Neighbourhoods

One aspect of neighbourhood definition concerns
rapid cost function evaluation.
Sometimes it is possible to simply calculate the
change in cost function. For example, in the
Travelling Salesperson Problem.

4
4
4
6
6
8
9
7
7
5
5
5
New 4756(45) Old ((45)-(89)) Old
Delta
45
Neighbourhoods

You are given a set of integers SS1, S2,,Sn
Can you find a subset Q of these integers that
sums to a particular value V?

S Q u (S-Q) A move could put an element in
(S-Q) into Q or remove an element from Q. Let
target sum V be 274. Then Current cost
(4376963286) - 274
333-27459Delta cost easy to calculate too
(maintain current subset sum).
43
76
In the subset
96
32
86
40
Not in the subset
56
13
97
46
Neighbourhoods

But the real point to note is YOU DEFINE THE
NEIGHBOURHOOD.
Variations are clearly possible. For subset
problem
We changed the in/out status of a single element
But we could have changed the status of 2 (3,
4,..) such elements.
Of course, the number of elements in the
neighbourhood gets larger with such k-element
modification.
Need to take into account the cost function too
the changes in cost should not be too radical
we should have some degree of continuity in the
neighbourhood.

47
Lack of Memory may be a Problem

There is nothing in standard annealing to prevent
you going back to previously visited states
during the search.
Despite its ability to accept worsening moves you
may still get stuck in local optima.
Our next technique (tabu search) aims to
incorporate memory as part of its search
procedure. This promotes diversification.

48
Tabu Search

Simulated annealing theory is based on Markov
chains
No memory. Once the search reaches a point, it
doesnt matter how it got there.
Tabu search adds memory to the search with
notions such as
Tabu list. When a move is taken (or state
visited), it is placed on the tabu list for some
number L moves. It usually cannot be retaken for
the next L moves it is tabu. This helps
avoid cycles in the search. It is a form of short
term memory for the search. Promotes
diversification.
Aspiration. But if taking a tabu move would give
the best result yet then it may be taken!
Promotes convergence.
Frequency. Long-term memory. Can be used to
ensure particular move types are not taken too
frequently over the whole search. Again, promotes
diversification.

49
Tabu Search - Example
Fitness10
Fitness16
2
3
4
5
6
7
Top 5
2
3
4
5
6
7
Top 5
1
5,4
6
1
3,1
2
2
7,4
4
2
2,3
1
3
3,6
2
3
3,6
-1
4
2,3
0
4
3
7,1
-2
5
4,1
-1
5
6,1
-4
Tabu length 3 for this example
6
6
Example is a permutation problem representing
order of filter applications ( a move is simply a
pairwise swap) and is taken from Tabu Search
tutorial by Glover and Laguna in Modern
Heuristic Techniques for Combinatorial Problems
(edited by Colin Reeves). An excellent
introduction to search generally.
50
Tabu Search -Example
Fitness14
Fitness18
2
3
4
5
6
7
Top 5
2
3
4
5
6
7
Top 5
1
2
4,5
6
1
3
1,3
-2
Tabu
Tabu
2
3
5,3
2
2
2,4
-4
3
7,1
0
3
7,6
-6
Tabu
4
1
1,3
-3
4
2
4,5
-7
Tabu
5
2,6
-6
5
5,3
-9
6
6
Aspiration suggests we should take the tabu move
anyhow.
51
Local Search Trajectories

There may be a great deal of useful information
in the trajectory (trace) of a search.
We are doing guided search and each decision
whether to move to a new state is based on
information from the cost function landscape.
Why throw all this away?
The final result is only one of the outputs
from a search.
Sometimes analysis of the trajectory may provide
information on the desired solution, even when
the final result is not that desired solution.
Failure may not be as bad as it seems!

52
Local Searches

Why bother with anything else?
The local optimum you end up in may depend very
much on the initial starting state. If the search
space is sufficiently large it may be very
unlikely that a local search technique will find
the global optimum (within any reasonable amount
of time anyhow).
Some population techniques have been found to
sample search spaces more effectively and learn
features of high performing solutions.
A good deal of experimentation has shown that
population based approaches may give better
overall results but often only when elements of
local search are added to do the final tuning.
The reading spectacles of local search are
great. Other techniques have better binoculars.
Other approaches may get you near the summit of
Everest very effectively. But local search then
gives you the oxygen and the kit to ascend to the
peak within view.
A simple local search (e.g. annealing) is very
cheap to implement. Try local search first, and
then something more sophisticated if you need to.

53
Local Searches

Local searches may be used in other ways too.
Multiple runs may be employed.
If we can identify several local optima we may be
able to search the space more effectively using
this information.

54
Summary and Comments

If your problem looks like a special purpose nail
use a special purpose hammer. Otherwise move
on.
Local search progressively investigates solutions
close to the current solution. Effectively, the
search trace is a walk around the search space,
where only small steps can be taken.
But small step is defined by you.
Some steps you just want to take, and others you
take in the hope of reward later.
Convergence is promoted in various ways (e.g.
always accept improving moves, aspiration
criterion)
Diversification is promoted in various ways
(e.g. probabilistically accept non-improving
moves, tabu lists)
Local search is good place to start...