Title: Evolutionary Algorithms EVO Introduction and Local Search L1 and L2
1Evolutionary Algorithms (EVO)Introduction and
Local Search (L1 and L2)
- John A. Clark
- Professor of Critical Systems
- Non-Standard Computation Group
2Aim of the Module
- To familiarise you with a variety of
nature-inspired problem solving techniques, in
particular those inspired by concepts in
evolution - Genetic algorithms (of various sorts)
- Simple GA
- More advanced GAs
- Estimation of distribution algorithms (EDAs)
- Multi-objective Genetic Algorithms (MOGAs)
- Genetic programming
- Grammatical evolution
- Evolutionary strategies
- Co-evolution.
- Will also give a variety of unusual applications
towards the end of the module.
3Presentational Strategy
- Generally aim to indicate why techniques emerged.
- Technique A will have its strengths.
- Technique A also has its limitations
- too limited domain of application, e.g. only
applies to continuous functions. - cannot cope practically with large problems, e.g.
algorithms might take longer than the age of the
universe to find a solution, or may require
infeasible amounts of memory. - basic operation seems ill suited to the problem
at hand. - Technique B is developed to fix some aspect of
As deficiencies. - Technique B has its strengths
- Technique B also has its limitations
- .
4Lectures and Practicals
- Lectures in weeks 6,7,8,9
- Monday 10.15-12.15. P/L/001 (Physics)
- Thursday 16.15-18.15. ATB/056 (Langwith)
- Practicals in weeks 7,8,9,10
- Wednesday 9.15-11.15 CS/007 (Computer Science)
6
7
8
9
10
5Orders Matter!
- Scenario 1
- Please tell me what your problem is.
- Pause
- I think the answer to your problem is
nature-inspired computation.
- Scenario 2
- I think the answer to your problem is
nature-inspired computation. - Pause.
- Please tell me what your problem is.
6A world outside NI-computation
- Optimisation did not start with NI-computation.
- There is a great deal of research in mathematics
and operations research aimed at finding optimal
or indeed good solutions to problems. - Before we decide I think the answer to your
problem is nature-inspired computation lets
take a brief look at a few of the techniques out
there. - Let me give you an excellent piece of advice
7Evolutionary ComputationJust say No!
8Linear Programming
- Linear programming is an extremely powerful
solution technique. - Maximize f(x1,..,xn) c c1x1.. cnxn
- Subject to the linear constraints
- a11x1.. a1nxnltb1
- a21x1.. a2nxnltb2
-
- ak1x1.. aknxnltbk
9LP Example (Taha)
- Suppose a paint manufacturer makes two types of
paint (interior and exterior). - Interior paint sells for 2000 per tonne and
exterior paint sells for 3000 per tonne. - Making each tonne of paint requires the indicated
amounts of raw materials A and B. - Demand for interior paint cannot exceed that for
exterior by more than 1 tonne. - Maximum demand for interior paint is 2 tonnes.
- How much of each type should be produced?
10LP Example (Taha)
- Let x be the amount of exterior paint and y be
the amount of interior paint produced. - Maximise
- f(x,y) 2000 y 3000x
- subject to
- x2ylt6
- 2xylt8
- y-x lt 1
- ylt2
- xgt0, ygt0
y Interior
Dotted lines show values of f(x,y)constant Optimu
m occurs when x3 1/3, y1 1/3
Optimum occurs here
3
6
x Exterior
For excellent introduction to LP see Tahas
Operations Research
11Linear Programming
- A vast amount of work exists in the area of
linear programming. - Commercial and freeware tools available.
- What if not continuous variables?
- Integer programming techniques exist.
- Mixed variable type techniques too.
- Dont blunder into using NI-computational
techniques just because you dont know if there
are any special purpose techniques for the
problem at hand. - Look for them!!!
12Dijkstras Shortest Path Algorithm
Dijkstra's Algorithm solves the single-source
shortest path problem in weighted graphs.
Non-negative weights
1 function Dijkstra(G, w, s) 2 for each vertex
v in VG // Initializations 3 dv
infinity 4 previousv undefined 5 ds
0 // Distance from s to s 6 S empty set
7 Q VG // Set of all vertices 8 while Q
is not an empty set // The algorithm itself 9
u Extract_Min(Q) 10 S S union u 11
for each edge (u,v) outgoing from u 12 if
du w(u,v) lt dv // Relax (u,v) 13 dv
du w(u,v) 14 previousv u
http//en.wikipedia.org/wiki/Dijkstra's_algorithm
13Dijkstras Shortest Path Algorithm
14Dijkstras Shortest Path Algorithm
- With simple implementation using linked lists
this algorithm has complexity O(V2) - With sparse graphs and with smart implementation
(e.g. using Fibonacci heap) this can be improved
to O(E V logV). - There are a few constraints (e.g. non-negative
edge weights), but. - if you have a shortest path problem like the one
shown, use a shortest path algorithm. - Note there are further algorithms that handle
some limitations, e.g. Bellman-Ford algorithm
allows negative edge weights.
15Dijkstras Shortest Path Algorithm
C
0.99
0.99
E
Most reliable path from A to D is?
0.97
A
0.99
0.98
0.98
D
B
Now consider a network where the probability of a
message passing reliably across a link is as
shown. Reliability of a path is now the product
of the edge weights (probabilities) along that
path.
Time to reach for evolutionary computation?????
No! Take logs of weights. Add the minimum to all.
And now use Dijkstras SPA.
16Moral of the Tale
Even if your problem does not seem to be solvable
by known efficient algorithms, a transformation
of it might be.
17Calculus 1 variable
- Calculus is a well known method of finding
optima. - Find minimum of
- y(x)3x22x1
- dy/dx6x2
- Set dy/dx0
- gtx-1/3,y2/3
- Strictly should checkd2y/dx26 gt0for minimum
y(x) is a differentiable function. Same sorts of
ideas apply in higher dimensions
18Finding zeroes of a polynomial function
- Calculus is a well known method of finding
optima. - Find zeroes of
- y(x)x2-5x6
- Analytic solutions exist for quadratics
- y(x)ax2bxc0
- And for cubics
- y(x)ax3bx2cxd0
- And for quartics
- y(x)ax4bx3cx2dxe0
- And for quintics?????
Not pleasant! Look it up.
Not pleasant! Look it up.
No formulae exist.
19Moral of the Tale
If your problem looks like a (special purpose)
nail
use a (special purpose) hammer
Available from all good mathematics and
operations research departments at reasonable
cost. Read the small print.
20A bit of guidance goes a long way
Newton-Raphson zero finding
0.9
0.7
0.5
0.3
0.1
x1
x2
x3
x0
-0.1
-0.3
0.5
0.7
0.9
1.1
1.3
1.5
1.7
1.9
2.1
2.3
2.5
Approach uses the gradient at the current xn to
guide movement in the right direction to generate
a better xn1
21Local Search
22Local Search Procedure
- A local search comprises a trace of execution
- Trace (s0,r0), (s1,r1), , (s2,r2), ,
(send,rend) - sk are members of the search space
- rk are the fitness values (evaluated in the
solution space) of the corresponding sk - Consecutive sk are related in a particular way.
- for all k1..(end-1) . Sk1 is_in
Neighbourhood(sk) - The neighbourhood function N(sk)Neighbourhood(sk)
defines a set of points that are somehow deemed
to be near to or close to or in the
locality of (sk)
23Some Local Search Questions
- How do you determine the start state s0?
- How do you define the the neighbourhood function
N() ? - How do you determine which member of
neighbourhood is selected to be the next state?
24Hill Climbing
- Let the current solution or point be x.
- Define the neighbourhood N(x) to be the set of
solutions that are close to x - If possible, move to a neighbouring solution that
improves the value of f(x), otherwise stop. - Choose any y as next solution provided f(y) gt
f(x) - weak hill-climbing (dont go down)
- Choose y as next solution such that
f(y)supf(v) v in N(x) - steepest gradient ascent (climb as fast as you
can) - For many purposes hill-climbing works very well
particularly when you climb the right hill - but
there is a problem.
25Local Optimisation - Hill Climbing
Neighbourhood of a point x might be
N(x)x1,x-1
f(x)
Second (right hand) choice of x0 much better!
Search goes x0 ? x1 ? x2 ? xopt
sincef(x0)ltf(x1)ltf(x2) lt f(xopt) gt f(xopt1)
Search x0 ? x1 ? x2 sincef(x0)ltf(x1)ltf(x2) gt
f(x3)
26Landscapes
- The hills, peaks and the like are what we
generally refer to as the fitness landscape. - We have laid out the solutions in an ordered way
and can see how the fitness varies as we traverse
the solution space. - This is easy to visualise with 1- or 2-D
solutions spaces but the idea generalises.
27Landscapes
- Things that may affect our searches
- Fitness differences between neighboring solution
space points - Generally referred to as ruggedness
- Smooth landscapes, jagged/spiky landscapes,
fractal landscapes. - The number of local optima
- A single local optimum helps a lot!!!!!
- Many, many problems of interest have multiple
local optima - The distribution of the local optima in the
search space - Are optima similar, e.g. do they all have
particular important characteristics, or do
optima occur in radically different parts of
search space? - Can have implications if you try to mate
solutions as part of the search process (e.g.
genetic algorithms) - Sometimes possible to construct the global
optimum from many local optima. - The topology of basins of attraction of the local
optima - Search techniques have their own characteristics,
once the candidate solution falls into a local
optimums territory it may not be able to escape! - Great if you fall into the territory of the
solution you want. Not so good if you have been
taken prisoner by mediocrity (stuck in a
distinctly sub-optimal local optimum) - Basins of attractions can be very complex (cf
Mandelbrot set)
28Measures
- Wont go into details here but a variety of
measures have been proposed, e.g. - Time series autocorrelation
- Go on a random walk around the search space and
measure correlation between f(xt) and f(xtk) - Fitness distance correlation
- Developed for use in genetic algorithms
- Measures how much fitness increases as we
approach a local optimum.
New Ideas in Optimisation. Editors Corne, Dorigo
and Glover
29Nastier- Fractal Landscapes
Given 1000 function evaluations whats the best
you can achieve? Huygens Competition.
http//gungurru.csse.uwa.edu.au/cara/huygens/cec20
06.php
30Local search solution No pain no gain
Allow non-improving moves so that it is possible
to go down
z(x)
in order to rise again
to reach global optimum
x
31Simulated Annealing
- Inspired by physics.
- In condensed matter physics, annealing is known
as a thermal process for obtaining low energy
states of a solid in a heat bath. - Two steps
- increase the temperature of the heat bath until
the solid metal melts and - decrease carefully the temperature of the heat
bath until the particles arrange themselves in
the ground state of the solid. - In liquid phase particles arrange themselves
randomly. In ground state the particles are
arranged in a highly structured lattice and the
energy of the system is minimal. - Compare with quenching very rapid lowering of
temperature (e.g. by dropping into a bath of cold
water).
32Simulated Annealing
- Thermal equilibrium is characterised by the
Boltzmann distribution - the probability of being
in state i with energy Ei is given by - Simulated annealing mimics the trajectory of
physical transitions between states of various
energies as thermal equilibrium is achieved.
33Simulated Annealing
- Candidate solutions in a combinatorial
optimisation problem are equivalent to the states
of a physical system. - The cost of a solution is equivalent to the
energy of a state. - We know that if we cool metals carefully enough,
we can achieve very low energy states. - Why not ape that process for optimisation?
- Thats the inspiration for simulated annealing.
Transitions between states (candidate solutions)
are carried out probabilistically in an analogous
manner to the distribution for describing
physical state transitions.
34Simulated Annealing
- Improving moves always accepted
- Non-improving moves may be accepted
probabilistically and in a manner depending on
the temperature parameter T. Loosely - the worse the move the less likely it is to be
accepted - a worsening move is less likely to be accepted
the cooler the temperature - The temperature T starts high and is gradually
cooled as the search progresses. - Initially (when things are hot) virtually
anything is accepted, at the end (when things are
nearly frozen) only improving moves are allowed
(and the search effectively reduces to
hill-climbing)
35Simulated Annealing (Minimisation)
At each temperature Tk consider Lk moves
Always accept improving moves
Accept worsening moves probabilistically. Gets
harder to do this the worse the move. Gets
harder as Temp decreases.
Calculate next number of trial moves at
TkCalculate next temperature Tk
36Acceptance Criterion
If Dlt0 then we clearly have an improvement and
we move to the trial state Snew. If Dgt 0 then
we clearly have a non-improving move. We also
have -D/Tk lt 0 and so 0ltexp(-D/Tk
)lt1 Therefore exp(-D/Tk )gtU(0,1) is a
probabilistic test for acceptance.
37Cooling the System
- It is possible to do this in various ways.
- Most common is geometric cooling. This simply
reduces the temperature by some multiplicative
factor a, where 0lt a lt1. Thus we have TkT(k-1) x
a - Cooling factors most typically in the range 0.8
0.99 (with a bias towards the higher end). - Other methods are possible, e.g. logarithmic
cooling but the rough rate of cooling is
generally found to be more important than the
precise means of reduction. - THE RATE OF COOLING MATTERS A LOT
- Also, if you thing the search isnt going well
(i.e. getting stuck) then you can reheat the
system too.
38Achieving Thermal Equilibrium
- At each temperature a number Lk of trial moves
are investigated. - How big should Lk be?
- Harder to say.
- There is some theoretical advice on how many
moves you need to consider, but most people
simply experiment. - People want results in good time and so feel a
need to take short cuts. If experiments are not
giving good enough results then greater values of
will then be used. - Some researchers spend less time at higher
temperatures. - Many simply make Lk constant over all k.
39Very Basic Simulated Annealing Example
Iteration
1 Do 400 trial moves
2 Do 400 trial moves
3 Do 400 trial moves
4 Do 400 trial moves
m Do 400 trial moves
n Do 400 trial moves
40Simulated Annealing
- Simulated annealing is a tremendously simple form
of search. - The theory is based on Markov chains
(Markovgtlack of memory property) - Once the search reaches a state, it doesnt
matter how it got there. It effectively forgets
its past. - To move from sk to a neighbouring state sk1 that
state must - Be selected for consideration, with some
probability pk,k1 - Pass the acceptance test, with some probability
qk,k1 - These probabilities may vary between
temperatures within a temperature cycle they are
history independent.
41Initial State
- Most common approach to initial state selection
is random choice. - Though to overcome some of the limitations of the
technique, multiple runs with different starting
states may be carried out.
42Initial Temperature
- How do you choose the initial temperature T0?
- We want a temperature at which a lot of moves are
accepted. - One way is to progressively increase the
temperature and execute an inner loop. When the
acceptance rate reaches, say, 95, we have an
appropriate T0 and can begin the annealing
proper. - Some tools progressively double temperature as
the means of determining T0 .
43I want to Stop!
- Time constrained usually.
- Various criteria
- No state change for a long time (you decide).
- Temperature below some threshold.The following
(by Lundy and Mies) aims to provide a result
within e of the global optimum with probability
q. - A real solution has been detected.
44Neighbourhoods
- One aspect of neighbourhood definition concerns
rapid cost function evaluation. - Sometimes it is possible to simply calculate the
change in cost function. For example, in the
Travelling Salesperson Problem.
4
4
4
6
6
8
9
7
7
5
5
5
New 4756(45) Old ((45)-(89)) Old
Delta
45Neighbourhoods
- You are given a set of integers SS1, S2,,Sn
- Can you find a subset Q of these integers that
sums to a particular value V?
S Q u (S-Q) A move could put an element in
(S-Q) into Q or remove an element from Q. Let
target sum V be 274. Then Current cost
(4376963286) - 274
333-27459Delta cost easy to calculate too
(maintain current subset sum).
43
76
In the subset
96
32
86
40
Not in the subset
56
13
97
46Neighbourhoods
- But the real point to note is YOU DEFINE THE
NEIGHBOURHOOD. - Variations are clearly possible. For subset
problem - We changed the in/out status of a single element
- But we could have changed the status of 2 (3,
4,..) such elements. - Of course, the number of elements in the
neighbourhood gets larger with such k-element
modification. - Need to take into account the cost function too
the changes in cost should not be too radical
we should have some degree of continuity in the
neighbourhood.
47Lack of Memory may be a Problem
- There is nothing in standard annealing to prevent
you going back to previously visited states
during the search. - Despite its ability to accept worsening moves you
may still get stuck in local optima. - Our next technique (tabu search) aims to
incorporate memory as part of its search
procedure. This promotes diversification.
48Tabu Search
- Simulated annealing theory is based on Markov
chains - No memory. Once the search reaches a point, it
doesnt matter how it got there. - Tabu search adds memory to the search with
notions such as - Tabu list. When a move is taken (or state
visited), it is placed on the tabu list for some
number L moves. It usually cannot be retaken for
the next L moves it is tabu. This helps
avoid cycles in the search. It is a form of short
term memory for the search. Promotes
diversification. - Aspiration. But if taking a tabu move would give
the best result yet then it may be taken!
Promotes convergence. - Frequency. Long-term memory. Can be used to
ensure particular move types are not taken too
frequently over the whole search. Again, promotes
diversification.
49Tabu Search - Example
Fitness10
Fitness16
2
3
4
5
6
7
Top 5
2
3
4
5
6
7
Top 5
1
5,4
6
1
3,1
2
2
7,4
4
2
2,3
1
3
3,6
2
3
3,6
-1
4
2,3
0
4
3
7,1
-2
5
4,1
-1
5
6,1
-4
Tabu length 3 for this example
6
6
Example is a permutation problem representing
order of filter applications ( a move is simply a
pairwise swap) and is taken from Tabu Search
tutorial by Glover and Laguna in Modern
Heuristic Techniques for Combinatorial Problems
(edited by Colin Reeves). An excellent
introduction to search generally.
50Tabu Search -Example
Fitness14
Fitness18
2
3
4
5
6
7
Top 5
2
3
4
5
6
7
Top 5
1
2
4,5
6
1
3
1,3
-2
Tabu
Tabu
2
3
5,3
2
2
2,4
-4
3
7,1
0
3
7,6
-6
Tabu
4
1
1,3
-3
4
2
4,5
-7
Tabu
5
2,6
-6
5
5,3
-9
6
6
Aspiration suggests we should take the tabu move
anyhow.
51Local Search Trajectories
- There may be a great deal of useful information
in the trajectory (trace) of a search. - We are doing guided search and each decision
whether to move to a new state is based on
information from the cost function landscape. - Why throw all this away?
- The final result is only one of the outputs
from a search. - Sometimes analysis of the trajectory may provide
information on the desired solution, even when
the final result is not that desired solution. - Failure may not be as bad as it seems!
52Local Searches
- Why bother with anything else?
- The local optimum you end up in may depend very
much on the initial starting state. If the search
space is sufficiently large it may be very
unlikely that a local search technique will find
the global optimum (within any reasonable amount
of time anyhow). - Some population techniques have been found to
sample search spaces more effectively and learn
features of high performing solutions. - A good deal of experimentation has shown that
population based approaches may give better
overall results but often only when elements of
local search are added to do the final tuning.
The reading spectacles of local search are
great. Other techniques have better binoculars. - Other approaches may get you near the summit of
Everest very effectively. But local search then
gives you the oxygen and the kit to ascend to the
peak within view. - A simple local search (e.g. annealing) is very
cheap to implement. Try local search first, and
then something more sophisticated if you need to.
53Local Searches
- Local searches may be used in other ways too.
- Multiple runs may be employed.
- If we can identify several local optima we may be
able to search the space more effectively using
this information.
54Summary and Comments
- If your problem looks like a special purpose nail
use a special purpose hammer. Otherwise move
on. - Local search progressively investigates solutions
close to the current solution. Effectively, the
search trace is a walk around the search space,
where only small steps can be taken. - But small step is defined by you.
- Some steps you just want to take, and others you
take in the hope of reward later. - Convergence is promoted in various ways (e.g.
always accept improving moves, aspiration
criterion) - Diversification is promoted in various ways
(e.g. probabilistically accept non-improving
moves, tabu lists) - Local search is good place to start...