Evolution%20strategies%20(ES) - PowerPoint PPT Presentation

About This Presentation
Title:

Evolution%20strategies%20(ES)

Description:

Evolution strategies (ES) Chapter 4 Parent selection Parents are selected by uniform random distribution whenever an operator needs one/some Thus: ES parent selection ... – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0
Slides: 42
Provided by: AEEi1
Category:

less

Transcript and Presenter's Notes

Title: Evolution%20strategies%20(ES)


1
Evolution strategies (ES)
  • Chapter 4

2
Evolution strategies
  • Overview of theoretical aspects
  • Algorithm
  • The general scheme
  • Representation and operators
  • Example
  • Properties
  • Applications

3
ES quick overview (I)
  • Developed Germany in the 1970s
  • Early names Ingo Rechenberg, Hans-Paul Schwefel
    and and Peter Bienert (1965), TU Berlin
  • In the beginning, ESs were not devised to compute
    minima or maxima of real-valued static functions
    with fixed numbers of variables and without noise
    during their evaluation. Rather, they came to the
    fore as a set of rules for the automatic design
    and analysis of consecutive experiments with
    stepwise variable adjustments driving a suitably
    flexible object / system into its optimal state
    in spite of environmental noise.
  • Search strategy
  • Concurrent, guided by absolute quality of
    individuals

4
ES quick overview (II)
  • Typically applied to
  • application concerning shape optimization a
    slender 3D body in a wind tunnel flow into a
    shape with minimal drag per volume.
  • numerical optimisation
  • continuous parameter optimisation
  • computational fluid dynamics the design of a 3D
    convergent-divergent hot water flashing nozzle.
  • ESs are closer to Larmackian evolution (which
    states that acquired characteristics can be
    passed on to offspring).
  • The difference between GA and ES is the
    Representation and Survival selection mechanism,
    that imply survival in the new population of part
    from the old population

5
ES quick overview (III)
  • Attributed features
  • fast
  • good optimizer for real-valued optimisation
    (real-valued vectors are used to represent
    individuals)
  • relatively much theory
  • Strong emphasis on mutation for creating
    offspring
  • Mutation is implemented by adding some random
    noise drawn from Gaussian distribution
  • Mutation parameters are changed during a run of
    the algorithm
  • In the ES the control parameter are included in
    the chromosomes and co-evolve with the solutions.
  • Special
  • self-adaptation of (mutation) parameters standard

6
ES Algorithm - The general scheme
  • An Example Evolution Strategy
  • Procedure ES
  • t 0
  • Initialize P(t)
  • Evaluate P(t)
  • While (Not Done)
  • Parents(t) Select_Parents(P(t))
  • Offspring(t) Procreate(Parents(t))
  • Evaluate(Offspring(t))
  • P(t1) Select_Survivors(P(t),Offspring(t))
  • t t 1
  • The differences between GA and ES consists in
    representation and survivors selection (in the
    new population will survive the best of parents
    and offspring unlike generational genetic
    algorithms where children replaced the parents).

7
ES technical summary tableau
Representation Real-valued vectors Encoding also the mutation rate
Recombination Discrete or intermediary
Mutation Gaussian perturbation
Parent selection Uniform random
Survivor selection (?,?) or (??)
Specialty Self-adaptation of mutation step sizes
8
Evolution Strategies
  • There are basically 4 types of ESs
  • The Simple (11)-ES (In this strategy the aspect
    of collective learning in a population is
    missing. The population is composed of a single
    individual).
  • The (?1)-ES (The first multimember ES. ? parents
    give birth to 1 offspring)
  • For the next two ESs ? parents give birth to ?
    offspring
  • The (??)-ES. P(t1) Best ? of the ??
    individuals
  • The (?,?)-ES. P(t1) Best ? of the ?
    offspring.

9
(11) - Evolution Strategies (two membered
Evolution Strategy)
  • Before the (11)-ES there were no more than two
    rules
  • 1. Change all variables at a time, mostly
    slightly and at random.
  • 2. If the new set of variables does not diminish
    the goodness of the device, keep it, otherwise
    return to the old status.
  • The Simple (11)-ES (In this strategy the aspect
    of collective learning in a population is
    missing. The population is composed of a single
    individual).
  • (11)-ES is a stochastic optimization method
    having similarities with Simulated Annealing.
  • Represents a local search strategy that perform
    the current solution exploitation.

10
(11) - Evolution Strategies features
  • the convergence velocity, the expected distance
    traveled into the useful direction per iteration,
    is inversely proportional to the number of
    variables of the objective function
  • linear convergence order can be achieved if the
    mutation strength (or mean step-size or standard
    deviation of each component of the normally
    distributed mutation vector) is adjusted to the
    proper order of magnitude, permanently
  • the optimal mutation strength corresponds to a
    certain success probability that is independent
    of the dimension of the search space and is the
    range of one fifth for both model functions
    (sphere model and corridor model).
  • the convergence (velocity) rate of a ES (1 1) is
    defined as the ratio of the Euclidean Distance
    (ED) traveled towards the optimal point and the
    number of generations required for running this
    distance.

11
Introductory example
  • Task minimise f Rn ? R
  • Algorithm two-membered ES using
  • Vectors from Rn directly as chromosomes
  • Population size 1
  • Only mutation creating one child
  • Greedy selection

12
Standard deviation. Normal distribution
  • Consider X ? x1, x2, ,xn ? n-dimensional
    random variable.
  • The mean (µ) M(X)(x1 x2,xn )/n.
  • The square of standard deviation (also called
    variance)
  • ?2 M(X-M(X))2?(xk - M(X))2/n
  • Normal distribution
  • N(µ,?)

The distribution with µ 0 and s?2 1 is called
the standard normal.
13
Illustration of normal distribution
http//fooplot.com/
14
Introductory example pseudocode
Minimization problem
  • Set t 0
  • Create initial point xt ? x1t,,xnt ?
  • REPEAT UNTIL (TERMIN.COND satisfied) DO
  • Draw zi from a normal distribution for all i
    1,,n
  • yit xit zi or yit xit N(0, ?)
  • IF f(xt) lt f(yt) THEN xt1 xt
  • ELSE xt1 yt
  • endIF
  • Set t t1
  • endDO

15
Introductory example mutation mechanism
  • z values drawn from normal distribution N(µ,?)
  • Mean µ is set to 0
  • Standard deviation ? is called the mutation step
    size
  • ? is varied on the fly by the 1/5 success rule
  • This rule resets ? after every k iterations by
  • ? ? / c if Ps gt 1/5 (Foot of big hill ?
    increase s)
  • ? ? c if Ps lt 1/5 (Near the top of the
    hill ? decrease s)
  • ? ? if Ps 1/5
  • where Ps is the of successful mutations (those
    in which the child is fitter than parents), 0.8 ?
    c ? 1, usualy c0.817
  • Mutation rule for object variables x (xit) is
    additive, while the mutation rule for dispersion
    (?) is multiplicative.

16
The Rechenbergs 1/5th - succes rule
  • The 1/5th rule of success is a mechanism that
    ensures efficient heuristic search with the price
    of decreased robustness.
  • The ratio of successful mutations and other
    mutations must be the fifth (1/5).
  • IF this ratio is greater than 1/5 the dispersion
    must be increased (accelerates convergence).
  • ELSE
  • IF this ratio is less than 1/5 the dispersion
    must be decreased.

17
The implementation of the Rechenbergs 1/5th -rule
1. perform the (1 1)-ES for a number G of
generations - keep s constant during this
period - count the number Gs of successful
mutations during this period 2. determine an
estimate of the success probability Ps by Ps
Gs/G 3. change s according to s s / c,
if Ps gt 1/5 s s c, if Ps lt 1/5 s
s, if Ps 1/5 4. goto 1.
The optimal value of the factor c depends on the
objective function to be optimized, the
dimensionality N of the search space, and on the
number G. If N is sufficiently large N 30, G
N is a reasonable choice. Under this condition
Schwefel (1975) recommended using 0.85 c lt
1. Since we are not finding better solutions, we
have reached the top of the hill. ? Rechenbergs
1/5 rule reduces the standard deviation s in the
case that the system was not very successful in
finding better solutions.
18
Another historical examplethe jet nozzle
experiment
Task to optimize the shape of a jet
nozzle Approach random mutations to shape
selection
19
Another historical examplethe jet nozzle
experiment contd
In order to be able to vary the length of the
nozzle and the position of its throat, gene
duplication and gene deletion was mimicked to
evolve even the number of variables, i.e., the
nozzle diameters at fixed distances. The perhaps
optimal, at least unexpectedly good and so far
best-known shape of the nozzle was
counter-intuitively strange, and it took a while,
until the one-component two-phase supersonic flow
phenomena far from thermodynamic equilibrium,
involved in achieving such good result, were
understood.
20
The disadvantages of (11)-ES
  • Fragile nature of the search point by point
    based on the 1/5 successful rule may lead to
    stagnation in a local minimum point.
  • Dispersion (step size) is the same for each
    dimension (coordinate) within search space.
  • Does not use recombination it is not using a
    real population
  • There is no mechanism to allow individual
    adjustment of stride for each coordinate axis of
    the search space. The lack of such a mechanism is
    that the procedure will move slowly to the
    optimum point.

21
(??), (?,?) - (multi membered Evolution
Strategies)
? parents give birth to ? offspring
22
Representation
  • Chromosomes consist of three parts
  • Object variables x1,,xn
  • Strategy parameters
  • Mutation step sizes ?1,,?n?
  • Rotation angles ?1,, ?n?
  • Not every component is always present
  • Full size ? x1,,xn, ?1,,?n ,?1,, ?k ?
  • where k n(n-1)/2 (no. of i,j pairs)

23
Mutation
  • Main mechanism changing value by adding random
    noise drawn from normal distribution
  • xi xi N(0,?)
  • Key idea
  • ? is part of the chromosome ? x1,,xn, ? ?
  • ? is also mutated into ? (see later how)
  • Thus mutation step size ? is coevolving with the
    solution x

24
Mutate ? first
  • Net mutation effect ? x, ? ? ? ? x, ? ?
  • Order is important
  • first ? ? ? (see later how)
  • then x ? x x N(0,?)
  • Rationale new ? x ,? ? is evaluated twice
  • Primary x is good if f(x) is good
  • Secondary ? is good if the x it created is
    good
  • Reversing mutation order this would not work

25
Mutation case 1Uncorrelated mutation with one ?
  • Chromosomes ? x1,,xn, ? ?
  • ? ? exp(? N(0,1))
  • xi xi ? N(0,1)
  • Typically the learning rate ? ? 1/ n½
  • And we have a boundary rule ? lt ?0 ? ? ?0

26
Mutants with equal likelihood
  • Circle mutants having the same chance to be
    created

27
Mutation case 2Uncorrelated mutation with n ?s
  • Chromosomes ? x1,,xn, ?1,, ?n ?
  • ?i ?i exp(? N(0,1) ? Ni (0,1))
  • xi xi ?i Ni (0,1)
  • Two learning rate parmeters
  • ? overall learning rate
  • ? coordinate wise learning rate
  • ? ? 1/(2 n)½ and ? ? 1/(2 n½) ½
  • And ?i lt ?0 ? ?i ?0

28
Mutants with equal likelihood
  • Ellipse mutants having the same chance to be
    created

29
Mutation case 3Correlated mutations
  • Chromosomes ? x1,,xn, ?1,, ?n ,?1,, ?k ?
  • where k n (n-1)/2
  • and the covariance matrix C is defined as
  • cii ?i2
  • cij 0 if i and j are not correlated
  • cij ½ ( ?i2 - ?j2 ) tan(2 ?ij) if i and
    j are correlated
  • Note the numbering / indices of the ?s

30
Correlated mutations contd
  • The mutation mechanism is then
  • ?i ?i exp(? N(0,1) ? Ni (0,1))
  • ?j ?j ? N (0,1)
  • x x N(0,C)
  • x stands for the vector ? x1,,xn ?
  • C is the covariance matrix C after mutation of
    the ? values
  • ? ? 1/(2 n)½ and ? ? 1/(2 n½) ½ and ? ? 5
  • ?i lt ?0 ? ?i ?0 and
  • ?j gt ? ? ?j ?j - 2 ? sign(?j)

31
Mutants with equal likelihood
  • Ellipse mutants having the same chance to be
    created

32
Recombination
  • Creates one child
  • Acts per variable / position by either
  • Averaging parental values, or
  • Selecting one of the parental values
  • From two or more parents by either
  • Using two selected parents to make a child
  • Selecting two parents for each position anew

33
Names of recombinations
Two fixed parents Two parents selected for each i
zi (xi yi)/2 Local intermediary Global intermediary
zi is xi or yi chosen randomly Local discrete Global discrete
34
Parent selection
  • Parents are selected by uniform random
    distribution whenever an operator needs one/some
  • Thus ES parent selection is unbiased - every
    individual has the same probability to be
    selected
  • Note that in ES parent means a population
    member (in GAs a population member selected to
    undergo variation)

35
Survivor selection
  • Applied after creating ? children from the ?
    parents by mutation and recombination
  • Deterministically chops off the bad stuff
  • Basis of selection is either
  • The set of children only (?,?)-selection
  • The set of parents and children (??)-selection

36
Survivor selection contd
  • (??)-selection is an elitist strategy
  • (?,?)-selection can forget
  • Often (?,?)-selection is preferred for
  • Better in leaving local optima
  • Better in following moving optima
  • Using the strategy bad ? values can survive in
    ?x,?? too long if their host x is very fit
  • Selective pressure in ES is very high (? ? 7 ?
    is the common setting)

37
Self-adaptation illustrated
  • Given a dynamically changing fitness landscape
    (optimum location shifted every 200 generations)
  • Self-adaptive ES is able to
  • follow the optimum and
  • adjust the mutation step size after every shift !

38
Self-adaptation illustrated contd
Changes in the fitness values (left) and the
mutation step sizes (right)
39
Prerequisites for self-adaptation
  • ? gt 1 to carry different strategies
  • ? gt ? to generate offspring surplus
  • Not too strong selection, e.g., ? ? 7 ?
  • (?,?)-selection to get rid of misadapted ?s
  • Mixing strategy parameters by (intermediary)
    recombination on them

40
ES Applications
  • Lens shape optimization required to Light
    refraction
  • Distribution of fluid in a blood network
  • Brachystochrone curve
  • Solving the Rubik's Cube

41
Example application the Ackley function (Bäck
et al 93)
  • The Ackley function (here used with n 30)
  • Evolution strategy
  • Representation
  • -30 lt xi lt 30 (coincidence of 30s!)
  • 30 step sizes
  • (30,200) selection
  • Termination after 200000 fitness evaluations
  • Results average best solution is 7.48 10 8
    (very good)
Write a Comment
User Comments (0)
About PowerShow.com