Title: Structure of search space, complexity of stochastic combinatorial optimization algorithms and applic
1Structure of search space, complexity of
stochastic combinatorial optimization algorithms
and application to biological motifs discovery
Robin Gras INRIA Rennes
2- Black box global combinatorial optimization
- Search of the maximums of a function F (fitness)
with integer variables. - Search space C set of all legal
instantiations. - Global the search space is the Cartesian
product of the domains of the variables. - Black box the analytical form of F is not
known but its value is computable in each point.
A lot of difficult bioinformatics problems (most
of them are NP-hard) can be represented as black
box combinatorial optimisation problems.
3- A few definitions
- Operator o Move (exploration) from a set of
points Ec (a sample) to another set of points. - Neighborhood Vo (given an operator) set of all
reachable points from a set Ec by one application
of the operator. - Landscape triplet (C,F,Vo)
- Local maximum Mo (given an operator) X is a
local maximum given o iff - Metaheuristic heuristic exploration algorithm
for black box combinatorial optimization problems.
4What is complexity? All difficult problems are
not equal
5- Influence of the maximums
- Basin of attraction of Mo the set of points of
C from which Mo is reachable by a sequence of
applications of o. - Study of basins of attraction.
- Study of fitness cloud (variation of fitness
values in function of fitness values).
Results (empirical) Number of global maximums
compared with the size of C Number of global
maximums compared with number of local
maximums Size and overlap of basins of
attractions Linked to the neighborhood!
6- Problem decomposability epistasie
- A problem of size n which is decomposable in n/k
independent sub-problems of size k has a
complexity in 2k.n/k - Epistasie maximum number of variables of which
depend each of the n variables level of
non-linearity ?measures of non-linearity. - Example spin glasses
- Function with a tunable level of epistasie
NK-landscape - With N the size of the problem and K the number
of dependencies
7- Fitness and instantiation deceptive functions
- Function built to be difficult for hill-climber.
- Function that is almost linear except for a few
points of the search space. - Example trap5 function
Trap5(X)
Independent of the neighborhood!
8Efficiency of evolutionary approaches. Two main
strategies - A priori definition of the
operators - Discovery of the operators
9- Classic genetic algorithms (Holland 75)
- Exploration by sampling.
- Population sample
- Evaluation and bias by selection
- Generation of a new population by application of
operators - Experimental studies of the efficiency and the
behavior ?various conclusions. - Theoretical studies.
- Convergence proofs in simple cases (Goldberg
87). - Introduction to the notion of schemes (Bagley
67) - Computation of the efficiency on deceptive
problems (Goldberg 93)
10- Probabilistic model building algorithms
- Principle discovering the dependencies and the
structure using the sample. - First model univariate distribution of the
variables (Muhlenbein et Voigt 96). - BOA Bayesian network (Pelikan et Golderg 99).
- hBOA Bayesian network decision graph
ecological niches (Pelikan et Goldberg 00). - building of the network
- Population generation
- Population size
- Number of generations
- Global complexity
- Validation on spin glasses, MAXSAT and deceptive
functions.
11- Still limitations
- Convergence proofs do not take into account the
strong heuristic (greedy) used to build the
network. - The quality measure is not computed at each
step. - High global computation cost.
- ?What are the consequences on the real
efficiency of hBOA?
12Efficiency of evolutionary algorithms on high
epistasie level deceptive problems
Problem of size 120 and 100 with sub-functions of
size between 4 and 12
Deceptive function
Non deceptive function
Adjacent Configuration Xi xi, xim, ,
xi(k-1)m with i ?1, ,m
Non-adjacent configuration Xi x(i-1).k1, x
(i-1).k2, , x (i-1).kk with i ?1, ,m
Classical genetic algorithm
Non-adjacent configuration 100 of failure !
Classic genetic algorithm with adjacent
configuration
13hBOA behavior
- Adjacent configuration
- When the epistasie level is above 6 the global
maximum is never reached in 100 generations. - Computation time became prohibitive more than
60 hours with epistasie of 12. - High dependency with the structure of the
deceptive function. - Non-adjacent configuration
- Same results so hBOA is not dependent of the
configuration . - ? real capacity to detect and handle the
dependencies but the heuristics used do not allow
the obtaining of demonstrated results.
14- Simple PMBGA algorithm dedicated to deceptive
problems - Test of several measures on bivariate
frequencies only frequencies, conditional
probability, mutual information and statistical
implication. - Algorithm taking into account the deceptive
property - Random generation of the initial population
- Tournament selection
- Computation of the measures for each couple of
variables - Building of a solution
Direct obtaining of the solution in each case
(adjacent or not) in few minutes! With mixed
sub-functions (deceptive and non-deceptive) Imposs
ible to discover the solution. hBOA no
difference with mixed sub-functions
15- Conclusions on black box problems
- Easy if there is no dependency.
- As soon as the epistasie level is 2 or above and
there is overlap between dependencies the problem
is NP-hard. - In general the problem is NP-hard when the
dependencies are not known. - Two possible approaches
- Expert Use of expert knowledge about the
problem to build pertinent neighborhood and
operators. - Automatic discovery of the dependencies by
sampling and model building. Limited by the
number of dependencies and necessity of new
heuristics.
16- Further works new algorithms
- Definition of a more structured benchmark than
NK-landscape. - New PMBGA algorithm
- New probabilistic model.
- New quality measure for the model.
- More efficient heuristics for the model
building. - Learning of the number of dependencies.
- PMBGP algorithm
- Specialization for non-linear regression.
- Use of dependencies discovery.