Structure of search space, complexity of stochastic combinatorial optimization algorithms and applic - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Structure of search space, complexity of stochastic combinatorial optimization algorithms and applic

Description:

Search space C = set of all legal instantiations. Global = the search space is the Cartesian product of ... Introduction to the notion of schemes (Bagley 67) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 17
Provided by: sib11
Category:

less

Transcript and Presenter's Notes

Title: Structure of search space, complexity of stochastic combinatorial optimization algorithms and applic


1
Structure of search space, complexity of
stochastic combinatorial optimization algorithms
and application to biological motifs discovery
Robin Gras INRIA Rennes
2
  • Black box global combinatorial optimization
  • Search of the maximums of a function F (fitness)
    with integer variables.
  • Search space C set of all legal
    instantiations.
  • Global the search space is the Cartesian
    product of the domains of the variables.
  • Black box the analytical form of F is not
    known but its value is computable in each point.

A lot of difficult bioinformatics problems (most
of them are NP-hard) can be represented as black
box combinatorial optimisation problems.
3
  • A few definitions
  • Operator o Move (exploration) from a set of
    points Ec (a sample) to another set of points.
  • Neighborhood Vo (given an operator) set of all
    reachable points from a set Ec by one application
    of the operator.
  • Landscape triplet (C,F,Vo)
  • Local maximum Mo (given an operator) X is a
    local maximum given o iff
  • Metaheuristic heuristic exploration algorithm
    for black box combinatorial optimization problems.

4
What is complexity? All difficult problems are
not equal
5
  • Influence of the maximums
  • Basin of attraction of Mo the set of points of
    C from which Mo is reachable by a sequence of
    applications of o.
  • Study of basins of attraction.
  • Study of fitness cloud (variation of fitness
    values in function of fitness values).

Results (empirical) Number of global maximums
compared with the size of C Number of global
maximums compared with number of local
maximums Size and overlap of basins of
attractions Linked to the neighborhood!
6
  • Problem decomposability epistasie
  • A problem of size n which is decomposable in n/k
    independent sub-problems of size k has a
    complexity in 2k.n/k
  • Epistasie maximum number of variables of which
    depend each of the n variables level of
    non-linearity ?measures of non-linearity.
  • Example spin glasses
  • Function with a tunable level of epistasie
    NK-landscape
  • With N the size of the problem and K the number
    of dependencies

7
  • Fitness and instantiation deceptive functions
  • Function built to be difficult for hill-climber.
  • Function that is almost linear except for a few
    points of the search space.
  • Example trap5 function

Trap5(X)
Independent of the neighborhood!
8
Efficiency of evolutionary approaches. Two main
strategies - A priori definition of the
operators - Discovery of the operators
9
  • Classic genetic algorithms (Holland 75)
  • Exploration by sampling.
  • Population sample
  • Evaluation and bias by selection
  • Generation of a new population by application of
    operators
  • Experimental studies of the efficiency and the
    behavior ?various conclusions.
  • Theoretical studies.
  • Convergence proofs in simple cases (Goldberg
    87).
  • Introduction to the notion of schemes (Bagley
    67)
  • Computation of the efficiency on deceptive
    problems (Goldberg 93)

10
  • Probabilistic model building algorithms
  • Principle discovering the dependencies and the
    structure using the sample.
  • First model univariate distribution of the
    variables (Muhlenbein et Voigt 96).
  • BOA Bayesian network (Pelikan et Golderg 99).
  • hBOA Bayesian network decision graph
    ecological niches (Pelikan et Goldberg 00).
  • building of the network
  • Population generation
  • Population size
  • Number of generations
  • Global complexity
  • Validation on spin glasses, MAXSAT and deceptive
    functions.


11
  • Still limitations
  • Convergence proofs do not take into account the
    strong heuristic (greedy) used to build the
    network.
  • The quality measure is not computed at each
    step.
  • High global computation cost.
  • ?What are the consequences on the real
    efficiency of hBOA?

12
Efficiency of evolutionary algorithms on high
epistasie level deceptive problems
Problem of size 120 and 100 with sub-functions of
size between 4 and 12
Deceptive function
Non deceptive function
Adjacent Configuration Xi xi, xim, ,
xi(k-1)m with i ?1, ,m
Non-adjacent configuration Xi x(i-1).k1, x
(i-1).k2, , x (i-1).kk with i ?1, ,m
Classical genetic algorithm
Non-adjacent configuration 100 of failure !
Classic genetic algorithm with adjacent
configuration
13
hBOA behavior
  • Adjacent configuration
  • When the epistasie level is above 6 the global
    maximum is never reached in 100 generations.
  • Computation time became prohibitive more than
    60 hours with epistasie of 12.
  • High dependency with the structure of the
    deceptive function.
  • Non-adjacent configuration
  • Same results so hBOA is not dependent of the
    configuration .
  • ? real capacity to detect and handle the
    dependencies but the heuristics used do not allow
    the obtaining of demonstrated results.

14
  • Simple PMBGA algorithm dedicated to deceptive
    problems
  • Test of several measures on bivariate
    frequencies only frequencies, conditional
    probability, mutual information and statistical
    implication.
  • Algorithm taking into account the deceptive
    property
  • Random generation of the initial population
  • Tournament selection
  • Computation of the measures for each couple of
    variables
  • Building of a solution

Direct obtaining of the solution in each case
(adjacent or not) in few minutes! With mixed
sub-functions (deceptive and non-deceptive) Imposs
ible to discover the solution. hBOA no
difference with mixed sub-functions
15
  • Conclusions on black box problems
  • Easy if there is no dependency.
  • As soon as the epistasie level is 2 or above and
    there is overlap between dependencies the problem
    is NP-hard.
  • In general the problem is NP-hard when the
    dependencies are not known.
  • Two possible approaches
  • Expert Use of expert knowledge about the
    problem to build pertinent neighborhood and
    operators.
  • Automatic discovery of the dependencies by
    sampling and model building. Limited by the
    number of dependencies and necessity of new
    heuristics.

16
  • Further works new algorithms
  • Definition of a more structured benchmark than
    NK-landscape.
  • New PMBGA algorithm
  • New probabilistic model.
  • New quality measure for the model.
  • More efficient heuristics for the model
    building.
  • Learning of the number of dependencies.
  • PMBGP algorithm
  • Specialization for non-linear regression.
  • Use of dependencies discovery.
Write a Comment
User Comments (0)
About PowerShow.com