Coevolving Solutions to the Shortest Common Superstring Problem - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

Coevolving Solutions to the Shortest Common Superstring Problem

Description:

Production scheduling in the car industry. Optimising traffic light control. Optimising the distribution of off-lease vehicles. Optimising radiotherapy treatment ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 79
Provided by: cs189
Category:

less

Transcript and Presenter's Notes

Title: Coevolving Solutions to the Shortest Common Superstring Problem


1
Coevolving Solutions to the Shortest Common
Superstring Problem
  • Assaf Zaritsky Moshe Sipper
  • Ben-Gurion University, Israel
  • www.cs.bgu.ac.il/assafza

2
Outline
  • The Shortest Common Superstring problem.
  • DNA sequencing and the input domain.
  • Standard and cooperative coevolutionary genetic
    algorithm (GA) experimental results.
  • The Puzzle approach experimental results.
  • The Co-Puzzle algorithm experimental results.
  • Conclusions and future work.

3
The Shortest Common Superstring Problem (SCS)
  • Let S s1,,sn be a set of strings (blocks)
    over some alphabet S. A superstring of S is a
    string x such that each si in S is a substring of
    x.
  • Problem Find shortest (common) superstring.
  • NP-Complete.
  • MAX-SNP hard.
  • Motivation DNA sequencing, data compression.

4
SCS Example
  • S ate, half, lethal, alpha, alfalfa
  • A trivial superstring is atehalflethalalphaalfalf
    a of length 25 (a simple concatenation of all
    blocks).
  • A shortest common superstring is
    lethalphalfalfate of length 17.
  • Note that a compressed permutation of the
    blocks is actually a superstring.

5
Approximation Algorithms
  • Several linear approximations for SCS have been
    proposed, most of which rely on greedy
    approaches.
  • GREEDY
  • The most widely heuristic used in DNA
    sequencing.
  • Conjecture Blum 1994, Sweedyk 1999 Superstring
    produced by GREEDY is of length at most two times
    the optimal.
  • We are not aware of any previous evolutionary
    approach to the SCS problem.

6
Outline
  • The Shortest Common Superstring problem.
  • DNA sequencing and the input domain.
  • Standard and cooperative coevolutionary genetic
    algorithm (GA) experimental results.
  • The Puzzle approach experimental results.
  • The Co-Puzzle algorithm experimental results.
  • Conclusions and future work.

7
DNA Sequencing
The most common usage of the SCS problem.
8
DNA Sequencing (contd)
  • The problem read a string of DNA.
  • Short DNA strands can be read in laboratory.
  • To sequence a long DNA strand
  • (The DNA sequence appears in many copies)
  • Cut the DNA to short fragments using restriction
    enzymes.
  • Sequence each of the resulting fragments.
  • Order those fragments using a SCS algorithm.

9
The Input Domain
The input strings used in the experiments were
inspired by DNA sequencing
10
Input Generation Setup Parameters
NB increasing number of blocks results in
exponential growth of the problems complexity.
11
Outline
  • The Shortest Common Superstring problem.
  • DNA sequencing and the input domain.
  • Standard and cooperative coevolutionary genetic
    algorithm (GA) experimental results.
  • The Puzzle approach experimental results.
  • The Co-Puzzle algorithm experimental results.
  • Conclusions and future work.

12
Simple Genetic Algorithm
produce an initial population of
individuals evaluate fitness of all
individuals while termination condition not met
do select fitter individuals for
reproduction recombine individuals mutate
individuals evaluate fitness of modified
individuals generate a new population end while
13
EA Success Stories
http//evonet.lri.fr/evoweb/resources/evolution_w
ork/all.php
http//www.genetic-programming.com/humancompetitiv
e.html
14
EA Success Stories
15
EA Success Stories
16
Simple GA for the SCS Problem
  • Given a set of strings as input, generate initial
    population of random candidate solutions.
  • The fitness of each individual depends on its
    length and accuracy.
  • The GA uses selection, recombination, and
    mutation to create the next generation, each
    individual of which is then evaluated.
  • Theses steps are repeated a predefined number of
    times or until the solution is deemed
    satisfactory.

17
Simple GA for the SCS Problem (contd)
  • Blocks of the input set are atomic components.
  • Representation An individuals genome is
    represented as a sequence of blocks.
  • An individual may have missing blocks or contain
    duplicate copies of the same block.
  • Permutation Representation Good or Bad?

18
Simple GA for the SCS Problem (contd)
  • Evaluation fitness of an individual is the
    length of its compressed genome the total
    length of the blocks that are not covered by the
    individual.
  • Genetic operators
  • Fitness proportionate selection.
  • Two-points recombination. Allows growth and
    reduction in genomes length.
  • Block-change mutation.

19
Simple GA for the SCS Problem (example)
  • S s1,s2,s3,s4 s1 0011, s2 1100, s3
    1001, s4 111.
  • Fitness (lt s2,s1gt) 110011 111 6 3
    9.
  • Fitness (lt s4,s2,s1,s4gt) 11100111 8.
  • Recombination
  • p1 lts1,s2,s3,s4gt
  • p2 lts4,s1,s3,s2gt
  • p3 recombine1(p1,p2) lts1,s1,s3 ,s2,s4gt
  • p4 recombine2(p1,p2) lts4,s2,s3 gt
  • mutate (lts1,s2,s2gt) lts1,s4,s2gt

20
Coevolution
  • Simultaneous evolution of two or more species
    with coupled fitness.
  • Coevolving species either compete or cooperate.
  • Competitive coevolution Fitness of individual
    based on direct competition with individuals of
    other species, which in turn evolve separately in
    their own populations (prey-predator).

21
Cooperative Coevolution
22
Cooperative Coevolution (contd)
  • Cooperative Coevolution involves a number of
    independently evolving species.
  • Interaction between species occurs via fitness
    function only.
  • The fitness of an individual depends on its
    ability to collaborate with individuals from
    other species.

23
Cooperative Coevolution (contd)
Source Potter DeJong (1997)
24
Cooperative Coevolutionary Algorithm for the SCS
Problem
  • Two species evolve simultaneously.
  • First species contains prefixes of candidate
    solutions to the SCS problem at hand.
  • Second species contains candidate suffixes.
  • Fitness of an individual in each species depends
    on how good it interacts with representatives
    from other species to construct a global solution.

25
Cooperative Coevolutionary Algorithm for the SCS
Problem (evaluation process)
Merge
26
Cooperative Coevolutionary Algorithm for the SCS
Problem (evaluation process)
Evaluate
27
Experiments
Compare GREEDY, Standard GA, Cooperative
Coevolution
28
Experimental Setup
Each type of GA was executed twice on each
problem instance the better run of the two was
used for statistical purposes.
29
Results Experiment I (50 blocks)
30
Results Experiment II (80 blocks)
31
Results Summary
Average of the best superstring lengths
Algorithm
Problem size
GREEDY
Genetic
Cooperative
50 blocks
80 blocks
32
Conclusion
The collaboration between the two populations
results in a good decomposition of the problem
into two smaller sub-problems, each is solved
using a standard GA.
33
Outline
  • The Shortest Common Superstring problem.
  • DNA sequencing and the input domain.
  • Standard and cooperative coevolutionary genetic
    algorithm (GA) experimental results.
  • The Puzzle approach experimental results.
  • The Co-Puzzle algorithm experimental results.
  • Conclusions and future work.

34
The Puzzle Algorithm
35
The Schema Theorem
Short, low-order, above-average schemata receive
exponentially increasing trials in subsequent
generations of a genetic algorithm. Holland
(1975)
36
Building Blocks Hypothesis
A genetic algorithm seeks near-optimal
performance through the juxtaposition of short,
low-order, high-performance schemata, called the
building blocks.
37
Our Interpretation
The success of GAs stems from their ability to
combine quality sub-solutions (building blocks)
from separate individuals in order to form better
global solutions.
38
The Main Assumption
Problems in nature have an inherent structural
design. Even when the structure is not known
explicitly GAs detect it implicitly and gradually
enhance good building blocks.
39
A Problem
Recombination may destroy quality building blocks
found by the GA.
40
The Preservation of Favoured Building Blocks in
the Struggle for Fitness The Puzzle Algorithm
41
Puzzle Algorithm The Idea
  • Improve Recombination Operator.
  • Preserve good building blocks discovered by GA
    using selection of recombination loci that do not
    destroy good building blocks.
  • Result Assembly of good building blocks to
    construct better solutions (as in a puzzle).

42
Puzzle Algorithm (contd)
  • Two populations
  • 1. Candidate solutions As in simple GA.
  • 2. Building blocks Each individual is a
    sequence of blocks contained in at least one
    candidate solution.

43
Puzzle Algorithm (contd)
  • Interaction between candidate solutions and
    building blocks is through fitness function.
  • Interaction between building blocks and candidate
    solutions is through constraints on recombination
    points.

Fitness evaluation
Crossover location
44
Puzzle Algorithm Zoom In
45
Puzzle Algorithm Zoom In
46
The Candidate Solutions Population
  • Representation, fitness evaluation, selection,
    and mutation are identical to the simple GA.
  • Recombination-aid vector aids in selecting the
    recombination loci.
  • Recombination-aid vector is updated by building
    blocks individuals.

47
The Building Blocks Population
  • An individual is represented as a sequence of
    blocks, contained in at least one candidate
    solution.
  • Fitness of an individual is the average of the
    fitness of candidate solutions containing it.
  • Fitness-proportionate selection.

48
The Building Blocks Population (cont)
  • Unisex individuals.
  • Two modification operators
  • Expansion Increase its genome by one block.
    Occurs with high probability.
  • Exploration Die, and start over as a new
    2-block individual. Occurs with low probability.

49
Building Blocks Candidate Solutions
Fitness evaluation
f1
f2
f3
f4
50
Building Blocks Candidate Solutions
Fitness evaluation
f1
f2
f3
f4
Update recombination-aid vector
51
Update Recombination-aid vector
52
Update Recombination-aid vector
53
Update Recombination-aid vector
54
Recombination-loci selection
Ties are broken arbitrarily
55
Experiments
Compare GREEDY, Standard GA, Puzzle
56
Building Blocks - Experimental Setup
57
Results Experiment III (50 blocks)
58
Results Experiment IV (80 blocks)
Did we lose to cooperative?
NO!
59
Results Summary
Average of the best superstring lengths
Algorithm
Problem size
GREEDY
Genetic
Puzzle
50 blocks
80 blocks
60
Outline
  • The Shortest Common Superstring problem.
  • DNA sequencing and the input domain.
  • Standard and cooperative coevolutionary genetic
    algorithm (GA) experimental results.
  • The Puzzle approach experimental results.
  • The Co-Puzzle algorithm experimental results.
  • Conclusions and future work.

61
Relations Between The Algorithms
Co-Puzzle
GA
62
The Co-Puzzle Algorithm
Fitness evaluation
Fitness eval
Fitness eval
Possible building blocks population
Candidate prefixes population
Possible building blocks population
Candidate suffixes population
Crossover location
Crossover location
63
Experiments
Compare GREEDY, Cooperative Coevolution,
Co-Puzzle
64
Results Experiment V (80 blocks)
65
Results Experiment VI (50 blocks)
????
66
Results Summary
size of shortest common superstring
Algorithm
Problem size
GREEDY
Cooperative
Co-puzzle
50 blocks
80 blocks
67
Proposal The Messy Puzzle Algorithm
68
The Messy Puzzle Algorithm
  • The Linkage Problem.
  • Messy genes
  • Variable length genome.
  • Gene is an ordered pair ltallele-locus,allele-value
    gt.
  • Handling over- under-specification.
  • Example

69
The Messy Puzzle Algorithm (cont)
A building blocks genome is represented as a
sequence of messy genes.
70
Example The MAXCUT Problem
  • The MAXCUT problem.
  • The main difficulty identifying the related
    vertexes.
  • Possible solution
  • Use some sort of messy genes to put related genes
    close together.
  • Use the Puzzle approach to keep them together.

71
Outline
  • The Shortest Common Superstring problem.
  • DNA sequencing and the input domain.
  • Standard and cooperative coevolutionary genetic
    algorithm (GA) experimental results.
  • The Puzzle approach experimental results.
  • The Co-Puzzle algorithm experimental results.
  • Conclusions and future work.

72
Results Summary
size of shortest common superstring
Algorithm
Problem size
GREEDY
Cooperative
Co-puzzle
Puzzle
83 better
50 blocks
42 better
80 blocks
20 problem instances per experiment
25 better
90 blocks
13 better
100 blocks
73
Larger Problems - Using More Species
size of shortest common superstring
Algorithm
Problem size
GREEDY
Co-puzzle
3-Co-puzzle
110 blocks
120 blocks
74
Conclusions
  • Cooperative coevolution might prove deleterious
    when too many species are used.
  • When a suitable number of species are used,
    cooperative coevolution improves performance by
    decomposing the problem to several easier
    subproblems.

75
(Conjectured) Scaling Analysis of Cooperative
Coevolution
76
Conclusions (cont)
  • Evolving a population of building blocks to aid
    in the selection of recombination loci improves
    drastically the performance of a standard GA.
  • Cooperation between cooperative coevolution and
    Puzzle ultimately improves global performance.

77
Future Work
  • The Messy Puzzle Algorithm.
  • Scaling analysis of cooperative coevolution.
  • Test the (Co-) Puzzle approach on other problem
    domains.
  • A hybrid GA.
  • Tackle larger problems.
  • Comparison to greedy-stochastically based
    local-search algorithms.

78
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com