Title: Combinatorial Library Design Using a Multiobjective Genetic Algorithm
1Combinatorial Library Design Using a
Multiobjective Genetic Algorithm
- Valerie J. Gillet, Wael Khatib, Peter Willett,
Peter J. Fleming, and Darren V. S. Green. J.
Chem. Inf. Comput. Sci. 2002, 42, 375-385
Krebs Institute for Biomolecular Research and
Department of Information Studies, University of
Sheffield, Western Bank, Sheffield S10 2TN,
United Kingdom, Department of Automatic Control
and Systems Engineering, University of Sheffield,
Western Bank, Sheffield S10 2TN, United Kingdom,
and GlaxoSmithKline, Gunnels Wood Road,
Stevenage, SG1 2NY, United Kingdom
Presented by Greg Goldgof
2Problem How to Optimize Library Design?
Solution A Multiobjective Genetic Algorithm
Based that determines Pareto Frontiers
3Traditionally
- Library design algorithms focused on diversity.
- Failed to deliver sufficiently improved hit
rates. - Generated chemicals that make undesirable lead
compounds.
4Single-ObjectiveSELECT
5SELECT Extended for Multiple Objectives
6Limitations of MO Search
- The definition of the fitness function can be
difficult especially with noncommensurable
objectives for example, in library design it is
not obvious how diversity should be combined with
cost. - The setting of the weights is nonintuitive for
example, in the SELECT program several
trial-and-error experiments may be required to
choose appropriate weights.22 - The fitness function determines the regions of
the search space that are explored, and combining
objectives via weights can result in some regions
being obscured. - The progress of the search or optimization
process is not easy to follow since there are
many objectives to monitor simultaneously. - (The objectives may be coupled, thus implying
conflict and competition, which can make it more
difficult for the optimization process to achieve
reasonable or acceptable results.
7Limitations of MO Search cont.
- A single solution is found which is typically one
among a family of solutions that are all
equivalent in terms of the overall fitness,
although they may have different values of the
individual objectives. - For example, consider a two-objective problem
where the fitness function is defined as f(n)
) w1x w2ywhere x and y are hypothetical
objectives and w1 and w2 are both set to unity.
The solution x ) 0.4, y ) 0.5 has the same
fitness (0.9) as the potential solution x ) 0.5,
y ) 0.4, and thus both solutions can be
considered as equivalent typically, however,
only one of them will be found.
8Pareto Frontier and Dominated versus Nondominated
9MoSELECT
Evolutionary algorithms, however, operate with a
population of individuals and are thus
well suited to search for multiple solutions in
parallel hence they can be readily adapted to
deal with multiobjective search and optimization.
10Charting the results of MoSELECT with two
parameters
11(No Transcript)
12SELECT Verses MoSELECT
13Convergence Criteria
The second convergence criterion that was
investigated involves calculating the percentage
of nondominated solutions in the Pareto set as
the search progresses This method, however,
did not prove to be effective since there was
no clear trend to indicate what a valid threshold
should be.
14Niche Induction
- Genetic Drift or Speciation
- If the (absolute) difference in the objectives of
the next solution and the objectives of any
solution that already forms the center of a niche
is within a given threshold, for all objectives,
the fitness (or dominance) of the current
solution is penalized otherwise it forms the
center of a new niche. The threshold is also
known as the niche radius.
15Increasing the Number of Objectives
- Diversity
- Cost
- Molecular weight (MW)
- Occurrence of rotatable bonds (RB)
- Occurrence of hydrogen bond donors (HBD)
- Occurrence of hydrogen bond acceptors (HBA)
- etc.
16No Niching
17With Niching
18(No Transcript)
19(No Transcript)
20(No Transcript)
21Future Work
- Future work will investigate the possibility of
interacting with the search process so that the
relationships between objectives are explored
during the search. This will allow the user to
observe which objectives are relatively hard to
improve, which are more easily optimized, and
which objectives are in competition. The search
process itself could then be altered to take
account of these characteristics.
22Discussion Questions
- MoSELECT allows the user to determine the
importance of each parameter. Is this reasonable? - How does the Future Work present solutions to
this problem? Does it solve it? - Why is there no formal analysis of runtime?