Discrete and Genetic Algorithms in Bioinformatics - PowerPoint PPT Presentation

PPT – Discrete and Genetic Algorithms in Bioinformatics PowerPoint presentation | free to view - id: 7dcb78-ODE4O

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

Discrete and Genetic Algorithms in Bioinformatics

Description:

Discrete and Genetic Algorithms in Bioinformatics */27 – PowerPoint PPT presentation

Number of Views:164
Avg rating:3.0/5.0
Slides: 28
Provided by: Hsu
Category:
Tags:
Transcript and Presenter's Notes

Title: Discrete and Genetic Algorithms in Bioinformatics

1
Discrete and Genetic Algorithms in Bioinformatics
• ???
• ????????

2
Discrete Algorithms
• Discrete Math. lies in the foundation of modern
computer science
• Most algorithms we have learned in computer
science are discrete
• Discrete algorithms emphasize worst case
analysis
• Many sequence manipulation algorithms in
bioinformatics are discrete

3
Natural Problems (1)
• Natural problems Problems arisen from nature,
which are guaranteed to have feasible solutions
if data is collected accurately.
• But because of noises in sampled data, such
solutions are hard to come by.
• To tackle these problems one should focus on real
data rather than worst case analysis.

4
Natural Problems (2)
• Techniques taking advantage of the natural
constraints of these problems do not necessarily
work for general data (especially the worst
case), but could perform very well for those
well-structured problems.
• Examples
• many computational problems arisen from biology,
speech recognition, and image processing

5
Constraints with Errors
• In ordinary constraint optimization problems, one
naturally assumes that the constraints are
correct.
• What if these constraints are inconsistent?
• There is no feasible solution satisfying them
• What if every constraint is only partially
correct?

6
Explicit Solution Candidates
• In ordinary optimization problems, most
algorithms do not generate plausible solutions in
the interim
• However, there are advantages to have some
solution candidates when there are errors in the
constraints.

7
Plausible Solution Candidates
• For some optimization problems, machine learning
approaches generate plausible solutions in the
interim.
• Solutions are getting better while the machine
learning approach refines solution patterns
iteratively.
• A better solution emerges from the cooperation of
plausible solution candidates.

8
Fitness Landscape
• Each solution candidate has its fitness score for
the optimization problem.
• A fitness landscape shows the fitness
distribution of the whole search space.
• Solution candidates are ranked by fitness
judgment.

9
Genetic Algorithm
• A search technique to find the exact or
approximate solutions to optimization problems.
• It is based on the principle of evolution
• Survival of the fittest in Natural Selection
• Two basic processes from evolution
• Inheritance (passing of features from one
generation to the next)
• Competition (survival of the fittest)

10
Basic description of GA
• Algorithm is started with a set of solutions
(represented by chromosomes) called population.
• Solutions from one population are taken and used
to form a new population.
• The new population (offspring) will be better
than the old one (parent).
• Solutions which are selected to form new
solutions are selected according to their fitness
- the more suitable they are the more chances
they have to reproduce.

11
GA in Pseudo-code
• Choose initial population
• Evaluate the fitness of each individual in the
population
• Repeat
• Select best-ranking individuals to reproduce
• Breed new generation through crossover and
mutation (genetic operations) and give birth to
offspring
• Evaluate the individual fitness of the offspring
• Replace worst ranked part of population with
offspring
• Until termination

12
Building Block Hypothesis
• Building block a short and highly fit schema
providing benefit for the solution.
• The global optimal solution is made up of
building blocks.
• Identify, recombine, and resample small building
blocks to form a new solution with potentially
higher fitness.
• By working with these particular building blocks,
we have reduced the complexity of our problem.

13
The Fitness Function
• Plays the role of a judge
• Give more scores if the individual owns more
building blocks
• Refine the fitness function based on the
evolution results

14
Physical Mapping
15
Cutting and reassembling for DNA sequence
• Cut a DNA sequence into small pieces in different
ways and reassemble them together
• the small pieces (called clones) are still too
large to find complete sequences
• biologically, use probeto mark the clones
• each probe could mark several clones clone could
contain several probes

16
The Physical Mapping Problem with Noisy Genomic
Data Journal of Computational Biology 10(5),
709-735, 2003
• Each row represents a clone Each column
represents a probe
• Diagram on the left input clone-probe matrix
• Diagram on the right after probe arrangement the
clones are put in correct positions

17
Consecutive Ones with Errors
18
False Positives and False Negatives
19
A genetic algorithm for physical mapping
• A two-stage genetic algorithm
• First stage generate the neighborhood
information among probes
• Second stage generate the maximum length of
connecting probes

20
The first stage of GA (GA1)
• Purpose find a probe ordering with the highest
fitness score for each clone.
• Pseudo Code
• Random generate a population of probe
permutations
• Evaluate the fitness of each individual in the
population
• Repeat
• Select best-ranking individuals to reproduce
• Breed new generation through crossover and
mutation (genetic operations) and give birth to
offspring
• Evaluate the individual fitnesses of the
offspring
• Replace worst ranked part of population with
offspring
• Until termination

21
The first stage of GA (GA1)
4 1 2 3 5 8 6 9 11 12 13 14 15 17 18
? ? ?
? ?
? ? ?
? ? ? ? ? ? ?
? ? ?
? ? ? ? ?
? ? ? ?
Two building blocks that make partial
consecutive ones
? ? ? ?
22
Crossover Operation
2 3 6 8 1 9 10 12 13 5 11 14 15 17 18
P1
9 10 11 12 13 14 8 18 17 6 5 3 2 1 15
P2
Child
2 3 6 8 1 9 10 11 12 13 14 18 17 5 15
2 3 6 8 1 9
2 3 6 8 1 9 10
2 3 6 8 1 9 10 11
2 3 6 8 1 9 10 11 12
23
Mutations
2 3 6 8 1 9 10 12 13 5 11 12 15 17 18
2 3 6 8 5 9 10 12 13 1 11 12 15 17 18
24
Detection of false Negatives
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
? ? ? ?
? ? ?
? ? ? ? ? ? ?
? ? ? ? ?
? ? ? ? ? ?
? ? ? ? ? ?
? ? ? ? ? ?
? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
25
The first stage of GA (GA1)
• Construct the probe neighboring information
according to the GA1 results

1 2 3 5 6 8 9 10 11 12 13 14 15 17 18
Probe ordering result for probe segment 1
5 6 7 8 9 10 11 13 14 15 16 17 18 19 20
Probe ordering result for probe segment 2
.
83 85 86 87 88 89 90 91 92 93 95 96 97 98 99
Probe ordering result for probe segment 20
5 3, 6 6 5, 8 8 6, 9 18 17
5 6 6 5, 7 7 6, 8 20 19
5 3, 6 6 5, 7, 8 7 6, 8, 9 20 19
A neighboring probe list

Probe neighboring information
26
The second stage of GA (GA2)
• Purpose find the longest connecting probe
sequence according to the probe neighboring
information.
• Pseudo Code
• Random generate a population of probe
permutations
• Evaluate the fitness of each individual in the
population
• Repeat
• Select best-ranking individuals to reproduce
• Breed new generation through crossover and
mutation (genetic operations) and give birth to
offspring
• Evaluate the individual fitnesses of the
offspring
• Replace worst ranked part of population with
offspring
• Until termination

27
The second stage of GA (GA2)
• Generate a probe ordering according to the probe
neighboring information

1 2 2 1, 3 3 2, 4, 5 4 3, 5 5 3, 4,
6 6 5, 7, 8 7 6, 8, 9 99 97, 98
2 3 5 4 71 72 73 55 56 57 99 98 97 96
1 2 3 4 5 6 7 93 94 95 96 97 98 99