Sequence Optimization For Synthetic Genes Using Genetic Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Description:

1 School of Computer Science Telecommunications and. Information Systems ... 'beads on a string' form of Chromatin. 30 nm chromatin fiber of packed nucleosomes ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 43
Provided by: robvoge
Category:

less

Transcript and Presenter's Notes

Title: Sequence Optimization For Synthetic Genes Using Genetic Algorithms


1
Sequence Optimization For Synthetic GenesUsing
Genetic Algorithms
  • David Sigfredo Angulo1
  • Rob Vogelbacher1, Benjamin R. Capraro2, Tobin
    Sosnick2,
  • Shohei Koide2
  • 1 School of Computer Science Telecommunications
    and
  • Information Systems DePaul University
  • 2 Department of Biochemistry and Molecular
    Biology
  • The University of Chicago

2
Introduction
  • Genetic Algorithms
  • Using ideas based on the biology of genes
  • Create software to use such a stochastic means to
    search through large searchspaces
  • Resulting algorithm has nothing to do with genes
  • Designing Genes
  • This search space is huge
  • REALLY NOVEL IDEA
  • Use Genetic Algorithms based on genes to design
    genes!!

3
Outline
  • Short biology Tutorial
  • DNA Sequence Generation
  • Why is the problem difficult?
  • IBG Gene Designer
  • Genetic Algorithm (GA) solution
  • Heuristics and Fitness Evaluation

4
First
  • Before the problem can be described
  • Must give some background biochemistry principles
  • Tutorial outline
  • DNA
  • Codons
  • Protein
  • Synthetic genes
  • What are they and what are they used for?
  • Restriction Enzymes
  • Expressing Proteins using Vectors

5
Transcription/Translation
  • Transcription
    Translation
  • DNA RNA
    Protein
  • RNA Polymerase Ribosomes

6
DNA
  • Deoxyribonucleic acid
  • Strand backbone is made of sugar phosphate
    molecules
  • Strands connected by nitrogen containing
    nucleotide bases
  • Two strands join making a double helix
  • Each strand is made of nucleotides joined together

7
2 nm 11 nm
30 nm 300 nm
700 nm 1100 nm
  • Short region of DNA 2bl helix
  • "beads on a string" form of Chromatin
  • 30 nm chromatin fiber of packed nucleosomes
  • Section of chromosome in an extended form
  • Condensed section of chromosome
  • Entire mitotic chromosome

8
DNA
Four Nucleotides AGTC
9
DNA Base Pairing
10
Short Biology Tutorial
  • Tutorial outline
  • DNA
  • Codons
  • Protein
  • Restriction Enzymes
  • Expressing Proteins using Vectors

11
DNA Sequence GenerationCodon to Amino Acid
Translation
  • http//campus.queens.edu/faculty/jannr/Genetics/im
    ages/codon.jpg

12
Short Biology Tutorial
  • Tutorial outline
  • DNA
  • Codons
  • Protein
  • Restriction Enzymes
  • Expressing Proteins using Vectors

13
Proteins AA Chains
14
Proteins
  • Amino Acid Chains Fold Into complex 3D Structures
  • Functional properties depend on3D structure
  • Usefulness depends onfunctional properties
  • E.g. designing drugs

15
Designed/Expressed Proteins Extremely Useful
  • Designed Proteins
  • Can be used to study protein structure
  • Can be used to study effects of otther proteins
  • Can be designed to knock out other proteins
  • Can be designed to block the acgtion of other
    proteins
  • Expressed proteins
  • Expressed in cows milk or chicken eggs
  • Can manufacture drugs on large scales in this way
  • E.g. insulin

16
Synthetic Genes
  • DNA sequences
  • backtranslated from a novel Protein or Amino
    Acid sequence

  • Transcription Translation
  • DNA
    RNA Protein
  • RNA Polymerase
    Ribosomes
  • Well put the DNA for our designed protein into
    an organism (a vector)
  • Then that vector will make (express) our protein
  • But, how do we get the DNA into an organism???

17
Short Biology Tutorial
  • Tutorial outline
  • DNA
  • Codons
  • Protein
  • Restriction Enzymes
  • Expressing Proteins using Vectors

18
Restriction Enzyme Digests
  • Watson Crick 1953
  • Took 20 years to be able to do anything with DNA
  • H. Smith (and others) made a discovery that
    allowed manipulation and deciphering of DNA
  • Discovery was that bacteria produced enzymes that
    introduce breaks in double stranded DNA molecules
    whenever they encountered a specific string of
    nucleotides
  • These enzymes are called Restriction Enzymes
  • Restriction Enzymes can be used as precise
    scissors
  • They let biologists cut (and paste) portions of
    DNA

19
EcoRI
  • EcoRI was the very first Restriction Enzyme
    discovered
  • "Eco" because it was isolated from E. Coli
    (Escherichia Coli)
  • "R" because it is a Restriction Enzyme
  • "I" because it was the first Restriction Enzyme
    from E. Coli
  • Now over 300 Restriction Enzymes known
  • EcoRI cleaves (restricts, digests) DNA
  • Between the G and A nucleotides
  • Only when it encounters them in the string
    5'-GAATTC-3'
  • This is called therestriction site

20
Sticky Ends
  • Many restriction enzymes in such a way that some
    single stranded DNA is left at both ends
  • These nucleotide sequences
  • Are complimentary to each other
  • Are 5'-AATT-3' in the case of EcoRI
  • Can base pair with other nucleotides in a
    sequence
  • Thus, are called "sticky ends"
  • Can temporarily hold twoDNA strands together
  • The enzyme ligasewill permanently jointhose
    strands
  • This is calledligation

21
Short Biology Tutorial
  • Tutorial outline
  • DNA
  • Codons
  • Protein
  • Restriction Enzymes
  • Expressing Proteins using Vectors

22
Gene SynthesisOn the Lab Bench
  • Initial Sequence Construction
  • Oligonucleotides (short strands of DNA) are
    defined with complementary overlapping sites
  • The sticky ends
  • Assembly PCR
  • Oligonucleotides and polymerase are mixed and
    placed in a thermocycler
  • Creates contiguous DNA sequence from component
    oligos

23
Gene SynthesisOn the Lab Bench (cont)?
  • After PCR, generated DNA sequence cut with
    restriction enzymes
  • Expression hosts's plasmid cut with restriction
    enzymes
  • Synthetic gene inserted into plasmid and plasmid
    repaired
  • Expression Vectors
  • Host organisms used to express the synthetic
    genes (make the protein)
  • Typically E. Coli
  • Possibly Chickens or Cows
  • Expression vector can now express protein coded
    for by synthetic gene
  • A bit more complicated than described above!!!

24
DNA Sequence GenerationGene Insertion
25
Outline
  • Short biology Tutorial
  • DNA Sequence Generation
  • Why is the problem difficult?
  • IBG Gene Designer
  • Genetic Algorithm (GA) solution
  • Heuristics and Fitness Evaluation

26
DNA Sequence GenerationThe Computational Problem
  • Why is the problem difficult?
  • Conflicting goals
  • Avoid restriction sites
  • Maximizing Codon Preference
  • Thus, cannot use deterministic algorithm
  • Degeneracy (redundancy) of the DNA code 64
    codons, 20 (21) amino acids (see next slide)
  • Several synonymous codons are translated into the
    same amino acid
  • Synonymous codons per AA vary from one to six
    (average is four codons per AA)?
  • Huge number of possible DNA Sequences
  • Average 2N for protein of amino acid length n
  • Codon Preference
  • Varying levels of tRNA assembly components in
    organisms
  • Codon usage for a particular AA greatly influence
    protein expression
  • (continued)

27
DNA Sequence GenerationCodon to Amino Acid
Translation
  • http//campus.queens.edu/faculty/jannr/Genetics/im
    ages/codon.jpg

28
DNA Sequence GenerationThe Computational
Problem (cont)?
  • Why is the problem difficult?
  • (continued)
  • Restriction Enzymes
  • The vector will contain many restriction enzymes
  • If these cut up our DNA, we wont express our
    proteins
  • We must design the DNA string using synonymous
    codons so that there are no restriction sites
  • Helpful to include some other restriction sites
  • We must design the DNA string using synonymous
    codons so that these are included
  • (continued)

29
DNA Sequence GenerationThe Computational
Problem (cont)?
  • Why is the problem difficult?
  • (continued)
  • mRNA Secondary Structure
  • In prokaryotes, mRNA can fold into complex shapes
  • This inhibits protein creation
  • Oligonucleotide generation
  • Want a specific melting temperature so that the
    complex folding doesnt take place
  • The sticky ends must have the same melting
    temperature so that they will bind together.

30
Outline
  • Short biology Tutorial
  • DNA Sequence Generation
  • Why is the problem difficult?
  • IBG Gene Designer
  • Genetic Algorithm (GA) solution
  • Heuristics and Fitness Evaluation

31
IBG GeneDesignerOur Solution
  • IBG GeneDesigner

32
IBG GeneDesignerGenetic Algorithm
  • Uses a Genetic Algorithm for sequence
    optimization
  • Tournament selection model
  • Uniform and single-point crossover (behind the
    scenes not user selectable at present.)?
  • Mutation causes codon wobbling
  • Sequence fitness determined by heuristic
    evaluation

33
IBG GeneDesignerFitness Evaluation
  • GeneDesigner heuristics
  • Manipulation of nucleotide percentages/ratios to
    reduce mRNA secondary structure formation
  • Inclusion and Exclusion of restriction sites
  • Restriction sites requested for inclusion should
    only occur once
  • Matching of codon preference
  • Oligonucleotide generation
  • Fitness determined by melting points, start and
    end nucleotide

34
IBG GeneDesignerFuture Work
  • Algorithm parameters
  • Systematically manipulate GA parameters to
    identify default values for sequence optimization
  • Population size
  • Number of generations
  • Mutation rate
  • Convergence criteria
  • Modify heuristic weighting scheme
  • Selection models
  • Experiment with alternative selection models
    (Roulette wheel, elitism, limit population
    replacement)?

35
IBG GeneDesignerFuture Work
  • Move algorithm to ECJ architecture
  • Use the Strength-Pareto multi-objective
    optimization algorithm
  • Create web-based version of application
  • Explore island model effects on optimization

36
Results
  • IBG GeneDesigner utilized to generate a
    nucleotide sequence for the SH3 domain of
    a-spectrin1.
  • The codon optimization option was set for
    expression in E. coli with a 40 G/C bias
  • We also used the application to generate four
    assembly PCR template oligonucleotide sequences
    to produce the protein coding sequence flanked by
    desired restriction enzyme recognition sites.
  • The calculated Tm values of the three overlapping
    regions were within 1.6oC
  • Promoting similar annealing behavior between
    strands.
  • Success of the reaction was confirmed by DNA
    sequencing of a pUC19 expression vector
    containing the PCR product cloned between
    restriction sites included in the gene design.
  • Summary Protein Made!!!

37
Input Protein Sequnce, Vector, Restriction
Enzymes
38
Input Flanking Sequences
39
Input Algorithm Parameters and Fitness Scores
40
Output Generation of Oligonucleotides
41
(No Transcript)
42
Acknowledgements
  • Graduate student who did much of the coding
  • Rob Vogelbacher
  • University of Chicago undergraduate who used it
    to build a protein
  • Benjamin R. Capraro
  • His advisor
  • Tobin Sosnick
  • Our collaborator at University of chicago
  • Shohei Koide
Write a Comment
User Comments (0)
About PowerShow.com