Oligonucleotide Design: Principles and Application in Osprey Dr' Christoph Sensen - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Oligonucleotide Design: Principles and Application in Osprey Dr' Christoph Sensen

Description:

See existing analysis tools in a larger context ... primer binding range is constrained 5'-ward by oligo length (L) and unreadable ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 29
Provided by: umani
Category:

less

Transcript and Presenter's Notes

Title: Oligonucleotide Design: Principles and Application in Osprey Dr' Christoph Sensen


1
Oligonucleotide Design Principles and
Application in OspreyDr. Christoph Sensen
2
Overview
  • Review of the Biochemistry
  • Review of existing techniques and component tools
  • How weve improved calculations in Osprey
  • How to access Osprey
  • Goals
  • See existing analysis tools in a larger context
  • Realize the advantages of computational
    expensive dynamic methods

3
Oligo Usage
  • Osprey focuses on applications that require the
    design of large numbers of oligos, i.e. those
    that benefit from automation.
  • Microarrays
  • Large clone and directed genome sequencing
  • But it can equally be used for a few sequences

4
Thermodynamic Parameters
  • We want
  • Binding to target sequence that has the right
    duplex melting temperature for the experiment
  • Distinguish between similar genes
  • We want to avoid
  • Oligos that fold back on themselves (hairpins)
  • Oligos that bind to other copies of themselves
    (dimers)
  • Oligos that bind well to more than one potential
    location in the DNA sample (secondary or
    off-target binding)

5
Melting Temperatures
  • Oligo duplexes are not two state, therefore the
    melting temperature (Tm) is considered the point
    at which half the duplexes are denatured.
  • Simple models include
  • Wallace Tm 2(AT) 4(GC) (in 1M NaCl)
  • Tm 100.5 41(GC)/(ATGC) - (820/(ATGC))
    16.6log10(Na)
  • There are many models for determining melting
    temperature, but they share a common theme...

6
Oligonucleotide Length
  • Melting temperature is directly proportional to
    oligo duplex length, and dependent on the base
    composition of the sequence.
  • These two factors create a sweet spot for
    finding oligos of the appropriate melting
    temperature based on a oligo length base
    composition probabilities in the input sequence
  • e.g. In Desulfolovibrio vulgaris (65 GC) the
    average 70mer melts at about 100C in 1M NaCl
  • In Sulfolobus solfataricus (37 GC) it's about
    65C

7
Melting Temperature Complex
  • More throrough models are based on physics
    melting temperature is determined by the enthalpy
    and the entropy of the oligo duplex
  • Tm ?Hº/(?Sº R ln (C/4))
  • Where
  • ?Hº enthalpy (order) for the whole duplex
  • ?Sº entropy (disorder) for the whole duplex
  • R molar gas constant, 1.987 cal/(Kmol)
  • C concentration of DNA

8
DNA Duplex Energy
  • There are several models to determine enthalpy
    and entropy, all based on the accepted Nearest
    Neighbour (NN) concept the overall energy of the
    duplex can be predicted by summing the
    interaction of adjacent basepairs. e.g.
  • Enthalpy (H) of GCCCTA
  • H(GC)2H(CC)H(CT)H(TA)
  • Neighbour energies are experimentally derived

9
NN Techniques
  • Models derived from empirical data include
    Gotoh, Vologodskii, Breslauer, Benight, Sugimoto
  • SantaLucia created a unified model from the
    combination of these datasets and his own data,
    generally considered the best model yet.

10
Popular Oligo Tools
  • Primer3 (Uses Breslauer model, good for PCR
    primers, interactive usage)
  • HyTher (SantaLucia's own system, accurate,
    interactive usage)
  • No product has dominated the microarray design
    market yet, most chips are designed using
    commercial software
  • Free software includes OligoArray, OligoChecker,
    PrimeGens

11
Why develop new tools?
  • Many programs require intimate knowledge of
    parameters or cast a wide net to find oligos
  • For large scale projects such as microarrays, the
    accuracy of the oligos can greatly affect their
    usability (e.g. ensuring even melting
    temperatures)
  • Many tools can't deal with large secondary
    binding data (i.e., the known transcriptome) well.

12
Osprey Goals
  • Require minimal user parameterization data
    preprocessing
  • Deal nicely with large datasets
  • Be as accurate as possible
  • Be fast

13
General Procedure for Design (Sequencing or
Microarray)
  • Read in sequence
  • Filter repetitive elements
  • Determine optimal melting temperature/ oligo
    length
  • Find candidate duplexes
  • Check for undesirable oligo configurations
  • Check for undesirable binding to the potential
    sample

14
Primers for sequencing reactions
  • Target site T1 primer binding range is
    constrained 5-ward by oligo length (L) and
    unreadable bases (U) after priming site, then
    3-ward by the minimum bases post-T1 we wish to
    elucidate
  • The expected read length (R) gives us a range
    within those limits, the top rectangle being the
    most 5 option, the bottom rectangle being the
    most 3 (desirable option)

15
Minimizing sequencing primers
  • If multiple sites need to be elucidated, and they
    are close together, we may be able to maximize
    coverage while minimizing reactions by biasing
    towards 3 priming locations (e.g. cover both T1
    and T2 in the previous diagram).

BAC clone pre and post application of polishing
primers
16
PCR products microarrays
  • Using available clones, or using ESTs/genome
    design PCR primers to generate amplicons from DNA
  • Primers should not be more than about 800-1000 bp
    apart due to limits in PCR product length
  • For eukaryotes, probes work better if at 3 end
    of coding sequence, since reverse transcription
    is usually primed off the poly(A) tail of mRNA in
    the experiments
  • For prokaryotes, since random hexamers are used
    for reverse transcription, 5 area of sequence is
    slightly more likely to be reverse transcribed.

Primer2
Primer1
PCR prod. lt 1000bp
3
5
Coding sequence
AAAA
RT cDNA
Euk. priming
Prok. priming
17
Oligo Microarray Search Techniques
  • Microarrays of 70mer oligos complementary to cDNA
    sequences, all should have similar melting temp.,
    etc.
  • Run the program iteratively, starting with strict
    constraints, and automatically relax them as
    needed, optimizing oligo similarity of length and
    melting temperature
  • Why reinvent the wheel? There are techniques and
    tools that can be used and adapted to satisfy the
    oligo selection criteria
  • These include heuristic methods such as BLAST
    for filtering, and dynamic methods where utmost
    accuracy is required

18
Repetitive Analysis Tools
  • MEGABLAST compares two large DNA sequences
    against each other. If the query and the subject
    are the same, you find repeats!
  • We use this in Osprey, iteratively, to filter out
    repeats, leaving one copy of each repeat.
  • The key parameters to tweak in MEGABLAST are word
    length (min. exact match), ID cutoff, and filter
    disabling (will ignore repeats instead of marking
    them)
  • Also useful for cross-genome comparisons

19
DNA Folding
  • The most commonly used tool for check the folding
    confirmation of nucleice acids is the DNA-FOLD
    family of programs from Zuker. We use this to
    check hairpins.
  • Rule of thumb (in kcal/mol) 10(GC-35)/5(oligo
    length)/20
  • A member of the package useful for large scale
    analysis (many small seqs) is quikfold sic,
    which produces only thermodynamic statistics
    rather than the pretty fold images available from
    the other programs

20
DNA Folding (contd)
Copy 1 5 CGAACGTTGAACGTT 3 Copy 2 3
TTGCAAGTTGCAAGC 5
  • Sequences that are largely self-complementary may
    form dimers, and should be avoided with the same
    guidelines as hairpins.
  • Both the HyTher and Mfold Web servers can be used
    to calculate dimer energy.

21
Secondary Binding
  • The dataset to check against for a microarray
    would be the all the ORFs (prokaryotes) or all of
    the known cDNA (eukaryotes)
  • For sequencing it would be the vector and any
    known sequence from the clone
  • Most software uses a ID cutoff, but the location
    of the mismatches can greatly change the effect
    on melting temperature!

22
Accurate Secondary Binding Checks
  • Free energy (G) is derived from the same enthalpy
    and entropy values as temperature, so why not use
    a dynamic method to optimize it in sequence
    searches? ?Gº ?Hº - T?Sº

Free energy (G) vs. Tm for left to right, random
50-, 25-, 20-, 15-, and 10-mers
23
Profile Alignments
  • Hidden Markov Models such as found in the Pfam
    database, and Profiles such as found in ProSite
    are types of Position Specific Scoring Matrices
    (PSSMs).
  • At each model position, a score is given to each
    possible match/mismatch. An A-G mismatch in
    different positions may score differently,
    usually based on frequencies in MSAs. Below is
    the start of a Pfam protein model match

24
Profile Alignments
  • We can also use the PSSM to encode nearest
    neighbour free energy (G) thermodynamics
  • Note that the first A matches with a score of
    580, the second with a score of 1000, this is a
    position specific differences because they have
    different neighbours (TA vs. AA)

25
Caveats
  • Profiles, like Smith-Waterman alignments, use
    dynamic methods, therefore the optimal solution
    will always be found
  • Using BLAST to find secondary binding, you may
    miss matches with good thermodynamics, because of
    the shortcuts used to make BLAST fast.
  • At the very least use a dynamic method to check
    oligo secondary matches. Profiles will
    additionally tell you not only a ID, but the
    free energy

26
Results Interpretation
  • Given a free energy cutoff, we can check all
    secondary results for their melting temperature,
    and ensure it is outside the margin desired by
    the user (e.g. more than 10 degrees below the
    melting temperature of the target duplex)

27
Hardware Acceleration
  • Dynamic methods such as PSSMs are expensive, but
    we have hardware accelerators.
  • Decypher runs these searches at least 50 times
    faster than software, so we can use dynamic
    methods on the Sulfolobus genome and design a
    chip of 3456 optimized oligos in 1.5 hours

28
Web Interface
  • A Web interface using the hardware acceleration
    for the design of microarrays is available at
  • http//osprey.ucalgary.ca/
  • Your assignment
  • Try the Osprey interface for microarray and PCR
    primer design
Write a Comment
User Comments (0)
About PowerShow.com