Oligonucleotide Design: Principles and Application in Osprey Dr' Christoph Sensen - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Oligonucleotide Design: Principles and Application in Osprey Dr' Christoph Sensen

Description:

See existing analysis tools in a larger context ... primer binding range is constrained 5'-ward by oligo length (L) and unreadable ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 29

Provided by: umani

Category:

more less

Transcript and Presenter's Notes

Title: Oligonucleotide Design: Principles and Application in Osprey Dr' Christoph Sensen

1
Oligonucleotide Design Principles and
Application in OspreyDr. Christoph Sensen
2
Overview

Review of the Biochemistry
Review of existing techniques and component tools
How weve improved calculations in Osprey
How to access Osprey
Goals
See existing analysis tools in a larger context
Realize the advantages of computational
expensive dynamic methods

3
Oligo Usage

Osprey focuses on applications that require the
design of large numbers of oligos, i.e. those
that benefit from automation.
Microarrays
Large clone and directed genome sequencing
But it can equally be used for a few sequences

4
Thermodynamic Parameters

We want
Binding to target sequence that has the right
duplex melting temperature for the experiment
Distinguish between similar genes
We want to avoid
Oligos that fold back on themselves (hairpins)
Oligos that bind to other copies of themselves
(dimers)
Oligos that bind well to more than one potential
location in the DNA sample (secondary or
off-target binding)

5
Melting Temperatures

Oligo duplexes are not two state, therefore the
melting temperature (Tm) is considered the point
at which half the duplexes are denatured.
Simple models include
Wallace Tm 2(AT) 4(GC) (in 1M NaCl)
Tm 100.5 41(GC)/(ATGC) - (820/(ATGC))
16.6log10(Na)
There are many models for determining melting
temperature, but they share a common theme...

6
Oligonucleotide Length

Melting temperature is directly proportional to
oligo duplex length, and dependent on the base
composition of the sequence.
These two factors create a sweet spot for
finding oligos of the appropriate melting
temperature based on a oligo length base
composition probabilities in the input sequence
e.g. In Desulfolovibrio vulgaris (65 GC) the
average 70mer melts at about 100C in 1M NaCl
In Sulfolobus solfataricus (37 GC) it's about
65C

7
Melting Temperature Complex

More throrough models are based on physics
melting temperature is determined by the enthalpy
and the entropy of the oligo duplex
Tm ?Hº/(?Sº R ln (C/4))
Where
?Hº enthalpy (order) for the whole duplex
?Sº entropy (disorder) for the whole duplex
R molar gas constant, 1.987 cal/(Kmol)
C concentration of DNA

8
DNA Duplex Energy

There are several models to determine enthalpy
and entropy, all based on the accepted Nearest
Neighbour (NN) concept the overall energy of the
duplex can be predicted by summing the
interaction of adjacent basepairs. e.g.
Enthalpy (H) of GCCCTA
H(GC)2H(CC)H(CT)H(TA)
Neighbour energies are experimentally derived

9
NN Techniques

Models derived from empirical data include
Gotoh, Vologodskii, Breslauer, Benight, Sugimoto
SantaLucia created a unified model from the
combination of these datasets and his own data,
generally considered the best model yet.

10
Popular Oligo Tools

Primer3 (Uses Breslauer model, good for PCR
primers, interactive usage)
HyTher (SantaLucia's own system, accurate,
interactive usage)
No product has dominated the microarray design
market yet, most chips are designed using
commercial software
Free software includes OligoArray, OligoChecker,
PrimeGens

11
Why develop new tools?

Many programs require intimate knowledge of
parameters or cast a wide net to find oligos
For large scale projects such as microarrays, the
accuracy of the oligos can greatly affect their
usability (e.g. ensuring even melting
temperatures)
Many tools can't deal with large secondary
binding data (i.e., the known transcriptome) well.

12
Osprey Goals

Require minimal user parameterization data
preprocessing
Deal nicely with large datasets
Be as accurate as possible
Be fast

13
General Procedure for Design (Sequencing or
Microarray)

Read in sequence
Filter repetitive elements
Determine optimal melting temperature/ oligo
length
Find candidate duplexes
Check for undesirable oligo configurations
Check for undesirable binding to the potential
sample

14
Primers for sequencing reactions

Target site T1 primer binding range is
constrained 5-ward by oligo length (L) and
unreadable bases (U) after priming site, then
3-ward by the minimum bases post-T1 we wish to
elucidate
The expected read length (R) gives us a range
within those limits, the top rectangle being the
most 5 option, the bottom rectangle being the
most 3 (desirable option)

15
Minimizing sequencing primers

If multiple sites need to be elucidated, and they
are close together, we may be able to maximize
coverage while minimizing reactions by biasing
towards 3 priming locations (e.g. cover both T1
and T2 in the previous diagram).

BAC clone pre and post application of polishing
primers
16
PCR products microarrays

Using available clones, or using ESTs/genome
design PCR primers to generate amplicons from DNA
Primers should not be more than about 800-1000 bp
apart due to limits in PCR product length
For eukaryotes, probes work better if at 3 end
of coding sequence, since reverse transcription
is usually primed off the poly(A) tail of mRNA in
the experiments
For prokaryotes, since random hexamers are used
for reverse transcription, 5 area of sequence is
slightly more likely to be reverse transcribed.

Primer2
Primer1
PCR prod. lt 1000bp
3
5
Coding sequence
AAAA
RT cDNA
Euk. priming
Prok. priming
17
Oligo Microarray Search Techniques

Microarrays of 70mer oligos complementary to cDNA
sequences, all should have similar melting temp.,
etc.
Run the program iteratively, starting with strict
constraints, and automatically relax them as
needed, optimizing oligo similarity of length and
melting temperature
Why reinvent the wheel? There are techniques and
tools that can be used and adapted to satisfy the
oligo selection criteria
These include heuristic methods such as BLAST
for filtering, and dynamic methods where utmost
accuracy is required

18
Repetitive Analysis Tools

MEGABLAST compares two large DNA sequences
against each other. If the query and the subject
are the same, you find repeats!
We use this in Osprey, iteratively, to filter out
repeats, leaving one copy of each repeat.
The key parameters to tweak in MEGABLAST are word
length (min. exact match), ID cutoff, and filter
disabling (will ignore repeats instead of marking
them)
Also useful for cross-genome comparisons

19
DNA Folding

The most commonly used tool for check the folding
confirmation of nucleice acids is the DNA-FOLD
family of programs from Zuker. We use this to
check hairpins.
Rule of thumb (in kcal/mol) 10(GC-35)/5(oligo
length)/20
A member of the package useful for large scale
analysis (many small seqs) is quikfold sic,
which produces only thermodynamic statistics
rather than the pretty fold images available from
the other programs

20
DNA Folding (contd)
Copy 1 5 CGAACGTTGAACGTT 3 Copy 2 3
TTGCAAGTTGCAAGC 5

Sequences that are largely self-complementary may
form dimers, and should be avoided with the same
guidelines as hairpins.
Both the HyTher and Mfold Web servers can be used
to calculate dimer energy.

21
Secondary Binding

The dataset to check against for a microarray
would be the all the ORFs (prokaryotes) or all of
the known cDNA (eukaryotes)
For sequencing it would be the vector and any
known sequence from the clone
Most software uses a ID cutoff, but the location
of the mismatches can greatly change the effect
on melting temperature!

22
Accurate Secondary Binding Checks

Free energy (G) is derived from the same enthalpy
and entropy values as temperature, so why not use
a dynamic method to optimize it in sequence
searches? ?Gº ?Hº - T?Sº

Free energy (G) vs. Tm for left to right, random
50-, 25-, 20-, 15-, and 10-mers
23
Profile Alignments

Hidden Markov Models such as found in the Pfam
database, and Profiles such as found in ProSite
are types of Position Specific Scoring Matrices
(PSSMs).
At each model position, a score is given to each
possible match/mismatch. An A-G mismatch in
different positions may score differently,
usually based on frequencies in MSAs. Below is
the start of a Pfam protein model match

24
Profile Alignments

We can also use the PSSM to encode nearest
neighbour free energy (G) thermodynamics
Note that the first A matches with a score of
580, the second with a score of 1000, this is a
position specific differences because they have
different neighbours (TA vs. AA)

25
Caveats

Profiles, like Smith-Waterman alignments, use
dynamic methods, therefore the optimal solution
will always be found
Using BLAST to find secondary binding, you may
miss matches with good thermodynamics, because of
the shortcuts used to make BLAST fast.
At the very least use a dynamic method to check
oligo secondary matches. Profiles will
additionally tell you not only a ID, but the
free energy

26
Results Interpretation

Given a free energy cutoff, we can check all
secondary results for their melting temperature,
and ensure it is outside the margin desired by
the user (e.g. more than 10 degrees below the
melting temperature of the target duplex)

27
Hardware Acceleration

Dynamic methods such as PSSMs are expensive, but
we have hardware accelerators.
Decypher runs these searches at least 50 times
faster than software, so we can use dynamic
methods on the Sulfolobus genome and design a
chip of 3456 optimized oligos in 1.5 hours

28
Web Interface

A Web interface using the hardware acceleration
for the design of microarrays is available at
http//osprey.ucalgary.ca/
Your assignment
Try the Osprey interface for microarray and PCR
primer design

Write a Comment

User Comments (0)