Title: Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignmen
1Selection of optimal oligonucleotide probes
formicroarrays using multiple criteria,
globalalignment and parameter estimationXingyua
n Li, Zhili He1 and Jizhong Zhou1.61146123
Nucleic Acids Research, 2005, Vol. 33, No. 19.
Presented by Deepti Malhotra Biological Sequence
Analysis
2MICROARRAY - What is it?
Analysis of the relative expression level of
hundreds or thousands of genes simultaneously by
determining the amount of messenger RNA (mRNA)
that is present in a single experiment.
Labeled Target
Probe (gene of interest)
matrix
3cDNA Microarray NIEHS Tox Chip
Nuwaysir E, et al., Molecular Carcinogenesis
24153-159 (1999)
4GeneChip Probe Arrays
Hybridized Probe Cell
GeneChip Probe Array
Single stranded, fluorescently labeled DNA target
Oligonucleotide probe
24µm
Each probe cell or feature contains millions of
copies of a specific oligonucleotide probe
1.28cm
Over 200,000 different probes complementary to
genetic information of interest
Courtesy Affymetrix
Image of Hybridized Probe Array
5GeneChip? Probe Arrays
GeneChip? Probe Array
Probe Pair
Probe Set
PM
MM
Hybridized Probe Cell
Probe Cell (feature)
Image of Hybridized Probe Array
6Multiple Specific Probe Pairs per Gene
(25-mers)
(25-mer)
nature genetics supplement volume 21 january
1999
7Detection of genes using Oligos
Oligo arrays
cDNA arrays
Gene on
150 µm
24 µm
Gene off
Detection Pattern
Single Spot
8Synthesis of Ordered Oligonucleotide Arrays
9Whats the complexity?
- More genes
- More information per experiment
Feature Size
Features/Chip
Genes/Chip
100 µm 50 µm 20 µm 10 µm
16,384 65,538 409,600 1,638,400
409 1,638 10,240 40,960
Using 20 probe pairs per gene
10Qualitative and Quantitative Scoring importance
of probe design
RNA scored as Absent Signal 1,270
RNA scored as Present Signal 1,250
11Procedures for Target Preparation
Cells
Labeled transcript
AAAA
IVT (Biotin-UTP Biotin-CTP)
L
L
L
L
Poly (A) RNA
cDNA
Fragment (heat, Mg2)
L
L
Wash Stain
Hybridize (16 hours)
L
L
Scan
Labeled fragments
12Expression AnalysisHybridization and Staining
Array
Hybridized Array
cRNA Target
Streptravidin-phycoerythrin conjugate
13Creating cRNA from Original RNA Sample
- Isolated mRNA is reverse transcribed into cDNA
using a T7-primer, which contains a poly-T site
to bind and select mRNA for amplification. - E. coli RNase H digests the original RNA, leaving
the cDNA behind. - E. coli Polymerase I is added to synthesize a
complimentary strand of cDNA. - The two strands of cDNA are denatured and T7-RNA
polymerase transcribes cRNA while incorporating
biotinylated nucleotide bases.
14Why So Many Probe Pairs?
Probe Pairs
Gene of Interest
- Point Mutations, Deletions, or Insertions will
not effect the detection of the gene of interest. - Bioinformatics algorithm will account for
expression across 11 different probe pairs to
calculate expression of gene.
15Redundancy of probe synthesis
- Multiple Indicators for the Same Gene Ensures
- Quantitative accuracy
- High sensitivity
- Indicators of oligonucleotide Specificity
- Sequence identity to non-targets
- Continuous stretch to non-targets
- Free energy of Binding to the non-targets
- All these 3 criteria important for the selection
of optimal probes
16Problems with probe synthesis addressed by
CommOligo
- Representation of each sequence in a genome wide
search - Liberal cut-offs and fewer non specifics
- Generally use BLAST for local alignment or Suffix
arrays for exact string search - Homologous sequence studies versus whole genome
arrays ? Applicability to experiments - Experimental threshold determination
- Inherent variability
17CommOligo - Algorithm
- Series of filters check Oligos
- Probe optimization is iterative
- Takes into account all the 3 main criterias
- Thresholds and parameters are user adjustable
- Cutoff for identity, stretch and free energy ?
CommOligo_PE ? applicable to both Whole genome
sequences and highly homologous sequences.
18Series of filters checking Oligos
Cut offs based on CommOligo_PE
Parameters and thresholds are user adjustable
Iterative probe optimization
All 3 criterias included
19Filter 1
- Continuous stretch search by scanning all
sequences in length of 10 in a table sized 410 ?
Mask the stretches longer than user specified
values ? Score selected oligos for matches to non
target sequences ? self annealing measurement - Sequence identity filter removes oligonucleotide
with identities to non targets higher than a
user-specified threshold. Uses optimal gap
alignment not BLAST. - Binding free energy of an oligo is calculated as
the minimal free energy of binding to its non
targets. - Global alignment algorithms ? Dynamic programming
matrices with mismatch/gap scores. - Binding free energy is calculated using
parameters from MFOLD used for RNA structure
determination. But this free energy is calculated
at 37C and not hybridization temperature
20Sequence alignment strategy
Dynamic Programming Matrix
- Uses bit scores from Myers algorithm during
identity calculation - An alignment corresponds to the path from bottom
row with high identity/ score to the top row. - Traverse path/ last path
21Best alignment path search
22Filter 2
- Tm is calculated using Nearest neighbor model
using a predefined algorithm and at a fixed DNA
concentration of 10µM. - Tm interval ? one with maximum number of
sequences that have probes. - Optimize and choose best probes
- Minimal cross-hybridization and located in
different regions of the target selected . - For the same target identities between the 2
probes must be less than a user defined
threshold. - Probes are scored between 0 and 1.
-
23Final optimization and scoring
- Quality score is calculated as
- CommOligo_PE used to determine the thresholds and
the probes are optimized for maximum coverage and
correctness by calculating - The goal is to maximize NPV and C
- Cross validation by dividing into subsets of 10
randomly and using one as a test calibration is
run 10 times. -
24Results
Training sets
25(No Transcript)
26Genome wide analysis
Homologous sequence searches
27(No Transcript)
28Take home message
- CommOligo works well with Homologous sequences ?
3 stringent criteria's ? cDNA - Still works well at the same thresholds for
genome wide searches ? Oligochip - Actual hybridization data is used
- Better identity and minimum energy filters
- Optimal Tm for the hybridization reaction is
based on the oligos selected after having passed
all the filters and not all the possible oligos - Iterative threshold optimization