Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignmen - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignmen

Description:

Selection of optimal oligonucleotide probes for. microarrays using ... Nuwaysir E, et al., Molecular Carcinogenesis 24:153-159 (1999) GeneChip Probe Arrays ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 20
Provided by: jhs81
Category:

less

Transcript and Presenter's Notes

Title: Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignmen


1
Selection of optimal oligonucleotide probes
formicroarrays using multiple criteria,
globalalignment and parameter estimationXingyua
n Li, Zhili He1 and Jizhong Zhou1.61146123
Nucleic Acids Research, 2005, Vol. 33, No. 19.
Presented by Deepti Malhotra Biological Sequence
Analysis
2
MICROARRAY - What is it?
Analysis of the relative expression level of
hundreds or thousands of genes simultaneously by
determining the amount of messenger RNA (mRNA)
that is present in a single experiment.
Labeled Target
Probe (gene of interest)
matrix
3
cDNA Microarray NIEHS Tox Chip
Nuwaysir E, et al., Molecular Carcinogenesis
24153-159 (1999)
4
GeneChip Probe Arrays
Hybridized Probe Cell
GeneChip Probe Array
Single stranded, fluorescently labeled DNA target
Oligonucleotide probe
24µm
Each probe cell or feature contains millions of
copies of a specific oligonucleotide probe
1.28cm
Over 200,000 different probes complementary to
genetic information of interest
Courtesy Affymetrix
Image of Hybridized Probe Array
5
GeneChip? Probe Arrays
GeneChip? Probe Array
Probe Pair
Probe Set
PM
MM
Hybridized Probe Cell
Probe Cell (feature)
Image of Hybridized Probe Array
6
Multiple Specific Probe Pairs per Gene
(25-mers)
(25-mer)
nature genetics supplement volume 21 january
1999
7
Detection of genes using Oligos
Oligo arrays
cDNA arrays
Gene on
150 µm
24 µm
Gene off
Detection Pattern
Single Spot
8
Synthesis of Ordered Oligonucleotide Arrays
9
Whats the complexity?
  • More genes
  • More information per experiment

Feature Size
Features/Chip
Genes/Chip
100 µm 50 µm 20 µm 10 µm
16,384 65,538 409,600 1,638,400
409 1,638 10,240 40,960
Using 20 probe pairs per gene
10
Qualitative and Quantitative Scoring importance
of probe design
RNA scored as Absent Signal 1,270
RNA scored as Present Signal 1,250
11
Procedures for Target Preparation
Cells
Labeled transcript
AAAA
IVT (Biotin-UTP Biotin-CTP)
L
L
L
L
Poly (A) RNA
cDNA
Fragment (heat, Mg2)
L
L
Wash Stain
Hybridize (16 hours)
L
L
Scan
Labeled fragments
12
Expression AnalysisHybridization and Staining
Array
Hybridized Array
cRNA Target
Streptravidin-phycoerythrin conjugate
13
Creating cRNA from Original RNA Sample
  • Isolated mRNA is reverse transcribed into cDNA
    using a T7-primer, which contains a poly-T site
    to bind and select mRNA for amplification.
  • E. coli RNase H digests the original RNA, leaving
    the cDNA behind.
  • E. coli Polymerase I is added to synthesize a
    complimentary strand of cDNA.
  • The two strands of cDNA are denatured and T7-RNA
    polymerase transcribes cRNA while incorporating
    biotinylated nucleotide bases.

14
Why So Many Probe Pairs?
Probe Pairs
Gene of Interest
  • Point Mutations, Deletions, or Insertions will
    not effect the detection of the gene of interest.
  • Bioinformatics algorithm will account for
    expression across 11 different probe pairs to
    calculate expression of gene.

15
Redundancy of probe synthesis
  • Multiple Indicators for the Same Gene Ensures
  • Quantitative accuracy
  • High sensitivity
  • Indicators of oligonucleotide Specificity
  • Sequence identity to non-targets
  • Continuous stretch to non-targets
  • Free energy of Binding to the non-targets
  • All these 3 criteria important for the selection
    of optimal probes

16
Problems with probe synthesis addressed by
CommOligo
  • Representation of each sequence in a genome wide
    search
  • Liberal cut-offs and fewer non specifics
  • Generally use BLAST for local alignment or Suffix
    arrays for exact string search
  • Homologous sequence studies versus whole genome
    arrays ? Applicability to experiments
  • Experimental threshold determination
  • Inherent variability

17
CommOligo - Algorithm
  • Series of filters check Oligos
  • Probe optimization is iterative
  • Takes into account all the 3 main criterias
  • Thresholds and parameters are user adjustable
  • Cutoff for identity, stretch and free energy ?
    CommOligo_PE ? applicable to both Whole genome
    sequences and highly homologous sequences.

18
Series of filters checking Oligos
Cut offs based on CommOligo_PE
Parameters and thresholds are user adjustable
Iterative probe optimization
All 3 criterias included
19
Filter 1
  • Continuous stretch search by scanning all
    sequences in length of 10 in a table sized 410 ?
    Mask the stretches longer than user specified
    values ? Score selected oligos for matches to non
    target sequences ? self annealing measurement
  • Sequence identity filter removes oligonucleotide
    with identities to non targets higher than a
    user-specified threshold. Uses optimal gap
    alignment not BLAST.
  • Binding free energy of an oligo is calculated as
    the minimal free energy of binding to its non
    targets.
  • Global alignment algorithms ? Dynamic programming
    matrices with mismatch/gap scores.
  • Binding free energy is calculated using
    parameters from MFOLD used for RNA structure
    determination. But this free energy is calculated
    at 37C and not hybridization temperature

20
Sequence alignment strategy
Dynamic Programming Matrix
  • Uses bit scores from Myers algorithm during
    identity calculation
  • An alignment corresponds to the path from bottom
    row with high identity/ score to the top row.
  • Traverse path/ last path

21
Best alignment path search
22
Filter 2
  • Tm is calculated using Nearest neighbor model
    using a predefined algorithm and at a fixed DNA
    concentration of 10µM.
  • Tm interval ? one with maximum number of
    sequences that have probes.
  • Optimize and choose best probes
  • Minimal cross-hybridization and located in
    different regions of the target selected .
  • For the same target identities between the 2
    probes must be less than a user defined
    threshold.
  • Probes are scored between 0 and 1.

23
Final optimization and scoring
  • Quality score is calculated as
  • CommOligo_PE used to determine the thresholds and
    the probes are optimized for maximum coverage and
    correctness by calculating
  • The goal is to maximize NPV and C
  • Cross validation by dividing into subsets of 10
    randomly and using one as a test calibration is
    run 10 times.

24
Results
Training sets
25
(No Transcript)
26
Genome wide analysis
Homologous sequence searches
27
(No Transcript)
28
Take home message
  • CommOligo works well with Homologous sequences ?
    3 stringent criteria's ? cDNA
  • Still works well at the same thresholds for
    genome wide searches ? Oligochip
  • Actual hybridization data is used
  • Better identity and minimum energy filters
  • Optimal Tm for the hybridization reaction is
    based on the oligos selected after having passed
    all the filters and not all the possible oligos
  • Iterative threshold optimization
Write a Comment
User Comments (0)
About PowerShow.com