Computational modeling of microarrays - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Computational modeling of microarrays

Description:

Introduction to Bioinformatics. Bio-nanotechnology: miniaturization and automation ... Bioinformatics in MD Anderson Cancer Center. An award winning team: ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 32
Provided by: zhan6
Category:

less

Transcript and Presenter's Notes

Title: Computational modeling of microarrays


1
Computational modeling of microarrays
  • Li Zhang
  • Department of Biostatistics and Applied
    Mathematics
  • The University of Texas MD Anderson Cancer Center
  • URL http//odin.mdacc.tmc.edu/zhangli

2
Part I
  • Introduction to Bioinformatics

3
Bio-nanotechnology miniaturization and automation
100 mm
200 nm
A. Multilayer elastomer microfluidics for cell
sorting and single cell gene expression
profiling. B. Nonomechanical sensor for DNA
binding. C. Semiconductor nanowire sensor for
protein binding.(Hood et al., 2004. Science 203.)
4
High throughput technologies lead to data
explosion
  • High throughput
  • Microarray 0.1 mg sample on 2 cm x 2 cm chip
  • with 106 probe features giving 106
    measurements.
  • Data explosion
  • DNA Sequence. Mutation. Copy number.
    Methylation. DNA-protein binding.
  • RNA Dynamic abundance.
  • Protein Dynamic abundance. Chemical
    modification. Protein-Protein interaction.

5
New opportunities
  • Global view Characterize cellular life on a
    systemic level
  • Systems biology integration of high throughput
    data to build and characterize gene network.
  • Biomarkers Diagnosis of diseases. Identify risk
    factors for prevention. Treatment response
    markers for personalized medicine.

6
Network model of galactose utilization in yeast
Hood et al., Science. Vol 306. p640. (2004)
7
Bioinformatics in MD Anderson Cancer Center
  • An award winning team
  • Microarray CAMDA 2002, 2003, 2004
  • Proteomics PAMDA 2003.
  • MDACC Faculty Scholar Award 2002, 2003, 2004.
  • Mitchell Prize 2003.
  • Web site http//bioinformatics.mdanderson.org
  • Graduate School (GSBS) http//gsbs.gs.uth.tmc.edu
    /

8
Challenges
  • Data quality Noise and technical bias.
  • Complex data structure Heterogeneity.
  • Biomarkers Multiple testing problem.
  • Network Curse of dimensionality.

9
Part II
  • Modeling Microarrays

10
Microarray Platforms
  • Spotted arrays
  • Inserts from cDNA libraries, PCR products, or
    oligonucleotides
  • Probed with labeled RNA or cDNA from 2 samples
  • Affymetrix GeneChip arrays
  • 25mer oligonucleotides synthesized on a glass
    wafer
  • Probed with labeled RNA or cDNA from a single
    sample

11
Protocol of a microarray experiment
12
Affymetrix GeneChip Probe Arrays
Hybridized Probe Cell
GeneChip Probe Array
Single stranded, fluorescently labeled DNA target
Oligonucleotide probe
24µm
1.28cm
Each probe cell or feature contains millions of
copies of a specific oligonucleotide probe
Over 250,000 different probes complementary to
genetic information of interest
Image of Hybridized Probe Array
13
Double helix on microarrays
  • The probe is a 25-mer DNA oligo

ATCAGCATACGAGAGAATGATGGAT

ATCAGCATACGACAGAATGATGGAT
AAUAGUCGUAUGCUCUCUUACUACCUAGC
cRNA fragment from solution
Average distance between probes is 80Å
14
Technical factors affecting gene expression
measurements
  • Interaction between base pairs (stacking)
  • Interaction with microarray surface
  • Interaction with unintended targets (cross
    hybridization)
  • Kinetic process (equilibration washing)
  • Physical properties of RNA sample
  • Degradation (missing 5 ends)
  • Alternative splicing (missing exons)
  • Secondary structure (RNA hairpins loops)
  • Biotinylation

15
Technical factors affecting gene expression
measurements
  • Interaction between base pairs (stacking)
  • Nearest-neighbor model
  • Interaction with microarray surface
  • Positional dependant weights for stacking
    energies
  • Interaction with unintended targets (cross
    hybridization)
  • PDNN mean field theory
  • Kinetic process (equilibration washing)
  • Langmuir and Sips model
  • Physical properties of RNA sample
  • Degradation (missing 5 ends)
  • Alternative splicing (missing exons)
  • Secondary structure (RNA hairpins loops)
  • Biotinylation

16
Assumption two types of binding
  • Gene-specific binding 25 n.t. exact
    complementary sequences (binding with the
    intended target).
  • Non-specific binding Many (gt5) mismatches or
    short stretches (binding with unintended
    targets).

17
Positional Dependant Nearest-Neighbor (PDNN)
model of molecular interactions
Weighted sum base-pair stacking energies
Gene-specific binding energy
Non-specific binding energy
18
PDNN model of probe signals
Probe Signal
Fitness
  • N, B are the same on a microarray
  • Nj is the same in a probe set.

Constraints
  • Energy parameters
  • B, N, Nj

Minimization of T
Software available at http//odin.mdacc.tmc.edu/
zhangli/PerfectMatch
19
Fitting PDNN model
ln (signal)
Probe index
20
Energy parameters in PDNN model
Weight factors
Stacking energy terms
21
Baseline of non-specific binding
Non-specific binding energy
22
Effects of Mismatches
  • A Mismatch disrupts the double helix formation.
  • Energetically, it is unfavorable for binding.
  • It depends on the context of DNA sequences.

23
Effect of mismatch at base13 depends on the
nearest-neighbors
C
T
A
A
G
24
Sequence dependence of free energy cost of single
mismatch in DNA duplexes
25
Pattern of cross hybridization MM and PM probes
bind to different molecules
Var(ln MM)
Var(ln PM)
Data source Affymetrix HG-U133 spike-in data
set. Large variation indicates resonse to
spike-ins. Number of arrays 42. Number of probes
on an array 0.5 million.
26
Microarray surface effects
  • DNA and RNA are negatively charged.
  • Glass surface also charged
  • Repulsion

27
Pattern of cross hybridization bias towards the
5 end
5 end
28
Sense and antisense
  • Upon binding, sense and antisense probes form the
    same double helix structure.
  • The same interactions should lead to the same
    binding energy.
  • The observed data contradict with this prediction.

29
Contrast of sense and antisense probe signals
  • Y -0.17 0.05 Nt 0.05 Na 0.02 Ng
  • R2 0.67 Sample size875.

Model fitted
Ln (sense probe signal / antisense probe signal)
30
Summary
  • Binding on array surface Probe binding free
    energy can be approximated by a weighted sum of
    base-pair stacking energies, with the probe ends
    having less contributions.
  • Mismatches Mismatches disrupt hybridization,
    especially in cross hybridization. The effects of
    mismatches depend on sequences. The surface also
    an effect.
  • Surface effects Cross hybridization is biased
    towards the 5 end of the probes. Repulsion of
    surface depends on nucleotides.

31
Acknowledgements
Ken Hess Keith A. Baggerly Kevin R. Coombes
James Mitchell Norris Clift Lianchun
Xiao Roberto Carta Chunlei Wu Haitao Zhao Kenneth
D Aldape Michael F Miles
Write a Comment
User Comments (0)
About PowerShow.com