The physics and calibration of Affymetrix microarrays - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

The physics and calibration of Affymetrix microarrays

Description:

Following discussions with Caroline, Christine, Danielle, Eric, Hugh, Ilhem, ... Affymetrix scanners transform message into light. The Engima scanner has low noise: ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 58
Provided by: chris520
Category:

less

Transcript and Presenter's Notes

Title: The physics and calibration of Affymetrix microarrays


1
The physics and calibration of Affymetrix
microarrays harry_at_biochem.ucl.ac.uk
Following discussions with Caroline, Christine,
Danielle, Eric, Hugh, Ilhem, Kevin, Lucy, Martin,
Martino, Michael, Mike and Paul
2
Animal Biology
Operation and tissue extraction
mRNA preparation
Molecular Biology
Chips are run
Chip calibration
ComputationalBiology
Differentially expressed genes are identified
3
Affymetrix scanners transform message into light
The Engima scanner has low noise SVDFGS is
definitely SVDFGS
Would you prefer to listen to Radio Glasgow with
poor reception or Radio Budapest with clear
reception?
4
Affymetrix microarrays
5
3
GGTGGGAATTGGGTCAGAAGGACTGTGGCTAGGCGC
GGAATTGGGTCAGAAGGACTGTGGC
GGAATTGGGTCACAAGGACTGTGGC
perfect match probe cells
mismatch probe cells
actually scattered on chip
5
Probe cells of an Affymetrix Gene chip contain
millions of identical 25-mers
25-mer
6
Affymetrix Gene chip-Hybridization
7
Affymetrix Gene chip-Fluorescence
8
Affymetrix probe set
9
(No Transcript)
10
Each gene is represented by 16 probe pairs (for
chip rgu34a). Each pair has a perfect match (the
25 base oligonucleotide binds to the gene of
interest) and a mismatch (the central base is
changed).
Outliers?
11
Chip calibration
Correct Background, Normalise, Correct for Cross
Hybridisation, Expression Measure
High-level analysis, biological interpretation
12
Background Fluorescence needs to be corrected
e.g. MAS and RMA algorithms
13
Camel distributions suggest that there are two
populations (detected and not detected?).
14
Chips need to normalised against each other.
Each chip is a different colour
e.g. invariant genes, lowess, quantiles
15
RMA uses Quantile normalisation at the probe level
16
Cross Hybridisation
MAS 5.0 (Affymetrix) corrects for
cross-hybridisation by subtracting the MisMatch
signal from the Perfect-Match.
RMA ignore the mismatches because they hybridise
to the Perfect Signal.
17
Expression Measure
The intensities of the multiple probes within a
probeset are combined into ONE measure of
expression
18
MAS 5.0 (Signal) takes the Tukey bi-weighted mean
of the difference in logs of PM and MM.
19
1-9 are different chips.
dChip and RMA model the systematic
hybridisation patterns when calibrating an
expression measure.
20
Once chips have gone through the calibration
process, changes in gene expression between
conditions or over time can be observed.
mlog2(Fold Change), alog2(Average Intensity)
The change in expression between two conditions
for all the genes on an array can be viewed on a
MA plot
21
Sliding Z Quackenbush (2002)
m - mean(m) standard
deviation (m)
Z
22
signal
bg
At low intensities, the sd is too low.
signal
bg
23
Barenco 2003
Spike-in measurements show there remains
considerable signal at low concentrations.
24
The non-linearity means that Fold Change
(Intensity) is NOT the same as Fold Change
(Transcript)
This causes complications when comparing chips
against mathematical models of changes in gene
expression
It is difficult to establish when a gene is NOT
expressed
The statistical space is also non-linear
25
Cross Hybridisation
MAS 5.0 (Affymetrix) corrects for
cross-hybridisation by subtracting the MisMatch
signal from the Perfect-Match.
RMA ignore the mismatches because they hybridise
to the Perfect Signal.
How can you measure cross-hybridisation without
using the MisMatch signal?
There is a need for a model of the physics of
hybridisation (Naef and Magnasco 2003)
26
GC content is important
AT bonds have two hydrogen bonds. GC have 3
hydrogen bonds
27
(No Transcript)
28
The fraction of overlap between transcript and
probe depends upon the position along the probe
(Maibaum and SantaLucia)
Imagine if all your fragments were of length 20.
Imagine dropping the fragments randomly along a
line of 25
There will also be Duplex breathing and a torque
between the duplex and the unbound fragment
29
Biotin labelling interferes with the hybridisation
C T (pyrimidines) are labelled. So GC binds
less strongly than CG, and AT binding is weaker
than TA.
If the probe contains no C T, it will hybridise
well but with no fluorescence. If you have all C
T, it will have difficulty hybridising.
C and T within your mRNA fragment but immediately
outside your probe will fluoresce and not
interfere with hybridisation
30
Naef and Magnasco 2003 - a key paper
31
e.g. perfect match 13 A, so mismatch 13 is T,
and the complementary base in mRNA is also T/U
Size is important
T
Pyrimidines (C T) are small
There will be no steric hindrance between the
pyrimidine in the mismatch and the pyrimidine in
the mRNA of interest.
C
Purines (G A) are large
G
There will be a large steric hindrance between
the purine in the mismatch and the purine in the
mRNA of interest.
A
32
Naef and Magnasco (2003)
33
From Mei et al. (2003, PNAS) Hybridisation
with respect to A C is red G is
green T is yellow
Affymetrix design their arrays using increasingly
sophisticated models of the physical chemistry of
hybridisation
34
Zhang, Miles and Aldape (2003)
Their model is named Position Dependent Nearest
Neighbour (PDNN)
PDNN has 24 weight factors for Gene Specific
Binding, 24 factors for Non-Specific Binding and
16 stacking energy parameters
They fit their model with a dataset of 5,000,000
probe measurements (40 chips)
35
Naef and Magnasco (2003)
The model contains only position specific
affinities for each base (fitted using 80 chips)
A low order function can be fitted to the
hybridisation for a given base at a given
position. The total hybridisation for the 25 base
sequence is then the sum of the local
hybridisations.
36
If your probe contains lots of As in the centre
Position along probe
The complementary sequence will contain lots of
Ts (biotin interference)
There will be lots of AT bonds which means weak
2-hydrogen bonds
37
If your probe contains lots of Cs in the centre
Position along probe
The complementary sequence will contain lots of
Gs (no biotin interference)
There will be lots of GC bonds which means strong
3-hydrogen bonds
38
Wu and Irizarry report spike in yeast controls on
a human chip.
This measures non-specific hybridisation directly
Theory is comparable to experiment
Not as clean as Naef
Many unchanging genes do not express!
39
Wu and Irizarry (2004) have written GCRMA (which
is available now in Bioconductor)
As theory is comparable to experiment, it can be
used estimate the intrinsic stochastic
uncertainty of the hybridisation process
Lots of close sequences will hybridise to a given
probe. Wu and Irizarry model the variation in
hybridisation of these similar processes using a
statistical model.
GCRMA determines the contribution to the PM from
Signal and from Non-Specific Hybridisation
40
GCRMA suggests that many probes on the chip do
not detect signal.
41
GCRMA produces a good linear relationship between
intensity and concentration
42
Standard deviation of fold change as a function
of intensity
GCRMA is more noisy than RMA because each PM has
a noisy cross-hybridisation subtraction
GCRMA
RMA
MAS
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
GCRMA makes the global properties of chips much
more comparable. In particular, it is much better
than RMA at removing genes with little emission
over and above the non-specific hybridisation.
GCRMA produces a linear relationship between
light and transcript to much lower concentrations.
The subtraction of cross-hybridisation adds to
the noise. However, this noise is much lower than
MAS at low-middle concentrations
47
Can the algorithms be improved further?
48
Hekstra et al. (2003) show that Affymetrix chips
follow Langmuir adsorption isotherms i.e. they
chemically saturate at large concentrations in a
well understood manner.
The affinities show a slight kink, suggesting
they can be improved by including saturation
effects
49
The corrections are for non-specific
hybridisation, yet some probes will be prone to
specific cross hybridisation from other genes -
see talk by Eric
Outliers will need to be found and removed
50
Comparing the probes in a biological replicate
Even after using GCRMA the variation does not
look random at low intensities. It looks like
there is still a systematic bias, or there
remains a background contribution to the PM signal
51
There appear to be two populations of probes
D-detected
U-undetected
At present, expression measures (GCRMA, RMA, MAS)
combine all the probes within a probeset
D
U
Should all the probes below the peak in variance
be ignored?
52
Can we do better on the image processing?
Affymetrix data
53
Gung-Ho Conclusions
The calibration of Affymetrix chips is a very
active and quickly evolving research area. All
the references in my talk are from 2003 or later!
GCRMA seems to have all the properties you would
expect from a correct calibration protocol. It is
available NOW in Bioconductor for FREE and will
help biologists and analysts.
54
Affymetrix calibration requires bioinformatix,
physix and statistix to work (and live) in
harmony.
55
Computer Scientists
56
Dadda (yesterday)
57
Quantile normalisation assumes the chips have the
same underlying distribution of intensities. For
some experiments, this is not the case (and what
if you wish to compare 1000 chips?)
Write a Comment
User Comments (0)
About PowerShow.com