Title: Signal Detection, Estimate of Fold Change
1Signal Detection,Estimate of Fold Change
Differential Expressionin Affymetrix GeneChips
2- Each GeneChip measures 12 000 genes, each gene
represented by 11 to 20 PM and MM oligo
probes scattered across chip. - Intensity data from each chip is stored in a .cel
file containing an intensity value at each site - Bioconductor MAS software uses information in
.cdf file to interpret this as an intensity value
for each PM and each MM probe.
3- Given a set of typically 16 PM and MM intensity
values (number of replicate chips in expt.), how
can we obtain a measure of mRNA expression for a
given gene? - Either as an absolute mRNA concentration
- Or a relative change in mRNA concentration
between treatments
4- (Absolute concentration) no easy answer
- (Relative expression between treatments)
Expression measures such as - MAS5
- RMA
- Li-Wong
- can be useful.
- Bioconductor provides inbuilt functions for these
measures
5MAS5 (MicroArray Suite v5)
if
where
something lt PM otherwise
2. Tukey biweight average of logged Vs within
probeset (summarisation)
SignalLogValue
6- 3. Optional scaling factor
4. Final output is
Reported value of ith probeset
7RMA (Robust Microarray Average)
Irizarry et al. Biostatistics, 4 (2003) 249-264
1. Background Correction
Subtract from PMs a probe specific background
correction using a model based on observed
intensity being the sum of (exponential) signal
(normal) noise.
- 2. Quantile normalisation
Assuming multiple replicates of each experiment,
this adjusts intensities so that the
distribution of intensities is the same for all
chips within set of replicates.
84. Average across the 16 probes in probeset using
median polish summarisation
i.e., fit to model
is the required measure
9Affymetrix Latin Square experiment
- 14 genes spiked at cyclic permutations of the 14
concentrations (0, 0.25, 0.5, 1, ,1024) pM
- into background of human pancreas cRNA
- Hybridised onto 14 arrays
- 3 replicates of experiment
10GENES
CHIPS
11Raw data from .cel files
Affy spike-in experiment Gene 37777_at Red
PM Black MM
12a b c d e f g h i j k l m
n q
rma mas5 conc
(log scale)
13a b c d e f g h i j k l m
n q
rma log2(mas5) conc (log
scale)
14a b c d e f g h i j k l m
n q
rma log2(mas5) conc (log
scale)
15Gene 37777_at
Background
64 pM
Saturation
1 pM
16(No Transcript)
17Lesson Expression measures underestimate fold
change!
18a b c d e f g h i j k l m
n q
rma mas5 conc
(log scale)
19GENES
CHIPS
20rma mas5 conc
(log scale)
21rma mas5 conc
(log scale)
22GeneLogic Dilution/Mixture study
- 20 µg/200 ml solutions of liver and central
nervous system cell line cRNA diluted to samples
of (20, 10, 7.5, 5, 2.5, 1.25) µg/200 ml - Hybridised onto Affymetrix HG_U95av2 chips
- 5 replicates of each dilution/mixture
23(No Transcript)
24- M log (liver 20µg) log (liver 1.25µg)
- A log (liver 20µg) log (liver 1.25µg)
25M log (CNS 10µg) log (liver 10µg) A
log (CNS 10µg) log (liver 10µg)
26Pairwise comparisons of M log (CNS 10µg)
log (liver 10µg)
27Langmuir Adsorption Model
- Assume
- (Adsorption) Target mRNA attaches to probes at a
rate proportional to concentration of specific
target mRNA and fraction of unoccupied probes - (Desorption) Target mRNA detaches from probes at
a rate proportional to fraction of occupied
probes - ? At equilibrium, intensity I(x) at target
concentration x follows Langmuir Isotherm
28(No Transcript)
29Raw data from .cel files
Affy spike-in experiment Gene 37777_at Red
PM Black MM
30Raw data from .cel files
Affy spike-in experiment Gene 37777_at Red
PM Black MM
31Raw data from .cel files
Affy spike-in experiment Gene 37777_at Red
PM Black MM
32- Individual probes have very different responses
depending on their nucleotide sequence! - Temperature, pH, wafer effects, time to reach
equilibrium etc. also important - Role (and usefulness) of MMs is not clear
- - The problem of extracting absolute
concentration values from .cel file data still
not solved
33Summary
- Expression measures such as MAS5, RMA, Li-Wong
provide measures of relative concentrations of
same gene under different treatments
34Summary
- Expression measures such as MAS5, RMA, Li-Wong
provide measures of relative concentrations of
same gene under different treatments - But not different genes within same treatment
35Summary
- Expression measures such as MAS5, RMA, Li-Wong
provide measures of relative concentrations of
same gene under different treatments - But not different genes within same treatment
- Fold change underestimated at high concentrations
(saturation) or low concentrations (background
and random noise)
36Summary
- Expression measures such as MAS5, RMA, Li-Wong
provide measures of relative concentrations of
same gene under different treatments - But not different genes within same treatment
- Fold change underestimated at high concentrations
(saturation) or low concentrations (background
and random noise) - Normalisation (e.g. quantile normalisation in
RMA) should only be used within a set of
replicates
37Summary
- Expression measures such as MAS5, RMA, Li-Wong
provide measures of relative concentrations of
same gene under different treatments - But not different genes within same treatment
- Fold change underestimated at high concentrations
(saturation) or low concentrations (background
and random noise) - Normalisation (e.g. quantile normalisation in
RMA) should only be used within a set of
replicates - Scaling (e.g. MAS5) assumes same total target
concentration between replicates/treatments
38Summary
- Expression measures such as MAS5, RMA, Li-Wong
provide measures of relative concentrations of
same gene under different treatments - But not different genes within same treatment
- Fold change underestimated at high concentrations
(saturation) or low concentrations (background
and random noise) - Normalisation (e.g. quantile normalisation in
RMA) should only be used within a set of
replicates - Scaling (e.g. MAS5) assumes same total target
concentration between replicates/treatments - There is no reliable measure of absolute target
concentration from intensity data