Title: Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li
1Bias, Variance, and Fit for Three Measures of
Expression AvDiff, Li Wongs, and AvLog(PM-BG)
- Rafael A. Irizarry
- Department of Biostatistics, JHU
- (joint work with Bridget Hobbs and Terry Speed,
- Walter Eliza Hall Institute of Medical Research)
2Summary
- Summarize the expression level of a probe set by
Average Log2 (PM-BG) - PMs need to be normalized
- Background makes no use of probe-specific MM
- Evaluate and compare through bias, variance and
model fit to AvDiff and the Li Wong algorithm - Use Gene Logic spike-in and dilution study
- All three expression measures performed well
- AvLog(PM-BG) is arguably the best of the three
3SD vs. Avg of Defective Probes
4Normalization at Probe Level
5Spike-In Experiments
- Add concentrations (0.5pM 100 pM) of 11 foreign
species cRNAs to hybridization mixture - Set A 11 control cRNAs were spiked in, all at
the same concentration, which varied across
chips. - Set B 11 control cRNAs were spiked in, all at
different concentrations, which varied across
chips. The concentrations were arranged in 12x12
cyclic Latin square (with 3 replicates)
6Set A Probe Level Data (12 chips)
7What Did We Learn?
- Dont subtract or divide by MM
- Probe effect is additive on log scale
- Take logs
8Why Remove Background?
9Background Distribution
10Average Log2(PM-BG)
- Normalize probe level data
- Compute BG background mean by estimating the
mode of the MM distribution - Subtract BG from each PM
- If PM-BG lt 0 use minimum of positives divided by
2 - Take average
11Expression after Normalization
12Expression Level Comparison
13Spike-In B
Probe Set Conc 1 Conc 2 Rank
BioB-5 100 0.5 1
BioB-3 0.5 25.0 2
BioC-5 2.0 75.0 4
BioB-M 1.0 37.5 4
BioDn-3 1.5 50.0 5
DapX-3 35.7 3.0 6
CreX-3 50.0 5.0 7
CreX-5 12.5 2.0 8
BioC-3 25.0 100 9
DapX-5 5.0 1.5 10
DapX-M 3.0 1.0 11
Later we consider 23 different combinations of
concentrations
14Differential Expression
15Differential Expression
16Differential Expression
17Differential Expression
18Observed Ranks
Gene AvDiff MAS 5.0 LiWong AvLog(PM-BG)
BioB-5 6 2 1 1
BioB-3 16 1 3 2
BioC-5 74 6 2 5
BioB-M 30 3 7 3
BioDn-3 44 5 6 4
DapX-3 239 24 24 7
CreX-3 333 73 36 9
CreX-5 3276 33 3128 8
BioC-3 2709 8572 681 6431
DapX-5 2709 102 12203 10
DapX-M 165 19 13 6
Top 15 1 5 6 10
19Observed vs True Ratio
20Dilution Experiment
- cRNA hybridized to human chip (HGU95) in range of
proportions and dilutions - Dilution series begins at 1.25 ?g cRNA per
GeneChip array, and rises through 2.5, 5.0, 7.5,
10.0, to 20.0 ?g per array. 5 replicate chips
were used at each dilution - Normalize just within each set of 5 replicates
- For each probe set compute expression, average
and SD over replicates, and fit a line to - log expression vs. log concentration
- Regression line should have slope 1 and high R2
21Dilution Experiment Data
22Expression and SD
23Slope Estimates and R2
24Model check
- Compute observed SD of 5 replicate expression
estimates - Compute RMS of 5 nominal SDs
- Compare by taking the log ratio
- Closeness of observed and nominal SD taken as a
measure of goodness of fit of the model
25Observed vs. Model SE
26Observed vs. Model SE
27Conclusion
- Take logs
- PMs need to be normalized
- Using global background improves on use of
probe-specific MM - Gene Logic spike-in and dilution study show all
three expression measures performed very well - AvLog(PM-BG) is arguably the best in terms of
bias, variance and model fit - Future better BG robust/resistant summaries
28Acknowledgements
- Gene Browns group at Wyeth/Genetics Institute,
and Uwe Scherfs Genomics Research Development
Group at Gene Logic, for generating the spike-in
and dilution data - Gene Logic for permission to use these data
- Francois Collin (Gene Logic)
- Ben Bolstad (UC Berkeley)
- Magnus Åstrand (Astra Zeneca Mölndal)