Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li

Description:

SD vs. Avg of Defective Probes. Normalization at Probe Level. Expression after Normalization ... and rises through 2.5, 5.0, 7.5, 10.0, to 20.0 g per array. ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 25
Provided by: iriz8
Category:

less

Transcript and Presenter's Notes

Title: Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li


1
Bias, Variance, and Fit for Three Measures of
Expression AvDiff, Li Wongs, and AvLog(PM-BG)
  • Rafael A. Irizarry
  • Department of Biostatistics, JHU
  • (joint work with Bridget Hobbs and Terry Speed,
  • Walter Eliza Hall Institute of Medical Research)

2
Summary
  • Summarize the expression level of a probe set by
    Average Log2 (PM-BG)
  • PMs need to be normalized
  • Background makes no use of probe-specific MM
  • Evaluate and compare through bias, variance and
    model fit to AvDiff and the Li Wong algorithm
  • Use Gene Logic spike-in and dilution study
  • All three expression measures performed well
  • AvLog(PM-BG) is arguably the best of the three

3
SD vs. Avg of Defective Probes
4
Normalization at Probe Level
5
Expression after Normalization
6
Background Distribution
7
Average Log2(PM-BG)
  • Normalize probe level data
  • Compute BG background mean by estimating the
    mode of the MM distribution
  • Subtract BG from each PM
  • If PM-BG lt 0 use minimum of positives divided by
    2
  • Take average

8
Spike-In Experiments
  • Add concentrations (0.5pM 100 pM) of 11 foreign
    species cRNAs to hybridization mixture
  • Set A 11 control cRNAs were spiked in, all at
    the same concentration, which varied across
    chips.
  • Set B 11 control cRNAs were spiked in, all at
    different concentrations, which varied across
    chips. The concentrations were arranged in 12x12
    cyclic Latin square (with 3 replicates)

9
Why Remove Background?
10
Probe Level Data (12 chips)
11
What Did We Learn?
  • Dont subtract or divide by MM
  • Probe effect is additive on log scale
  • Take logs

12
Expression Level
13
Spike-In B
Gene Conc 1 Conc 2 Rank
BioB-5 100 0.5 1
BioB-3 0.5 25.0 2
BioC-5 2.0 75.0 3
BioB-M 1.0 35.7 4
BioDn-3 1.5 50.0 5
DapX-3 35.7 3.0 6
CreX-3 50.0 5.0 7
CreX-5 12.5 2.0 8
BioC-3 25.0 100 9
DapX-5 5.0 1.5 10
DapX-M 3.0 1.0 11
Later we consider 24 different combinations of
concentrations
14
Differential Expression
15
Observed vs True Ratio
16
Dilution Experiment
  • cRNA hybridized to human chip (HGU_95) in range
    of proportions and dilutions
  • Dilution series begins at 1.25 ?g cRNA per
    GeneChip array, and rises through 2.5, 5.0, 7.5,
    10.0, to 20.0 ?g per array. 5 replicate chips
    were used at each dilution
  • Normalize just within each set of 5 replicates
  • For each probe set compute expression, average
    and SD over replicates, and fit a line to
  • log expression vs. log concentration
  • Regression line should have slope 1 and high R2

17
Dilution Experiment Data
18
Expression and SD
19
Slope Estimates and R2
20
Model check
  • Compute observed SD of 5 replicate expression
    estimates
  • Compute RMS of 5 nominal SDs
  • Compare by taking the log ratio
  • Closeness of observed and nominal SD taken as a
    measure of goodness of fit of the model

21
Observed vs. Model SE
22
Observed vs. Model SE
23
Conclusion
  • Take logs
  • PMs need to be normalized
  • Using global background improves on use of
    probe-specific MM
  • Gene Logic spike-in and dilution study show all
    three expression measures performed very well
  • AvLog(PM-BG) is arguably the best in terms of
    bias, variance and model fit
  • Future better BG robust/resistant summaries

24
Acknowledgements
  • Gene Browns group at Wyeth/Genetics Institute,
    and Uwe Scherfs Genomics Research Development
    Group at Gene Logic, for generating the spike-in
    and dilution data
  • Gene Logic for permission to use these data
  • Francois Collin (Gene Logic)
  • Ben Bolstad (UC Berkeley)
  • Magnus Åstrand (Astra Zeneca Mölndal)
Write a Comment
User Comments (0)
About PowerShow.com