Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data

About This Presentation

Title:

Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data

Description:

Exploration, Normalization, and Summaries of High Density ... Normalize (quantile normalization) Assume additive model: Estimate ai using robust method ... – PowerPoint PPT presentation

Number of Views:123

Avg rating:3.0/5.0

Slides: 38

Provided by: iriz

Learn more at: https://www.biostat.jhsph.edu

Category:

more less

Transcript and Presenter's Notes

Title: Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data

1
Exploration, Normalization, and Summaries of High
Density Oligonucleotide Array Probe Level Data

Rafael A. Irizarry
Department of Biostatistics, JHU
(joint work with Leslie Cope, Ben Bolstad,
Francois Collin, Bridget Hobbs, and Terry Speed)
http//biosun01.biostat.jhsph.edu/ririzarr

2
Summary

Review of technology
Probe level summaries
Normalization
Assess technology and expression measures
Conclusion/future work

3
Probe Arrays
Hybridized Probe Cell
GeneChip Probe Array
Single stranded, labeled RNA target
Oligonucleotide probe
24µm
Millions of copies of a specific oligonucleotide
probe
1.28cm
gt200,000 different complementary probes
Image of Hybridized Probe Array
Compliments of D. Gerhold
4
PM MM
5
Data and Notation

PMijn , MMijn Intensity for perfect/mis-match
probe cell j, in chip i, in gene n
i 1,, I (ranging from 1 to hundreds)
j1,, J (usually 16 or 20)
n 1,, N (between 8,000 and 12,000)

6
The Big Picture

Summarize 20 PM,MM pairs (probe level data) into
one number for each gene
We call this number an expression measure
Affymetrix GeneChips Software has defaults.
Does it work? Can it be improved?

7
What is the evidence?

Lockhart et. al. Nature
Biotechnology 14 (1996)

8
Competing Measures of Expression

GeneChip software uses Avg.diff
with A a set of suitable pairs chosen by
software.
Log ratio version is also used.
For differential expression Avg.diffs are
compared between chips.

9
Competing Measures of Expression

GeneChip new version uses something else
with MM a version of MM that is never bigger
than PM.

10
Competing Measures of Expression

Li and Wong fit a model
Consider expression in chip i
Efron et. al. consider log PM 0.5 log MM
Another is second largest PM

11
Competing Measures of Expression

Why not stick to what has worked for cDNA?
with A a set of suitable pairs.

12
Features of Probe Level Data
13
SD vs. Avg
14
ANOVA Strong probe effect5 times bigger than
gene effect
15
Normalization at Probe Level
16
Spike-In Experiments

Set A 11 control cRNAs were spiked in, all at
the same concentration, which varied across
chips.
Set B 11 control cRNAs were spiked in, all at
different concentrations, which varied across
chips. The concentrations were arranged in 12x12
cyclic Latin square (with 3 replicates)

17
Set A Probe Level Data
18
What Did We Learn?

Dont subtract or divide by MM
Probe effect is additive on log scale
Take logs

19
Why Remove Background?
20
Background Distribution
21
RMA

Background correct PM
Normalize (quantile normalization)
Assume additive model
Estimate ai using robust method

22
Spike-In B
Probe Set Conc 1 Conc 2 Rank
BioB-5 100 0.5 1
BioB-3 0.5 25.0 2
BioC-5 2.0 75.0 4
BioB-M 1.0 37.5 4
BioDn-3 1.5 50.0 5
DapX-3 35.7 3.0 6
CreX-3 50.0 5.0 7
CreX-5 12.5 2.0 8
BioC-3 25.0 100 9
DapX-5 5.0 1.5 10
DapX-M 3.0 1.0 11
Later we consider 23 different combinations of
concentrations
23
Differential Expression
24
Differential Expression
25
Differential Expression
26
Differential Expression
27
Observed Ranks
Gene AvDiff MAS 5.0 LiWong AvLog(PM-BG)
BioB-5 6 2 1 1
BioB-3 16 1 3 2
BioC-5 74 6 2 5
BioB-M 30 3 7 3
BioDn-3 44 5 6 4
DapX-3 239 24 24 7
CreX-3 333 73 36 9
CreX-5 3276 33 3128 8
BioC-3 2709 8572 681 6431
DapX-5 2709 102 12203 10
DapX-M 165 19 13 6
Top 15 1 5 6 10
28
Observed vs True Ratio
29
Dilution Experiment

cRNA hybridized to human chip (HGU95) in range of
proportions and dilutions
Dilution series begins at 1.25 ?g cRNA per
GeneChip array, and rises through 2.5, 5.0, 7.5,
10.0, to 20.0 ?g per array. 5 replicate chips
were used at each dilution
Normalize just within each set of 5 replicates
For each probe set compute expression, average
and SD over replicates

30
Dilution Experiment Data
31
Expression
32
SD
33
Log Scale SD
34
Model check

Compute observed SD of 5 replicate expression
estimates
Compute RMS of 5 nominal SDs
Compare by taking the log ratio
Closeness of observed and nominal SD taken as a
measure of goodness of fit of the model

35
Observed vs. Model SE
36
Conclusion

Take logs
PMs need to be normalized
Using global background improves on use of
probe-specific MM
Gene Logic spike-in and dilution study show
technology works well
RMA is arguably the best summary in terms of
bias, variance and model fit
Future What stastistic should we use to rank?

37
Acknowledgements

Gene Browns group at Wyeth/Genetics Institute,
and Uwe Scherfs Genomics Research Development
Group at Gene Logic, for generating the spike-in
and dilution data
Gene Logic for permission to use these data
Magnus Åstrand (Astra Zeneca Mölndal)
Skip Garcia, Tom Cappola, and Joshua Hare (JHU)

Write a Comment

User Comments (0)

About PowerShow.com

Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data - PowerPoint PPT Presentation

Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data

Exploration, Normalization, and Summaries of High Density ... Normalize (quantile normalization) Assume additive model: Estimate ai using robust method ... – PowerPoint PPT presentation