Statistical Analyses of High Density Oligonucleotide Arrays presentation

About This Presentation

Transcript and Presenter's Notes

Title: Statistical Analyses of High Density Oligonucleotide Arrays

1
Statistical Analyses of High Density
Oligonucleotide Arrays

Rafael A. Irizarry
Department of Biostatistics, JHU
(joint work with Bridget Hobbs and Terry Speed,
Walter Eliza Hall Institute of Medical Research
and Francois Collin,Gene Logic)
http//biosun01.biostat.jhsph.edu/ririzarr

2
Summary

Review of technology
Data exploration
Probe level summaries (expression measures)
Normalization
Evaluate and compare through bias, variance and
model fit to 4 expression measures
Use Gene Logic spike-in and dilution study
Conclusion/future work

3
Probe Arrays
Hybridized Probe Cell
GeneChip Probe Array
Single stranded, labeled RNA target
Oligonucleotide probe
24µm
Millions of copies of a specific oligonucleotide
probe
1.28cm
gt200,000 different complementary probes
Image of Hybridized Probe Array
Compliments of D. Gerhold
4
PM MM
5
Data and Notation

PMijn , MMijn Intensity for perfect/mis-match
probe cell j, in chip i, in gene n
i 1,, I (ranging from 1 to hundreds)
j1,, J (usually 16 or 20)
n 1,, N (between 8,000 and 12,000)

6
The Big Picture

Summarize 20 PM,MM pairs (probe level data) into
one number for each gene
We call this number an expression measure
Affymetrix GeneChips Software uses AvDiff as
expression measure
Does it work? Can it be improved?

7
What is the evidence?

Lockhart et. al. Nature
Biotechnology 14 (1996)

8
Competing Measures of Expression

GeneChip software uses Avg.diff
with A a set of suitable pairs chosen by
software.
Log ratio version is also used.
For differential expression Avg.diffs are
compared between chips.

9
Competing Measures of Expression

GeneChip new version uses something else
with MM a version of MM that is never bigger
than PM.

10
Competing Measures of Expression

Li and Wong fit a model
Consider expression in chip i
Efron et. al. consider log PM 0.5 log MM
Another is second largest PM

11
Competing Measures of Expression

Why not stick to what has worked for cDNA?
with A a set of suitable pairs.

12
Features of Probe Level Data
13
SD vs. Avg of Defective Probes
14
ANOVA Strong probe effect5 times bigger than
gene effect
15
Histograms of log2(PM/MM) stratifies by
log2(PMxMM)/2 for mouse chip for defective and
normal probe
16
Normalization at Probe Level
17
Spike-In Experiments

Set A 11 control cRNAs were spiked in, all at
the same concentration, which varied across
chips.
Set B 11 control cRNAs were spiked in, all at
different concentrations, which varied across
chips. The concentrations were arranged in 12x12
cyclic Latin square (with 3 replicates)

18
Set A Probe Level Data (12 chips)
19
What Did We Learn?

Dont subtract or divide by MM
Probe effect is additive on log scale
Take logs

20
Why Remove Background?
21
Background Distribution
22
Average Log2(PM-BG)

Normalize probe level data
Compute BG background mean by estimating the
mode of the MM distribution
Subtract BG from each PM
If PM-BG lt 0 use minimum of positives divided by
2
Take average

23
Expression after Normalization
24
Expression Level Comparison
25
Spike-In B
Later we consider 23 different combinations of
concentrations
26
Differential Expression
27
Differential Expression
28
Differential Expression
29
Differential Expression
30
Observed Ranks
31
Observed vs True Ratio
32
Dilution Experiment

cRNA hybridized to human chip (HGU95) in range of
proportions and dilutions
Dilution series begins at 1.25 ?g cRNA per
GeneChip array, and rises through 2.5, 5.0, 7.5,
10.0, to 20.0 ?g per array. 5 replicate chips
were used at each dilution
Normalize just within each set of 5 replicates
For each probe set compute expression, average
and SD over replicates, and fit a line to
log expression vs. log concentration
Regression line should have slope 1 and high R2

33
Dilution Experiment Data
34
Expression and SD
35
Slope Estimates and R2
36
Model check

Compute observed SD of 5 replicate expression
estimates
Compute RMS of 5 nominal SDs
Compare by taking the log ratio
Closeness of observed and nominal SD taken as a
measure of goodness of fit of the model

37
Observed vs. Model SE
38
Observed vs. Model SE
39
Conclusion

Take logs
PMs need to be normalized
Using global background improves on use of
probe-specific MM
Gene Logic spike-in and dilution study show all
four expression measures performed very well
AvLog(PM-BG) is arguably the best in terms of
bias, variance and model fit
Future better BG robust/resistant summaries

40
Acknowledgements

Gene Browns group at Wyeth/Genetics Institute,
and Uwe Scherfs Genomics Research Development
Group at Gene Logic, for generating the spike-in
and dilution data
Gene Logic for permission to use these data
Ben Bolstad (UC Berkeley)
Magnus Åstrand (Astra Zeneca Mölndal)
Skip Garcia, Tom Cappola, and Joshua Hare (JHU)

Write a Comment

User Comments (0)

About PowerShow.com

Statistical Analyses of High Density Oligonucleotide Arrays PowerPoint PPT Presentation