Statistical Analyses of High Density Oligonucleotide Arrays - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Statistical Analyses of High Density Oligonucleotide Arrays

Description:

... Hobbs and Terry Speed, Walter & Eliza Hall Institute of Medical ... Magnus strand (Astra Zeneca M lndal) Skip Garcia, Tom Cappola, and Joshua Hare (JHU) ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 41
Provided by: biosun01B
Category:

less

Transcript and Presenter's Notes

Title: Statistical Analyses of High Density Oligonucleotide Arrays


1
Statistical Analyses of High Density
Oligonucleotide Arrays
  • Rafael A. Irizarry
  • Department of Biostatistics, JHU
  • (joint work with Bridget Hobbs and Terry Speed,
    Walter Eliza Hall Institute of Medical Research
    and Francois Collin,Gene Logic)
  • http//biosun01.biostat.jhsph.edu/ririzarr

2
Summary
  • Review of technology
  • Data exploration
  • Probe level summaries (expression measures)
  • Normalization
  • Evaluate and compare through bias, variance and
    model fit to 4 expression measures
  • Use Gene Logic spike-in and dilution study
  • Conclusion/future work

3
Probe Arrays
Hybridized Probe Cell
GeneChip Probe Array
Single stranded, labeled RNA target
Oligonucleotide probe
24µm
Millions of copies of a specific oligonucleotide
probe
1.28cm
gt200,000 different complementary probes
Image of Hybridized Probe Array
Compliments of D. Gerhold
4
PM MM
5
Data and Notation
  • PMijn , MMijn Intensity for perfect/mis-match
  • probe cell j, in chip i, in gene n
  • i 1,, I (ranging from 1 to hundreds)
  • j1,, J (usually 16 or 20)
  • n 1,, N (between 8,000 and 12,000)

6
The Big Picture
  • Summarize 20 PM,MM pairs (probe level data) into
    one number for each gene
  • We call this number an expression measure
  • Affymetrix GeneChips Software uses AvDiff as
    expression measure
  • Does it work? Can it be improved?

7
What is the evidence?
  • Lockhart et. al. Nature
    Biotechnology 14 (1996)

8
Competing Measures of Expression
  • GeneChip software uses Avg.diff
  • with A a set of suitable pairs chosen by
    software.
  • Log ratio version is also used.
  • For differential expression Avg.diffs are
    compared between chips.

9
Competing Measures of Expression
  • GeneChip new version uses something else
  • with MM a version of MM that is never bigger
    than PM.

10
Competing Measures of Expression
  • Li and Wong fit a model
  • Consider expression in chip i
  • Efron et. al. consider log PM 0.5 log MM
  • Another is second largest PM

11
Competing Measures of Expression
  • Why not stick to what has worked for cDNA?
  • with A a set of suitable pairs.

12
Features of Probe Level Data
13
SD vs. Avg of Defective Probes
14
ANOVA Strong probe effect5 times bigger than
gene effect
15
Histograms of log2(PM/MM) stratifies by
log2(PMxMM)/2 for mouse chip for defective and
normal probe
16
Normalization at Probe Level
17
Spike-In Experiments
  • Set A 11 control cRNAs were spiked in, all at
    the same concentration, which varied across
    chips.
  • Set B 11 control cRNAs were spiked in, all at
    different concentrations, which varied across
    chips. The concentrations were arranged in 12x12
    cyclic Latin square (with 3 replicates)

18
Set A Probe Level Data (12 chips)
19
What Did We Learn?
  • Dont subtract or divide by MM
  • Probe effect is additive on log scale
  • Take logs

20
Why Remove Background?
21
Background Distribution
22
Average Log2(PM-BG)
  • Normalize probe level data
  • Compute BG background mean by estimating the
    mode of the MM distribution
  • Subtract BG from each PM
  • If PM-BG lt 0 use minimum of positives divided by
    2
  • Take average

23
Expression after Normalization
24
Expression Level Comparison
25
Spike-In B
Later we consider 23 different combinations of
concentrations
26
Differential Expression
27
Differential Expression
28
Differential Expression
29
Differential Expression
30
Observed Ranks
31
Observed vs True Ratio
32
Dilution Experiment
  • cRNA hybridized to human chip (HGU95) in range of
    proportions and dilutions
  • Dilution series begins at 1.25 ?g cRNA per
    GeneChip array, and rises through 2.5, 5.0, 7.5,
    10.0, to 20.0 ?g per array. 5 replicate chips
    were used at each dilution
  • Normalize just within each set of 5 replicates
  • For each probe set compute expression, average
    and SD over replicates, and fit a line to
  • log expression vs. log concentration
  • Regression line should have slope 1 and high R2

33
Dilution Experiment Data
34
Expression and SD
35
Slope Estimates and R2
36
Model check
  • Compute observed SD of 5 replicate expression
    estimates
  • Compute RMS of 5 nominal SDs
  • Compare by taking the log ratio
  • Closeness of observed and nominal SD taken as a
    measure of goodness of fit of the model

37
Observed vs. Model SE
38
Observed vs. Model SE
39
Conclusion
  • Take logs
  • PMs need to be normalized
  • Using global background improves on use of
    probe-specific MM
  • Gene Logic spike-in and dilution study show all
    four expression measures performed very well
  • AvLog(PM-BG) is arguably the best in terms of
    bias, variance and model fit
  • Future better BG robust/resistant summaries

40
Acknowledgements
  • Gene Browns group at Wyeth/Genetics Institute,
    and Uwe Scherfs Genomics Research Development
    Group at Gene Logic, for generating the spike-in
    and dilution data
  • Gene Logic for permission to use these data
  • Ben Bolstad (UC Berkeley)
  • Magnus Åstrand (Astra Zeneca Mölndal)
  • Skip Garcia, Tom Cappola, and Joshua Hare (JHU)
Write a Comment
User Comments (0)
About PowerShow.com