Assessing Differential Expression in Mixtures of Cell Types - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Assessing Differential Expression in Mixtures of Cell Types

Description:

Statisctics of genes for which expression of class1 = class 2. What ... Gordon Smyth, Dileepa Diyagama, Andrew Holloway. from the WEHI (Melbourne) for the data ... – PowerPoint PPT presentation

Number of Views:119

Avg rating:3.0/5.0

Slides: 24

Provided by: jrgenmar

Category:

more less

Transcript and Presenter's Notes

Title: Assessing Differential Expression in Mixtures of Cell Types

1
Assessing Differential Expression in Mixtures of
Cell Types

Achim Tresch

2
Statistical Testing in a Nutshell
Question / Hypothesis Is the expression of gene g
in cell type 1 higher than in cell type 2?
Data Expression of gene g in several
samples(absolute scale)
3
What is a good Statistic?
Probability density
good
Statisctics of genes for which expression of
class1 class 2
Statisctics of genes for which expression of
class1 gt class 2
0
good statistic
Probability density
poor
0
poor statistic
4
What is a good Statistic?
Sensitivity
(Proportion of truly higher expressed genes
which were found among the truly higher
expressed genes)
Type I error
(Proportion of non-higher expressed genes that
are erroneously found)
Genes that are foundby the statistics
threshold
5
What is a good Statistic?
Sensitivity
Type I error
Genes that are foundby the statistics
threshold
6
What is a good Statistic?
Sensitivity
1
Sensitivity
0
1
Type I error
Type I error
1
Sensitivity
0
1
Type I error
threshold
7
What is a good Statistic?
Sensitivity
1
Sensitivity
0
1
Type I error
Type I error
1
Sensitivity
0
1
Type I error
threshold
8
Receiver Operating Characteristic (ROC)
ROC curve
1
good
Sensitivity
0
1
Type I error
ROC curve
1
poor
Sensitivity
0
1
Type I error
9
Statistics for the Detection of Differential Gene
Expression
Bad Idea Subtract the estimated sample means
Problem d is not scale invariant
10
Problems in Mixed Cell Samples
Varying cell proportions lead to varying
observations
Examples

Tumor biopsies with varying proportions of tumor
tissue in each sample. An estimate of the tumor
tissue proportion is provided by the pathologist.
RNAi experiments in which transfection efficiency
(resp. knockdown efficiency) is substantially
below 100. The proportion of cells in which RNAi
works is estimated via TaqMan analysis of the
target RNA.

11
Problems in Mixed Cell Samples
Let S be a collection of samples. Let xs be the
relative proportion of cells of type 1 in this
sample s.
Then,
Expression
12
Connections to Linear Regression
Idea Observe that can be estimated as the
slope of the linear fit of the data
Though formally identical to the t-statistic, the
difference is in the calculation of d.
13
Ingredients for the Calculation of d Shrinkage
estimation of d using penalized regression
The standard regression estimate for d is the
minimizer of the quadratic loss
measurement
linear fit
A way to make linear regression more robust is to
add a linear penalty term for d (Lasso,
R.Tibshirani 1998)
Here, ? is the so-called shrinkage parameter. It
is comparable to s0 in the SAM-statistics and
avoids overfitting.
How do we choose ??
14
Ingredients for the Calculation of d Selection
of the shrinkage parameter ?
Idea Taking a Bayesian view, d can be derived as
the mode of a posterior distribution
15
Ingredients for the Calculation of d Selection
of the shrinkage parameter ?
If we assume that the entries in d follow a
Laplacew,0 distribution, then ths shrinkage
parameter ? can be derived from the shape
parameter of this distribution as ?
1/(2w) Apply a Kolmogorov-Smirnov test to justify
this assumption and fit the shape parameter w.
16
Ingredients for the Calculation of d Adjustment
to heteroscedasticity of the measurements
Variance depends upon expression intensity
Use the error model proposed in vsn, use vsn to
estimate the individual variances.
Expression
Perform a weighted penalized linear regression
instead of a simple linear regression for d.
17
Ingredients for the Calculation of d The
Two-Component Model
(Taken from W.Huber)
measured intensity offset gain ?
true abundance
A robust fitting method for the estimation of the
parameters ai ,bi ,s1 ,s2 has been developed by
W.Huber and A.v.Heydebreck. It has been
implemented in the R package vsn.
18
The final algorithm

Estimate the empirical distribution of the
entries of d by a simple linear regression
Fit a Laplace distribution to obtain the
shrinkage parameter ?
Estimate the individual measurement variances by
the vsn procedure, determine regression weights
ws
For each gene, calculate the t-statistics

19
Validation by Simulation (1)

Data generation
Fix all gene expression values in class 1 to one
value
Alter half of the genes in class 2 by some
constant value
Add normally distributed noise.

20
Validation by Simulation (2)

Data generation
Draw from a log-normal distribution
Draw a fold change vector f by
Calculate as the product of with the
fold change vetor
Generate 8 samples with mixture proportions
evenly distributed in 0,1
Add normally distributed noise

21
Validation by Simulation (2)
ROC curve
Predictive power
t
SAM
(fraction of truly higher expressed genes among
found genes)
t
Sensitivity
Positive predictive value PPV
Type I error
found genes
22
Validation by simulated data (3)

Generate 16 mixture samples as in example (2)
Calculate the statistics for 4 configurations

23
Outlook

Prove superiority over method proposed by Ghosh
Accomplish the paper
Write R software package
Apply methodology to RNAi data

Thanks to ...

Tim Beissbarth for data acquisition and
preprocessing
Andreas Buness, Markus Ruschhaupt for helpful
discussions
Gordon Smyth, Dileepa Diyagama, Andrew Holloway
from the WEHI (Melbourne) for the data
Annemarie Poustka, Holger Sültmann

Write a Comment

User Comments (0)