Assessing Differential Expression in Mixtures of Cell Types - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Assessing Differential Expression in Mixtures of Cell Types

Description:

Statisctics of genes for which expression of class1 = class 2. What ... Gordon Smyth, Dileepa Diyagama, Andrew Holloway. from the WEHI (Melbourne) for the data ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 24
Provided by: jrgenmar
Category:

less

Transcript and Presenter's Notes

Title: Assessing Differential Expression in Mixtures of Cell Types


1
Assessing Differential Expression in Mixtures of
Cell Types





  • Achim Tresch

2
Statistical Testing in a Nutshell
Question / Hypothesis Is the expression of gene g
in cell type 1 higher than in cell type 2?
Data Expression of gene g in several
samples(absolute scale)
3
What is a good Statistic?
Probability density
good
Statisctics of genes for which expression of
class1 class 2
Statisctics of genes for which expression of
class1 gt class 2
0
good statistic
Probability density
poor
0
poor statistic
4
What is a good Statistic?
Sensitivity
(Proportion of truly higher expressed genes
which were found among the truly higher
expressed genes)
Type I error
(Proportion of non-higher expressed genes that
are erroneously found)
Genes that are foundby the statistics
threshold
5
What is a good Statistic?
Sensitivity
Type I error
Genes that are foundby the statistics
threshold
6
What is a good Statistic?
Sensitivity
1
Sensitivity
0
1
Type I error
Type I error
1
Sensitivity
0
1
Type I error
threshold
7
What is a good Statistic?
Sensitivity
1
Sensitivity
0
1
Type I error
Type I error
1
Sensitivity
0
1
Type I error
threshold
8
Receiver Operating Characteristic (ROC)
ROC curve
1
good
Sensitivity
0
1
Type I error
ROC curve
1
poor
Sensitivity
0
1
Type I error
9
Statistics for the Detection of Differential Gene
Expression
Bad Idea Subtract the estimated sample means
Problem d is not scale invariant
10
Problems in Mixed Cell Samples
Varying cell proportions lead to varying
observations
Examples
  • Tumor biopsies with varying proportions of tumor
    tissue in each sample. An estimate of the tumor
    tissue proportion is provided by the pathologist.
  • RNAi experiments in which transfection efficiency
    (resp. knockdown efficiency) is substantially
    below 100. The proportion of cells in which RNAi
    works is estimated via TaqMan analysis of the
    target RNA.

11
Problems in Mixed Cell Samples
Let S be a collection of samples. Let xs be the
relative proportion of cells of type 1 in this
sample s.
Then,
Expression
12
Connections to Linear Regression
Idea Observe that can be estimated as the
slope of the linear fit of the data
Though formally identical to the t-statistic, the
difference is in the calculation of d.
13
Ingredients for the Calculation of d Shrinkage
estimation of d using penalized regression
The standard regression estimate for d is the
minimizer of the quadratic loss
measurement
linear fit
A way to make linear regression more robust is to
add a linear penalty term for d (Lasso,
R.Tibshirani 1998)
Here, ? is the so-called shrinkage parameter. It
is comparable to s0 in the SAM-statistics and
avoids overfitting.
How do we choose ??
14
Ingredients for the Calculation of d Selection
of the shrinkage parameter ?
Idea Taking a Bayesian view, d can be derived as
the mode of a posterior distribution
15
Ingredients for the Calculation of d Selection
of the shrinkage parameter ?
If we assume that the entries in d follow a
Laplacew,0 distribution, then ths shrinkage
parameter ? can be derived from the shape
parameter of this distribution as ?
1/(2w) Apply a Kolmogorov-Smirnov test to justify
this assumption and fit the shape parameter w.
16
Ingredients for the Calculation of d Adjustment
to heteroscedasticity of the measurements
Variance depends upon expression intensity
Use the error model proposed in vsn, use vsn to
estimate the individual variances.
Expression
Perform a weighted penalized linear regression
instead of a simple linear regression for d.
17
Ingredients for the Calculation of d The
Two-Component Model
(Taken from W.Huber)
measured intensity offset gain ?
true abundance
A robust fitting method for the estimation of the
parameters ai ,bi ,s1 ,s2 has been developed by
W.Huber and A.v.Heydebreck. It has been
implemented in the R package vsn.
18
The final algorithm
  • Estimate the empirical distribution of the
    entries of d by a simple linear regression
  • Fit a Laplace distribution to obtain the
    shrinkage parameter ?
  • Estimate the individual measurement variances by
    the vsn procedure, determine regression weights
    ws
  • For each gene, calculate the t-statistics

19
Validation by Simulation (1)
  • Data generation
  • Fix all gene expression values in class 1 to one
    value
  • Alter half of the genes in class 2 by some
    constant value
  • Add normally distributed noise.

20
Validation by Simulation (2)
  • Data generation
  • Draw from a log-normal distribution
  • Draw a fold change vector f by
  • Calculate as the product of with the
    fold change vetor
  • Generate 8 samples with mixture proportions
    evenly distributed in 0,1
  • Add normally distributed noise

21
Validation by Simulation (2)
ROC curve
Predictive power
t
SAM
(fraction of truly higher expressed genes among
found genes)
t
Sensitivity
Positive predictive value PPV
Type I error
found genes
22
Validation by simulated data (3)
  • Generate 16 mixture samples as in example (2)
  • Calculate the statistics for 4 configurations

23
Outlook
  • Prove superiority over method proposed by Ghosh
  • Accomplish the paper
  • Write R software package
  • Apply methodology to RNAi data

Thanks to ...
  • Tim Beissbarth for data acquisition and
    preprocessing
  • Andreas Buness, Markus Ruschhaupt for helpful
    discussions
  • Gordon Smyth, Dileepa Diyagama, Andrew Holloway
    from the WEHI (Melbourne) for the data
  • Annemarie Poustka, Holger Sültmann
Write a Comment
User Comments (0)
About PowerShow.com