Title: Statistical Modeling of Time Course Gene Expressions Ping Ma Department of Statistics Center for Adv
1Statistical Modeling of Time Course Gene
ExpressionsPing MaDepartment of
StatisticsCenter for Advanced StudyInstitute
for Genomic Biology
2mRNA Expression
- Microarray A snapshot of mRNA expression levels
of thousands of genes - Expression profiles expression measurements
under different conditions and for different
types of cells.
3mRNA expression pattern
4Time Course Microarray
- A series of microarray conducted sequentially
during a biological process - provide insight on the underlying biology
- help decipher the dynamic gene regulatory
network.
5Experiment 1 Comparative Genomics
- Worm and fruit fly has last common ancestor one
billion years ago
6Life Cycle Microarray of Worm and Fruitfly
- cDNA array 17,871 genes in 6 time points from
egg to adulthood - Jiang et al 2000 PNAS
cDNA array 4028 genes expression at 67 time
points Arbeitman et al 2002 Science
7Experiment 2 Anaerobic and Aerobic Microarray
in Yeast
8Experiment 3 Factorial Microarray on Zebrafish
Retina Development
Leung, Y. F. , Ma, P., Link, B. A. and Dowling,
J.(2008) PNAS, 105, 12909-12914.
9Experiment 3 Factorial Microarray on Zebrafish
Retina Development
Leung, Y. F. , Ma, P., Link, B. A. and Dowling,
J.(2008) PNAS, 105, 12909-12914.
10Objective
- How to analyze time course gene expression while
taking in account of time dependence and
biological conditions - Establish a flexible framework to facilitate
information extraction
11Challenges
- Both continuous and discrete factors
- Time dependence correlation
- Different sampling strategy
- --- sample separately for adult male and
female in Arbeitman (2002) - --- break point in oxygen in Lai (2006)
-
12Current Research
- Conventional methods, e.g. Kmeans and
hierarchical clustering, ignores these factors - Multivariate Gaussian does not account the time
interval - Time series requires stationary and Markov
property
13Functional Data Approach
- The true expressions are modeled by curves, which
is described using functional in mathematics -
14Mixed-Effect Representation
- The expression profile of ith gene is yi
-
-
15Illustration
16Functional ANOVA
- Decomposition
- More generally
17Branching Spline
18Penalized Hendersons Likelihood
19Matrix Representation
20Smoothing Parameter Selection
- How to choose ? and O?
- Generalized Cross-validation
- The asymptotic optimality of GCV was shown by Gu
and Ma (2005 Ann Stat) in a decision-theoretic
framework
21Clustering Analysis
- The expression profile of ith gene is yi
-
-
22EM algorithm
23Rejection-Controlled EM
- Typical EM is infeasible for large scale data
- Rejection-controlled step for alleviate the
computation cost
24Model Selection
- Assessing the number of components in mixture
model - Bayesian factor
- Bayesian information criteria (BIC) as an
- approximation
25Comparative Genomics
Gene A1 of Species 1
- 808 ortholog genes
- We identified 34 clusters
- Annotated functions using Gene Ontology analysis
- 21 of 34 clusters are biological functions
enriched
Gene A of Ancestor
Gene A2 of Species 2
26Pattern Formation
embryonic development (P-value 0.0003)
post-embryonic body morphogenesis (P-value 0.007)
mRNA processing (P-value 0.002)
27Development
larval development (P-value0.008)
growth regulation (P-value lt 10-6 ).
28Reproduction
embryonic development (P-value lt 10-6)
reproduction (P-value lt 10-7)
29Software
- SSClust http//www.stat.uiuc.edu/pingma/research/
software/SSClust.html - MFDA
- http//cran.r-project.org/
30Time Course Microarray Database
- http//error.stat.uiuc.edu/timeseries/pradoproject
/index.php
31Reference
- Ma, P. and Zhong, W.(2008) JASA
- Leung, Y. F. , Ma, P., Link, B. A. and Dowling,
J. (2008) PNAS - Ma, P., Castillo-Davis, C., Zhong, W., and Liu,
J. S. (2006) NAR
32Acknowledgement
- Leung, Yuk Fai (Purdue), Wenxuan Zhong (UIUC),
Liu, Jun S. (Harvard) - Zamdborg, Leonid, Kim, Ji Young
- NSF DMS