Title: Kinetics analysis of microarray Modelling and clustering of gene expression profiles
1Kinetics analysis of microarray Modelling and
clustering of gene expression profiles
Robert-Granié C. INRA-Station dAmélioration
Génétique des Animaux San Cristobal M., Liaubet
L. INRA-Laboratoire de Génétique
Cellulaire Martin PGP. INRA-Laboratoire de
Pharmacologie et Toxicologie Déjean
S. Institut de Mathématiques, Université Paul
Sabatier
2Skeletal muscle gene expression after local injury
- 10 male piglets (ranging from 23 to 32 Kg body
weight) - 4 IM injection of propylene glycol in
Longissimus dorsi (LD) muscle - 5 random sites in the left and right LD muscles
at different time-intervals
Control
Acclimatization period
- 21 days
- 7 days
- 48h
- 6h
Euthanasia
3cDNA microarrays
2.5 cm
- 50 muscle samples for RNA isolation
- 3456 pig cDNA clones
- spotted in duplicate on two separate fields in
the same membrane -
- Nylon cDNA microarrays
7.5 cm
Muscle Lesion 1
Muscle Lesion 2
4Mean profiles of the normalized intensities
- 10 pigs x 5 time points
- x 2 replicats
-
- 100 observations/clone
- 3456 clones
Sample of 50 gene profiles
5Hepatic gene expression profiles
- Temporal gene expression profiles during a
fasting period for mouse - 88 mice
- - 2 genotypes (wild-type and knockout PPAR?-/-)
- - 11 time points between 0 and 72 hours
- (0h-3h-6h-9h-12h-18h-24h-36h-48h-60h-72h)
- - 4 mice / time point / genotype
- - RNA from liver
- 200 genes (dedicated cDNA chip) on nylon
membranes
6Means of the observed values by genotype
7Biological questions
- To fit the kinetics of gene expression - To
identify the differentially expressed genes along
times (pig) / both along times and between
genotype (mice) - To identify homogeneous
clusters of gene with similar profiles
whatever the level of expression
8The simplest semi parametric model
Each gene is fitted by a penalized linear spline
model (Ruppert, Wand, Caroll, 2003)
smooth function of time
The penalized spline smoother exactly corresponds
to the optimal predictor in a mixed model
framework - easy implementation in standard
statistical software - estimation and inference
about parameters are available
9Semi parametric mixed model using penalized
splines
To take account the differences between
individuals
10How do we select the best model ?
- Choice of the degree of the polynomial (linear,
quadratic, cubic) - Choice of the number of knots and knots
locations - - Selection of differentially expressed genes
along times / both along times and between
genotype
11The best model (mice data)
A linear penalized spline for each genotype
with 5 knots (12, 24, 36, 48, 60 hours)
,
Selection of genes based on the significant times
x genotype interaction 23 genes are declared
differentially expressed both along times and
between genotype
12Selection of differentially expressed genes
along times (pig data)
- Locations of knots at time points 6h, 2, 7 and
21 days - Comparison models (RLT) to detect groups of gene
- with no, a low and an high individual
variability - Selection of genes varying during time
- 37 genes are declared differentially expressed
along times in the group with a low individual
variability
13Raw and fitted gene expression profiles
- Raw data Fitted individual
profiles Mean profile
14Clustering of gene expression profiles
- On the fitted curves of 37 genes
- The curves are summarized by the values of the
derivative of fitted expression profiles in some
discretization points (20 points equally spaced
between 0 and 21 days) - 37 individual-genes in rows x 20
variables-dates in columns - - Hierarchical clustering (euclidian distance,
Ward criterion)
partitioning method (k-means)
15Profile clustering
Inflammation process Cellular movement Cellular
growth and proliferation
Cellular metabolism Cytoskeleton
Signal transduction Protein interaction
Protein synthesis
16Conclusion
- A flexible and simple method of fitting curves
in kinetic studies of microarray data - Computations are easily performed, thanks to the
existing mixed model packages in many standard
statistical software - Results are in accordance with our current
knowledge of the biological processes underlying
muscular repair (pig data) / modulated during
fasting (mice data) - The models can be easily extended to more
general models - polynomials of degree pgt1,
- correlation among errors, -
heterogeneity of variances
17Thank You For Your Attention
Questions !!!