Analyzing Time Series Gene Expression Data - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Analyzing Time Series Gene Expression Data

Description:

... for time series than for static expression experiments. 06/2004 ... We can modify our splines to take into account the fact that many genes are co-expressed. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 49
Provided by: jeanne91
Category:

less

Transcript and Presenter's Notes

Title: Analyzing Time Series Gene Expression Data


1
Analyzing Time Series Gene Expression Data
Ziv Bar-Joseph Center for Automated Learning and
Discovery Carnegie Mellon University
2
Expression Experiments
Time series Multiple arrays at various temporal
intervals
Static Snapshot of the activity in the cell
3
Abundance of time series expression datasets
  • Over 30 of the 170 papers perform time series
    experiments.
  • A total of 220 time series datasets.
  • More arrays used for time series than for static
    expression experiments.

4
(No Transcript)
5
Unique features of time series expression
experiments
  • Autocorrelation between successive points.
  • Can identify complete set of acting genes.
  • Allows to infer causality.

6
Time Series Examples Development
Development of fruit flies Arbeitman, Science 02
7
Time Series Examples (cont)
Function
Infectious diseases, response to external stimulus
Interactions and Systems
Transcription factors knockouts

8
Time Series Examples Systems
The cell cycle system in yeast Simon et al, Cell
01
9
Computational challenges
Computational
Biological
10
Sampling Rates
  • Non uniform
  • Differ between experiments

11
Cell Cycle Datasets
12
Networks
Pattern Recognition
Data Analysis
13
Representing time series expression data
  • We are capturing a continuous process with a few
    samples.
  • We need a way to convert our samples for each
    gene to an expression profile.
  • Some simple techniques
  • - Linear interpolation
  • - Spline interpolation
  • - Functional assignment

14
Standard interpolation
If we have missing values and noise linear
interpolation will fail to reproduce an accurate
representation.
15
Splines
  • Instead of linear interpolation, we can use
    splines piecewise polynomials.
  • Still, will overfit when faced with missing
    values and noise.

16
The power of co-expression
  • We can modify our splines to take into account
    the fact that many genes are co-expressed.

17
Avoiding overfitting
  • Require that for each gene ? N(0, ?j)
  • Add noise term

18
Class Assignment
  • In some cases the biological classes are known
    in advance.

The algorithm can be modified and combined with a
Gaussian mixture algorithm to perform clustering
of the continuous representation of the
expression data.
19
Missing values
20
Interpolation
21
Alignment
FKH1
  • Difference in the timing of similar biological
    processes

22
Continuous Alignment
Using the estimated splines, we continuously
align two expression datasets by minimizing a
global error function
RECOMB 2002
23
Identifying differentially expressed genes
Wild Type
Knockout
  • Hard to perform manual comparison.
  • Sampling rates and different timing prevent
    direct comparison.

Zhu et al, Nature 2000
24
Using Global Error to Determine Significance
Key idea Combine individual noise model with a
global error (area between curves) that correctly
captures the temporal difference between the two
profiles.
25
Comparing the continuous representation
WT
Knockout
26
Enrichment for the Cell Cycle Factors
27
Overcoming population effects
Smc3 observed values
  • Microarray experiments profile population of
    cells.
  • Initially cells are synchronized, but they lose
    their synchronization over time.
  • Need to compensate for synchronization loss in
    order to recover single cell values.

28
Networks
Pattern Recognition
Individual Gene
29
Pattern recognition and clustering
  • Identifying relationships between genes based on
    expression profiles.
  • Handling non uniform sampling rates.
  • Determining relationships between clusters.

30
Time Shifted and Inverted Profiles
Qian et al Journal of Molecular Biology 2001
31
Results
Simultaneous expression profile relationships
Inverted expression profile relationships
Time delayed expression profile relationships
32
Hierarchical clustering
  • For n leaves there are n-1 internal nodes
  • Each flip in an internal node creates a new
    linear ordering
  • There are 2n-1 possible linear ordering of the
    leafs of the tree

1
2
33
Determine Relations Between Clusters
Optimal leaf ordering selects the ordering that
maximizes the sum of the similarities of adjacent
leaves in the clustering tree.
34
Results Synthetic Data
Input
Hierarchical clustering
Optimal ordering
35
24 cell cycle experiments
36
Short Time Series
  • 60 of the time series datasets are short (lt7).
  • Over 40000 signals are measured, data is very
    noisy and experiments are compared across all
    time points.
  • Most clustering algorithms will miss small sets,
    and in addition, cannot be used to compare
    datasets.

37
Taking advantage of the small number of points
38
Networks
Pattern Recognition
Individual Gene
39
Systems Biology
  • Different types of data provide partial
    information about the activity in the cell.
  • By integrating these data sources we can obtain a
    better picture of the activity in the cell.
  • A lot of current interest though relatively few
    methods construct temporal models.

40
Dynamic Bayesian Networks
  • Bayesian networks are graphical models which can
    account for the stochastisity in the data.
  • Can be extended to handle time series data
    (dynamic Bayesian networks).
  • So far have been used for small scale modeling.

41
Modeling tryptophan metabolism on E. coli
Ong et al Bioinformatics 2002
42
Genetic RegulAtory Modules (GRAM)
  • Gene Modules
  • Set of genes that are co-regulated and
    co-expressed.
  • Functional Module
  • Collection of gene modules with related function.

43
Assembly of the Cell Cycle Transcriptional Regul
atory Network
Blue boxes gene modules
We combine GRAM with our continuous alignment
algorithms to construct a dynamic model for a
sub-network
44
Assembly of the Cell Cycle Transcriptional Regul
atory Network
Blue boxes gene modules
Individual regulators ovals, connected to their
modules Dashed line extends from module
encoding a regulator to the regulator protein oval
45
Comparing the Continuous Representation
WT
Knockout
46
Assembly of the Cell Cycle Transcriptional Regul
atory Network
Blue boxes gene modules
Individual regulators ovals, connected to their
modules Dashed line extends from module
encoding a regulator to the regulator protein oval
47
Summary
  • Time series expression data can be used to answer
    important biological questions.
  • Pros Autocorrelation, allows for casual
    inference, provides a better view of cellular
    activity
  • Cons Large number of signals but small number of
    time points, noise, lack of repeats
  • By using methods specifically developed for this
    data we can overcome the above problems and take
    advantage of its unique properties

48
Want to know more ?
  • Z. Bar-Joseph, Analyzing time series gene
    expression data Bioinformatics, in press.
  • www.cs.cmu.edu/zivbj
  • zivbj_at_cs.cmu.edu
Write a Comment
User Comments (0)
About PowerShow.com