Title: Course Module: Brief Introduction to Systems Biology Sven Bergmann Department of Medical Genetics Un
1Course Module Brief Introduction to Systems
Biology Sven BergmannDepartment of Medical
GeneticsUniversity of LausanneRue de Bugnon 27
- DGM 328 CH-1005 Lausanne Switzerland work
41-21-692-5452cell 41-78-663-4980
http//serverdgm.unil.ch/bergmann
2Recap Part 2 Standard Analysis Tools
- Motivation
- Why to study a large heterogeneous set of
expression data? - What biological questions can we ask?
- Supervised vs. unsupervised approaches
- Practical Part
- K-means clustering
- Coupled two-way clustering (CTWC)
- Principle component analysis (PCA)
- Singular value decomposition (SVD)
3Part 3 Advanced Analysis Tools
- Motivation
- What are the limitations of standard tools?
- What is a modular approach?
- How can we integrate different datasets?
- Practical Part
- Overview of Advanced Analysis Tools
- The (Iterative) Signature Algorithm
- The Ping-Pong Algorithm
4How to make sense of millions of numbers?
Hundreds of samples
Thousands of genes
New Analysis and Visualization Tools are needed!
5Pooling genome-wide expression measurements from
many experiments
cell- cycle
sets of specific conditions
6The challenge of many datasets How to integrate
all the information?
- Protein expression
- Tissue specific expression
- Interaction data
- Localization data
-
7How to extract biological information from
large-scale expression data?
- Clusters cannot overlap!
- Clustering based on correlations over all
conditions - sensitive to noise -
computation intensive
8How to extract biological information from
large-scale expression data?
9Overview of modular analysis tools
- Cheng Y and Church GM. Biclustering of
expression data.(Proc Int Conf Intell Syst Mol
Biol. 2000893-103) - Getz G, Levine E, Domany E. Coupled two-way
clustering analysis of gene microarray data.
(Proc Natl Acad Sci U S A. 2000 Oct
2497(22)12079-84) - Tanay A, Sharan R, Kupiec M, Shamir R. Revealing
modularity and organization in the yeast
molecular network by integrated analysis of
highly heterogeneous genomewide data. (Proc Natl
Acad Sci U S A. 2004 Mar 2101(9)2981-6) - Sheng Q, Moreau Y, De Moor B. Biclustering
microarray data by Gibbs sampling.
(Bioinformatics. 2003 Oct19 Suppl 2ii196-205) - Gasch AP and Eisen MB. Exploring the conditional
coregulation of yeast gene expression through
fuzzy k-means clustering.(Genome Biol. 2002 Oct
103(11)RESEARCH0059) - Hastie T, Tibshirani R, Eisen MB, Alizadeh A,
Levy R, Staudt L, Chan WC, Botstein D, Brown P.
'Gene shaving' as a method for identifying
distinct sets of genes with similar expression
patterns. (Genome Biol. 20001(2)RESEARCH0003.) - and many more!
http//serverdgm.unil.ch/bergmann/Publications/rev
iew.pdf
10One example in more detail The (Iterative)
Signature Algorithm
- No need for correlations!
- decomposes data into transcription modules
- integrates external information
- allows for interspecies comparative analysis
J Ihmels, G Friedlander, SB, O Sarig, Y Ziv N
Barkai Nature Genetics (2002)
11Trip to the Amazon
12How to find related items?
items
10
20
30
40
50
60
70
80
90
100
5
10
15
20
25
30
35
40
45
50
customers
13How to find related genes?
genes
10
20
30
40
50
60
70
80
90
100
5
10
15
20
25
30
35
40
45
50
conditions
J Ihmels, G Friedlander, SB, O Sarig, Y Ziv N
Barkai Nature Genetics (2002)
14Signature Algorithm Score definitions
15How to find related genes? Scores and thresholds!
condition scores
initial guesses (genes)
16How to find related genes? Scores and thresholds!
condition scores
gene scores
thresholding
17How to find related genes? Scores and thresholds!
condition scores
gene scores
18Iterative Signature Algorithm
OUTPUT
SB, J Ihmels N Barkai Physical Review E (2003)
19Identification of transcription modules using
many random seeds
20New Tools Module Visualization
http//serverdgm.unil.ch/bergmann/Fibroblasts/visu
aliser.html
21Gene enrichment analysis
- The hypergeometric distribution f(M,A,K,T) gives
the probability - that K out of A genes with a particular
annotation match with a - module having M genes if there are T genes in
total.
http//en.wikipedia.org/wiki/Hypergeometric_distri
bution
22Decomposing expression data into annotated
transcriptional modules
identified gt100 transcriptional modules in yeast
high functional consistency! many functional
links waiting to be verified experimentally
J Ihmels, SB N Barkai Bioinformatics 2005
23Module hierarchies and networks
24Higher-order structure
correlated
C
anti-correlated
25Mapping Transcription Modules
BLAST
26For distant organisms correlation patterns
generally are distinct
SB, J Ihmels N Barkai PLoS Biology (2004)
27What about related organisms?
genes
pairwise correlation (over all arrays)
J Ihmels, SB, J Berman N Barkai Science (2005)
28Promoter analysis The Rapid Growth Element
AATTTT
29Data Integration Example NCI60
60 cancer cell lines (9 tissue types)
30Modules and Co-modules
Z Kutalik, J Beckmann SB, Nature Biotechnology
(2008)
31How to identify Co-modules?
Iteratively refine genes, cell-lines and drugs to
get co-modules
32Gene (g)-drug (d) associations
Association score
Drug module score Gene module score
Sum over all modules m
33Co-modules predict drug-gene associations
recorded in Drugbank
True Positive Rate
False Positive Rate
34Geneset (G)-drug (d) associations
Association score
Drug module score Gene set module score
Sum over all modules m
35Predictive power for drug-geneset associations
is even stronger
36Take-home Messages
- Analysis of large-scale expression data bears
great potential to understand global
transcription programs and their evolution - Innovative analysis tools needed to extract
information from such data - (Iterative) Signature Ping-Pong Algorithms
- decomposes data into transcription modules
- integrates external information
- allows for interspecies comparative analysis