Title: HUMANMOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U', Piro R', Grassi E
1HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS
PREDICT CANDIDATE DISEASE GENES Ala U., Piro
R., Grassi E., Damasco C., Silengo L., Brunner
H., Provero P. and Di Cunto F.ugo.ala_at_unito.it
Molecular Biotechnology Center, University of
Torino
2Introduction
- Massive repositories of gene expression data
obtained with microarray technology represent an
extremely rich source of biological information - Since genes involved in the same functions tend
to show very similar expression profiles,
co-expression analysis performed on these
datasets could be a very powerful approach for
inferring functional relationships among genes
and for predicting the involvement of specific
sequences in human genetic diseases - However, so far gene co-expression has not proved
to be a particularly useful criterion for disease
genes identification.
3Reasons
- Microarray data are noisy
- Many genes showing very similar expression
profiles are not functionally related (Spellman
et al, 2002)
4A powerful help phylogenetic conservation
Since gene regulatory regions evolve at higher
speed than coding regions, if the co-expression
of two genes is evolutionarily conserved, it is
much more likely that the genes are functionally
related. Obviously, the confidence level
increases with the phylogenetic distance among
species.
A gene co-expression network constructed with
expression data from distant species (H.
sapiens, C. elegans, D. melanogaster, S.
cerevisiae) (Stuart et al, 2003 )
5A powerful help phylogenetic conservation
- Human-mouse conserved co-expression represents an
excellent compromise between sensitivity and
specificity to predict functional relationships
among mammalian genes (Pellegrino et al, 2004)
6Construction of human-mouse conserved
coexpression networks for disease gene
predictionStep one single species networks
Homo sapiens
Mus musculus
Single-species datasets of microarray
experiments, based on probes which can be linked
to EntrezGene IDs
Evaluation of gene expression profile correlation
among all the probes by Pearsons coefficient
Link every probe with the probes which are in the
first percentile of the respective ranked lists
Merge links between probes by Entrez Gene
identifiers
Human gene co-expression networks H-GCN
Mouse gene co-expression networks M-GCN
7Construction of human-mouse conserved
coexpression networks for disease gene
predictionStep two human-mouse networks
Human gene co-expression networks H-GCN
Mouse gene co-expression networks M-GCN
Select the links found in both the co-expression
networks, according to Homologene
Human-mouse co-expression network
8Conserved co-expression networksData retrieval
Experiments based on cDNA platforms and performed
mostly on tumor cell lines
Experiments based on Affymetrix platforms and
performed on normal tissues
- 4129 experiments for 102296 EST probes
- for human
- 467 experiments for 80595 EST probes for
- mouse
- 353 experiments for 46241 probesets for
- human (Roth et al, 2006)
- 122 experiments for 19692 probesets for
- mouse (Su et al, 2004)
9Conserved co-expression networksResults
- 8512 nodes (genes)
- 56397 edges
- 12766 nodes (genes)
- 155403 edges
We concentrate our network analysis on CC
(Co-expression cluster) defined as the nearest
neighbors of each node of networks, thus
obtaining a CC for each gene
10Conserved co-expression networksComparison with
other networks
Good protein-protein predictors
Both networks exhibit a highly significant
overlap with protein-protein interactions
reported in the Human Protein Reference Database
11Conserved co-expression networksGO Analysis
Good criterion to identify functionally related
genes
A-CCN and S-CCN show a strong enrichment for
functional annotation, compared with random
permutations.
12Predicting human disease genes
- MimMiner (Van Driel et al, 2006), a text-mining
phenotype similarity relationship database,
represents a very useful way for the merging of
co-expression data with disease information.
A-CCN and S-CCN show also a strong enrichment for
what concern OMIM Ids characterizing disease
phenotype.
13How to of the algorithm (1)
CCs Conserved Co-expression clusters
OMIM locus (phenotype description)
14How to of the algorithm (2)
CCs Conserved Co-expression clusters
OMIM locus (phenotype description)
DRCCs Disease Related Co-expression Clusters
15How to of the algorithm (3)
DRCCs Disease Related Co-expression Clusters
OMIM locus (phenotype description)
These genes become our candidate disease genes
16Leave-one-out
- Leave-one-out cross validation tests over all
known disease genes have shown good performance
17Predicting human disease genes Results
- We applied our procedure to 850 OMIM phenotype
entries with unknown molecular basis (but mapped
to one or more genetic loci). - The candidates are 321, covering a set of 81 loci
(65 from A-CCN, 6 from S-CCN and 10 from both
networks)
18Examples and discussion of some candidates
19Conclusions
- Our approach, based on conserved co-expression
analysis, has been - demonstrated particularly successful to provide
reliable predictions of - potential disease-causing genes because of two
main factors - the phylogenetic filter
- the integration with quantitative phenotype
correlation data - In conclusion, we propose that our method and our
list of candidates will - provide a useful support for the identification
of new disease-causing - genes.
20Our real network
Damasco C.
Di Cunto F.
Piro R.
Ala U.
Brunner H.
Grassi E.
Provero P.
Silengo L.