Title: Evidence networks for the analysis of biological systems
1Evidence networks for the analysis of biological
systems
- Rainer Breitling
- IBLS Molecular Plant Science group
- Bioinformatics Research Centre
- University of Glasgow, Scotland, UK
2Background
Datasets and evidence networks in post-genomic
biology
3Genomics
Fully sequenced genomes (1995-2004) 18
archaea 163 bacteria 3 protozoa 24 yeast species
and fungi 2 plants (Arabidopsis, rice) 2 insects
(flies, honey bee) 2 worms (C.elegans, C.
briggsae) 3 fish (fugu, puffer,
zebrafish) chicken, cow, dog, mouse, rat,
chimp human ? lots of lists of genes
4Transcriptomics
- microarrays measure gene expression levels (mRNA
concentrations) - relative or absolute values
- in organisms, tissues, cells
- produce gene lists (e.g., which genes are
up-regulated by a disease, by drug treatment, in
a certain tissue)
5Proteomics
- 2D gels, liquid chromatography, and mass
spectrometry measure protein concentrations - in tissues, cells, organelles
- detect chemical modifications and processing of
proteins - produces lists of protein variants that are
different among conditions
6Metabolomics
- chromatography and mass spectrometry measure
metabolite concentrations - in tissues, cells, body fluids, cell culture
medium - produces lists of affected metabolites
7Evidence networks
- relate items (genes, proteins, metabolites) that
have something to do with each other - relationship is based on objective evidence
- represented as bipartite graphs
- two classes of nodes items and evidence
- automated analysis of results possible
- intuitive visualization and links to literature
8Types of evidence networks
- Relationship can be based on
- physical neighborhood
- phyletic pattern similarity
- expressional correlation
- biophysical similarity
- chemical transformation
- functional co-operation
- literature co-citations
9Types of evidence networks
- Relationship can be based on
- physical neighborhood
- phyletic pattern similarity
- expressional correlation
- biophysical similarity
- chemical transformation
- functional co-operation
- literature co-citations
A O M P K Z Y Q V D R L B C E F G H S N U J
X I T W phy a o m p k z y - - d - l - - - - - -
- - - - - i t 22 aompkzy--d-l-----------it-
NtpA C H-ATPase subunit A 17
aompkzy--d-l-----------it- NtpB C H-ATPase
subunit B 17 aompkzy--d-l-----------it- NtpD C
H-ATPase subunit D 18 aompkzy--d-l-----------it-
NtpI C H-ATPase subunit I
10Types of evidence networks
- Relationship can be based on
- physical neighborhood
- phyletic pattern similarity
- expressional correlation
- biophysical similarity
- chemical transformation
- functional co-operation
- literature co-citations
11Types of evidence networks
- Relationship can be based on
- physical neighborhood
- phyletic pattern similarity
- expressional correlation
- biophysical similarity
- chemical transformation
- functional co-operation
- literature co-citations
12Types of evidence networks
- Relationship can be based on
- physical neighborhood
- phyletic pattern similarity
- expressional correlation
- biophysical similarity
- chemical transformation
- functional co-operation
- literature co-citations
13Types of evidence networks
- Relationship can be based on
- physical neighborhood
- phyletic pattern similarity
- expressional correlation
- biophysical similarity
- chemical transformation
- functional co-operation
- literature co-citations
14Types of evidence networks
- Relationship can be based on
- physical neighborhood
- phyletic pattern similarity
- expressional correlation
- biophysical similarity
- chemical transformation
- functional co-operation
- literature co-citations
15What is the big picture? Graph-based iterative
Group Analysis for the automated interpretation
of biological datasets lists graphs
understanding
16What does this list mean?
Fold-Change Gene Symbol Gene Title
1 26.45 TNFAIP6 tumor necrosis factor, alpha-induced protein 6
2 25.79 THBS1 thrombospondin 1
3 23.08 SERPINE2 serine (or cysteine) proteinase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 2
4 21.5 PTX3 pentaxin-related gene, rapidly induced by IL-1 beta
5 18.82 THBS1 thrombospondin 1
6 16.68 CXCL10 chemokine (C-X-C motif) ligand 10
7 18.23 CCL4 chemokine (C-C motif) ligand 4
8 14.85 SOD2 superoxide dismutase 2, mitochondrial
9 13.62 IL1B interleukin 1, beta
10 11.53 CCL20 chemokine (C-C motif) ligand 20
11 11.82 CCL3 chemokine (C-C motif) ligand 3
12 11.27 SOD2 superoxide dismutase 2, mitochondrial
13 10.89 GCH1 GTP cyclohydrolase 1 (dopa-responsive dystonia)
14 10.73 IL8 interleukin 8
15 9.98 ICAM1 intercellular adhesion molecule 1 (CD54), human rhinovirus receptor
16 9.97 SLC2A6 solute carrier family 2 (facilitated glucose transporter), member 6
17 8.36 BCL2A1 BCL2-related protein A1
18 7.33 TNFAIP2 tumor necrosis factor, alpha-induced protein 2
19 6.97 SERPINB2 serine (or cysteine) proteinase inhibitor, clade B (ovalbumin), member 2
20 6.69 MAFB v-maf musculoaponeurotic fibrosarcoma oncogene homolog B (avian)
17iterative Group Analysis (iGA)
iGA uses simple hypergeometric distribution to
obtain p-values Breitling et al., BMC
Bioinformatics, 2004, 534
18Graph-based iGA
Breitling et al., BMC Bioinformatics, 2004, 5100
19Graph-based iGA
1. step build the network
Breitling et al., BMC Bioinformatics, 2004, 5100
20Graph-based iGA
2. step assign ranks to genes
Breitling et al., BMC Bioinformatics, 2004, 5100
21Graph-based iGA
3. step find local minima
p 1/8 0.125
p 6/8 0.75
p 2/8 0.25
Breitling et al., BMC Bioinformatics, 2004, 5100
22Graph-based iGA
4. step extend subgraph from minima
p0.014
p0.018
p1
p0.125
Breitling et al., BMC Bioinformatics, 2004, 5100
23Graph-based iGA
5. step select p-value minimum
p0.018
p0.014
p1
p0.125
Breitling et al., BMC Bioinformatics, 2004, 5100
24Advantages of GiGA
- fast, unbiased and comprehensive analysis
- assignment of statistical significance values to
interpretation - detection of significant changes even if data are
too noisy to reliably detect changed genes - statistically meaningful interpretation already
without replication experiments - detection of patterns even for small absolute
changes - flexible use of annotations intuitive
visualization
25Example 1
Microarrays Gene expression changes during the
yeast diauxic shift
26Yeast diauxic shift studyDeRisi et al.
(1997)Science 278 680-6
27Yeast diauxic shift study
0h 9.5h 11.5h 13.5h 15.5h 18.5h 20.5h
UP 6144 - purine base metabolism 6099 - tricarboxylic acid cycle 6099 - tricarboxylic acid cycle 3773 - heat shock protein activity 6099 - tricarboxylic acid cycle
9277 - cell wall (sensu Fungi) 3773 - heat shock protein activity 5749 - respiratory chain complex II (sensu Eukarya) 6099 - tricarboxylic acid cycle 3773 - heat shock protein activity
297 - spermine transporter activity 6950 - response to stress 6121 - oxidative phosphorylation, succinate to ubiquinone 5977 - glycogen metabolism 5749 - respiratory chain complex II (sensu Eukarya)
15846 - polyamine transport 297 - spermine transporter activity 8177 - succinate dehydrogenase (ubiquinone) activity 6950 - response to stress 6121 - oxidative phosphorylation, succinate to ubiquinone
4373 - glycogen (starch) synthase activity 3773 - heat shock protein activity 4373 - glycogen (starch) synthase activity 8177 - succinate dehydrogenase (ubiquinone) activity
15846 - polyamine transport 4373 - glycogen (starch) synthase activity 4129 - cytochrome c oxidase activity 6537 - glutamate biosynthesis
5353 - fructose transporter activity 7039 - vacuolar protein catabolism 5751 - respiratory chain complex IV (sensu Eukarya) 6097 - glyoxylate cycle
15578 - mannose transporter activity 6950 - response to stress 5749 - respiratory chain complex II (sensu Eukarya) 5750 - respiratory chain complex III (sensu Eukarya)
7039 - vacuolar protein catabolism 4129 - cytochrome c oxidase activity 6121 - oxidative phosphorylation, succinate to ubiquinone 9060 - aerobic respiration
8645 - hexose transport 5751 - respiratory chain complex IV (sensu Eukarya) 8177 - succinate dehydrogenase (ubiquinone) activity 4129 - cytochrome c oxidase activity
28GiGA results diauxic shift
Down-regulated genes using GeneOntology-based network Down-regulated genes using GeneOntology-based network Down-regulated genes using GeneOntology-based network Down-regulated genes using GeneOntology-based network Down-regulated genes using GeneOntology-based network
locus gene description ("anchor gene") p-value members max. rank
YHL015W ribosomal protein S20 5.87E-86 39 48
YMR217W GMP synthase 3.38E-13 9 172
YDR144C aspartyl proteaserelated to Yap3p 4.06E-08 6 242
YNL065W multidrug resistance transporter 4.02E-05 3 141
YLR062C 6.41E-05 4 367
YGL225W May regulate Golgi function and glycosylation in Golgi 1.12E-04 4 422
YPR074C transketolase 1 1.44E-04 4 449
total genes measured in network 4087. total genes measured in network 4087.
29small ribosomal subunit
large ribosomal subunit
nucleolar rRNA processing
translational elongation
30GiGA case study diauxic shift
Up-regulated genes using metabolic network Up-regulated genes using metabolic network Up-regulated genes using metabolic network Up-regulated genes using metabolic network
locus gene description p-value members max. rank
YER065C isocitrate lyase 4.96E-53 39 54
YGR088W catalase T 3.09E-10 11 106
YFR015C glycogen synthase (UDP-glucose-starch glucosyltransferase) 2.08E-04 3 45
YJR073C unsaturated phospholipid N-methyltransferase 3.85E-04 5 156
YDR001C neutral trehalase 5.01E-04 3 60
YCR014C DNA polymerase IV 5.44E-04 17 481
YIR038C glutathione transferase 8.64E-04 5 183
total genes measured in network 744. total genes measured in network 744.
31respiratory chain complex II
glyoxylate cycle
citrate (TCA) cycle
oxidative phosphorylation (complex V)
respiratory chain complex III
32respiratory chain complex IV
33Example 2
Metabolomics Changes in metabolic profiles in
drug-treated trypanosomes
34GiGA applied to metabolomics data
- Challenge No annotation available
- Solution Build evidence network based on
hypothetical reactions between observed masses
(mass differences)
35Metabolite tree of mass 257.1028
(glycerylphosphorylcholine)
6 generations
36Metabolite tree of mass 257.1028
4 generations
37Metabolite tree of mass 257.1028
2 generations
38Metabolite tree of mass 257.1028
colors indicate changes of metabolite signals
compared to untreated samples after 60 min
pentamidine (red down, green up)
39GiGA metabolite trees for one experimental example
40Choline tree found by GiGA(most significant
subgraph, plt10-13)
extracted from
41Summary
- post-genomic technologies produces lists
- neighborhood relationships yield evidence
networks (graphs) - lists graphs biological insights
- GiGA graph analysis highlights and connects
relevant areas in the evidence network
42Acknowledgements
- Pawel Herzyk Sir Henry Wellcome Functional
Genomics Facility - Anna Amtmann Patrick Armengaud IBLS Molecular
Plant Science group - Mike Barrett IBLS Parasitology Research group
- FGF academic users Wilhelmina Behan, Simone
Boldt, Anna Casburn-Jones, Gillian Douce, Paul
Everest, Michael Farthing, Heather Johnston,
Walter Kolch, Peter O'Shaughnessy, Susan Pyne,
Rosemary Smith, Hawys Williams
43Contact
- Rainer Breitling
- Bioinformatics Research Centre
- Davidson Building A416
- University of Glasgow, Scotland, UK
- R.Breitling_at_bio.gla.ac.uk
- http//www.brc.dcs.gla.ac.uk/rb106x