Title: Biomolecular Networks Charlie Hodgman Univ. Leeds GSK December 2003
1Biomolecular NetworksCharlie Hodgman (Univ.
Leeds / GSK)December 2003
- Composition
- Information resources on biomolecular
interactions - Problems with these resources
- Introduction to Graph Theory
- Applications of Graph Theory to biological
research - A sniff of systems biology
2There are three classes interactions - they can
be used to build interaction networks that
represent physiological processes within and
between cells
- Enzyme - ligand (substrate, product, inhibitor,
activator) - metabolic pathways and networks
- Gene products (RNA, proteins) with each other
- signal networks, machinery for cellular processes
- protein(/product) interaction networks
- natural RNA interference mechanisms
- Gene regulatory elements - gene products
- genetic networks
3The 3 networks are actually interconnected
Gene product interaction network
Genetic network
STIMULUS
etc.
mRNA
Metabolic network
4Putting the networks together
- Integrated - combinations of metabolic, genetic
and product-interaction networks - a.k.a. gene networks of Inst. Cytology
Genetics, Novosibirsk - Holistic - integrated networks that capture
(all) interactions within a cell or
multi-cellular system - a.k.a. virtual cells, e-cells
5Public proprietary enzyme databases
ENZYME http//ca.expasy.org/enzyme/ E.C.
reactions record 1 E.C. code KEGG
http//www.genome.ad.jp/kegg/ enzymes -
ligands record 1 E.C. code BRENDA
http//www.brenda.uni-koeln.de/ all
enzymological data ScienceFactory site
down record 1 E.C. code EMP
http//www.empproject.com/ all enzymological
data http//emp.mcs.anl.gov/
record 1 enzyme from 1 paper UMBBD
http//umbbd.ahc.umn.edu/ enzymes -
metabolites record 1 recorded bio-degradative
or xenobiotic reaction Beilstein
6Enzyme record
ID 4.1.1.77 DE 4-oxalocrotonate
decarboxylase. CA 4-oxalocrotonate
2-oxopent-4-enoate CO(2). CC -!- Involved in
the meta-cleavage pathway for the CC
degradation of phenols, cresols and catechols. DR
P49156, DMPH_PSESP P49155, XYLI_PSEPU //
7Problems with these databases
- EC code assignment idiosyncratic (e.g. 19 codes
for ATP hydrolysis), slow (can take years
depending on how often EC defers decision), poor
representation of allostery - EC code structure (i.e. number-based hierarchy)
is poor model of classification for enzyme
activity, but is ONLY system at present - Non-standard terminology of metabolite and
protein names, both within and between databases
........
8The metabolite-naming problem
Different compounds similar names
Same compound different names
(S)-malate Also known as- L-2-Hydroxybutanedi
oic acid L-Apple acid L-malate L-Malic acid
(S)-Malate S-malate
C(C1CCCCC1)C(O)O-
phenylacetate
Generic-specific relationships
Alanine is-a alpha-amino acid is-a
amino-acid is-a amine is-a carboxylic acid
C1CCCCC1OC(O)C
Phenyl acetate
9Protein interactions/associations have physical
and functional attributes
Stoichiometric Non-stoichiometric
Enzyme/substrate relationship
Regulatory subunits
Directed Undirected
Multi-subunit complexes
Filaments
Directed one molecule exerting a biological
effect upon the other. Stoichiometry the numbers
of molecules involved are physically
defined. These attributes need to be assigned
for studying cause-effect relationships
10Info resources - Protein interactions
BIND http//www.bind.ca/ BRITE http//www.genom
e.ad.jp/brite/ CSNDB http//geo.nihs.go.jp/csndb
/ DIP http//dip.doe-mbi.ucla.edu/ HPRD http/
/www.hprd.org/ MINT http//cbm.bio.uniroma2.it/m
int/ other interactions TRANSPATH http//193
.175.244.148/ duplicates CSNDB data
- Protein Standards Institute now working on data
exchange format - EBI is european coordinator (http//www.ebi.ac.uk/
intact/)
11Information resources - gene regulation
TRRD http//www.bionet.nsc.ru/trrd/ TRANSFAC htt
p//transfac.gbf.de/TRANSFAC/ accessible only
through proprietary /www.gene-regulation.com/ PRO
NIT http//www.rtc.riken.go.jp/jouhou/pronit/proni
t.html (see also http//gibk26.bse.hyutech.ac.jp/
jouhou/jouhoubank.html) RegulonDB http//www.cifn
.unam.mx/Computational_Genomics/regulondb/ data
mostly pertain to E. coli. EPD http//www.epd.
isb-sib.ch/ focus on human data largely only
promoter locations
12(No Transcript)
13Networks databases
- Metabolism
- Signal transduction, integrated networks
14Metabolic pathway databases
MPW http//www.empproject.com/,
http//emp.mcs.anl.gov/ KEGG http//www.genome.ad
.jp/kegg/ BioCyc http//biocyc.org/ brings
together Ecocyc, Humcyc, Metacyc etc. based on
Lamberts chart and E.C. reaction
equations Boehringer http//www.expasy.org/cgi-bi
n/search-biochem-index based on Boehringer chart
15MPW chart
16A KEGG CHART
Title
17Problems with these databases
- Pathways are arbitrary entities
- where to start/stop,
- gaps in picture (KEGG) or pathways shrinking in
length (MPW) - metabolite centric, suboptimal for functional
genomics proteomics because enzymes (and
especially multifunctional gene products) occur
in multiple places - selective ignorance (only capture isolated
pathways) - possibly expensive or restricted access (e.g.
curagen, transpath)
18Other pathway databases
CSNDB http//geo.nihs.go.jp/csndb.html SPAD htt
p//www.grt.kyushu-u.ac.jp/spad/ STKE http//stke
.sciencemag.org/ GeNet GenomeKnowledgeBase
http//www.genomeknowledge.org/
19Other pathway databases
CSNDB http//geo.nihs.go.jp/csndb.html SPAD htt
p//www.grt.kyushu-u.ac.jp/spad/ STKE http//stke
.sciencemag.org/ GeNet GenomeKnowledgeBase
http//www.genomeknowledge.org/ Proprietary Tr
anspath http//193.175.244.148/ (content mostly
CSNDB) Biocarta http//www.biocarta.com/ GeneGo
http//www.genego.com/ (Curagen)
20Gene networks supporting homeostasis of an
organism
Gene network controlling intracellular
cholesterol concentration in mammalian cells
(Ignatieva E.V.)
Principle scheme of a regulatory contour with
the negative feedback
21Graphsin the mathematical sense(Gross
Yellen)
22Graph terminology
Connected
Node/vertex
Edge
Unconnected
23Graph terminology
Graph
(unconnected) subgraphs,clusters
24Graph terminology
(connected) subgraphs
strongly connected nodes/clusters
25Graph terminology
Cliques
26Graph terminology
Undirected graph
Directed graph (digraph)
Directed acyclic graph (DAG) Partially directed
graph
27Graph terminology
Bridge
Span
Articulation point
28Graph terminology
Tree
Leaf
Root
Node
29Graph terminology
Forest
30Graph terminology
Pruning
- Different approaches
- back one node
31Simplifying yeast two-hybrid output
32Graph terminology
Pruning
- Different approaches
- back one node
- back to node with gt1 edge
- (trees pruned back to root)
- Reveals cycles
33Graph terminology
Pruning
- Different approaches
- back one node
- back to node with gt1 connection
- (trees pruned back to root)
- by given distance from a given node
- Shows structure of nodes locality
Distance 2
34Graph terminology
Pruning
- Different approaches
- back one node
- back to node with gt1 connection
- (trees pruned back to root)
- by given distance from a given node
- by given distance up/down digraph
- What generates or results from node
Distance 2
35Graph terminology
In a petrinet, the nodes alternates between
passive (e.g. metabolite) and active (e.g.
enzyme) states. N.B. it is usual to represent
active nodes by squares. Coloured
graphs/petrinets have nodes edges with multiple
attributes or properties (e.g. Km, Vmax,
compartment, charge, hydrophobicity)
36Graph terminology
Vertex degree number of edges from a given
node Min. path length the least number of
edges to cross between 2 nodes, also sometimes
called the degrees of freedom Network centre
the node which has the lowest average minimum
path length (usually w.r.t. undirected
graphs) Distance matrix matrix containing
the minimum path length between every pair
of nodes Aver. path length the average of all
the minimum path lengths in a (sub)graph
37Graph terminology
Networks random generated by random
addition of new nodes edges small-world has a
small number of nodes with high vertex degree,
resulting in a low average path length
scale-free distribution of vertex degrees
follows a power-law distribution (i.e. the
distribution follows a straight line on log-log
plots). These are stable to perturbation, but can
exhibit complex behaviour. Jeong, H., et al.
(2000) The large scale organisation of metabolic
networks. Nature, 407 651-654. These highly
nodes of high vertex degree are known as hubs. In
metabolic networks, they correspond to water,
ATP, NADH etc. Computational navigation of
metabolic networks that include hubs show that
glycolysis is only one of 500 000 pathways of
the same length from glucose to
pyruvate!! Kuffner, R. et al. (2000) Pathway
analysis in metabolic databases via differential
metabolic display. Bioinformatics 16, 825-836.
38Graph terminology
More pruning
- Different approaches
- node with highest vertex degree
- Adjacency matrix (see below) gives
- vertex degree.
39Graph terminology
More pruning
- Different approaches
- node with highest vertex degree
- node causing greatest disconnection
- For every node, count up the number
- of minimum paths (between every pair
- of nodes) that go through it. Then rank
- these. High ranking nodes are likely
- to be hubs.
40Graph terminology
More pruning
- Different approaches
- node with highest vertex degree
- node causing greatest disconnection
- edge causing greatest disconnection
- For every edge, count up the number
- of minimum paths (between every pair
- of nodes) that go through it. Then rank
- these.
41Network navigation
Node names are placed in a stack (array). When
reaching the next node, need to check that it
has not been visited already by seeing if it
is the stack. Depth of search may be limited
min.p.len. Use adjacency list, exhaust node
lists at each depth before progressing to
next depth. Better for studying chronology of
flow through network Both approaches
require heavy use of recursion.
Depth first
Breadth first
42Computational representation of graphs
As objects.. Being OO, java has some useful
intrinsic properties. However, OReilly have
developed a special-purpose perl
module (http//search.cpan.org/author/JHI/Graph-0.
20101/)
43Computational representation of graphs
44Computational representation of graphs
- As pictures..
- Algorithms to represent graphs (in 2 and 3
dimensions) is a major area of development in
computer science, entitled Graph layout, where
the aim is to make the resulting pictures as
clear as possible. This can be done by - reducing to a minimum the numbers of edges that
cross each other, - separate nodes so that edges do not cross over
them (e.g. spring embedding), - ensuring that edges cross in areas that do not
matter to the observer, - use non-linear scales (e.g. fish-eye lens)
- Public domain tools include
- Graphviz (ATT) (http//www.graphviz.org/)
- Pajek (Univ. Ljubljana, Slovenija)
- ProViz (Univ. Toulouse)
- BioLayout (EBI)
- Cytoscape
- AGLO
- The main commercial package is from Tom Sawyer.
45Layout views Graphviz
46(No Transcript)
47Layout views spring-embedded
Figure 6.3.The main component of interactions in
yeast, represented as a straight line drawing in
2D. There are 525 vertices and 724 edges.
48Computational representation of graphs
Process model
Adjacency matrix
E1
E2
M1 M2 M3 M4 M5 M1 0
1 0 1 1 M2 1
0 1 1 0 M3 0
0 0 0 0 M4 1
0 0 0 0 M5 1
0 0 0 0
M1
M5
M2
M4
E3
M3
E3 reaction irreversible
Distance matrices calculated from adjacency matrix
49Computational representation of graphs
Process model
Distance matrix
E1
E2
M1 M2 M3 M4 M5 M1 1
1 2 1 1 M2 1
1 1 1 2 M3 0
0 0 0 0 M4 1
2 3 1 2 M5 1
2 3 3 1
M1
M5
M2
M4
E3
M3
E3 reaction irreversible
Distance matrices calculated recursively from
adjacency matrix. Figures are the minimum path
lengths. Largest number network diameter
50Elementary modes analysis
Elementary modes are individual pathways
through the network. This involves network
navigation (see later). Such analysis would show
that there is only one mode connecting M5 to
M3. Elementary modes are not restricted to
minimum length paths. Analysis of
holistic networks can also reveal minor or
aberrant pathways found in disease states.
Process model
E1
E2
M1
M5
M2
M4
E3
M3
51Computational representation of graphs
E1
E2
E3
M1
M2
M4 M5
M3 M4
Stoichiometric matrix one row/reaction, minus
plus signs for LHS and RHS of equations
M1
M2
M3
M4
M5
R1 1
-1 -1 R2 -1
1 1 R3
-1 1 R4 1
-1 R5 -1 1
1 . Rnet -1
1 1 Sum columns to discover net
reaction of system
52Computational representation of graphs
E1
E2
E3
M1
M2
M4 M5
M3 M4
M1
M2
M3
M4
M5
R1 1
-1 -1 R2 -1
1 1 R3
-1 1 R4 1
-1 R5 -1 1
1
Columns containing only -ve numbers system
substrates Columns containing only ve numbers
system products System substrates can
essential metabolites System products can
fermentation products
53Computational representation of graphs
E1
E2
E3
M1
M2
M4 M5
M3 M4
M1
M2
M3
M4
M5
R1 1
-1 -1 R3 -1
1 R4 1 -1 R5
-1 1 1
Columns containing only -ve numbers system
substrates Columns containing only ve numbers
system products System substrates can
essential metabolites (M5) System products
can fermentation products (M3)
54Dynamics -how things change with time and
with respect to each other and tothe
environment
55Dynamics (a.k.a. Systems Biology)
The stoichoimetric matrices can be converted into
groups of (a.k.a. systems of) differential
equations M1 M4 M5 R1
1 -1 -1 dM1
k1 M4 M5 k1 reaction rate dt The
rate of appearance of M1 is equal to the reaction
rate constant (a feature of the enzyme)
multiplied by the concentrations of M4 and M5.
For some types of enzymes, the equations are
more complex (e.g. allosterically regulated or
multi-subunit co-operative enzymes).
M1
M4 M5
56Dynamics (a.k.a. Systems Biology)
- The networks and their dynamics can be
represented in many other ways involving - algebra (partial differential equations,
non-linear equations, metabolic control analysis,
bifurcation theory) - boolean analysis and logic programming
- cellular automata
- concurrency theory
- time-delay petrinets
- stochastic modelling
- bayesian networks and other forms of statistics
- and any combination of the above!!
- These can all get rather complicated.
57Thank you for listening!You can give your
brain a rest now