Title: Biomolecular Networks BBSRC Summer School Dr. Charlie Hodgman Leeds Dec. 2002
1Biomolecular NetworksBBSRC Summer SchoolDr.
Charlie HodgmanLeeds - Dec. 2002
- Biomolecular interactions
- Information resources
- Graphs
- Representation in the computer
- Dynamics
2Biomolecular interactions can be used to build
interaction networks
- Enzyme - ligand (substrate, product, inhibitor,
activator) - metabolic pathways and networks
- Gene products (RNA, proteins) with each other
- signal networks, machinery for cellular processes
- protein(/product) interaction networks
- Gene - gene products
- genetic networks
3The 3 networks are actually interconnected
Gene product interaction network
Genetic network
STIMULUS
etc.
mRNA
Metabolic network
4Putting the networks together
- Integrated - combinations of metabolic, genetic
and product-interaction networks - a.k.a. gene networks in Inst. Cytology
Genetics, Novosibirsk - Holistic - integrated networks that capture
(all) interactions within a cell or
multi-cellular system - a.k.a. virtual cells, e-cells
5Information resources - Enzymes
EMP http//www.empproject.com/ all
enzymological data record 1 enzyme from 1
paper BRENDA http//www.brenda.uni-koeln.de/
all enzymological data record 1 E.C.
code KEGG http//www.genome.ad.jp/kegg/kegg2.
html enzymes - ligands record 1 E.C.
code UMMBD http//umbbd.ahc.umn.edu/ enzymes
- metabolites record 1 recorded
bio-degradative or xenobiotic reaction ENZYME
http//ca.expasy.org/enzyme/ E.C.
reactions record 1 E.C. code
6Problems with these databases
- Non-standard terminology of metabolite and
protein names, both within and between databases - cant retrieve synonyms
- EC code assignment idiosyncratic (e.g. 19 codes
for ATP hydrolysis), slow (can take years
depending on how often EC defers decision), poor
representation of allostery - EC code structure (i.e. number-based hierarchy)
is poor model of classification for enzyme
activity, but is ONLY system at present
7Info resources - Protein interactions
For example, BIND http//www.bind.ca/ BRITE ht
tp//www.genome.ad.jp/brite/ CSNDB http//geo.ni
hs.go.jp/csndb/ DIP http//dip.doe-mbi.ucla.edu/
MINT http//cbm.bio.uniroma2.it/mint/
other interactions TRANSPATH http//193.175.244.1
48/ duplicates CSNDB data
- Protein Standards Institute now working on data
exchange format - EBI is european coordinator (http//www.ebi.ac.uk/
intact/)
8Interactions/associations have physical
andfunctional attributes
Stoichiometric Non-stoichiometric
Enzyme/substrate relationship
Regulatory subunits
Directed Undirected
Multi-subunit complexes
Filaments
Directed one molecule exerting a biological
effect upon the other. Stoichiometry the numbers
of molecules involved are physically
defined. These attributes need to be assigned
for studying cause-effect relationships
9Problems with these databases
- Non-standard terminology of protein names, though
not as severe for some because associations to
swissprot available - Non-standard (even conflicting) data-models, some
accessed through proprietary interfaces (e.g.
curagen,YPD) - Often no directions to biological relationship
because yeast two-hybrid data used - Cellular location of a complex may determines its
activity, e.g. AKAPs. - Bauman, A.L. Scott, J.D. (2002) Nature Cell
Biol. 4, E203-E206
10Information resources - gene regulation
TRANSFAC http//transfac.gbf.de/TRANSFAC/
accessible only through proprietary web
site TRRD http//www.bionet.nsc.ru/trrd/ EPD
http//www.epd.isb-sib.ch/ focus on human
data RegulonDB http//www.cifn.unam.mx/Computatio
nal_Genomics/regulondb/ data mostly pertain to
E. coli.
11Problems with these resources
- Non-standard terminology (especially of protein
names and binding sites), though TRRD has
thesaurus close to publication - low coverage of what is actually happening in
cells (or even of what is published) - TRANSFAC matrices inconsistent in definition,
number and often too vague for diagnostic
purposes, but currently only option - EPD and RegulonDB have merits but are virtually
organism-specific
12Metabolic pathway databases
MPW http//www.empproject.com/,
KEGG http//www.genome.ad.jp/kegg/kegg2.html
Biocarta http//www.biocarta.com/ proprietary
web interface BioCyc http//www.biocyc.org/ brin
gs together Ecocyc, Humcyc, Metacyc etc. based
on Lamberts chart Boehringer http//www.expasy.o
rg/cgi-bin/search-biochem-index based on
Boehringer chart
13MPW chart
14A KEGG CHART
Title
15Other pathway databases
CSNDB http//geo.nihs.go.jp/csndb/ SPAD http
//www.grt.kyushu-u.ac.jp/spad/ Biocarta
http//www.biocarta.com/ Transpath
http//193.175.244.148/ Gene
Networks http//wwwmgs.bionet.nsc.ru/mgs/gnw
16GeneNetworks
17Title
18Problems with these databases
- Pathways are abitrary entities
- where to start/stop,
- gaps in picture (KEGG) or pathways shrinking in
length (MPW) - metabolite centric, suboptimal for functional
genomics proteomics because enzymes (and
especially multifunctional gene products) occur
in multiple places - selective ignorance (only capture isolated
pathways) - possibly expensive or restricted access (e.g.
curagen, transpath)
19Graphsin the mathematical sense
20Graph terminology
Connected
Node/vertex
Edge
Unconnected
21Graph terminology
Graph
(unconnected) subgraphs,clusters
22Graph terminology
(connected) subgraphs
strongly connected nodes/clusters
23Graph terminology
Undirected graph
Directed graph (digraph)
Directed acyclic graph (DAG)
24Graph terminology
Bridge
Span
Articulation point
25Graph terminology
26Graph terminology
Tree
Leaf
Root
Node
27Graph terminology
Forest
28Graph terminology
Pruning
- Different approaches
- back one node
29Graph terminology
Pruning
- Different approaches
- back one node
- back to node with gt1 edge
- (trees pruned back to root)
30Graph terminology
Pruning
- Different approaches
- back one node
- back to node with gt1 connection
- (trees pruned back to root)
- by given distance from a given node
Distance 2
31Graph terminology
Pruning
- Different approaches
- back one node
- back to node with gt1 connection
- (trees pruned back to root)
- by given distance from a given node
- by given distance up/down digraph
Distance 2
32Graph terminology
In a petrinet, the nodes alternates between
passive (e.g. metabolite) and active (e.g.
enzyme) states. N.B. it is usual to represent
active nodes by squares. Coloured
graphs/petrinets have nodes edges with multiple
attributes or properties (e.g. Km, Vmax,
compartment, charge, hydrophobicity)
33Graph terminology
Vertex degree number of edges from a given
node Min. path length the least number of
edges to cross between 2 nodes, also sometimes
called the degrees of freedom Network centre
the node which has the lowest average minimum
path length Distance matrix matrix
containing the minimum path length between
every pair of nodes Aver. path length
the average of all the minimum path lengths in a
(sub)graph
34Graph terminology
Networks random generated by random
addition of new nodes edges small-world has a
small number of nodes with high vertex degree,
resulting in a low average path length
scale-free distribution of vertex degrees
follows a power-law distribution (i.e. the
distribution follows a straight line on log-log
plots). These are stable to perturbation, but can
exhibit complex behaviour. Jeong, H., et al.
(2000) The large scale organisation of metabolic
networks. Nature, 407 651-654. These highly
nodes of high vertex degree are known as hubs. In
metabolic networks, they correspond to water,
ATP, NADH etc. Computational navigation of
metabolic networks that include hubs show that
glycolysis is only one of 500 000 pathways of
the same length from glucose to
pyruvate!! Kuffner, R. et al. (2000) Pathway
analysis in metabolic databases via differential
metabolic display. Bioinformatics 16, 825-836.
35Computational representation of graphs
Process model
Adjacency matrix
E1
E2
M1 M2 M3 M4 M5 M1 0
1 0 1 1 M2 1
0 1 1 0 M3 0
0 0 0 0 M4 1
0 0 0 0 M5 1
0 0 0 0
M1
M5
M2
M4
E3
M3
E3 reaction irreversible
Distance matrices calculated from adjacency matrix
36Computational representation of graphs
E1
E2
E3
M1
M2
M4 M5
M3 M4
Stoichiometric matrix one row per reaction,
minus plus signs respec. for LHS and RHS of
equations
M1
M2
M3
M4
M5
R1 1
-1 -1 R2 -1
1 1 R3
-1 1 R4 1
-1 R5 -1 1
1
37Dynamics (a.k.a. Systems Biology)
- The stoichoimetric matrices can be
- converted into groups of connected (i.e. systems
of) - differential equations
- M1 M4 M5
- R1 1 -1
-1 - dM1 r1 M4 M5 r1 reaction rate
- dt
- converted and subjected to stochastic methods
- converted and subject to (probabilistic) logic
programming - subjected to a range of other statistical
techniques - For holistic virtual cells, this means that we
can model the - behaviour of living systems, linking genotype to
phenotype - in the computer.
38Dynamics (a.k.a. Systems Biology)
- Systems Biology Mark-up Language (SBML) is the
- emerging data standard for such models (Hucka et
al. - Bioinformatics, in press).
- A broad range of generic and specific modeling
tools are - available, including
- DBsolve
- Gepasi
- Metatools
- Stochsim
- E-cell
- Ecocyc
- GeneNet
- See A.P. Arkin (2001) Synthetic cell biology.
Curr. Opin. In Biotech. 12, 638-644.
39Thank you for listening!You can give your
brain a rest now