Biomolecular Networks Charlie Hodgman Univ. Leeds GSK December 2003 - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Biomolecular Networks Charlie Hodgman Univ. Leeds GSK December 2003

Description:

they can be used to build interaction networks that represent physiological ... or properties (e.g. Km, Vmax, compartment, charge, hydrophobicity) Graph terminology ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 58
Provided by: ch10
Category:

less

Transcript and Presenter's Notes

Title: Biomolecular Networks Charlie Hodgman Univ. Leeds GSK December 2003


1
Biomolecular NetworksCharlie Hodgman (Univ.
Leeds / GSK)December 2003
  • Composition
  • Information resources on biomolecular
    interactions
  • Problems with these resources
  • Introduction to Graph Theory
  • Applications of Graph Theory to biological
    research
  • A sniff of systems biology

2
There are three classes interactions - they can
be used to build interaction networks that
represent physiological processes within and
between cells
  • Enzyme - ligand (substrate, product, inhibitor,
    activator)
  • metabolic pathways and networks
  • Gene products (RNA, proteins) with each other
  • signal networks, machinery for cellular processes
  • protein(/product) interaction networks
  • natural RNA interference mechanisms
  • Gene regulatory elements - gene products
  • genetic networks

3
The 3 networks are actually interconnected
Gene product interaction network
Genetic network
STIMULUS
etc.
mRNA
Metabolic network
4
Putting the networks together
  • Integrated - combinations of metabolic, genetic
    and product-interaction networks
  • a.k.a. gene networks of Inst. Cytology
    Genetics, Novosibirsk
  • Holistic - integrated networks that capture
    (all) interactions within a cell or
    multi-cellular system
  • a.k.a. virtual cells, e-cells

5
Public proprietary enzyme databases
ENZYME http//ca.expasy.org/enzyme/ E.C.
reactions record 1 E.C. code KEGG
http//www.genome.ad.jp/kegg/ enzymes -
ligands record 1 E.C. code BRENDA
http//www.brenda.uni-koeln.de/ all
enzymological data ScienceFactory site
down record 1 E.C. code EMP
http//www.empproject.com/ all enzymological
data http//emp.mcs.anl.gov/
record 1 enzyme from 1 paper UMBBD
http//umbbd.ahc.umn.edu/ enzymes -
metabolites record 1 recorded bio-degradative
or xenobiotic reaction Beilstein
6
Enzyme record
ID 4.1.1.77 DE 4-oxalocrotonate
decarboxylase. CA 4-oxalocrotonate
2-oxopent-4-enoate CO(2). CC -!- Involved in
the meta-cleavage pathway for the CC
degradation of phenols, cresols and catechols. DR
P49156, DMPH_PSESP P49155, XYLI_PSEPU //
7
Problems with these databases
  • EC code assignment idiosyncratic (e.g. 19 codes
    for ATP hydrolysis), slow (can take years
    depending on how often EC defers decision), poor
    representation of allostery
  • EC code structure (i.e. number-based hierarchy)
    is poor model of classification for enzyme
    activity, but is ONLY system at present
  • Non-standard terminology of metabolite and
    protein names, both within and between databases
    ........

8
The metabolite-naming problem
Different compounds similar names
Same compound different names
(S)-malate Also known as- L-2-Hydroxybutanedi
oic acid L-Apple acid L-malate L-Malic acid
(S)-Malate S-malate
C(C1CCCCC1)C(O)O-
phenylacetate
Generic-specific relationships
Alanine is-a alpha-amino acid is-a
amino-acid is-a amine is-a carboxylic acid
C1CCCCC1OC(O)C
Phenyl acetate
9
Protein interactions/associations have physical
and functional attributes
Stoichiometric Non-stoichiometric
Enzyme/substrate relationship
Regulatory subunits
Directed Undirected
Multi-subunit complexes
Filaments
Directed one molecule exerting a biological
effect upon the other. Stoichiometry the numbers
of molecules involved are physically
defined. These attributes need to be assigned
for studying cause-effect relationships
10
Info resources - Protein interactions
BIND http//www.bind.ca/ BRITE http//www.genom
e.ad.jp/brite/ CSNDB http//geo.nihs.go.jp/csndb
/ DIP http//dip.doe-mbi.ucla.edu/ HPRD http/
/www.hprd.org/ MINT http//cbm.bio.uniroma2.it/m
int/ other interactions TRANSPATH http//193
.175.244.148/ duplicates CSNDB data
  • Protein Standards Institute now working on data
    exchange format
  • EBI is european coordinator (http//www.ebi.ac.uk/
    intact/)

11
Information resources - gene regulation
TRRD http//www.bionet.nsc.ru/trrd/ TRANSFAC htt
p//transfac.gbf.de/TRANSFAC/ accessible only
through proprietary /www.gene-regulation.com/ PRO
NIT http//www.rtc.riken.go.jp/jouhou/pronit/proni
t.html (see also http//gibk26.bse.hyutech.ac.jp/
jouhou/jouhoubank.html) RegulonDB http//www.cifn
.unam.mx/Computational_Genomics/regulondb/ data
mostly pertain to E. coli. EPD http//www.epd.
isb-sib.ch/ focus on human data largely only
promoter locations
12
(No Transcript)
13
Networks databases
  • Metabolism
  • Signal transduction, integrated networks

14
Metabolic pathway databases
MPW http//www.empproject.com/,
http//emp.mcs.anl.gov/ KEGG http//www.genome.ad
.jp/kegg/ BioCyc http//biocyc.org/ brings
together Ecocyc, Humcyc, Metacyc etc. based on
Lamberts chart and E.C. reaction
equations Boehringer http//www.expasy.org/cgi-bi
n/search-biochem-index based on Boehringer chart
15
MPW chart
16
A KEGG CHART
Title
17
Problems with these databases
  • Pathways are arbitrary entities
  • where to start/stop,
  • gaps in picture (KEGG) or pathways shrinking in
    length (MPW)
  • metabolite centric, suboptimal for functional
    genomics proteomics because enzymes (and
    especially multifunctional gene products) occur
    in multiple places
  • selective ignorance (only capture isolated
    pathways)
  • possibly expensive or restricted access (e.g.
    curagen, transpath)

18
Other pathway databases
CSNDB http//geo.nihs.go.jp/csndb.html SPAD htt
p//www.grt.kyushu-u.ac.jp/spad/ STKE http//stke
.sciencemag.org/ GeNet GenomeKnowledgeBase
http//www.genomeknowledge.org/
19
Other pathway databases
CSNDB http//geo.nihs.go.jp/csndb.html SPAD htt
p//www.grt.kyushu-u.ac.jp/spad/ STKE http//stke
.sciencemag.org/ GeNet GenomeKnowledgeBase
http//www.genomeknowledge.org/ Proprietary Tr
anspath http//193.175.244.148/ (content mostly
CSNDB) Biocarta http//www.biocarta.com/ GeneGo
http//www.genego.com/ (Curagen)
20
Gene networks supporting homeostasis of an
organism
Gene network controlling intracellular
cholesterol concentration in mammalian cells
(Ignatieva E.V.)
Principle scheme of a regulatory contour with
the negative feedback
21
Graphsin the mathematical sense(Gross
Yellen)
22
Graph terminology
Connected
Node/vertex
Edge
Unconnected
23
Graph terminology
Graph
(unconnected) subgraphs,clusters
24
Graph terminology
(connected) subgraphs
strongly connected nodes/clusters
25
Graph terminology
Cliques
26
Graph terminology
Undirected graph
Directed graph (digraph)
Directed acyclic graph (DAG) Partially directed
graph
27
Graph terminology
Bridge
Span
Articulation point
28
Graph terminology
Tree
Leaf
Root
Node
29
Graph terminology
Forest
30
Graph terminology
Pruning
  • Different approaches
  • back one node

31
Simplifying yeast two-hybrid output
32
Graph terminology
Pruning
  • Different approaches
  • back one node
  • back to node with gt1 edge
  • (trees pruned back to root)
  • Reveals cycles

33
Graph terminology
Pruning
  • Different approaches
  • back one node
  • back to node with gt1 connection
  • (trees pruned back to root)
  • by given distance from a given node
  • Shows structure of nodes locality

Distance 2
34
Graph terminology
Pruning
  • Different approaches
  • back one node
  • back to node with gt1 connection
  • (trees pruned back to root)
  • by given distance from a given node
  • by given distance up/down digraph
  • What generates or results from node

Distance 2
35
Graph terminology
In a petrinet, the nodes alternates between
passive (e.g. metabolite) and active (e.g.
enzyme) states. N.B. it is usual to represent
active nodes by squares. Coloured
graphs/petrinets have nodes edges with multiple
attributes or properties (e.g. Km, Vmax,
compartment, charge, hydrophobicity)
36
Graph terminology
Vertex degree number of edges from a given
node Min. path length the least number of
edges to cross between 2 nodes, also sometimes
called the degrees of freedom Network centre
the node which has the lowest average minimum
path length (usually w.r.t. undirected
graphs) Distance matrix matrix containing
the minimum path length between every pair
of nodes Aver. path length the average of all
the minimum path lengths in a (sub)graph
37
Graph terminology
Networks random generated by random
addition of new nodes edges small-world has a
small number of nodes with high vertex degree,
resulting in a low average path length
scale-free distribution of vertex degrees
follows a power-law distribution (i.e. the
distribution follows a straight line on log-log
plots). These are stable to perturbation, but can
exhibit complex behaviour. Jeong, H., et al.
(2000) The large scale organisation of metabolic
networks. Nature, 407 651-654. These highly
nodes of high vertex degree are known as hubs. In
metabolic networks, they correspond to water,
ATP, NADH etc. Computational navigation of
metabolic networks that include hubs show that
glycolysis is only one of 500 000 pathways of
the same length from glucose to
pyruvate!! Kuffner, R. et al. (2000) Pathway
analysis in metabolic databases via differential
metabolic display. Bioinformatics 16, 825-836.
38
Graph terminology
More pruning
  • Different approaches
  • node with highest vertex degree
  • Adjacency matrix (see below) gives
  • vertex degree.

39
Graph terminology
More pruning
  • Different approaches
  • node with highest vertex degree
  • node causing greatest disconnection
  • For every node, count up the number
  • of minimum paths (between every pair
  • of nodes) that go through it. Then rank
  • these. High ranking nodes are likely
  • to be hubs.

40
Graph terminology
More pruning
  • Different approaches
  • node with highest vertex degree
  • node causing greatest disconnection
  • edge causing greatest disconnection
  • For every edge, count up the number
  • of minimum paths (between every pair
  • of nodes) that go through it. Then rank
  • these.

41
Network navigation
Node names are placed in a stack (array). When
reaching the next node, need to check that it
has not been visited already by seeing if it
is the stack. Depth of search may be limited
min.p.len. Use adjacency list, exhaust node
lists at each depth before progressing to
next depth. Better for studying chronology of
flow through network Both approaches
require heavy use of recursion.
Depth first
Breadth first
42
Computational representation of graphs
As objects.. Being OO, java has some useful
intrinsic properties. However, OReilly have
developed a special-purpose perl
module (http//search.cpan.org/author/JHI/Graph-0.
20101/)
43
Computational representation of graphs
44
Computational representation of graphs
  • As pictures..
  • Algorithms to represent graphs (in 2 and 3
    dimensions) is a major area of development in
    computer science, entitled Graph layout, where
    the aim is to make the resulting pictures as
    clear as possible. This can be done by
  • reducing to a minimum the numbers of edges that
    cross each other,
  • separate nodes so that edges do not cross over
    them (e.g. spring embedding),
  • ensuring that edges cross in areas that do not
    matter to the observer,
  • use non-linear scales (e.g. fish-eye lens)
  • Public domain tools include
  • Graphviz (ATT) (http//www.graphviz.org/)
  • Pajek (Univ. Ljubljana, Slovenija)
  • ProViz (Univ. Toulouse)
  • BioLayout (EBI)
  • Cytoscape
  • AGLO
  • The main commercial package is from Tom Sawyer.

45
Layout views Graphviz
46
(No Transcript)
47
Layout views spring-embedded
Figure 6.3.The main component of interactions in
yeast, represented as a straight line drawing in
2D. There are 525 vertices and 724 edges.
48
Computational representation of graphs
Process model
Adjacency matrix
E1
E2
M1 M2 M3 M4 M5 M1 0
1 0 1 1 M2 1
0 1 1 0 M3 0
0 0 0 0 M4 1
0 0 0 0 M5 1
0 0 0 0
M1
M5
M2
M4
E3
M3
E3 reaction irreversible
Distance matrices calculated from adjacency matrix
49
Computational representation of graphs
Process model
Distance matrix
E1
E2
M1 M2 M3 M4 M5 M1 1
1 2 1 1 M2 1
1 1 1 2 M3 0
0 0 0 0 M4 1
2 3 1 2 M5 1
2 3 3 1
M1
M5
M2
M4
E3
M3
E3 reaction irreversible
Distance matrices calculated recursively from
adjacency matrix. Figures are the minimum path
lengths. Largest number network diameter
50
Elementary modes analysis
Elementary modes are individual pathways
through the network. This involves network
navigation (see later). Such analysis would show
that there is only one mode connecting M5 to
M3. Elementary modes are not restricted to
minimum length paths. Analysis of
holistic networks can also reveal minor or
aberrant pathways found in disease states.
Process model
E1
E2
M1
M5
M2
M4
E3
M3
51
Computational representation of graphs
E1
E2
E3
M1
M2
M4 M5
M3 M4
Stoichiometric matrix one row/reaction, minus
plus signs for LHS and RHS of equations
M1
M2
M3
M4
M5
R1 1
-1 -1 R2 -1
1 1 R3
-1 1 R4 1
-1 R5 -1 1
1 . Rnet -1
1 1 Sum columns to discover net
reaction of system
52
Computational representation of graphs
E1
E2
E3
M1
M2
M4 M5
M3 M4
M1
M2
M3
M4
M5
R1 1
-1 -1 R2 -1
1 1 R3
-1 1 R4 1
-1 R5 -1 1
1
Columns containing only -ve numbers system
substrates Columns containing only ve numbers
system products System substrates can
essential metabolites System products can
fermentation products
53
Computational representation of graphs
E1
E2
E3
M1
M2
M4 M5
M3 M4
M1
M2
M3
M4
M5
R1 1
-1 -1 R3 -1
1 R4 1 -1 R5
-1 1 1

Columns containing only -ve numbers system
substrates Columns containing only ve numbers
system products System substrates can
essential metabolites (M5) System products
can fermentation products (M3)
54
Dynamics -how things change with time and
with respect to each other and tothe
environment
55
Dynamics (a.k.a. Systems Biology)
The stoichoimetric matrices can be converted into
groups of (a.k.a. systems of) differential
equations M1 M4 M5 R1
1 -1 -1 dM1
k1 M4 M5 k1 reaction rate dt The
rate of appearance of M1 is equal to the reaction
rate constant (a feature of the enzyme)
multiplied by the concentrations of M4 and M5.
For some types of enzymes, the equations are
more complex (e.g. allosterically regulated or
multi-subunit co-operative enzymes).
M1
M4 M5
56
Dynamics (a.k.a. Systems Biology)
  • The networks and their dynamics can be
    represented in many other ways involving
  • algebra (partial differential equations,
    non-linear equations, metabolic control analysis,
    bifurcation theory)
  • boolean analysis and logic programming
  • cellular automata
  • concurrency theory
  • time-delay petrinets
  • stochastic modelling
  • bayesian networks and other forms of statistics
  • and any combination of the above!!
  • These can all get rather complicated.

57
Thank you for listening!You can give your
brain a rest now
Write a Comment
User Comments (0)
About PowerShow.com