Introduction to Bioinformatics

About This Presentation

Title:

Introduction to Bioinformatics

Description:

C E N T R F O R I N T E G R A T I V E E B I O I N F O R M A T I C S V U Introduction to Bioinformatics Lecture 20 Global network behaviour Protein interaction ... – PowerPoint PPT presentation

Number of Views:118

Avg rating:3.0/5.0

Slides: 75

Provided by: Rober366

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Bioinformatics

1
Introduction to Bioinformatics
Lecture 20 Global network behaviour
2
Networks
"The thousands of components of a living cell are
dynamically interconnected, so that the cells
functional properties are ultimately encoded into
a complex intracellular web network of
molecular interactions." "This is perhaps most
evident with cellular metabolism, a fully
connected biochemical network in which hundreds
of metabolic substrates are densely integrated
through biochemical reactions." (Ravasz E, et
al.)
3
(No Transcript)
4
TF
Ribosomal proteins
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(4-
1/(4(4-1)/2) 1/6
How linked up are the direct neighbours of a node
considered?
14
(No Transcript)
15
Small-world networks
A seminal paper, Collective dynamics of
"small-world" networks, by Duncan J. Watts and
Steven H. Strogatz, which appeared in Nature
volume 393, pp. 440-442 (4 June 1998), has
attracted considerable attention. One can
consider two extremes of networks The first are
regular networks, where "nearby" nodes have large
numbers of interconnections, but "distant" nodes
have few. The second are random networks, where
the nodes are connected at random. Regular
networks are highly clustered, i.e., there is a
high density of connections between nearby nodes,
but have long path lengths, i.e., to go from one
distant node to another one must pass through
many intermediate nodes. Random networks are
highly un-clustered but have short path lengths.
This is because the randomness makes it less
likely that nearby nodes will have lots of
connections, but introduces more links that
connect one part of the network to another.
16
Regular and random networks
random
regular
regular complete
17
Making a small world
A small-world network can be generated from a
regular one by randomly disconnecting a few
points and randomly reconnecting them elsewhere.
Another way to think of a small world network
is that some so-called 'shortcut' links are added
to a regular network as shown here
The added links are shortcuts because they allow
travel from node (a) to node (b), to occur in
only 3 steps, instead of 5 without the shortcuts.
18
Regular, small-world and random
networksRewiring experiments (Watts and
Strogatz, 1998)
p is the probability that a randomly chosen
connection will be randomly redirected elsewhere
(i.e., p0 means nothing is changed, leaving the
network regular p1 means every connection is
changed and randomly reconnected, yielding
complete randomness). For example, for p .01,
(so that only 1 of the edges in the graph have
been randomly changed), the "clustering
coefficient" is over 95 of what it would be for
a regular graph, but the "characteristic path
length" is less than 20 of what it would be for
a regular graph.
19
Small-world networks

Network characterisation
L characteristic path length
C clustering coefficient
A small-world network is much more highly
clustered than an equally sparse random graph (C
gtgt Crandom), and its characteristic path length L
is close to the theoretical minimum shown by a
random graph (L Lrandom).
The reason a graph can have small L despite being
highly clustered is that a few nodes connecting
distant clusters are sufficient to lower L.
Because C changes little as small-worldliness
develops, it follows that small-worldliness is a
global graph property that cannot be found by
studying local graph properties.

20
Small-world networks

A network or order (0ltplt1 as in earlier slides)
can be characterized by the average shortest
length L(p) between any two points, and a
clustering coefficient C(p) that measures the
cliquishness of a typical neighbourhood (a local
property).

These can be calculated from mathematical
simulations and yield the following behavior
(Watts and Strogatz)

21
(No Transcript)
22
Small-world networks
Part of the reason for the interest in the
results of Watts and Strogatz is that small-world
networks seem to be good models for a wide
variety of physical situations. They showed that
the power grid for the western U.S. (nodes are
power stations, and there is an edge joining two
nodes if the power stations are joined by
high-voltage transmission lines), the neural
network of a nematode worm (nodes are neurons and
there is an edge joining two nodes if the neurons
are joined by a synapse or gap junction), and the
Internet Movie Database (nodes are actors and
there is an edge joining two nodes if the actors
have appeared in the same movie) all have the
characteristics (high clustering coefficient but
low characteristic path length) of small-world
networks. Intuitively, one can see why
small-world networks might provide a good model
for a number of situations. For example, people
tend to form tight clusters of friends and
colleagues (a regular network), but then one
person might move from New York to Los Angeles,
say, introducing a random edge. The results of
Watts and Strogatz then provide an explanation
for the empirically observed phenomenon that
there often seem to be surprisingly short
connections between unrelated people (e.g., you
meet a complete stranger on an airplane and soon
discover that your sister's best friend went to
college with his boss's wife).
23
Small world example metabolism.

Wagner and Fell (2001) modeled the known
reactions of 287 substrates that represent the
central routes of energy metabolism and
small-molecule building block synthesis in E.
coli. This included metabolic sub-pathways such
as
glycolysis
pentose phosphate and Entner-Doudoro pathways
glycogen metabolism
acetate production
glyoxalate and anaplerotic reactions
tricarboxylic acid cycle
oxidative phosphorylation
amino acid and polyamine biosynthesis
nucleotide and nucleoside biosynthesis
folate synthesis and 1-carbon metabolism
glycerol 3-phosphate and membrane lipids
riboflavin
coenzyme A
NAD(P)
porphyrins, haem and sirohaem
lipopolysaccharides and murein
pyrophosphate metabolism

These sub-pathways form a network because some
compounds are part of more than one pathway and
because most of them include common components
such as ATP and NADP.
The graphs on the left show that considering
either reactants or substrates, the clustering
coefficient CgtgtCrandom, and the length
coefficient L is near that of Lrandom,
characteristics of a small world system.

random
Wagner A, Fell D (2001) The small world inside
large metabolic networks. Proc. R. Soc. London
Ser. B 268, 1803-1810.
24
Scale-free Networks
Using a Web crawler, physicist Albert-Laszlo
Barabasi and his colleagues at the University of
Notre Dame in Indiana in 1998 mapped the
connectedness of the Web. They were surprised to
find that the structure of the Web didn't conform
to the then-accepted model of random
connectivity. Instead, their experiment yielded a
connectivity map that they christened
"scale-free."

Often small-world networks are also scale-free.
In a scale-free network the characteristic
clustering is maintained even as the networks
themselves grow arbitrarily large.

25
Scale-free Networks

In any real network some nodes are more highly
connected than others.
P(k) is the proportion of nodes that have
k-links.
For large, random graphs only a few nodes have a
very small k and only very few have a very large
k, leading to a bell-shaped Poisson distribution

Scale-free networks fall off more slowly and are
more highly skewed than random ones due to the
combination of small-world local highly connected
neighborhoods and more 'shortcuts' than would be
expected by chance.
Scale-free networks are governed by a power law
of the form P(k) k-?
26
Scale-free Networks
Because of the P(k) k-? power law relationship,
a log-log plot of P(k) versus k gives a straight
line of slope -?
Some networks, especially small-world networks of
modest size do not follow a power law, but are
exponential. This point can be significant when
trying to understand the rules that underlie the
network.
27
(No Transcript)
28
(No Transcript)
29
Comparing Random and Scale-Free DistributionIn
the random network (right), the five nodes with
the most links (in red) are connected to only 27
of all nodes (green). In the scale-free network
(left), the five most connected nodes (red),
often called hubs, are connected to 60 of all
nodes (green).
30
Scale-free Networks

Barabasi and his team first studied the internet
and discovered scale-free network behaviour
Since then, this has been observed for example
for power grids, stock market, cancerous cells,
and sexually transmitted diseases
From random network models, the idea was that
large networks would hardly have any
well-connected nodes. Although not all nodes in a
random network would be connected to the same
degree, most would have a number of connections
hovering around a small, average value. Also, as
a randomly distributed network grows, the
relative number of very connected nodes
decreases.

31
Scale-free Networks

Scale-free networks include many "very connected"
nodes, hubs of connectivity that shape the way
the network operates. The ratio of very connected
nodes to the number of nodes in the rest of the
network remains constant as the network changes
in size.
Because of these differences, random and
scale-free networks behave differently as they
break down. The connectedness of a randomly
distributed network decays steadily as nodes
fail, slowly breaking into smaller, separate
domains that are unable to communicate.
Scale-free networks are more robust, but in a
special way
Scale-free networks can have small-world
characteristics, as can randomly connected
networks (but see the earlier experiment for
small-world networks)

32
Scale Free Network Hubs, highly connected
nodes, bring together different parts of the
network Rubustness Removing random nodes have
little effect Low attack resistance Removing a
hub is lethal. Random Network No hubs Low
robustness Low attack resistance
33
Scale-free Networks
Epidemiologists are also pondering the
significance of scale-free connectivity. Until
now, it has been accepted that stopping sexually
transmitted diseases requires reaching or
immunizing a large proportion of the population
most contacts will be safe, and the disease will
no longer spread. But if societies of people
include the very connected individuals of
scale-free networksindividuals who have sex
lives that are quantitatively different from
those of their peersthen health offensives will
fail unless they target these individuals. These
individuals will propagate the disease no matter
how many of their more subdued neighbors are
immunized. Now consider the following
Geographic connectivity of Internet nodes is
scale-free, the number of links on Web pages is
scale-free, Web users belong to interest groups
that are connected in a scale-free way, and
e-mails propagate in a scale-free way. Barabasi's
model of the Internet tells us that stopping a
computer virus from spreading requires that we
focus on protecting the hubs.
34
(No Transcript)
35
(No Transcript)
36
14-3-3 subtypes (paralogs)
14-3-3 paralogs (black) have evolved to binding
different partners (grey) but still share MARK3
as binding partner
Schematic representation of co-immunoprecipitation
studies performed with anti- MARK (microtubule
affinity-regulating kinase) antibodies. The
strength of the interactions is indicated by
the thickness of the arrows (after (2) .
37
connect preferentially to a hub
38
Preferential attachment

Hub protein characteristics
Multiple binding sites
Promiscuous binding
Non-specific binding

connect preferentially to a hub
39
(No Transcript)
40
Network motifs

Different Motifs in different processes
Observation more interconnected motifs are
more conserved

41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Robustness of the biodegradation network against
perturbations is tested here by removing 200
edges randomly (simulating each time that the
enzyme catalysing the reaction step has
mutated) (A) For each connection lost (red line),
1.6 compounds lose their pathway to the Central
Metabolism (CM). (B) However, the increase in
the average pathway length to the CM for the
remaining compounds is small
The biodegradation network appears to be less
tolerant to perturbations than metabolic networks
(Jeong et al., 2000)
47
Preferential attachment in biodegradation networks
New degradable compounds are observed to attach
prefentially to hubs close to (or in) the Central
Metabolism
48
The Matchmaker 14-3-3 family

Massively interacting protein family (the PPI
champions) by means of various binding modes
Involved in many essential cell processes
Occurs throughout kingdom of life
Various numbers of isoforms in different
organisms (7 in human)

49
14-3-3 dimer structure
50
14-3-3 network (hub?) promotion by binding and
bringing together two different proteins
51
Janus-faced character of 14-3-3s Identified
(co)-targets fall in opposing classes they seem
to both cause and work against cancer...
Clear color actin growth, pro-apoptotic,
stimulation of transcription, nuclear import,
neuron development. Hatched opposing functions.
100 56 proteins (De Boer Jimenez, unpubl.
data.).
52
Targets of 14-3-3 proteins implicated in tumor
development. Arrows indicate positive effects
while sticks represent inhibitory effects.
Targets involved in primary apoptosis and cell
cycle control are not shown due to space
limitations.
53
Role of 14-3-3 proteins in apoptosis 14-3-3
proteins inhibit apoptosis through multiple
mechanisms sequestration and control of
subcellular localization of phosphorylated and
nonphosphorylated pro- and anti-apoptotic
proteins.
What is the role of the subtypes? Modularity?
54
14-3-3 subtypes (paralogs)
Different subtypes display different binding
modes, reflecting pronounced divergent evolution
after duplication
14-3-3- subtypes ?,?,? and ?
Schematic representation of co-immunoprecipitation
studies performed with anti- MARK (microtubule
affinity-regulating kinase) antibodies. The
strength of the interactions is indicated by
the thickness of the arrows.
55
Protein Interaction Prediction

How can we get the edges (connections) of the
cellular networks?
We can predict functions of genes or proteins so
we know where they would fit in a metabolic
network
There are also techniques to predict whether two
proteins interact, either functionally (e.g. they
are involved in a two-step metabolic process) or
directly physically (e.g. are together in a
protein complex)

56
Protein Function Prediction
The state of the art its not complete Many
genes are not annotated, and many more are
partially or erroneously annotated. Given a
genome which is partially annotated at best, how
do we fill in the blanks? Of each sequenced
genome, 20-50 of the functions of proteins
encoded by the genomes remains unknown! How then
do we build a reasonably complete networks when
the parts list is so incomplete?
57
Protein interaction prediction through
co-evolution

FALSE NEGATIVES
need many organisms
relies on known orthologous relationships
FALSE POSITIVES
Phylogenetic signals at the organsism level
Functional interaction may not mean physical
interaction

58
Phylogenetic profile analysis (recap)

Function prediction of genes based on
guilt-by-association a non-homologous
approach
The phylogenetic profile of a protein is a string
that encodes the presence or absence of the
protein in every sequenced genome
Because proteins that participate in a common
structural complex or metabolic pathway are
likely to co-evolve, the phylogenetic profiles of
such proteins are often similar'
This means that such proteins have a good chance
of being physically or metabolically connected

59
Phylogenetic profile analysis (Recap)

Phylogenetic profile (against N genomes)
For each gene X in a target genome (e.g., E
coli), build a phylogenetic profile as follows
If gene X has a homolog in genome i, the ith bit
of Xs phylogenetic profile is 1 otherwise it
is 0

60
Phylogenetic profile analysis (recap)

Example phylogenetic profiles based on 60
genomes

genome
gene
orf1034111011011001011111010001010000000011110001
1111110110111010101 orf10361011110001000001010000
010010000000010111101110011011010000101 orf103711
01100110000001110010000111111001101111101011101111
000010100 orf103811101001100100101100100111000001
01110101101111111111110000101 orf1039111111111111
1111111111111111111111111111101111111111111111101
orf104 10001010000000000000001010000000001100000
00000000100101000100 orf1040111011111111110111110
1111100000111111100111111110110111111101 orf10411
11111111111111111011111111111110111111110111111111
1111111101 orf10421110100101010010010110000100001
001111110111110101101100010101 orf104311101001100
10000010100111100100001111110101111011101000010101
orf104411111001111100100101110101111110011111111
11111101101100010101 orf1045111111011011001111111
1111111111101111111101111111111110010101 orf10460
10110000001000101100000011111000001010000000101001
0100000000 orf10470000000000000001000010000001000
100000000000000010000000000000 orf105
01101101101000101111011010101110011011001011111000
10000010001 orf1054010010011000000110000100010000
0000100100100001000100100000000
By correlating the rows (open reading frames
(ORF) or genes) you find out about joint presence
or absence of genes this is a signal for a
functional connection
Genes with similar phylogenetic profiles have
related functions or functionally linked D
Eisenberg and colleagues (1999)
61
Phylogenetic profile analysis

Evolution suppresses unnecessary proteins
Once a member of an interaction is lost, the
partner is likely to be lost as well

62
Phylogenetic profile analysis (recap)

Phylogenetic profiles contain great amount of
functional information
Phlylogenetic profile analysis can be used to
distinguish orthologous genes from paralogous
genes
Subcellular localization 361 yeast
nucleus-encoded mitochondrial proteins are
identified at 50 accuracy with 58 coverage
through phylogenetic profile analysis
Functional complementarity By examining inverse
phylogenetic profiles, one can find functionally
complementary genes that have evolved through one
of several mechanisms of convergent evolution.

63
Prediction of protein-protein interactions
(recap)Rosetta stone method

Gene fusion is the an effective method for
prediction of protein-protein interactions
If proteins A and B are homologous to two domains
of a protein C, A and B are predicted to have
interaction

A
B
C
Though gene-fusion has low prediction coverage,
it false-positive rate is low (high specificity)
64
Gene (domain) fusion example

Vertebrates have a multi-enzyme protein
(GARs-AIRs-GARt) comprising the enzymes GAR
synthetase (GARs), AIR synthetase (AIRs), and GAR
transformylase (GARt).
In insects, the polypeptide appears as
GARs-(AIRs)2-GARt.
In yeast, GARs-AIRs is encoded separately from
GARt
In bacteria each domain is encoded separately
(Henikoff et al., 1997).
GAR glycinamide ribonucleotide
AIR aminoimidazole ribonucleotide

65
Protein interaction database (recap)

There are numerous databases of protein-protein
interactions
DIP is a popular protein-protein interaction
database

The DIP database catalogs experimentally
determined interactions between proteins. It
combines information from a variety of sources to
create a single, consistent set of
protein-protein interactions.
66
Protein interaction databases (Recap)

BIND - Biomolecular Interaction Network Database
DIP - Database of Interacting Proteins
PIM Hybrigenics
PathCalling Yeast Interaction Database
MINT - a Molecular Interactions Database
GRID - The General Repository for Interaction
Datasets
InterPreTS - protein interaction prediction
through tertiary structure
STRING - predicted functional associations among
genes/proteins
Mammalian protein-protein interaction database
(PPI)
InterDom - database of putative interacting
protein domains
FusionDB - database of bacterial and archaeal
gene fusion events
IntAct Project
The Human Protein Interaction Database (HPID)
ADVICE - Automated Detection and Validation of
Interaction by Co-evolution
InterWeaver - protein interaction reports with
online evidence
PathBLAST - alignment of protein interaction
networks
ClusPro - a fully automated algorithm for
protein-protein docking
HPRD - Human Protein Reference Database

67
Protein interaction database (recap)
68
Recap
Network of protein interactions and predicted
functional links involving silencing information
regulator (SIR) proteins. Filled circles
represent proteins of known function open
circles represent proteins of unknown function,
represented only by their Saccharomyces genome
sequence numbers ( http//genome-www.stanford.edu/
Saccharomyces). Solid lines show experimentally
determined interactions, as summarized in the
Database of Interacting Proteins19
(http//dip.doe-mbi.ucla.edu). Dashed lines show
functional links predicted by the Rosetta Stone
method12. Dotted lines show functional links
predicted by phylogenetic profiles16. Some
predicted links are omitted for clarity.
69
Recap
Network of predicted functional linkages
involving the yeast prion protein20 Sup35. The
dashed line shows the only experimentally
determined interaction. The other functional
links were calculated from genome and expression
data11 by a combination of methods, including
phylogenetic profiles, Rosetta stone linkages and
mRNA expression. Linkages predicted by more than
one method, and hence particularly reliable, are
shown by heavy lines. Adapted from ref. 11.
70
STRING - predicted functional associations among
genes/proteins
Recap

STRING is a database of predicted functional
associations among genes/proteins.
Genes of similar function tend to be maintained
in close neighborhood, tend to be present or
absent together, i.e. to have the same
phylogenetic occurrence, and can sometimes be
found fused into a single gene encoding a
combined polypeptide.
STRING integrates this information from as many
genomes as possible to predict functional links
between proteins.

Berend Snel en Martijn Huynen (RUN) and the group
of Peer Bork (EMBL, Heidelberg)
71
STRING - predicted functional associations among
genes/proteins
Recap

STRING is a database of known and predicted
protein-protein interactions.The interactions
include direct (physical) and indirect
(functional) associations they are derived from
four sources
Genomic Context (Synteny)
High-throughput Experiments
(Conserved) Co-expression
Previous Knowledge
STRING quantitatively integrates interaction
data from these sources for a large number of
organisms, and transfers information between
these organisms where applicable. The database
currently contains 736429 proteins in 179 species

72
STRING - predicted functional associations among
genes/proteins
Recap
Conserved Neighborhood This view shows
runs of genes that occur repeatedly in close
neighborhood in (prokaryotic) genomes. Genes
located together in a run are linked with a black
line (maximum allowed intergenic distance is 300
bp). Note that if there are multiple runs for a
given species, these are separated by white
space. If there are other genes in the run that
are below the current score threshold, they are
drawn as small white triangles. Gene fusion
occurences are also drawn, but only if they are
present in a run.
73
STRING - predicted functional associations among
genes/proteins
Recap

Genes clustered in a genomic region are likely to
interact
co-ordinated expression
co-ordinated gene gains/losses

74
Wrapping up

Understand regular, random, small-world and
scale-free networks
Know and understand observations on path length,
clustering coefficients, etc.
Know and understand interaction prediction using
phylogenetic co-evolution, phylogenetic
profiling, Rosetta stone methods and the STRING
server
Comparing and overlaying various networks (e.g.
regulation, signalling, metabolic, PPI) and
studying evolutionary conservation at these
network levels is one of the current grand
challenges, and will be crucially important for a
systemsbased approach to (intra)cellular
behaviour.