Statistical physics of complex networks presentation

About This Presentation

Transcript and Presenter's Notes

Title: Statistical physics of complex networks

1
Statistical physics of complex networks

Sergei Maslov
Brookhaven National Laboratory

2
Short history complex systems before after
networks

Statistical physics of complex systems was active
in 80s-90s (following the chaos boom of 70s)
Fractals (Mandelbrot and many others)
Self-Organized Criticality (Per Bak and
co-authors) ? sandpiles ? granular systems
Complexmultiple time and length scales (e.g.
avalanches) ? Cult of power-laws
Cellular automata (mostly in real spacetime)
Examples
earthquakes
disordered moving interfaces
(co)-evolution of species
agent-based modeling (ants)
By the end of 90s breakup of the community and
specialization
Biology
Economics and finance
Internet
Social sciences

3
Networks in complex systems

Complex systems
Large number of components interacting with each
other
All components and/or interactions are different
from each other (unlike in traditional physics
where 1023 electrons are all the same!)
Paradigms
104 types of proteins in an organism,
106 routers in the Internet
109 web pages in the WWW
1011 neurons in a human brain
The simplest property who interacts with whom?
can be visualized as a network
Complex networks are just a backbone for complex
dynamical processes

4
Why study the topology of complex networks?

Lots of easily available data thats where the
state of the art information is (at least in
biology)
Large networks may contain information about
basic design principles and/or evolutionary
history of the complex system
This is similar to paleontology learning about
an animal from its backbone

Inside single cells

A small part of a metabolic network the citric
acid cycle

7
Metabolic pathway chart by ExPASy
8
Protein binding networks
Bakers yeast S. cerevisiae (only nuclear
proteins shown)
Nematode worm C. elegans
9
Transcription regulatory networks
Single-celled eukaryote S. cerevisiae
Bacterium E. coli
10
GENOME
protein-gene interactions
PROTEOME
protein-protein interactions
METABOLISM
bio-chemical reactions
slide after Reka Albert
11

Between cells in a multi-cellular organism

12
Sea urchin embryonic development (endomesoderm up
to 30 hours) by Davidsons lab
13
C. elegans neurons
14

Between organisms

15
Freshwater food web by Neo Martinez and Richard
Williams
16
Sexual contacts M. E. J. Newman, The structure
and function of complex networks, SIAM Review 45,
167-256 (2003).
17

Social

18
High school dating Data drawn from Peter S.
Bearman, James Moody, and Katherine Stovel
visualized by Mark Newman
19
Network of actor co-starring in movies
20
Networks of scientists co-authorship of papers
21
Webpages connected by hyperlinks on the ATT
website circa 1996 visualized by Mark
Newman Citation networks are similar to the WWW
but time-ordered
22

Technological

23
Internet as measured by Hal Burch and Bill
Cheswick's Internet Mapping Project.
24
(No Transcript)
25
transportation networks airlines
26
transportation networks railway maps
Tokyo rail map
27

Lecture 1 General introduction into networks
Node degrees, its distribution, and correlations
Simple models
preferential attachment and Simon model
Growth model for protein families
Percolation transition on networks
Clustering coefficient
Lectures 2-3 Biomolecular (mostly protein)
networks
Regulatory and signaling networks
How many regulators? Bureaucratic collapse
Network motifs in directed (e.g. regulatory)
networks
Protein binding networks
Broad degree distributions in protein binding
networks and possible explanations
Evolutionary (duplication-divergence)
Biophysical (stickiness)
Functional
Beyond degree distributions How it all is wired
together? Correlations in degrees
Randomization of networks
Law of Mass Action and propagation of
perturbations

28
Degree (or connectivity) of a node the of
neighbors
Degree K2
Degree K4
29
Directed networks havein- and out-degrees
In-degree Kin2
Out-degree Kout5
30

Degree distributions in random and real networks

31
Degree distribution in a random network

Randomly throw E edges among N nodes
Solomonoff, Rapaport, Bull. Math. Biophysics
(1951)Erdos-Renyi (1960)
Degree distribution Binominal ? Poisson
K???? with no hubs(fast decay of N(K))

32
Degree distribution in real protein binding
network

Histogram N(K) is broad most nodes have low
degree 1, few nodes high degree 100
Can be approximately fitted with N(K)K-?
functional formwith ?2.5

33
Many real world networkshave broad degree
distributions
34
Basic BA-model

Very simple algorithm to implement
start with an initial set of m0 fully connected
nodes
e.g. m0 3
now add new vertices one by one, each one with
exactly m edges
each new edge connects to an existing vertex in
proportion to the number of edges that vertex
already has ? preferential attachment
easiest if you keep track of edge endpoints in
one large array and select an element from this
array at random
the probability of selecting any one vertex will
be proportional to the number of times it appears
in the array which corresponds to its degree

1 1 2 2 2 3 3 4 5 6 6 7 8 .
35
generating BA graphs contd

To start, each vertex has an equal number of
edges (2)
the probability of choosing any vertex is 1/3
We add a new vertex, and it will have m edges,
here take m2
draw 2 random elements from the array suppose
they are 2 and 3
Now the probabilities of selecting 1,2,3,or 4 are
1/5, 3/10, 3/10, 1/5
Add a new vertex, draw a vertex for it to connect
from the array
etc.

36
The tale of linear vs exponential growth

Linear growth Barabasi-Albert model with ?3 is
a version of the Simons word usage model ?2?
dnk/dt(k-1)nk-1/(t?t)-knk/(t?t)
Exponential growth Protein duplication-deletion
model ?2?/(?dup-?del)
dnk/dt?dup (k-1)nk-1- (?dup?del )knk?del
(k1)nk1 NF?knk also grows exponentially
dNF/dt ? NG ? ?kknk

37
Preferential attachment with fitness

Bianconi-Barabasi (2001)
Attractiveness of a node to new edges is given by
fiki/?rfrkr
For uniform ?(f) Pk k-(1C)/ln(k), where
C1.255
Generally C depends on ?(f)
Some ?(f) result in Bose-Einstein condensation
in which super-hubs emerge

Percolation transition in networks

39
Why should we care?

The most important property of a network. It
quantifies how broken-up is a network
Below the percolation threshold many small
components
At the percolation threshold scale-free
distribution of component sizes P(S)S-2.5
Above the percolation threshold giant connected
component and a few small ones?
Determines the propagation of perturbations which
affect neighbors with probability p (e.g.
infections)

40
Naïve (and wrong) argument

An average node has ltKgt first neighbors, ltKgtltK-1gt
second neighbors, ltKgtltK-1gtltK-1gt third neighbors
We neglect overlap between e.g. second and first
neighbors in random networks a small effect 1/N
If ltK-1gt ? 1 a single node is connected to a
finite fraction of all nodes in the network

41
Where is it wrong?

Probability to arrive at a node with K neighbors
is proportional to K!
All averages have to be modified ltF(K)gt ? ltF(K)
Kgt/ltKgt
The right answer ltK(K-1)gt/ltKgt ? 1 a
perturbation would spread
In directed networks it is ltKinKoutgt/ltKingt ? 1
Correlations between degrees of neighbors and an
abnormally large number of triangles (clustering)
would affect the answer

42
How many clusters?

If ltK(K-1)gt/ltKgt ltlt 1 there are only small
clusters
If ltK(K-1)gt/ltKgt ? 1 cluster sizes S have a
scale-free distribution P(S)S-2.5.
If ltK(K-1)gt/ltKgt gtgt 1 there is one giant
cluster and a few small ones
Perturbation which affects neighbors with
probability p propagates if pltK(K-1)gt/ltKgt ? 1
For scale-free networks P(K)K-? with ?lt3,
ltK2gt? ? perturbation always spreads in a large
enough network

43
Diameter and mean cluster size are determined by
ltk(k-1)gt/ltkgt

Mean diameter L 1ltkgt ltkgtltk(k-1)gt/ltkgt
ltkgt(ltk(k-1)gt/ltkgt)LN ? L ?
log(N/ltkgt)/log(ltk(k-1)gt/ltkgt)1
Mean cluster size below pcltSgt1ltkgt/(1-ltk(k-1)gt/
ltkgt)

44
Amplification ratios

A(dir) 1.08 - E. Coli, 0.58 - Yeast
A(undir) 10.5 - E. Coli, 13.4 Yeast
A(PPI) ? - E. Coli, 26.3 - Yeast

45
Clustering coefficient C?

C?3 N?/?knk k(k-1)/2
Could be defined for individual nodes or as a
function of k C?(k)3 N?(k)/nk k(k-1)/2
C?1 could not be realized if k is heterogeneous
Needs to be compared to its value in randomized
networks with the same degree sequence

46
End lecture 1
47
Lecture 2
48

Protein networks

49
Places to learn molecular biology

Molecular Biology of the Cell. Fourth Edition.
Bruce Alberts, Alexander Johnson, Julian Lewis,
Martin Raff, Keith Roberts, Peter Walter. Garland
Science. 2002.
DNA from the beginning. http//www.dnaftb.org/
Online Biology Book. http//gened.emc.maricopa.edu
/bio/bio181/BIOBK/BioBookTOC.html
Kimballs Biology Pages. http//www.ultranet.com/
jkimball/BiologyPages/
Gene expression. http//vlib.org/Science/Cell_Biol
ogy/gene_expression.shtml
Human Genome Project. http//www.ornl.gov/hgmis/
Microarrays. http//www.gene-chips.com/

From Prof. Michael Hallett (McGill) online
lectures
50
Protein networks

Nodes proteins
Edges interactions between proteins
Metabolic (protein enzymes on sharing common
metabolites are connected)
Physical (binding interactions)
Regulatory and signaling (transcriptional
regulation, protein modifications)
Co-expression networks from microarray data
(connect genes with similar expression
(abundance) patterns under many conditions)
Genetic interactions e.g. synthetic lethal
protein pairs (removal of any one of the two
proteins doesnt kill the cell, but removal of
both proteins does)
Etc, etc, etc.

51
Sources of data on protein networks

Genome-wide experiments
Binding two-hybrid (Y2H) and mass-spec (MS)
high-throughput techniques
Transcriptional regulation ChIP-on-chip, or
ChIP-then-SAGE
Expression, disruption networks microarrays
Lethality of genes (including synthetic lethals)
Gene knockout yeast
RNAi worm, fly
Many small or intermediate-scale experiments
All stored in public databases BIOGRID, DIP,
BIND, YPD (no longer public), SGD, Flybase,
Ecocyc, etc.

52
Pathway ? network paradigm shift
53
Images from ResNet3.0 by Ariadne Genomics
MAPK signaling
Inhibition of apoptosis
54

Transcription regulatory networks

55
Transcription factors bind DNA
56
Activators and repressors

Depending on the position of the binding site
(operator) with respect to the RNA-polymerase
binding site (promoter) Transcription Factors
could either activate or repress the production
of mRNA from a given gene (transcription) and
thus affect the abundance of a protein product

57
Transcription regulatory networks
58
Sea urchin embryonic development (endomesoderm up
to 30 hours) by Davidsons lab
59

How many transcriptional regulators are out
there?

60
Fraction of transcriptional regulators in bacteria
61
Figure from Erik van Nimwegen, TIG 2003
62
Complexity of regulation grows with complexity of
organism

NRltKoutgtNltKingtnumber of edges
NR/N ltKingt/ltKoutgt increases with N
ltKingt grows with N
In bacteria NRN2 (Stover, et al. 2000)
In eucaryots NRN1.3 (van Nimwengen, 2002)
Networks in more complex organisms are more
interconnected then in simpler ones

63
Complexity is manifested in Kin distribution
E. coli vs H. sapiens
64
Table from Erik van Nimwegen, TIG 2003
65
Toolbox model

NTFAN2 ? dNTF2ANdN ? dN/dNTF2A/N
In small genomes 100 genes per TF. In large ones
only 4!
A toolbox (e.g. metabolic network) grows linearly
with N. To handle a new condition (NTF?NTF1) one
needs fewer and fewer new tools.
S. Maslov, S. Krishna, K. Sneppen in preparation

66
How is it all connected? (beyond degree
distribution)
67
What is unusual about topology of a given network?

Look for a number of occurrences of a certain
topological pattern
Compare with a randomized network
What patterns to look for?
Number of edges connecting nodes with given
degrees (degree-degree correlations)
Motifs small subgraphs of 3-4 nodes (in
undirected networks clustering or the triangles)
Overrepresentation Nature needs them for some
function
Underrepresentation they are detrimental and
nature avoids them

How to construct a proper random network?

69
Randomization of a network
70
Stub reconnection algorithm

Break every edge into two halves (stubs)
Randomly reconnect stubs
Watch for multiple edges!
For example, in the AS-Internet two largest hubs
would end up being connected with 50 edges (sic!)
Not adaptable to conserve other low-level
topological properties of the network

71
Local rewiring algorithm

R. Kannan, P. Tetali, and S. Vempala, Random
Structures and Algorithms (1999)
SM, K. Sneppen, Science (2002)

Randomly select and rewire two edges
Repeat many times

72
Metropolis rewiring algorithm
energy E
energy E?E
SM, K. Sneppen cond-mat preprint
(2002),Physica A (2004)

Randomly select two edges
Calculate change ?E in energy function
E(Nactual-Ndesired)2/Ndesired
Rewire with probability pexp(-?E/T)

Degree-degree correlations

74
Central vs peripheral network architecture
random
A. Trusina, P. Minnhagen, SM, K. Sneppen, Phys.
Rev. Lett. 92, 17870, (2004)
75
What is the case for protein interaction network
SM, K. Sneppen, Science 296, 910 (2002)
76
Correlation profile

Count N(k0,k1) the number of links between
nodes with connectivities k0 and k1
Compare it to Nr(k0,k1) the same property in a
random network
Qualitative features are very noise-tolerant with
respect to both false positives and false
negatives

77
(No Transcript)
78
Correlation profile of the protein interaction
network
R(k0,k1)N(k0,k1)/Nr(k0,k1)
Z(k0,k1) (N(k0,k1)-Nr(k0,k1))/?Nr(k0,k1)
Similar profile is seen in the yeast regulatory
network
79
Hubs may act within a module, or connect modules

Party hub
simultaneous interactions
tends to be within the same module
Date hub
sequential interactions
connect different modules

Han et al, Nature 443, 88 (2004)
80
(No Transcript)
81
Correlation profile of the yeast regulatory
network
R(kout, kin)N(kout, kin)/Nr(kout,kin)
Z(kout,kin)(N(kout,kin)-Nr(kout,kin))/
?Nr(kout,kin)
82
Some scale-free networks may appear similar
In both networks the degree distribution is
scale-free P(k) k-? with ?2.2-2.5
83
But correlation profiles give them unique
identities
Internet
Protein interactions
84

Small network motifs(Uri Alon and his group)

85
All 3 node motifs
86
Motifs can overlap in the network
motif to be found
graph
motif matches in the target graph
http//mavisto.ipk-gatersleben.de/frequency_concep
ts.html
87
Detection of important network motifs

Technique
construct many random graphs with the same number
of nodes and degree distribution
count the number of motifs in those graphs
calculate the Z score the probability that the
same or larger number of motifs in the real world
network could have occurred in a random one
Software available
http//www.weizmann.ac.il/mcb/UriAlon/

88
What the Z score means
m mean number of times the motifappeared in
the random graph
the probability observing a Z score of 2 is
0.02275 In the context of motifs Z gt 0, motif
occurs more often than for random graphs Z lt 0,
motif occurs less often than in random
graphs Z gt 1.65, only a 5 chance of random
occurrence
s standard deviation
of times motif appeared in random graph
x - mx
zx

sx
89
Examples of network motifs (3 nodes)

Feed forward loop
Found in many transcriptional regulatory
networks

90
Possible functional role of a coherent
feed-forward loop

Noise filtering short pulses in input do not
result in turning on of the Z
To function needs time-delay (about 0.5hrs for
bacterial transcription)

91
All 4 node subgraphs (computational expense
increases with the size of the graph!)
92
Higher-order motifs

4-node motifs contain some 3-node motifs
One needs to be careful when calculating
over-representation
Alon co-authors use our Metropolis algorithm to
generate networks with a given number of
low-level motifs

93
Table 1 from R Milo, S Shen-Orr, S Itzkovitz, N
Kashtan, D Chklovskii U Alon, Network Motifs
Simple Building Blocks of Complex Networks
Science, 298824-827 (2002)
94
Examples of network motifs (4 nodes)

Statistical physics of complex networks PowerPoint PPT Presentation