Title: The Fractal Properties of Growing Networks
1Data and networksGIACS Conference Palermo 9-4-08
2 3- Networks as an instrument of Data Filtering
Correlation based Minimal Spanning Tree 1071
stocks traded at NYSE between 1987-1998 Different
colours refers to different SIC sectors
Correlation based Minimal Spanning
Tree Artificial market of 1071 stocks According
the one factor model. Different colours refers to
different SIC sectors
Topology of correlation based minimal spanning
trees in real and model marketsG. Bonanno, G.
Caldarelli F. Lillo, R. Mantegna,Physical Review
E 68 046130 (2003).
Networks of equities in financial marketsG.
Bonanno, GC, F. Lillo, S. Miccichè, N.
Vandewalle, R. N. Mantegna,European Physical
Journal B 38 363-372 (2004).
4COSIN (official number IST-20001-33555) was a
Research Project financed by European
Commission through the Fifth Framework
Programme. COSIN is part of the actions taken by
the Future and Emerging Technologies (FET) in
the priority area of research of Information
Society Technologies (IST) (http//www.cordis.lu/I
ST/FET) Documents at http//www.cosinproject.org
5- COSIN involves
- 7 different nodes in 5 countries
- (Ph CS) Roma, Italy
- (Ph) Barcelona, Spain
- (Ph) Lausanne, Switzerland
- (Ph) Ens, Paris, France
- (CS) Karlsruhe, Germany
- (Ph) Upsud, Paris, France
EU countries 2001
Non EU countries 2001
EU COSIN participant
Non EU COSIN participant
6G. Bonanno, G. Caldarelli, F.Colaiori, G. Di
Battista, D. Donato, S. Leonardi, R. Mantegna, A.
Marchetti-Spaccamela, M. Patrignani, L.
Pietronero, V. Servedio
A. Arenas, M. Boguña, A. Díaz-Guilera, R. Ferrer
i Cancho, M.A. Muñoz, M.A Serrano, R.
Pastor-Satorras
G. Bianconi, A. Capocci, P. De Los Rios, T.
Erlebach, T. Petermann, Y.-C. Zhang
A. Barrat. S. Battiston, P. Nadal, A. Vespignani,
G. Weisbuch,
U. Brandes, M. Gaertler, M. Kaufmann, D. Wagner,
7- To develop a unified set of Complex Systems
theoretical methodologies for
the characterization of Complex Networks, - To develop statistical models for networks
growth and evolution. - To collect data mainly for Internet and World
Wide Web - To extend analysis to social and economic
networks - To develop visualization tools for large scale
systems - To disseminate results through publication,
conferences and project web site.
8- After three years of activity we have a common
ground of methodologies and tools at least
between computer scientists and physicists (also
some economists). Some more effort would be
necessary to integrate social scientists. - We provided a class of models for network growth
and evolution, moreover we addressed the study of
statistical properties of weighted networks. - Data collection for Internet and World Wide Web
resulted much more difficult than expected.
Actually larger consortia have been funded
specifically for this task in the meanwhile.
Thank to external collaboration we still found
the data to validate the models we produced
9- In economic and financial networks , COSIN people
are on the frontline of this very new field of
research. This new approach attracted the
interest of the community at level of Nobel
laureates. Less successful has been the impact in
social science. Unexpected and very successful
has been the impact on biology (botany, zoology).
- Standard visualization problem wants to keep all
the graph structure and present it suitably. On
this point some progress has been made, it is
worth to mention that several ideas are now under
consideration for the visualization of
simplified graphs. - The project had a considerable impact on the
scientific community in terms of citations,
visibility, conferences, schools, books and data
download from site. Maybe some more work could be
done for the general public.
10The graph of scientific collaborations on
scale-free networks in statistical physics M.E.J
Newman PRE 69 026113 (2004)
11- More than 150 referred papers (some of them
Nature, PNAS, PRL, LNCS) - Lectures and talks in the various world
conference (for physics STATPHYS, APS Meetings)
and invited talks in various institutions - Books
12The Sitges Conference published the proceedings
of the most interesting talks on a special volume
Statistical Mechanics of Complex
NetworksSeries Lecture Notes in Physics,
Vol. 625 Pastor-Satorras, Romualdo Rubi,
Miguel Diaz-Guilera, Albert (Eds.) 2003, XII,
206 p., HardcoverISBN 3-540-40372-8
The Rome Conference published the proceeding on a
special issue of the European Physical Journal B
13 14- Trivially, the access to data was crucial for
the project - We had that in some cases we found very nice
datasets and could work on them - Internet (AS topology)
- Wikipedia.
- In presence of poor or no data, we obtained (of
course) only - partial results
- Liquidity shocks,
- River networks
15STATISTICAL PROPERTIES OF THE WIKIGRAPH
L.S. Buriol A. Capocci, F. Colaiori, D. Donato,
S. Leonardi, F. Rao, V. Servedio, GC
- Taxonomy and clustering in collaborative systems
the case of the on-line encyclopedia
WikipediaA.Capocci, F. Rao, GC Europhysics
Letters 81 28008 (arXiv0710.3058) (2008) - Preferential attachment in the growth of social
networks the Internet encyclopedia WikipediaA.
Capocci, V.D.P. Servedio, F. Colaiori, L.S.
Buriol, D. Donato, S. Leonardi, GC - Physical Review E 74 036116 (2006).
Centro E. Fermi
16 17Wikipedia in other languages You may read and
edit articles in many different
languages Wikipedia encyclopedia languages with
over 100,000 articles Deutsch (German)
Français (French) Italiano (Italian)
(Japanese) Nederlands (Dutch) Polski (Polish)
Português (Portuguese) Svenska (Swedish)
Wikipedia encyclopedia languages with over
10,000 articles ??????? (Arabic) ?????????
(Bulgarian) Català (Catalan) Cesky (Czech)
Dansk (Danish) Eesti (Estonian) Español
(Spanish) Esperanto Galego (Galician) ?????
(Hebrew) Hrvatski (Croatian) Ido Bahasa
Indonesia (Indonesian) ??? (Korean) Lietuviu
(Lithuanian) Magyar (Hungarian) Bahasa Melayu
(Malay) Norsk bokmål (Norwegian) Norsk
nynorsk (Norwegian) Româna (Romanian) ???????
(Russian) Slovencina (Slovak) Slovenšcina
(Slovenian) ?????? (Serbian) Suomi (Finnish)
Türkçe (Turkish) ?????????? (Ukrainian) ??
(Chinese) Wikipedia encyclopedia languages with
over 1,000 articles Alemannisch (Alemannic)
Afrikaans Aragonés (Aragonese) Asturianu
(Asturian) Az?rbaycan (Azerbaijani)
Bân-lâm-gú (Min Nan) ?????????? (Belarusian)
Bosanski (Bosnian) Brezhoneg (Breton) ?a???
?e??? (Chuvash) Corsu (Corsican) Cymraeg
(Welsh) ???????? (Greek) Euskara (Basque)
????? (Persian) Føroyskt (Faroese) Frysk
(Western Frisian) Gaeilge (Irish) Gàidhlig
(Scots Gaelic) ?????? (Hindi) Interlingua
Íslenska (Icelandic) Basa Jawa (Javanese)
??????? (Georgian) ????? (Kannada) Kurdî /
????? (Kurdish) Latina (Latin) Latviešu
(Latvian) Lëtzebuergesch (Luxembourgish)
Limburgs (Limburgish) ?????????? (Macedonian)
????? (Marathi) Napulitana (Neapolitan)
Occitan ???? (Ossetic) Plattdüütsch (Low
Saxon) Scots Sicilianu (Sicilian) Simple
English Shqip (Albanian) Sinugboanon
(Cebuano) Srpskohrvatski/??????????????
(SerboCroatian) ????? (Tamil) Tagalog
??????? (Thai) Tatarça (Tatar) ??????
(Telugu) Ti?ng Vi?t (Vietnamese) Walon
(Walloon) Complete list Multilingual
coordination Start a Wikipedia in another
language
18The datasets of each language are available in
two selfextracting files for mysql database. The
table cur contains the current on-line articles,
whereas the table old contains all previous
versions of each current article. Old versions of
an article are identified for using the same
title, and not the same id. The dataset dumps are
updated almost weekly, so the current graph is
usually not more than a week old. For
generating a graph from the link structure of a
dataset, each article is considered a node and
each hyperlink between articles is a link in this
graph. In the wikipedia datasets, each webpage is
a single article. An article also might contain
some external links that point pages outside the
dataset. Usually wikipedia articles has no
external links, or just a few of them. These kind
of links are not considered for generating the
wikigraphs, since we want to restrict the graph
to pages into the set being analyzed.
19- sociological reasons the encyclopedia collects
pages written by a number of indipendent and
eterogeneous individuals. Each of them
autonomously decides about the content of the
articles with the only constraint of a prefixed
layout. The autonomy is a common feature of the
content creation in the Web. The wikipedia
authors community is formed by members whose
only wish is to make available to the world
concepts and topics that they consider
meaningful. In some sense, tracing the evolution
of the wikipedia subsets should mirror the
develop of significant trends within each
linguistic community. - generation on time wikipedia provides time
information associated with nodes. Moreover, it
provides old information time information for
the creation and the modifications for each page
on the dataset. - independency of external links wikipedia
articles link mainly to articles on the same
dataset. - variety of graph sizes it can be collected one
graph by language, and the graph dimensions vary
from a few hundred pages up to half million pages.
20- Summarizing
- We have available all the history of growth, so
that we can study the evolution - We have an example of a social network of huge
size - We can compare the system produced by users of
different language, thereby - measuring the effect of different cultures.
- We can study Wikipedia as a case study for the
World Wide Web
WE RECOVER A PREFERENTIAL ATTACHMENT MECHANISM
FROM THE DATA. DIFFERENT LANGUAGES PRODUCE
SIMILAR STRUCTURES WE FIND A SYSTEM SIMILAR TO
THE WWW EVEN IF THE MICROSCOPIC RULE OF GROWTH IS
VERY DIFFERENT.
21We generated six wikigraphs, wikiEN, wikiDE,
wikiFR, wikiES, wikiIT and wikiPT, generated from
the English, German, French, Spanish, Italian and
Portuguese datasets, respectively. The graphs
were obtained from an old dump of June 13, 2004.
We are not using the current data due to disk
space restrictions. The English dataset of June
2005 has more than 36 GB compacted, that is about
200 GB expanded.
The page that was mostly visited was the main
pages for wikiEN, wikiDE, wikiFR and wikiES,
while that for the datasets wikiIT and wikiPT
there were no visits associated with the pages.
22- SCC (Strongly Connected Component) includes
pages that are mutually reachable by traveling on
the graph - IN component is the region from which one can
reach SCC - OUT component encompasses the pages reached from
SCC. - TENDRILS are pages reacheable from the IN
component,and not pointing to SCC or OUT region
TENDRILS also includes those pages that point to
the OUT region not belonging to any of the other
de?ned regions. - TUBES connect directly IN and OUT regions,
- DISCONNECTED regions are those isolated from the
rest.
The Bow-tie structure, found in the WWW (Broder
et al. Comp. Net. 33, 309, 2000)
23The measure/size of the Wikigraph for the various
languages.
The percentage of the various components of the
Wikigraph for the various languages.
24- Power laws (what else? ? )
The Degree shows fat tails that can be
approximated by a power-law function of the kind
P(k) k-g Where the exponent is the same both
for in-degree and out-degree.
In the case of WWW 2 gin 2.1
indegree(empty) and outdegree(filled)
Occurrency distributions for the Wikgraph in
English (?) and Portuguese (?).
25As regards the assortativity (as measured by the
average degree of the neighbours of a vertex with
degree k) there is no evidence of any assortative
behaviour.
The average neighbors indegree, computed along
incoming edges, as a function of the indegree
for the English (?) and Portuguese (?)
26The pagerank distribution for wikiEN is a power
law function with ? 2.1. Previous measures in
webgraphs also exhibit the same behaviour for
the pagerank distribution. We list the number
of visits of the top ranked pages just to show
that this value is not related with the pagerank
values. We confirm that very little correlation
was found between the link analysis
characteristics and the actual number of visits.
27Given the history of growth one can verify the
hypothesis of preferential attachment. This is
done by means of the histogram P(k) who gives the
number of vertices (whose degree is k) acquiring
new connections at time t. This is quantity is
weighted by the factor N(t)/n(k,t)
We find preferential attachment for in and out
degree.
English (?) and Portuguese (?). White
in-degree Filled out-degree
28In our opinion the nature of this preferential
attachment is effective ratther than the real
driving force in the phenomenon.
In other words the linear preferential attachment
can be originated by a copying procedure (new
vertices are introduced by copying old ones and
keeping most of the edges). Also we could have a
sort of fitness for the various entries (but in
this case one has a multidimensional series of
quantities describing the importance of one
page).
Apart the interpretation the data show a rather
clear LINEAR PREFERENTIAL ATTACHMENT
29Other power-laws related to dyamics need to be
explained For example the number of updates also
follows a power law.
Each point presents the number of nodes (y axis)
that were updated exactly x times.
30- We introduced an evolution rule, similar to other
models of - rewiring already considered,
- At each time step, a vertex is added to the
network. It is connected to the existing
vertices by M oriented edges the direction of
each edge is drawn at random - with probability R1 the edge leaves the new
vertex pointing to an existing one chosen with
probability proportional to its indegree - with probability R2, the edge points to the new
vertex, and the source vertex is chosen with
probability proportional to its outdegree. - Finally, with probability R3 1 - R1 - R2 the
edge is added between existing vertices the
source vertex is chosen with probability
proportional to the outdegree, while the
destination vertex is chosen with probability
proportional to the indegree.
See for example Krapivsky Rodgers and Redner
PRL 86 5401 (2001)
31From these data it seems that a model in the
spirit of BA could reproduce most of the features
of the system.
- Actually
- This network is oriented.
- The preferential attachment in Wikipedia has a
somewhat different nature. Here, most of the
times, the edges are added between existing
vertices differently from the BA model. For
instance, in the English version of Wikipedia a
largely dominant fraction 0.883 of new edges is
created between two existing pages, while a
smaller fraction of edges points or leaves a
newly added vertex (0.026 and 0.091 respectively).
32The model can be solved analytically
We can use for the model the empirical values of
R10.026 R20.091 R30.883 Already measured for
the English version of Wikigraph
P(kin) kin- gin gin -(11/(1-R2))
P(kout) kout- gout gout -(11/(1-R1))
gin ? 2.100 gout ? 2.027
33The model can be solved analytically
Knnin (kin) M N1-R1 R1R2/R3 (R3?0)
Both cases is constant
Knnin (kin) M R1R2 ln (N) (R30)
The value of the constant depends also upon the
initial conditions. The two lines refer to two
realizations of the model where in one case the
0.5 of the first vertices has been removed.
34- We have a structure that resembles the bow-tie
of the WWW - We have a power-law decay for the degree
distributions and also - a power-law decay for the number of one page
updates - Preferential Attachment in the Rewiring seems to
be the driving force - in the evolution of the system
- The microscopic structure of rewiring is very
different from that of WWW - In principle a user can change any series of
edges and add as many - pages as wanted. Still most of the quantities
are similar
35It turns out that the pagerank of the pages is
not related with the number of visit opens a very
interesting scenario for further research work.
Since, by definition, pagerank should give us the
visit time of the page and since actually it is
complety indipendent by the number of visits, we
wonder if pagerank is a good measure of the
authoritativeness of the pages in wikigraphs and
which modifications should be introduced in order
to tune its performances.
36 37 38 39 40From satellite images one gets Digital Elevation
Models (DEM)
From DEM a spanning tree is computed (via
steepest descent)
From the spanning tree, the number of points
uphill is computed
41HACKS LAW L// Ah
42 43Data on Mars topography were collected through
the Mars Orbiter Laser Altimeter (MOLA)
44 45 46 Results are that we can distinguish regions
whose DEM networks have properties similar to
River Networks on Earth.
For River on Earth P(A) ? A-1.43
47THE LIQUIDITY MARKET
Monetary Policy
ECB
Reserves
Banks get liquidity from ECB through
auctions Monetary policy realised by ECB to
control interest rates BANKS MANAGE THEIR
LIQUIDITY IN THE INTERBANK MARKET
48The Market
Money Market
- EUROPEAN CENTRAL BANK provides LIQUIDITY to
European Banks, through weekly auctions. - EVERY BANK must DEPOSIT to NATIONAL CENTRAL BANK
the 2 of all deposits and debts issued in the
last two years. This reserves are supposed to
help in the case of liquidity shocks - 2 value fluctuates in time and it is recomputed
every month.
Banks sell and buy liquidity to adjust their
liquidity needs and at the same time tend to
reduce the value of reserve.
49The Market
Market Data
The interbank markets are basically managed by
each European country. These markets are in
almost all case phone-based, that means that each
bank has some brokers doing their transactions by
phone. The only exception is the Italian market,
which is totally screen-based, implying that each
banks operator can see real time quotes of all
other banks and do its transaction. The recent
paper by Boss et al. investigate the network of
overall credit relationships in the Austrian
Interbank market. In their study the authors
analyze all the liabilities for ten quarterly
single months periods, between 2000 and 2003,
among 900 banks. They find a power-law
distribution of contract sizes, and a power-law
decay of the distribution of incoming and
outgoing links (a link between two banks exists
if the banks have an overall exposure with each
other). Furthermore they show that the most
vulnerable vertices are those with the highest
centrality (measured by the number of paths that
go through them). A different issue has been
explored by Cocco et al. who have investigated
the nature of lending relationships in the
fragmented Portuguese interbank market over the
period 1997-2001. In fragmented markets the
amount and the interest rate on each loan are
agreed on a one-to-one basis between borrowing
and lending institutions. Other banks do not have
access to the same terms, and no public
information regarding the loan is available. The
authors showed that frequent and repeated
interactions between the same banks appear with a
probability higher than those expected for random
matching. In addition they found that during
illiquid periods, and in particular during the
Russian financial crisis preferential lending
relationships increased.
50The Market
Market Data
Italian Interbank Money Market Banks operating on
the Italian market, this market is fully
electronic for interbank deposit since 1990
(e-Mid) ) Daily volume 18 billion Euros ) 200
participants
We report here the analysis on 196 Italian banks
(plus 18 banks from abroad who interact with
them) who did 85202 transactions in 2000.
51INTRODUCTION
Time activity
two time scales day one month maintenance period
52Statistical Properties
Market Data
The network shows a rather peculiar
architecture The banks form a disassortative
network where large banks interact mostly with
small ones.
53Statistical Properties
Market Data
Actually the banks form different groups roughly
related to their size when considering the
average volume of money exchanged.
54Statistical Properties
Degree Distributions
Using the latter quantity we can divide banks in
four groups (same number of classes of the Bank
of Italy classification). Group 1 with volume in
the range 0-23 million Euro per day, Group 2 in
the range 23-70 million Euro per day, Group 3 in
the range 70-165 million Euro per day, Group 4
over 165 million Euro per day. In this way we
find an overlap of more than 90 between the two
classifications.
55Communities
Separation of business
Two main communities emerge Many small banks
and few little banks.
Second eigenvector of the normal matrix
56Modelling
Model of bank network
We assign to the N nodes (N is the size of the
system) a value drawn from the previous
distribution. Vertices origin and destination for
one edge are chosen with a probability pij
proportional to the sum of respective sizes vi
and vj . In formulas
57Modelling
Market Data
58MODELLING
Model and clustering
To quantify the agreement between experimental
and simulated networks we also define an overlap
parameter m specifying how good is the behavior
of the model in reproducing the observed
clustering. To quantify the agreement between
experimental and simulated networks, we proceed
in the following way. We define a matrix E, that
is a weighted matrix 4 4, where the weights
represent the number of connections between
groups. In order to measure the overlap between
the matrices obtained by data and by computer
model, we define a distance based on the
differences between the elements of the matrices.
59MODELLING
Model and clustering
We can define a distance between the number of
intergroup edges in experimental data and
numerical simulation.
The sum of all elements, is equal to Etot in both
cases. Therefore the maximum possible difference
is 2Etot. This happens when all the links are
between two groups in one case and in other two
groups in the other. We use this maximum value to
normalize the above expression and we than define
the overlap parameter m m 1 - d/2Etot
WE HAVE AN OVERLAP m98
60MODELLING
Model and clustering
To evaluate the relevance of division in classes,
we have to compare the value of Eg,k with the
corresponding quantity Enullg,k for a network
where there is not a division in classes (null
hypothesis). The analytical expression for the
null case is Enullg,k Etot/10 where 10 is the
number of possible couplings between the 4
groups. The comparison between the two networks
evidences that in the real case emerges the
division in groups in Table for each possible
combination of groups is reported the value
Eg,k/Etot. In the null case, each element of the
same matrix should be equal to 10.
Group 1 2 3 4
1 0 6 4 8
2 6 3 8 17
3 4 8 5 27
4 8 17 27 22
61CONCLUSIONS
Market Data
- Financial Networks can help
- In distinguishing behaviour of different markets
- In visualizing important features as the business
role - In testing the validity of market models
- They might be an example of scale-free networks
even more general than those described by growth
and preferential attachment.
62CONCLUSIONS
Thanks to Giulie
Giulia Iori, Department of Economics, School of
Social Science City University, London UK
Giulia De Masi, Dep. Economics Università delle
Marche Italy