The Fractal Properties of Growing Networks - PowerPoint PPT Presentation

About This Presentation
Title:

The Fractal Properties of Growing Networks

Description:

Usually wikipedia articles has no external links, or just a few of them. ... Also we could have a sort of fitness for the various entries (but in this case ... – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 62
Provided by: guidocal
Category:

less

Transcript and Presenter's Notes

Title: The Fractal Properties of Growing Networks


1
Data and networksGIACS Conference Palermo 9-4-08
2
  • Networks

3
  • Networks as an instrument of Data Filtering

Correlation based Minimal Spanning Tree 1071
stocks traded at NYSE between 1987-1998 Different
colours refers to different SIC sectors
Correlation based Minimal Spanning
Tree Artificial market of 1071 stocks According
the one factor model. Different colours refers to
different SIC sectors
Topology of correlation based minimal spanning
trees in real and model marketsG. Bonanno, G.
Caldarelli F. Lillo, R. Mantegna,Physical Review
E 68 046130 (2003).
Networks of equities in financial marketsG.
Bonanno, GC, F. Lillo, S. Miccichè, N.
Vandewalle, R. N. Mantegna,European Physical
Journal B 38 363-372 (2004).
4
  • The Cosin project

COSIN (official number IST-20001-33555) was a
Research Project financed by European
Commission through the Fifth Framework
Programme. COSIN is part of the actions taken by
the Future and Emerging Technologies (FET) in
the priority area of research of Information
Society Technologies (IST) (http//www.cordis.lu/I
ST/FET) Documents at http//www.cosinproject.org
5
  • The Cosin project
  • COSIN involves
  • 7 different nodes in 5 countries
  • (Ph CS) Roma, Italy
  • (Ph) Barcelona, Spain
  • (Ph) Lausanne, Switzerland
  • (Ph) Ens, Paris, France
  • (CS) Karlsruhe, Germany
  • (Ph) Upsud, Paris, France

EU countries 2001
Non EU countries 2001
EU COSIN participant
Non EU COSIN participant
6
  • Some of the Cosin people

G. Bonanno, G. Caldarelli, F.Colaiori, G. Di
Battista, D. Donato, S. Leonardi, R. Mantegna, A.
Marchetti-Spaccamela, M. Patrignani, L.
Pietronero, V. Servedio
A. Arenas, M. Boguña, A. Díaz-Guilera, R. Ferrer
i Cancho, M.A. Muñoz, M.A Serrano, R.
Pastor-Satorras
G. Bianconi, A. Capocci, P. De Los Rios, T.
Erlebach, T. Petermann, Y.-C. Zhang
A. Barrat. S. Battiston, P. Nadal, A. Vespignani,
G. Weisbuch,
U. Brandes, M. Gaertler, M. Kaufmann, D. Wagner,
7
  • The Cosin project
  • To develop a unified set of Complex Systems
    theoretical methodologies for
    the characterization of Complex Networks,
  • To develop statistical models for networks
    growth and evolution.
  • To collect data mainly for Internet and World
    Wide Web
  • To extend analysis to social and economic
    networks
  • To develop visualization tools for large scale
    systems
  • To disseminate results through publication,
    conferences and project web site.

8
  • A Cosin summary
  1. After three years of activity we have a common
    ground of methodologies and tools at least
    between computer scientists and physicists (also
    some economists). Some more effort would be
    necessary to integrate social scientists.
  2. We provided a class of models for network growth
    and evolution, moreover we addressed the study of
    statistical properties of weighted networks.
  3. Data collection for Internet and World Wide Web
    resulted much more difficult than expected.
    Actually larger consortia have been funded
    specifically for this task in the meanwhile.
    Thank to external collaboration we still found
    the data to validate the models we produced

9
  1. In economic and financial networks , COSIN people
    are on the frontline of this very new field of
    research. This new approach attracted the
    interest of the community at level of Nobel
    laureates. Less successful has been the impact in
    social science. Unexpected and very successful
    has been the impact on biology (botany, zoology).
  2. Standard visualization problem wants to keep all
    the graph structure and present it suitably. On
    this point some progress has been made, it is
    worth to mention that several ideas are now under
    consideration for the visualization of
    simplified graphs.
  3. The project had a considerable impact on the
    scientific community in terms of citations,
    visibility, conferences, schools, books and data
    download from site. Maybe some more work could be
    done for the general public.

10
The graph of scientific collaborations on
scale-free networks in statistical physics M.E.J
Newman PRE 69 026113 (2004)
11
  • Dissemination
  • More than 150 referred papers (some of them
    Nature, PNAS, PRL, LNCS)
  • Lectures and talks in the various world
    conference (for physics STATPHYS, APS Meetings)
    and invited talks in various institutions
  • Books

12
The Sitges Conference published the proceedings
of the most interesting talks on a special volume
Statistical Mechanics of Complex
NetworksSeries Lecture Notes in Physics,
Vol. 625 Pastor-Satorras, Romualdo Rubi,
Miguel Diaz-Guilera, Albert (Eds.) 2003, XII,
206 p., HardcoverISBN 3-540-40372-8
The Rome Conference published the proceeding on a
special issue of the European Physical Journal B
13
  • Web site

14
  • What about data?
  • Trivially, the access to data was crucial for
    the project
  • We had that in some cases we found very nice
    datasets and could work on them
  • Internet (AS topology)
  • Wikipedia.
  • In presence of poor or no data, we obtained (of
    course) only
  • partial results
  • Liquidity shocks,
  • River networks

15
STATISTICAL PROPERTIES OF THE WIKIGRAPH
L.S. Buriol A. Capocci, F. Colaiori, D. Donato,
S. Leonardi, F. Rao, V. Servedio, GC
  • Taxonomy and clustering in collaborative systems
    the case of the on-line encyclopedia
    WikipediaA.Capocci, F. Rao, GC Europhysics
    Letters 81 28008 (arXiv0710.3058) (2008)
  • Preferential attachment in the growth of social
    networks the Internet encyclopedia WikipediaA.
    Capocci, V.D.P. Servedio, F. Colaiori, L.S.
    Buriol, D. Donato, S. Leonardi, GC
  • Physical Review E 74 036116 (2006).

Centro E. Fermi
16
  • Wikipedia intro

17
  • Wikipedia intro

Wikipedia in other languages You may read and
edit articles in many different
languages Wikipedia encyclopedia languages with
over 100,000 articles Deutsch (German)
Français (French) Italiano (Italian)
(Japanese) Nederlands (Dutch) Polski (Polish)
Português (Portuguese) Svenska (Swedish)
Wikipedia encyclopedia languages with over
10,000 articles ??????? (Arabic) ?????????
(Bulgarian) Català (Catalan) Cesky (Czech)
Dansk (Danish) Eesti (Estonian) Español
(Spanish) Esperanto Galego (Galician) ?????
(Hebrew) Hrvatski (Croatian) Ido Bahasa
Indonesia (Indonesian) ??? (Korean) Lietuviu
(Lithuanian) Magyar (Hungarian) Bahasa Melayu
(Malay) Norsk bokmål (Norwegian) Norsk
nynorsk (Norwegian) Româna (Romanian) ???????
(Russian) Slovencina (Slovak) Slovenšcina
(Slovenian) ?????? (Serbian) Suomi (Finnish)
Türkçe (Turkish) ?????????? (Ukrainian) ??
(Chinese) Wikipedia encyclopedia languages with
over 1,000 articles Alemannisch (Alemannic)
Afrikaans Aragonés (Aragonese) Asturianu
(Asturian) Az?rbaycan (Azerbaijani)
Bân-lâm-gú (Min Nan) ?????????? (Belarusian)
Bosanski (Bosnian) Brezhoneg (Breton) ?a???
?e??? (Chuvash) Corsu (Corsican) Cymraeg
(Welsh) ???????? (Greek) Euskara (Basque)
????? (Persian) Føroyskt (Faroese) Frysk
(Western Frisian) Gaeilge (Irish) Gàidhlig
(Scots Gaelic) ?????? (Hindi) Interlingua
Íslenska (Icelandic) Basa Jawa (Javanese)
??????? (Georgian) ????? (Kannada) Kurdî /
????? (Kurdish) Latina (Latin) Latviešu
(Latvian) Lëtzebuergesch (Luxembourgish)
Limburgs (Limburgish) ?????????? (Macedonian)
????? (Marathi) Napulitana (Neapolitan)
Occitan ???? (Ossetic) Plattdüütsch (Low
Saxon) Scots Sicilianu (Sicilian) Simple
English Shqip (Albanian) Sinugboanon
(Cebuano) Srpskohrvatski/??????????????
(SerboCroatian) ????? (Tamil) Tagalog
??????? (Thai) Tatarça (Tatar) ??????
(Telugu) Ti?ng Vi?t (Vietnamese) Walon
(Walloon) Complete list Multilingual
coordination Start a Wikipedia in another
language
18
  • Wikipedia intro

The datasets of each language are available in
two selfextracting files for mysql database. The
table cur contains the current on-line articles,
whereas the table old contains all previous
versions of each current article. Old versions of
an article are identified for using the same
title, and not the same id. The dataset dumps are
updated almost weekly, so the current graph is
usually not more than a week old. For
generating a graph from the link structure of a
dataset, each article is considered a node and
each hyperlink between articles is a link in this
graph. In the wikipedia datasets, each webpage is
a single article. An article also might contain
some external links that point pages outside the
dataset. Usually wikipedia articles has no
external links, or just a few of them. These kind
of links are not considered for generating the
wikigraphs, since we want to restrict the graph
to pages into the set being analyzed.
19
  • Wikipedia interests
  • sociological reasons the encyclopedia collects
    pages written by a number of indipendent and
    eterogeneous individuals. Each of them
    autonomously decides about the content of the
    articles with the only constraint of a prefixed
    layout. The autonomy is a common feature of the
    content creation in the Web. The wikipedia
    authors community is formed by members whose
    only wish is to make available to the world
    concepts and topics that they consider
    meaningful. In some sense, tracing the evolution
    of the wikipedia subsets should mirror the
    develop of significant trends within each
    linguistic community.
  • generation on time wikipedia provides time
    information associated with nodes. Moreover, it
    provides old information time information for
    the creation and the modifications for each page
    on the dataset.
  • independency of external links wikipedia
    articles link mainly to articles on the same
    dataset.
  • variety of graph sizes it can be collected one
    graph by language, and the graph dimensions vary
    from a few hundred pages up to half million pages.

20
  • Results
  • Summarizing
  • We have available all the history of growth, so
    that we can study the evolution
  • We have an example of a social network of huge
    size
  • We can compare the system produced by users of
    different language, thereby
  • measuring the effect of different cultures.
  • We can study Wikipedia as a case study for the
    World Wide Web

WE RECOVER A PREFERENTIAL ATTACHMENT MECHANISM
FROM THE DATA. DIFFERENT LANGUAGES PRODUCE
SIMILAR STRUCTURES WE FIND A SYSTEM SIMILAR TO
THE WWW EVEN IF THE MICROSCOPIC RULE OF GROWTH IS
VERY DIFFERENT.
21
  • The Wiki graphs

We generated six wikigraphs, wikiEN, wikiDE,
wikiFR, wikiES, wikiIT and wikiPT, generated from
the English, German, French, Spanish, Italian and
Portuguese datasets, respectively. The graphs
were obtained from an old dump of June 13, 2004.
We are not using the current data due to disk
space restrictions. The English dataset of June
2005 has more than 36 GB compacted, that is about
200 GB expanded.
The page that was mostly visited was the main
pages for wikiEN, wikiDE, wikiFR and wikiES,
while that for the datasets wikiIT and wikiPT
there were no visits associated with the pages.
22
  • SCC (Strongly Connected Component) includes
    pages that are mutually reachable by traveling on
    the graph
  • IN component is the region from which one can
    reach SCC
  • OUT component encompasses the pages reached from
    SCC.
  • TENDRILS are pages reacheable from the IN
    component,and not pointing to SCC or OUT region
    TENDRILS also includes those pages that point to
    the OUT region not belonging to any of the other
    de?ned regions.
  • TUBES connect directly IN and OUT regions,
  • DISCONNECTED regions are those isolated from the
    rest.

The Bow-tie structure, found in the WWW (Broder
et al. Comp. Net. 33, 309, 2000)
23
  • The Wikigraphs

The measure/size of the Wikigraph for the various
languages.
The percentage of the various components of the
Wikigraph for the various languages.
24
  • Power laws (what else? ? )

The Degree shows fat tails that can be
approximated by a power-law function of the kind
P(k) k-g Where the exponent is the same both
for in-degree and out-degree.
In the case of WWW 2 gin 2.1
indegree(empty) and outdegree(filled)
Occurrency distributions for the Wikgraph in
English (?) and Portuguese (?).
25
  • Correlations

As regards the assortativity (as measured by the
average degree of the neighbours of a vertex with
degree k) there is no evidence of any assortative
behaviour.
The average neighbors indegree, computed along
incoming edges, as a function of the indegree
for the English (?) and Portuguese (?)
26
  • PageRank

The pagerank distribution for wikiEN is a power
law function with ? 2.1. Previous measures in
webgraphs also exhibit the same behaviour for
the pagerank distribution. We list the number
of visits of the top ranked pages just to show
that this value is not related with the pagerank
values. We confirm that very little correlation
was found between the link analysis
characteristics and the actual number of visits.
27
  • Preferential attachment

Given the history of growth one can verify the
hypothesis of preferential attachment. This is
done by means of the histogram P(k) who gives the
number of vertices (whose degree is k) acquiring
new connections at time t. This is quantity is
weighted by the factor N(t)/n(k,t)
We find preferential attachment for in and out
degree.
English (?) and Portuguese (?). White
in-degree Filled out-degree
28
  • Preferential attachment

In our opinion the nature of this preferential
attachment is effective ratther than the real
driving force in the phenomenon.
In other words the linear preferential attachment
can be originated by a copying procedure (new
vertices are introduced by copying old ones and
keeping most of the edges). Also we could have a
sort of fitness for the various entries (but in
this case one has a multidimensional series of
quantities describing the importance of one
page).
Apart the interpretation the data show a rather
clear LINEAR PREFERENTIAL ATTACHMENT
29
  • Updates statistics

Other power-laws related to dyamics need to be
explained For example the number of updates also
follows a power law.
Each point presents the number of nodes (y axis)
that were updated exactly x times.
30
  • Wikipedia growth model
  • We introduced an evolution rule, similar to other
    models of
  • rewiring already considered,
  • At each time step, a vertex is added to the
    network. It is connected to the existing
    vertices by M oriented edges the direction of
    each edge is drawn at random
  • with probability R1 the edge leaves the new
    vertex pointing to an existing one chosen with
    probability proportional to its indegree
  • with probability R2, the edge points to the new
    vertex, and the source vertex is chosen with
    probability proportional to its outdegree.
  • Finally, with probability R3 1 - R1 - R2 the
    edge is added between existing vertices the
    source vertex is chosen with probability
    proportional to the outdegree, while the
    destination vertex is chosen with probability
    proportional to the indegree.

See for example Krapivsky Rodgers and Redner
PRL 86 5401 (2001)
31
  • Wikipedia growth model

From these data it seems that a model in the
spirit of BA could reproduce most of the features
of the system.
  • Actually
  • This network is oriented.
  • The preferential attachment in Wikipedia has a
    somewhat different nature. Here, most of the
    times, the edges are added between existing
    vertices differently from the BA model. For
    instance, in the English version of Wikipedia a
    largely dominant fraction 0.883 of new edges is
    created between two existing pages, while a
    smaller fraction of edges points or leaves a
    newly added vertex (0.026 and 0.091 respectively).

32
  • Wikipedia growth model

The model can be solved analytically
We can use for the model the empirical values of
R10.026 R20.091 R30.883 Already measured for
the English version of Wikigraph
P(kin) kin- gin gin -(11/(1-R2))
P(kout) kout- gout gout -(11/(1-R1))
gin ? 2.100 gout ? 2.027
33
  • Wikipedia growth model

The model can be solved analytically
Knnin (kin) M N1-R1 R1R2/R3 (R3?0)
Both cases is constant
Knnin (kin) M R1R2 ln (N) (R30)
The value of the constant depends also upon the
initial conditions. The two lines refer to two
realizations of the model where in one case the
0.5 of the first vertices has been removed.
34
  • Wikipedia growth model
  • We have a structure that resembles the bow-tie
    of the WWW
  • We have a power-law decay for the degree
    distributions and also
  • a power-law decay for the number of one page
    updates
  • Preferential Attachment in the Rewiring seems to
    be the driving force
  • in the evolution of the system
  • The microscopic structure of rewiring is very
    different from that of WWW
  • In principle a user can change any series of
    edges and add as many
  • pages as wanted. Still most of the quantities
    are similar

35
  • Wikipedia growth model

It turns out that the pagerank of the pages is
not related with the number of visit opens a very
interesting scenario for further research work.
Since, by definition, pagerank should give us the
visit time of the page and since actually it is
complety indipendent by the number of visits, we
wonder if pagerank is a good measure of the
authoritativeness of the pages in wikigraphs and
which modifications should be introduced in order
to tune its performances.
36
  • River Networks

37
  • River Networks

38
  • River Networks

39
  • River Networks

40
  • River Networks

From satellite images one gets Digital Elevation
Models (DEM)
From DEM a spanning tree is computed (via
steepest descent)
From the spanning tree, the number of points
uphill is computed
41
  • River Networks

HACKS LAW L// Ah
42
  • River Networks

43
  • River Networks

Data on Mars topography were collected through
the Mars Orbiter Laser Altimeter (MOLA)
44
  • River Networks

45
  • River Networks

46
  • River Networks

Results are that we can distinguish regions
whose DEM networks have properties similar to
River Networks on Earth.
For River on Earth P(A) ? A-1.43
47
THE LIQUIDITY MARKET
Monetary Policy
ECB
Reserves
Banks get liquidity from ECB through
auctions Monetary policy realised by ECB to
control interest rates BANKS MANAGE THEIR
LIQUIDITY IN THE INTERBANK MARKET
48
The Market
Money Market
  • EUROPEAN CENTRAL BANK provides LIQUIDITY to
    European Banks, through weekly auctions.
  • EVERY BANK must DEPOSIT to NATIONAL CENTRAL BANK
    the 2 of all deposits and debts issued in the
    last two years. This reserves are supposed to
    help in the case of liquidity shocks
  • 2 value fluctuates in time and it is recomputed
    every month.

Banks sell and buy liquidity to adjust their
liquidity needs and at the same time tend to
reduce the value of reserve.
49
The Market
Market Data
The interbank markets are basically managed by
each European country. These markets are in
almost all case phone-based, that means that each
bank has some brokers doing their transactions by
phone. The only exception is the Italian market,
which is totally screen-based, implying that each
banks operator can see real time quotes of all
other banks and do its transaction. The recent
paper by Boss et al. investigate the network of
overall credit relationships in the Austrian
Interbank market. In their study the authors
analyze all the liabilities for ten quarterly
single months periods, between 2000 and 2003,
among 900 banks. They find a power-law
distribution of contract sizes, and a power-law
decay of the distribution of incoming and
outgoing links (a link between two banks exists
if the banks have an overall exposure with each
other). Furthermore they show that the most
vulnerable vertices are those with the highest
centrality (measured by the number of paths that
go through them). A different issue has been
explored by Cocco et al. who have investigated
the nature of lending relationships in the
fragmented Portuguese interbank market over the
period 1997-2001. In fragmented markets the
amount and the interest rate on each loan are
agreed on a one-to-one basis between borrowing
and lending institutions. Other banks do not have
access to the same terms, and no public
information regarding the loan is available. The
authors showed that frequent and repeated
interactions between the same banks appear with a
probability higher than those expected for random
matching. In addition they found that during
illiquid periods, and in particular during the
Russian financial crisis preferential lending
relationships increased.
50
The Market
Market Data
Italian Interbank Money Market Banks operating on
the Italian market, this market is fully
electronic for interbank deposit since 1990
(e-Mid) ) Daily volume 18 billion Euros ) 200
participants
We report here the analysis on 196 Italian banks
(plus 18 banks from abroad who interact with
them) who did 85202 transactions in 2000.
51
INTRODUCTION
Time activity
two time scales day one month maintenance period
52
Statistical Properties
Market Data
The network shows a rather peculiar
architecture The banks form a disassortative
network where large banks interact mostly with
small ones.
53
Statistical Properties
Market Data
Actually the banks form different groups roughly
related to their size when considering the
average volume of money exchanged.
54
Statistical Properties
Degree Distributions
Using the latter quantity we can divide banks in
four groups (same number of classes of the Bank
of Italy classification). Group 1 with volume in
the range 0-23 million Euro per day, Group 2 in
the range 23-70 million Euro per day, Group 3 in
the range 70-165 million Euro per day, Group 4
over 165 million Euro per day. In this way we
find an overlap of more than 90 between the two
classifications.
55
Communities
Separation of business
Two main communities emerge Many small banks
and few little banks.
Second eigenvector of the normal matrix
56
Modelling
Model of bank network
We assign to the N nodes (N is the size of the
system) a value drawn from the previous
distribution. Vertices origin and destination for
one edge are chosen with a probability pij
proportional to the sum of respective sizes vi
and vj . In formulas
57
Modelling
Market Data
58
MODELLING
Model and clustering
To quantify the agreement between experimental
and simulated networks we also define an overlap
parameter m specifying how good is the behavior
of the model in reproducing the observed
clustering. To quantify the agreement between
experimental and simulated networks, we proceed
in the following way. We define a matrix E, that
is a weighted matrix 4 4, where the weights
represent the number of connections between
groups. In order to measure the overlap between
the matrices obtained by data and by computer
model, we define a distance based on the
differences between the elements of the matrices.
59
MODELLING
Model and clustering
We can define a distance between the number of
intergroup edges in experimental data and
numerical simulation.
The sum of all elements, is equal to Etot in both
cases. Therefore the maximum possible difference
is 2Etot. This happens when all the links are
between two groups in one case and in other two
groups in the other. We use this maximum value to
normalize the above expression and we than define
the overlap parameter m m 1 - d/2Etot
WE HAVE AN OVERLAP m98
60
MODELLING
Model and clustering
To evaluate the relevance of division in classes,
we have to compare the value of Eg,k with the
corresponding quantity Enullg,k for a network
where there is not a division in classes (null
hypothesis). The analytical expression for the
null case is Enullg,k Etot/10 where 10 is the
number of possible couplings between the 4
groups. The comparison between the two networks
evidences that in the real case emerges the
division in groups in Table for each possible
combination of groups is reported the value
Eg,k/Etot. In the null case, each element of the
same matrix should be equal to 10.
Group 1 2 3 4
1 0 6 4 8
2 6 3 8 17
3 4 8 5 27
4 8 17 27 22
61
CONCLUSIONS
Market Data
  • Financial Networks can help
  • In distinguishing behaviour of different markets
  • In visualizing important features as the business
    role
  • In testing the validity of market models
  • They might be an example of scale-free networks
    even more general than those described by growth
    and preferential attachment.

62
CONCLUSIONS
Thanks to Giulie
Giulia Iori, Department of Economics, School of
Social Science City University, London UK
Giulia De Masi, Dep. Economics Università delle
Marche Italy
Write a Comment
User Comments (0)
About PowerShow.com