PowerLaws in Distributed Systems - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

PowerLaws in Distributed Systems

Description:

Six Degrees of Separation ... to be known as Six Degrees of Separation ... Traditional graph theory has been based on normally distributed (also called as ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 37
Provided by: mikv
Category:

less

Transcript and Presenter's Notes

Title: PowerLaws in Distributed Systems


1
Power-Laws inDistributed Systems
  • Mikko Vapa
  • mikko.vapa_at_jyu.fi
  • TIES427 Distributed Systems

2
Contents
  • Network models
  • Stanley Milgrams studies on social networks
  • Normal distribution and random graphs
  • Power-law distribution and scale-free graphs
  • Systems with power-law properties
  • Fault-tolerance characteristics of random and
    scale-free distributed systems

3
Network Models Why?
  • Using network models it is possible to understand
    and foresee the structure of a specific network
    and behavior
  • How the distances between nodes grow when network
    grows? Is it possible to keep the distance low in
    a large network by adding links to strategic
    points? Where?
  • How resilient is the network for failures of
    nodes and links? How many nodes can be removed
    from a connectednetwork until it breaks down to
    isolated clusters?
  • What would be the best way to route messages from
    one node to another, if the structure of the
    network is known only locally?
  • Is it possible to identify from a network (for
    example WWW) using only neighbor information such
    nodes that have important content or belong to a
    related group of nodes?
  • Orponen, P., Internet ja muut informaatioverkosto
    t, Tieteen päivät 2005

4
Research Experiment on Social Networks
  • In 1967 Stanley Milgram conducted a social
    experiment to find out what is the distance
    between any two people in the United States
  • 160 people around the states were selected as
    starting points and 2 people as destinations for
    letters (destination identified by name, photo
    and address of the person)
  • The letters could only be passed from hand to
    hand between acquaintances

5
Six Degrees of Separation
  • 42 of the 160 letters arrived to the destination
    and the median number of intermediate persons was
    5.5
  • The result became to be known as Six Degrees of
    Separation
  • Considering the amount of people in USA (about
    100 million those days) the distance was
    considered very short
  • Thus a saying Its a small world!
  • Milgram S., The Small World Problem, Psychology
    Today 1(1), 60-67 (1967)

6
Six Degrees of Separation
  • The results of the study raised a fundamental
    question
  • Why is the distance so short?
  • The question was left unanswered for years
  • In the meanwhile some progress was going on in
    the graph theory

7
Normal Distribution
  • Traditional graph theory has been based on
    normally distributed (also called as bell curve)
    graphs Erdös Renyi, 1959 Solomonoff
    Rapoport, 1951

Where µ mean s standard deviation
8
Random Graphs
  • Bell curve graphs can be generated by randomly
    connecting links between nodes
  • If number of links in a graph follows bell curve
    distribution then
  • each node has nearly the same number of links
    (mean standard deviation)
  • the number of nodes in a network can be
    approximated by equation n kd, wheren
    number of nodes, k average number of links for
    each node and d network distance
  • thus network distance between any two nodes is
    approximated by equation d log n / log k

9
Random Graphs
  • Many networks were modelled as random graphs
  • social networks
  • distributed systems (Internet, WWW)
  • virus infection pathways etc.
  • But the distribution was found to be inadequate
    for describing real-world networks where the link
    distribution between nodes is not equal

10
Indications of Power-Law Structure in WWW
  • Albert, Hawoong and Barabási found in 1999 that
    World Wide Web links do not follow normal
    distribution
  • There are hubs that gather many links and there
    are many web pages that are only linked to few
    pages
  • The power-law structure of the WWW was found

11
Power-Law Distribution
  • The number of web pages with exactly k incoming
    links, denoted by N(k), follows N(k) k ?,
    where the parameter ? is the degree exponent
  • For WWW incoming links the ? was found to be 2.1
  • For outgoing links ? 2.5
  • To illustrate the difference between bell curve
    and power-law distribution lets compare them
    using highway and airport maps

12
Bell Curve and Power-Law Distributions
13
Bell Curve and Power-Law Distributions
  • It seems that power-law distribution has many
    nodes with only few links and few nodes with many
    links
  • This characteristic is also called as 80/20 rule
    (for example 20 of customers bring 80 of net
    sales)

14
Scale-Free Model
  • To understand how these kind of networks form and
    how they behave a scale-free network model was
    developed
  • Note that already in 1955 Simon described the
    Matthew effect as a rule that a scientific
    credit does not go to the person who proposes the
    new results but to person who has most influence
    in the network and in 1965 De Solla Price
    interpreted this as a cumulative advantage
    principle
  • Now scale-free model provided a tool for
    analysing such behavior

15
Scale-Free Model
  • Scale-free link distribution follows power-law
  • the proportion of nodes having a given number of
    links n is P(n) 1 /n k
  • has no term related to the size of the network
    (no characteristic scale as in random graph)
  • therefore the name scale-free
  • most nodes have only few connections
  • some have a lot of links
  • important for binding disparate regions together
  • guarantees short paths between nodes in the
    network
  • guarantees multiple paths between any two node

16
Scale-Free Model
  • The model uses growth and preferential attachment
    for generating the network
  • New node is connected to two nodes
  • The two nodes are selected based on their number
    of links with a probability
  • The nodes that are early in the network acquire
    most of the neighbours (Rich gets richer
    principle)

17
Scale-Free Model
  • First there are two nodes (A and B) and the third
    one (C) will connect to them
  • Fourth one (D) connects with a probability of
    (endpoints)/(all endpoints) 2/6 to an existing
    node
  • Fifth one (E) connects with a probability of 3/10
    to A and B and with 2/10 to C and D and so on

B
B
B
B
B
D
D
D
A
A
A
A
A
E
E
C
C
C
C
18
Scale-Free Network
  • Scale-free network of 50 nodes (1, 2, 3 and 4 are
    the rich ones)

19
Systems withPower-Law Properties
  • Surprisingly many systems follow power-law, for
    example
  • Internet (intra-domain routing and inter-domain
    routing topologies)
  • World Wide Web
  • Peer-to-Peer networks (Gnutella, Freenet)
  • E-mail users
  • Telephone call graphs
  • Molecules and chemical reactions in living
    organisms (H2O, ATP, ADP and CO2 molecules as
    hubs)

20
Internet
  • The power-law characteristics of the Internet was
    found in 1999
  • The power-law topology applies both in the router
    and autonomous system domain levels
  • Faloutsos M., Faloutsos P. and Faloutsos C., On
    power-law relationships of the Internet
    topology, Computer Communication Review
    29(4)251-262 (1999)

21
Internet
  • Router-level and inter-domain level (autonomous
    systems of Border Gateway routing protocol)
  • By knowing the structure of Internet the average
    number of links for router-level ltkgt 3.5 and
    inter-domain level ltkgt 2.6 could be estimated
    as well as the diameter (for router-level d 9
    and for autonomous systems d 4)

22
Internet
  • The degree of Internets autonomous systems
    follows power-law with exponent -2.16 (and
    router-level with exponent 2.48)
  • The number of nodes having degree k 1/k2.16

log(number of nodes)
-2.16
log(degree)
Orponen, 2005
23
(No Transcript)
24
Web
  • Also because WWW hyperlinks follow directed
    power-law network structure the diameter of 19
    hops between any document could be calculated and
    the average number of outgoing links estimated as
    ltkgt 7
  • Albert R., Hawoong J., Barabási A.-L., Diameter
    of the World Wide Web, Nature 401130-131 (1999)

25
Web
  • The number of web pages with k outgoing links
    1/k2.4
  • The number of web pages with k web pages pointing
    at them 1/k2.1

Orponen, 2005
26
Web
  • Note that because of directed nature of the web
    links the system also has IN, OUT and Central
    Core/Strongly Connected Components (SCC)
  • IN continent is hard to index for search engines
  • Broder et al., 1999

27
Web
28
Web
  • The network growth models (for example scale-free
    model) explain well the average degree, degree
    distribution and diameter
  • However, they do not yet explain the clustering
    neighbor nodes are usually also neighbors to each
    other
  • No simple model yet exists for explaining this
    behavior

Orponen, 2005
29
Systems and Their Degree Exponents
  • Different kind of systems can be compared using
    their degree exponent

30
Other Characteristics
  • In addition to being searchable (Milgrams
    experiment) and having low network diameter
    power-law structure has also fault-tolerance
    benefits
  • For example resilient systems should be stable
    even though a high number of connections are
    broken
  • Useful property in distributed systems

31
Fault-Tolerance
  • Two types of failure scenarios
  • random failures where a destroyed node is
    selected randomly
  • targeted attack where the highest connected node
    is selected for destruction
  • Scale-free networks have good resilience on
    random failures, but are fragile under attacks
    (system has an Achilles Heel)

32
Fault-Tolerance
  • In scale-free networks, the few numbered hubs are
    in important role for communication
  • Albert et al. 2000 the average distance between
    nodes in case of random failures and targeted
    attack

Average distance between nodes
15
Targeted attacks
10
5
Random failures
2
1
Amount of removed nodes
Orponen, 2005
33
Random Failuresand Attacks
  • Random and scale-free networks under different
    failure scenarios

y network diameter x fraction of
nodes destroyed
Source R. Albert, H. Jeong, A.-L. Barabasi,
Error and attack tolerance of complex networks
34
Random Failuresand Attacks
  • Random network has a critical point (fc) under
    both failure scenarios
  • Scale-free only under attacks

y1 average size of the isolated
clusters ltsgt y2 relative size of
the largest cluster to all nodes
S x fraction of nodes destroyed
y1
y1
y2
y2
Source R. Albert, H. Jeong, A.-L. Barabasi,
Error and attack tolerance of complex networks,
2000
35
Fault-Tolerance Internet and WWW
  • Internet and WWW failures follow the same pattern
    as scale-free networks

y network diameter x fraction of
nodes destroyed
Source R. Albert, H. Jeong, A.-L. Barabasi,
Error and attack tolerance of complex networks
36
Summary
  • Many phenomenas have been identified to follow a
    power-law curve
  • Power-laws are common in distributed systems
    (result of a natural processes growth and
    preferential attachment)
  • Work is going on to develop algorithms, which
    utilize these properties
  • Reference Barabási, A.-L., Linked The New
    Science of Networks, Perseus Publishing, 2002
Write a Comment
User Comments (0)
About PowerShow.com