Title: RTG: A Recursive Realistic Graph Generator using Random Typing
1RTG A Recursive Realistic Graph Generator using
Random Typing
- Leman Akoglu and Christos Faloutsos
- Carnegie Mellon University
2Outline
- Motivation
- Problem Definition
- Related Work
- A Little History
- Proposed Model
- Experimental Results
- Conclusion
3Motivation - 1
- Complex graphs --WWW, computer,
- biological, social networks, etc.
- exhibit many common properties
- - power laws
- - small and shrinking diameter
- - community structure
- -
- How can we produce
- synthetic but realistic graphs?
http//www.aharef.info/static/htmlgraph/
4Motivation - 2
- Why do we need synthetic graphs?
- Simulation
- Sampling/Extrapolation
- Summarization/Compression
- Motivation to understand pattern generating
processes
5Problem Definition
- Discover a graph generator that is
- G1. simple the more intuitive the better!
- G2. realistic outputs graphs that obey all
laws - G3. parsimonious requires few parameters
- G4. flexible able to produce the cross-product
of un/weighted, un/directed, uni/bipartite
graphs - G5. fast generation should take linear time with
the size of the output graph
6Outline
- Motivation
- Problem Definition
- Related Work
- A Little History
- Proposed Model
- Experimental Results
- Conclusion
7Related Work
- Graph Properties
- ? What we want to match
- 2. Graph Generators
- ? What has been proposed earlier
8Related Work 1 Graph Properties
9Related Work 2 Graph Generators
- Erdos-Rényi (ER) model Erdos, Rényi 60
- Small-world model Watts, Strogatz 98
- Preferential Attachment Barabási, Albert 99
- Winners dont take all Pennock et al. 02
- Forest Fire model Leskovec, Faloutsos 05
- Butterfly model McGlohon et al. 08
10Related Work 2 Graph Generators
- Model some static graph property
- Neglect dynamic properties
- Cannot produce weighted graphs.
- Erdos-Rényi (ER) model Erdos, Rényi 60
- Small-world model Watts, Strogatz 98
- Preferential Attachment Barabási, Albert 99
- Winners dont take all Pennock et al. 02
- Forest Fire model Leskovec, Faloutsos 05
- Butterfly model McGlohon et al. 08
11Related Work 2 Graph Generators
- Random dot-product graphs
- Kraetzl, Nickel 05 Young, Scheinerman
07 - Utility-based models Fabrikant et al. 02
- Even-Bar et al. 07 Laoutaris, 08
- Kronecker graphs
- Leskovec et al. 07 Akoglu et al. 08
12Related Work 2 Graph Generators
- Produces only undirected graphs
- Cannot produce weighted graphs.
- Requires quadratic time
- Random dot-product graphs
- Kraetzl, Nickel 05 Young, Scheinerman
07 - Utility-based models Fabrikant et al. 02
- Even-Bar et al. 07 Laoutaris, 08
- Kronecker graphs
- Leskovec et al. 07 Akoglu et al. 08
13Related Work 2 Graph Generators
- Produces only undirected graphs
- Cannot produce weighted graphs.
- Requires quadratic time
- Random dot-product graphs
- Kraetzl, Nickel 05 Young, Scheinerman
07 - Utility-based models Fabrikant et al. 02
- Even-Bar et al. 07 Laoutaris, 08
- Kronecker graphs
- Leskovec et al. 07 Akoglu et al. 08
14Related Work 2 Graph Generators
- Produces only undirected graphs
- Cannot produce weighted graphs.
- Requires quadratic time
- Random dot-product graphs
- Kraetzl, Nickel 05 Young, Scheinerman
07 - Utility-based models Fabrikant et al. 02
- Even-Bar et al. 07 Laoutaris, 08
- Kronecker graphs
- Leskovec et al. 07 Akoglu, 08
- Multinomial/Lognormal distrib.
- Fixed number of nodes
15Outline
- Motivation
- Problem Definition
- Related Work
- A Little History
- Proposed Model
- Experimental Results
- Conclusion
16A Little History - 1
- Zipf, 1932
- In many natural languages, the rank r and the
frequency fr of words follow a power law - fr ? 1/r
17A Little History - 2
- Mandelbrot, 1953
- Humans optimize avg. information per unit
transmission cost.
18A Little History - 2
- Miller, 1957
- A monkey types
- randomly on a
- keyboard
-
-
- ? Distribution of words follow a power-law.
-
19A Little History - 2
- Conrad and Mitzenmacher, 2004
- Same relation still holds when keys have
unequal probabilities.
20Outline
- Motivation
- Problem Definition
- Related Work
- A Little History
- Proposed Model
- Experimental Results
- Conclusion
21Preliminary Model 1RTG-IE RTG with Independent
Equiprobable keys
22Lemma 1. W is super-linear on N (power
law)Lemma 2. W is super-linear on E (power
law)Lemma 3. In(out)-weight Wn of node n is
super-linear on in(out)-degree dn (power law)
Preliminary Model 1RTG-IE RTG with Independent
Equiprobable keys
Please find the proofs in the paper.
23Graph Properties
24Lemma 1. W is super-linear on N (power
law)Lemma 2. W is super-linear on E (power
law)Lemma 3. In(out)-weight Wn of node n is
super-linear on in(out)-degree dn (power law)
Preliminary Model 1RTG-IE RTG with Independent
Equiprobable keys
L05. Densification PL
L11. Weight PL
L10. Snapshot PL
Please find the proofs in the paper.
25Advantages of the Preliminary Model 1
- G1 - Intuitive
- G1 - Easy to implement
- G2 - Realistic provably follows several rules
- G3 - Handful of parameters k, q, W
- G5 - Fast generating random sequence of
char.s
26Problems of the Preliminary Model 1
- 1- Multinomial degree distributions
-
27Problems of the Preliminary Model 1
- 2- No homophily, no community structure
- ? Node i connects to any node j with prob. didj
independently, rather than connecting to
similar nodes.
28Preliminary Model 2RTG-IU RTG with Independent
Un-equiprobable keys
Solution to Problem 1 Conrad and Mitzenmacher,
2004
29Proposed ModelRTG Random Typing Graphs
- Solution to Problem 2
- 2D keyboard
- Generate source-
- destination labels
- in one shot.
- Pick one of the nine
- keys randomly.
30Proposed ModelRTG Random Typing Graphs
- Solution to Problem 2
- 2D keyboard
- Repeat recursively.
- Terminate each label
- when the space key
- is typed on each
- dimension (dark blue).
31Proposed ModelRTG Random Typing Graphs
Solution to Problem 2 2D keyboard How do we
choose the keys? Independent model
does not yield community structure!
papb
paq
pbpa
pbpb
pbq
qq
qpa
qpb
32Proposed ModelRTG Random Typing Graphs
- Solution to Problem 2
- 2D keyboard
- Boost probability
- of diagonal keys and
- decrease probability
- of off-diagonal ones
- (0ltßlt1 imbalance factor)
33Proposed ModelRTG Random Typing Graphs
- Solution to Problem 2
- 2D keyboard
- Boost probability
- of diagonal keys and
- decrease probability
- of off-diagonal ones
- (0ltßlt1 imbalance factor)
- Favoring of diagonal keys
- creates homophily.
34Proposed Model
- Parameters
- k Number of keys
- q Probability of hitting
- the space key S
- W Number of multi-
- edges in output
- graph G
- ß imbalance factor
35Proposed Model
Up to this point, we discussed directed, weighted
and unipartite graphs. Generalizations -
Undirected graphs Ignore edge directions
edge generation is symmetric. - Unweighted
graphs Ignore duplicate edges. - Bipartite
graphs Different key sets on source and
destination labels are different.
36Outline
- Motivation
- Problem Definition
- Related Work
- A Little History
- Proposed Model
- Experimental Results
- Conclusion
37Experimental Results
- How does RTG model real graphs?
- Blognet a social network of blogs based on
citations - ? undirected, unweighted and unipartite
- ? N 27, 726 E 126, 227 over 80 time
ticks. - Com2Cand the U.S. electoral campaign donations
network from organizations to candidates - ? directed, weighted ( amounts) and
bipartite - ? N 23, 191 E 877, 721 W 4, 383,
105, 580 over 29 time ticks. -
38Experimental Results
count
count
degree
degree
L01. Power-law degree distribution Faloutsos et
al. 99, Kleinberg et al. 99, Chakrabarti et al.
04, Newman 04
39Experimental Results
count
count
triangles
triangles
L02. Triangle Power Law (TPL) Tsourakakis 08
40Experimental Results 1
?rank
?rank
rank
rank
L03. Eigenvalue Power Law (EPL) Siganos et al.
03
41Graph Properties
42Experimental Results 1
edges
edges
nodes
nodes
L05. Densification Power Law (DPL) Leskovec et
al. 05
43Experimental Results
L06. Small and shrinking diameter Albert and
Barabási 99, Leskovec et al. 05
44Experimental Results
size
size
time
time
L07. Constant size 2nd and 3rd connected
components McGlohon et al. 08
45Experimental Results 1
?1
?1
edges
edges
L08. Principal Eigenvalue Power Law (?1PL)
Akoglu et al. 08
46Experimental Results 1
entropy
entropy
resolution
resolution
L09. Bursty/self-similar edge/weight additions
Gomez and Santonja 98, Gribble et al. 98,
Crovella and Bestavros 99, McGlohon et al. 08
47Graph Properties
48Experimental Results 2
diameter
diameter
time
time
size
size
time
time
49Experimental Results 2
?1
?1
edges
edges
?rank
?rank
rank
rank
50Experimental Results 2
count
count
in-degree
in-degree
entropy
entropy
resolution
resolution
51Experimental Results 2
in-weight
in-weight ( amount)
in-degree (checks)
in-degree
L10. Snapshot Power Law (SPL) McGlohon et al.
08
52Experimental Results 2
Total weight
Total weight
edges
edges
L11. Weight Power Law (WPL) McGlohon et al. 08
53Graph Properties
54Experimental Results
- On modularity Girvan and Newman 02
-
-
Modularity decreases with increasing ß
more community structure
No significant modularity --RTG-IE
55Graph Properties
56Experimental Results
Computation time grows linearly with increasing W
2M multi-edges in 7 sec.s
time (ms)
multi-edges
57Outline
- Motivation
- Problem Definition
- Related Work
- A Little History
- Proposed Model
- Experimental Results
- Conclusion
58Conclusion 1
- Our model is
- G1. simple and intuitive --few lines of code
- G2. realistic --graphs that obey all eleven
properties in real graphs - G3. parsimonious --only a handful of parameters
- G4. flexible --can generate weighted/unweighted,
directed/undirected, unipartite/bipartite graphs
and any combination of those - G5. fast --linear on the size of the output graph
59Conclusion 2
- We showed that RTG mimics real graphs well.
60Contact
Leman Akoglu www.cs.cmu.edu/lakoglu lakoglu_at_cs
.cmu.edu Christos Faloutsos www.cs.cmu.edu/chris
tos christos_at_cs.cmu.edu
61A Little History - 3
- The infinite monkey theorem
- A monkey typing randomly
- on a keyboard for an infinite
- amount of time will almost
- surely type a given text,
- such as the complete works of
- William Shakespeare.
62Proposed Model
- Burstiness and Self-similarity
- If each step is a time tick, weight additions
are uniform! - Start with a uniform interval
- Recursively subdivide weight
- additions to each half,
- quarter, and so on,
- according to the bias b gt 0.5
- b -fraction of the additions
- happen in one half and
- the remaining in the other.
63Related Work Graph Properties
Unweighted Weighted
Static L01. Power-law degree distribution Faloutsos et al. 99, Kleinberg et al. 99, Chakrabarti et al. 04, Newman 04 L02. Triangle Power Law (TPL) Tsourakakis 08 L03. Eigenvalue Power Law (EPL) Siganos et al. 03 L04. Community structure Flake et al. 02, Girvan and Newman 02 L10. Snapshot Power Law (SPL) McGlohon et al. 08
Dynamic L05. Densification Power Law (DPL) Leskovec et al. 05 L06. Small and shrinking diameter Albert and Barabási 99, Leskovec et al. 05 L07. Constant size 2nd and 3rd connected components McGlohon et al. 08 L08. Principal Eigenvalue Power Law (?1PL) Akoglu et al. 08 L09. Bursty/self-similar edge/weight additions Gomez and Santonja 98, Gribble et al. 98, Crovella and Bestavros 99, McGlohon et al. 08 L11. Weight Power Law (WPL) McGlohon et al. 08