A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee - PowerPoint PPT Presentation

About This Presentation

Title:

A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee

Description:

Orkut deviates: only 11.3% of network reached (effect of partial BFS crawl Snowball method) ... Orkut special case maybe because of partial crawl ... – PowerPoint PPT presentation

Number of Views:365

Avg rating:3.0/5.0

Slides: 31

Provided by: vannevarvi

Category:

more less

Transcript and Presenter's Notes

Title: A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee

1
Measurement and Analysis of Online Social Networks

A. Mislove, M. Marcon, K Gummadi, P. Druschel, B.
Bhattacharjee

Presentation by Shahan Khatchadourian Supervisor
Prof. Mariano P. Consens
2
Focus

graphs of online social networks
how they were obtained
how they were verified
how measurement and analysis was performed
properties of obtained graphs
why these properties are relevant

3
Why study the graphs?

important to improve existing system and develop
new applications
information search
trusted users
what is the structure of online social networks
what are different ways to examine a social
network when complete data is not available?
how do they compare with each other and to the
Web?

4
Which graphs?

Flickr, YouTube, LiveJournal, and Orkut
All are directed except for Orkut
Weakly Connected Component (WCC)
Strongly Connected Component (SCC)

5
How are the graphs obtained?

API
users
groups
forward/backward links
HTML Screen Scraping

6
Summary of graph properties

small-world
scale-free
correlation between indegree and outdegree
large strongly connected core of high-degree
nodes surrounded by small clusters of low-degree
nodes

7
Crawling Concerns - Algorithms

BFS and DFS
Snowball method underestimates number of
low-degree nodes. In social networks, they
underestimate the power-law coefficient, but
closely match other metrics such as overall
clustering coefficient.

8
Crawling Concerns FW links

cannot reach entire WCC

9
How to Verify Samples

Obtain a random user sample
LJ feature which returns 5,000 random users
Flickr random 8-digit user id generation
Conduct a crawl using these random users as seeds
See if these random nodes connect to the original
WCC
See what the graph structure of the newly crawled
graph compares to original

10
Crawling Concerns FW links

no effect on largest WCC

11
Crawling Concerns FW links

increasing the size of the WCC by starting at a
different seed

12
Site YT Flickr LJ Orkut
Users(mill) 1.1 1.8 5.2 3
Links(mill) 4.9 22 72 223
symmetry 79.1 62.0 73.5 100.0
Access (FW Forward-only) (SS HTML screen-scraping) API (users only) FW SS for group info API (users groups) FW API (users groups) FW BW SS for users groups
13
Link Symmetry

even with directed links, there is a high level
of symmetry
possibly contributed to by informing users of new
incoming links
makes it harder to identify reputable sources due
to dilution
possible sol who initiated the link?

14
Power-law node degrees

Orkut deviates
only 11.3 of network reached (effect of partial
BFS crawl Snowball method)
artificial cap of users number of outgoing
links, leads to a distortion in distribution of
high degrees
differs from Web

15
Power-law node degrees
16
Power-law node degrees
e.g. analysis of top keywords
17
Spread of Information
18
Power Law affectors

services, accessibility, features

100
100
mobile users
10-8
10-8
1
1
10000
10000
19
Correlation of indegree and outdegree

over 50 of nodes have indegree within 20 of
their outdegree

20
Path lengths and diameter

all four networks have short path length
Broder et al noted if Web were treated as
undirected graph, path length would drop from 16
to 7, so what?

21
Link degree correlations

JDD joint degree distribution
mapping between outdegree and average indegree of
all nodes connected to nodes of that outdegree
YouTube different due to extremely popular users
being connected to by many unpopular users
Orkut shows bump due to undersampling

22
Joint degree distribution and Scale-free behaviour
cap on links
undersampling of low-degree nodes
celebrity-driven nature
23
Densely connected core

removing 10 of core nodes results in breaking up
graph into millions of very small SCCs
why an SCC? directed links matter for actual
communication
graphs below show results as nodes are removed
starting with highest-degree nodes (left) and
path length as graph is constructed beginning
with highest-degree nodes(right)

Sub logarithmic growth
24
Tightly clustered fringe

based on clustering coefficient
social network graphs show stronger clustering,
most likely due to mutual friends

Possibly because personal content is not shared
25
Groups

group sizes follow power-law distribution
represent tightly clustered communities

26
Groups

Orkut special case maybe because of partial crawl

27
Node Value Determination

Directed Graph, current model
nodes with many incoming links (hubs) have value
due to their connection to many users
it becomes easy to spread important information
to the other nodes, e.g. DNS
unhealthy in case of spam or viruses
in order for a user to send spam, they have
become a more important node, amass friends

28
Node Value Determination

Link Initiator, requires temporal information
if user A requests a link with user B, does that
mean that user B is more important?
even though graphs have a high level of link
symmetry, this additional information can offset
this symmetry
unfortunately, examined graphs do not have
temporal information

29
Trust

lendingclub.com, Facebook application
people are more willing to lend money to friends
who are linked through a short path
people are more willing to pay back those who are
linked through a short path
no indication of whether this actually works
does trust increase as degree increases?
what credit rating and JDD does a person have to
get a good interest rate?

30
Thank you
shahan_at_cs

Write a Comment

User Comments (0)