Defending against large-scale crawls in online social networks - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Defending against large-scale crawls in online social networks

Description:

Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 26
Provided by: main99
Category:

less

Transcript and Presenter's Notes

Title: Defending against large-scale crawls in online social networks


1
Defending against large-scale crawls in online
social networks
  • Mainack Mondal Bimal Viswanath Allen
    Clement
  • Peter Druschel Krishna Gummadi Alan
    Mislove Ansley Post
  • MPI-SWS Northeastern University Now
    at Google
  • CoNEXT, December 2012

2
Lots of personal data on Online Social Networks
(OSNs)
3
What is the concern with aggregation of this
large data?
  • Aggregators can mine this large data
  • To infer attributes missing in the data, e.g.
    sexual orientation
  • Aggregators can republish this data in easily
    accessible form
  • Neither user nor OSN has control over usage of
    crawled data
  • Problem for OSN operators
  • User data is valuable asset to OSN operators
  • OSN operators are blamed for misuse of user data
    NYTimes 10
  • OSNs need to limit large-scale aggregation of
    user data

In 2010, 171 M Facebook users data published in
BitTorrent
4
Challenge
  • We are defending against a crawler who
  • Wants to crawl as many accounts as possible
  • Wants to crawl as fast as possible
  • Our goal is
  • Limit the rate of crawling
  • Make the crawlers as slow as possible

5
Existing solution Simple rate-limiting
  • OSNs rate-limit on per-account or per IP address
    basis
  • Crawlers can defeat rate-limit using multiple
    accounts


The crawlers can create multiple fake accounts or
Sybils
Or, the crawlers can use compromised accounts
6
Our solution Genie
  • Assumption Social links to good users are harder
    to get than accounts
  • Replace user-account-based rate-limiting with
    link-based rate-limiting

7
Outline
  • Background and key idea
  • Genie design
  • Credit networks
  • How to use credit networks to defend against
    crawlers
  • Using difference between user and crawler
    activity
  • Genie evaluation

8
Credit Networks EC 11
  • Nodes trust each other by providing pair-wise
    credit
  • Credit is used to pay for the services received
  • A B

1
2
4
5
9
Credit Networks EC 11
  • Nodes trust each other by providing pair-wise
    credit
  • Credit is used to pay the services received
  • A C
    B
  • To obtain a service, find path(s) with sufficient
    credits

2
5
6
3
3
2
3
4
10
How can we map OSN to credit networks ?
  • OSN operator forms credit network from the
    social network
  • Operator replenishes credit on each link at a
    fixed rate
  • Credit deducted from links to view another
    users profile

2
5
3
3
6
4
3
2
2
4
3
3
A
C
D
B
11
How do credit network defend against crawlers?
  • Amount of crawling is proportional to attack cut

Rest of the Network (normal users)
Sybil accounts
Compromised accounts
Attack cut
is small
Attack cut may be larger
(SybilRank, NSDI 2012)
11
12
Difference between normal users and crawlers
  • Reciprocity in profile views
  • Normal users are more reciprocal than crawlers
  • Repeated profile views
  • Normal users repeatedly visit the same set of
    profiles
  • Locality of views

13
Difference in locality between normal users and
crawlers
  • Renren graph and user browsing trace IMC 10
  • 33 K users, 96 K activities (2 weeks)
  • Most of the normal views are local

of views
crawler activity
Flickr Mislove et al. WOSN 08 Orkut Cha et
al. IMC 09
14
Genie design principles
  • Use a credit network to rate limit links
  • Exploit difference between normal and crawler
    activity to discriminate crawlers
  • Charge more for views further away

15
Genie design
  • New charging model Pay more to view profiles
    far away
  • Credit charged per link Shortest path distance
    between two nodes -1
  • Rate of crawling decreases with increased path
    length

1
4
2
3
6
4
- 2
- 2
- 2
2
2
2
3
2
2
4
4
5
A
C
D
B
16
Outline
  • Background and key idea
  • Genie design
  • Credit networks
  • How to use credit networks to defend against
    crawlers
  • Using difference between user and crawler
    activity
  • Genie evaluation

17
Genie evaluation
  • Does Genie limit attackers while allowing normal
    users?
  • The parameter to tweak Credit replenishment rate
    per link
  • Replenishment rate too high Crawlers will be
    allowed
  • Replenishment rate too low Users will be
    heavily penalized

18
Experimental setup
  • Genie simulator written in C
  • Input social graph and user activity trace
  • Output allowed/flagged for each activity
  • Normal user activity trace from Renren
  • Generated multiple synthetic traces for other
    graphs
  • We model a strong and efficient crawler
  • Crawler controls compromised user accounts
  • Each good user profile is crawled once
  • Crawlers try to crawl as many profiles as
    possible

19
Does Genie limit crawlers?
of users crawled per week
Only 2.7 of the network is crawled in 1 week
Credits/week per link
The crawlers are slowed down 3000 times
20
Does Genie penalize good users?
of user activity flagged
2.6 of total activities from 0.8 users flagged
Credit/week per link
21
Does Genie penalize good users?
10 8 6 4 2 0
of user activity flagged
of users crawled per week
Trade-off point
Credit/week per link
22
Who are these flagged users?
  • 3 Users with very high number of random profile
    views
  • Shows crawler like behavior
  • 70 of the flagged activity are by these users
  • Users with normal of profile views but very
    few friends
  • 99 of flagged users have less than 5 friends
  • Adding 4 more friends unflags 97 of these users

23
Efficiency of Genie
  • In our Genie simulator
  • To scale up Genie we used Canal library EuroSys
    12
  • Multithreaded implementation
  • Used a 24-core, 48 GB physical memory machine for
    evaluation
  • For a million node social graph
  • Memory overhead 5 GB
  • Each view request processed in 0.65 ms on average

24
Summary
  • We propose rate-limiting links to defend against
    crawlers
  • We strengthen our defense using difference
    between normal user and crawler activities
  • We evaluated Genie on real world user activity
    trace

25
  • Thank you
Write a Comment
User Comments (0)
About PowerShow.com