Title: Data and Applications Security Security and Privacy in Online Social Networks
1Data and Applications SecuritySecurity and
Privacy in Online Social Networks
- Murat Kantarcioglu
- Bhavani Thuraisingham
- Thanks to Raymond Heatherly and Barbara
Carminati for helping in slide preparations - April 2012
2Outline
- Introduction to Social Networks
- Properties of Social Networks
- Social Network Analysis Basics
- Data Privacy Basics
- Privacy and Social Networks
- Access control issues for Online Social Networks
3Social Networks
- Social networks have important implications for
our daily lives. - Spread of Information
- Spread of Disease
- Economics
- Marketing
- Social network analysis could be used for many
activities related to information and security
informatics. - Terrorist network analysis
4Enron Social Graph
http//jheer.org/enron/
5Romantic Relations at Jefferson High School
6Emergence of Online Social Networks
- Online Social networks become increasingly
popular. - Example Facebook
- Facebook has more than 200 million active users.
- More than 100 million users log on to Facebook at
least once each day - More than two-thirds of Facebook users are
outside of college - The fastest growing demographic is those 35 years
old and older
http//www.facebook.com/press/info.php?statistics
7Properties of Social Networks
- Small-world phenomenon
- Milgram asked participants to pass a letter to
one of their close contacts in order to get it to
an assigned individual - Most of the letters are lost (75 of the
letters) - The letters who reached their destination have
passed through only about six people. - Origins of six degree
- Mean geodesic distance l of graphs grows
logarithmically or even slower with the network
size. (dij is the shortest distance between node
i and j) .
8Small-World Example Six Degrees of Kevin Bacon
9Properties of Social Networks
- Degree DistributionClustering
- Other important properties
- Community Structure
- Assortativity
- Clustering Patterns
- Homomiphly
- .
- Many of these properties could be used for
analyzing social networks.
10Social Network Mining
- Social network data is represented a graph
- Individuals are represented as nodes
- Nodes may have attributes to represent personal
traits - Relationships are represented as edges
- Edges may have attributes to represent
relationship types - Edges may be directed
- Common Social Network Mining tasks
- Node classification
- Link Prediction
11Data Privacy Basics
- How to share data without violating privacy?
- Meaning of privacy?
- Identity disclosure
- Sensitive Attribute disclosure
- Current techniques for structured data
- K-anonymity
- L-diversity
- Secure multi-party computation
- Problem Publishing private data while, at the
same time, protecting individual privacy - Challenges
- How to quantify privacy protection?
- How to maximize the usefulness of published data?
- How to minimize the risk of disclosure?
-
12Sanitization and Anonymization
- Automated de-identification of private data with
certain privacy guarantees - Opposed to formal determination by
statisticians requirement of HIPAA - Two major research directions
- Perturbation (e.g. random noise addition)
- Anonymization (e.g. k-anonymization)
- Removing unique identifiers is not sufficient
- Quasi-identifier (QI)
- Maximal set of attributes that could help
identify individuals - Assumed to be publicly available (e.g., voter
registration lists) - As a process
- Remove all unique identifiers
- Identify QI-attributes, model adversarys
background knowledge - Enforce some privacy definition (e.g.
k-anonymity)
13Re-identifying anonymous data (Sweeney 01)
- 37 US states mandate collection of information
- She purchased the voter registration list for
Cambridge Massachusetts - 54,805 people
- 69 unique on postal code and birth date
- 87 US-wide with all three
- Solution k-anonymity
- Any combination of values appears at least k
times - Developed systems that guarantee k-anonymity
- Minimize distortion of results
14k-Anonymity
- Each released record should be indistinguishable
from at least (k-1) others on its QI attributes - Alternatively cardinality of any query result on
released data should be at least k - k-anonymity is (the first) one of many privacy
definitions in this line of work - l-diversity, t-closeness, m-invariance,
delta-presence... - Complementary Release Attack
- Different releases can be linked together to
compromise k-anonymity. - Solution
- Consider all of the released tables before
release the new one, and try to avoid linking. - Other data holders may release some data that can
be used in this kind of attack. Generally, this
kind of attack is hard to be prohibited
completely.
15L-diversity principles
- L-diversity principle A q-block is l-diverse if
contains at least l well represented values for
the sensitive attribute S. A table is l-diverse
if every q-block is l-diverse
- l-diversity may be difficult and unnecessary to
achieve. - A single sensitive attribute
- Two values HIV positive (1) and HIV negative
(99) - Very different degrees of sensitivity
- l-diversity is unnecessary to achieve
- 2-diversity is unnecessary for an equivalence
class that contains only negative records - l-diversity is difficult to achieve
- Suppose there are 10000 records in total
- To have distinct 2-diversity, there can be at
most 100001100 equivalence classes
16Privacy Preserving Distributed Data Mining
- Goal of data mining is summary results
- Association rules
- Classifiers
- Clusters
- The results alone need not violate privacy
- Contain no individually identifiable values
- Reflect overall results, not individual
organizations - The problem is computing the results without
access to the data!
- Data needed for data mining maybe distributed
among parties - Credit card fraud data
- Inability to share data due to privacy reasons
- HIPPAA
- Even partial results may need to be kept private
17Secure Multi-Party Computation (SMC)
- The goal is computing a function
- without revealing xi
- Semi-Honest Model
- Parties follow the protocol
- Malicious Model
- Parties may or may not follow the protocol
- We cannot do better then the existence of the
third trusted party situation - Generic SMC is too inefficient for PPDDM
- Enhancements being explored
18Preventing Private Information Inference Attacks
on Social Networks
- Raymond Heatherly, Murat Kantarcioglu, and
Bhavani ThuraisinghamThe University of Texas at
Dallas - Jack LindamoodFacebook
19Graph Model
Lindamood et al. 09 Heatherly et al. 09
- Graph represented by a set of homogenous vertices
and a set of homogenous edges - Each node also has a set of Details, one of which
is considered private.
20NaĂŻve Bayes Classification
Lindamood et al. 09 Heatherly et al. 09
- Classification based only on specified attributes
in the node
21NaĂŻve Bayes with Links
Lindamood et al. 09 Heatherly et al. 09
- Rather than calculate the probability from person
nx to ny we calculate the probability of a link
from nx to a person with nys traits
22Link Weights
Lindamood et al. 09 Heatherly et al. 09
- Links also have associated weights
- Represents how close a friendship is suspected
to be using the following formula
23Collective Inference
Lindamood et al. 09 Heatherly et al. 09
- Collection of techniques that use node attributes
and the link structure to refine classifications. - Uses local classifiers to establish a set of
priors for each node - Uses traditional relational classifiers as the
iterative step in classification
24Relational Classifiers
Lindamood et al. 09 Heatherly et al. 09
- Class Distribution Relational Neighbor
- Weighted-Vote Relational Neighbor
- Network-only Bayes Classifier
- Network-only Link-based Classification
25Experimental Data
Lindamood et al. 09 Heatherly et al. 09
- 167,000 profiles from the Facebook online social
network - Restricted to public profiles in the Dallas/Fort
Worth network - Over 3 million links
26General Data Properties
Lindamood et al. 09 Heatherly et al. 09
Diameter of the largest component 16
Number of nodes 167,390
Number of friendship links 3,342,009
Total number of listed traits 4,493,436
Total number of unique traits 110,407
Number of components 18
Probability Liberal .45
Probability Conservative .55
27Inference Methods
Lindamood et al. 09 Heatherly et al. 09
- Details only Uses NaĂŻve Bayes classifier to
predict attribute - Links Only Uses only the link structure to
predict attribute - Average Classifies based on an average of the
probabilities computed by Details and Links
28Predicting Private Details
Lindamood et al. 09 Heatherly et al. 09
- Attempt to predict the value of the political
affiliation attribute - Three Inference Methods used as the local
classifier - Relaxation labeling used as the Collective
Inference method
29Removing Details
Lindamood et al. 09 Heatherly et al. 09
- Ensures that no false information is added to
the network, all details in the released graph
were entered by the user - Details that have the highest global probability
of indicating political affiliation removed from
the network
30Removing Links
Lindamood et al. 09 Heatherly et al. 09
- Ensures that the link structure of the released
graph is a subset of the original graph - Removes links from each node that are the most
like the current node
31Most Liberal Traits
Lindamood et al. 09 Heatherly et al. 09
Trait Name Trait Value Weight Liberal
Group legalize same sex marriage 46.16066789
Group every time i find out a cute boy is conservative a little part of me dies 39.68599463
Group equal rights for gays 33.83786875
Group the democratic party 32.12011605
Group not a bush fan 31.95260895
Group people who cannot understand people who voted for bush 30.80812425
Group government religion disaster 29.98977927
Group buck fush 27.05782866
32Most Conservative Traits
Lindamood et al. 09 Heatherly et al. 09
Trait Name Trait Value Weight Conservative
Group george w bush is my homeboy 45.88831329
Group college republicans 40.51122488
Group texas conservatives 32.23171423
Group bears for bush 30.86484689
Group kerry is a fairy 28.50250433
Group aggie republicans 27.64720818
Group keep facebook clean 23.653477
Group i voted for bush 23.43173116
Group protect marriage one man one woman 21.60830487
33Most Liberal Traits per Trait Name
Lindamood et al. 09 Heatherly et al. 09
Trait Name Trait Value Weight Liberal
activities amnesty international 4.659100601
Employer hot topic 2.753844959
favorite tv shows queer as folk 9.762900035
grad school computer science 1.698146579
hometown mumbai 3.566007713
Relationship Status in an open relationship 1.617950632
religious views agnostic 3.15756412
looking for whatever i can get 1.703651985
34Experiments
Lindamood et al. 09 Heatherly et al. 09
- Conducted on 35,000 nodes which recorded
political affiliation - Tests removing 0 details and 0 links, 10 details
and 0 links, 0 details and 10 links, and 10
details and 10 links - Varied Training Set size from 10 of available
nodes to 90
35Local Classifier Results
Lindamood et al. 09 Heatherly et al. 09
36Collective Inference Results
Lindamood et al. 09 Heatherly et al. 09
37Access Control for Social Networks
38Online Social Networks Access Control Issues
- Current access control systems for online social
networks are either too restrictive or too loose - selected friends
- Bebo, Facebook, and Multiply.
- neighbors (i.e., the set of users having
musical preferences and tastes similar to mine) - Last.fm
- friends of friends
- (Facebook, Friendster, Orkut)
- contacts of my contacts (2nd degree contacts),
3rd and4th degree contacts - Xing
39Challenges
I want only my family and close friends to see
this picture.
40Requirements
- Many different online social networks with
different terminology - Facebook vs Linkedin
- We need to have flexible models that can
represent - Users profiles
- Relationships among users
- (e.g. Bob is Alices close friend)
- Resources
- (e.g., online photo albums)
- Relationships among users and resources
- (e.g., Bob is the owner of the photo album and
Alice is tagged in this photo), - Actions (e.g., post a message on someones wall).
41Overview of the Solution
- We use semantic web technologies (e.g., OWL) to
represent social network knowledge base. - We use semantic web rule language (SWRL) to
represent various security, admin and filter
policies.
42Modeling User Profiles and Resources
- Existing ontologies such as FoAF could be
extended to capture user profiles. - Relationship among resources could be captured by
using OWL concepts - PhotoAlbum rdfssubClassOf Resource
- PhotoAlbum consistsOf Photos
43Modeling Relationships Among Users
- We model relationships among users by defining
N-ary relationship - Christine a Person
has_friend _Friendship_Relation_1. - _Friendship_relation_1 a
Friendship_Relation Friendship_trust
HIGH Friendship_value Mike . - Owl reasoners cannot be used to infer some
relationships such as Christine is a third degree
friend of John. - Such computations needs to be done separately and
represented by using new class.
44Specifying Policies Using OSN Knowledge Base
- Most of the OSN information could be captured
using OWL to represent rich set of concepts - This makes it possible to specify very flexible
access control policies - Photos could be accessed by friends only
automatically implies closeFriend can access the
photos too. - Policies could be defined based on user-resource
relationships easily.
45Security Policies for OSNs
- Access control policies
- Filtering policies
- Could be specified by user
- Could be specified by authorized user
- Admin policies
- Security admin specifies who is authorized
specify filtering and access control policies - Exp if U1 isParentOf U2 and U2 is a child than
U1 can specify filtering policies
for U2.
46Security Policy Specification (using semantic web
technologies)
- Semantic Web Rule Language (SWRL) is used for
specifying access control, filtering and
authorization policies. - SWRL is based on OWL
- all rules are expressed in terms of OWL concepts
(classes, properties, individuals, literals). - Using SWRL, subject, object and actions are
specified - Rules can have different authorization that
states the subjects rights on target object.
47Knowledge based for Authorizations and
Prohibitions
- Authorizations/Prohibitions needs to be specified
using OWL - Different object property for each actions
supported by OSN. - Authorizations/prohibitions could automatically
propagate based on action hierarchies - Assume post is a subproperty of write
- If a user is given post permission than user
will have write permission as well - Admin Prohibitions need to be specified slightly
different. (Supervisor, Target, Object,
Privilige)
48Security Rule Examples
- SWRL rule specification does depend on the
authorization and OSN knowledge bases. - It is not possible to specify generic rules
- Examples
49Security Rule Enforcement
- A reference monitor evaluates the requests.
- Admin request for access control could be
evaluated by rule rewriting - Example Assume Bob submits the following admin
request - Rewrite as the following rule
50Security Rule Enforcement
- Admin requests for Prohibitions could be
rewritten as well. - Example Bob issues the following prohibition
request - Rewritten version
- Access control requests needs to consider both
filter and access control policies
51Framework Architecture
52Conclusions
- Various attacks exist to
- Identify nodes in anonymized data
- Infer private details
- Recent attempts to increase social network access
control to limit some of the attacks - Balancing privacy, security and usability on
online social networks will be an important
challenge - Directions
- Scalability
- We are currently implementing such system to test
its scalability. - Usability
- Create techniques to automatically learn rules
- Create simple user interfaces so that users can
easily specify these rules.