Data and Applications Security Security and Privacy in Online Social Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Data and Applications Security Security and Privacy in Online Social Networks

Description:

... Facebook, and Multiply ... 32.23171423 Group bears for bush 30.86484689 Group kerry is a fairy 28.50250433 Group aggie republicans 27.64720818 Group keep facebook ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 51
Provided by: MuratKant8
Learn more at: http://www.utdallas.edu
Category:

less

Transcript and Presenter's Notes

Title: Data and Applications Security Security and Privacy in Online Social Networks


1
Data and Applications SecuritySecurity and
Privacy in Online Social Networks
  • Murat Kantarcioglu
  • Bhavani Thuraisingham
  • Thanks to Raymond Heatherly and Barbara
    Carminati for helping in slide preparations
  • April 2012

2
Outline
  • Introduction to Social Networks
  • Properties of Social Networks
  • Social Network Analysis Basics
  • Data Privacy Basics
  • Privacy and Social Networks
  • Access control issues for Online Social Networks

3
Social Networks
  • Social networks have important implications for
    our daily lives.
  • Spread of Information
  • Spread of Disease
  • Economics
  • Marketing
  • Social network analysis could be used for many
    activities related to information and security
    informatics.
  • Terrorist network analysis

4
Enron Social Graph
http//jheer.org/enron/
5
Romantic Relations at Jefferson High School
6
Emergence of Online Social Networks
  • Online Social networks become increasingly
    popular.
  • Example Facebook
  • Facebook has more than 200 million active users.
  • More than 100 million users log on to Facebook at
    least once each day
  • More than two-thirds of Facebook users are
    outside of college
  • The fastest growing demographic is those 35 years
    old and older

http//www.facebook.com/press/info.php?statistics
7
Properties of Social Networks
  • Small-world phenomenon
  • Milgram asked participants to pass a letter to
    one of their close contacts in order to get it to
    an assigned individual
  • Most of the letters are lost (75 of the
    letters)
  • The letters who reached their destination have
    passed through only about six people.
  • Origins of six degree
  • Mean geodesic distance l of graphs grows
    logarithmically or even slower with the network
    size. (dij is the shortest distance between node
    i and j) .

8
Small-World Example Six Degrees of Kevin Bacon
9
Properties of Social Networks
  • Degree DistributionClustering
  • Other important properties
  • Community Structure
  • Assortativity
  • Clustering Patterns
  • Homomiphly
  • .
  • Many of these properties could be used for
    analyzing social networks.

10
Social Network Mining
  • Social network data is represented a graph
  • Individuals are represented as nodes
  • Nodes may have attributes to represent personal
    traits
  • Relationships are represented as edges
  • Edges may have attributes to represent
    relationship types
  • Edges may be directed
  • Common Social Network Mining tasks
  • Node classification
  • Link Prediction

11
Data Privacy Basics
  • How to share data without violating privacy?
  • Meaning of privacy?
  • Identity disclosure
  • Sensitive Attribute disclosure
  • Current techniques for structured data
  • K-anonymity
  • L-diversity
  • Secure multi-party computation
  • Problem Publishing private data while, at the
    same time, protecting individual privacy
  • Challenges
  • How to quantify privacy protection?
  • How to maximize the usefulness of published data?
  • How to minimize the risk of disclosure?

12
Sanitization and Anonymization
  • Automated de-identification of private data with
    certain privacy guarantees
  • Opposed to formal determination by
    statisticians requirement of HIPAA
  • Two major research directions
  • Perturbation (e.g. random noise addition)
  • Anonymization (e.g. k-anonymization)
  • Removing unique identifiers is not sufficient
  • Quasi-identifier (QI)
  • Maximal set of attributes that could help
    identify individuals
  • Assumed to be publicly available (e.g., voter
    registration lists)
  • As a process
  • Remove all unique identifiers
  • Identify QI-attributes, model adversarys
    background knowledge
  • Enforce some privacy definition (e.g.
    k-anonymity)

13
Re-identifying anonymous data (Sweeney 01)
  • 37 US states mandate collection of information
  • She purchased the voter registration list for
    Cambridge Massachusetts
  • 54,805 people
  • 69 unique on postal code and birth date
  • 87 US-wide with all three
  • Solution k-anonymity
  • Any combination of values appears at least k
    times
  • Developed systems that guarantee k-anonymity
  • Minimize distortion of results

14
k-Anonymity
  • Each released record should be indistinguishable
    from at least (k-1) others on its QI attributes
  • Alternatively cardinality of any query result on
    released data should be at least k
  • k-anonymity is (the first) one of many privacy
    definitions in this line of work
  • l-diversity, t-closeness, m-invariance,
    delta-presence...
  • Complementary Release Attack
  • Different releases can be linked together to
    compromise k-anonymity.
  • Solution
  • Consider all of the released tables before
    release the new one, and try to avoid linking.
  • Other data holders may release some data that can
    be used in this kind of attack. Generally, this
    kind of attack is hard to be prohibited
    completely.

15
L-diversity principles
  • L-diversity principle A q-block is l-diverse if
    contains at least l well represented values for
    the sensitive attribute S. A table is l-diverse
    if every q-block is l-diverse
  • l-diversity may be difficult and unnecessary to
    achieve.
  • A single sensitive attribute
  • Two values HIV positive (1) and HIV negative
    (99)
  • Very different degrees of sensitivity
  • l-diversity is unnecessary to achieve
  • 2-diversity is unnecessary for an equivalence
    class that contains only negative records
  • l-diversity is difficult to achieve
  • Suppose there are 10000 records in total
  • To have distinct 2-diversity, there can be at
    most 100001100 equivalence classes

16
Privacy Preserving Distributed Data Mining
  • Goal of data mining is summary results
  • Association rules
  • Classifiers
  • Clusters
  • The results alone need not violate privacy
  • Contain no individually identifiable values
  • Reflect overall results, not individual
    organizations
  • The problem is computing the results without
    access to the data!
  • Data needed for data mining maybe distributed
    among parties
  • Credit card fraud data
  • Inability to share data due to privacy reasons
  • HIPPAA
  • Even partial results may need to be kept private

17
Secure Multi-Party Computation (SMC)
  • The goal is computing a function
  • without revealing xi
  • Semi-Honest Model
  • Parties follow the protocol
  • Malicious Model
  • Parties may or may not follow the protocol
  • We cannot do better then the existence of the
    third trusted party situation
  • Generic SMC is too inefficient for PPDDM
  • Enhancements being explored

18
Preventing Private Information Inference Attacks
on Social Networks
  • Raymond Heatherly, Murat Kantarcioglu, and
    Bhavani ThuraisinghamThe University of Texas at
    Dallas
  • Jack LindamoodFacebook

19
Graph Model
Lindamood et al. 09 Heatherly et al. 09
  • Graph represented by a set of homogenous vertices
    and a set of homogenous edges
  • Each node also has a set of Details, one of which
    is considered private.

20
Naïve Bayes Classification
Lindamood et al. 09 Heatherly et al. 09
  • Classification based only on specified attributes
    in the node

21
Naïve Bayes with Links
Lindamood et al. 09 Heatherly et al. 09
  • Rather than calculate the probability from person
    nx to ny we calculate the probability of a link
    from nx to a person with nys traits

22
Link Weights
Lindamood et al. 09 Heatherly et al. 09
  • Links also have associated weights
  • Represents how close a friendship is suspected
    to be using the following formula

23
Collective Inference
Lindamood et al. 09 Heatherly et al. 09
  • Collection of techniques that use node attributes
    and the link structure to refine classifications.
  • Uses local classifiers to establish a set of
    priors for each node
  • Uses traditional relational classifiers as the
    iterative step in classification

24
Relational Classifiers
Lindamood et al. 09 Heatherly et al. 09
  • Class Distribution Relational Neighbor
  • Weighted-Vote Relational Neighbor
  • Network-only Bayes Classifier
  • Network-only Link-based Classification

25
Experimental Data
Lindamood et al. 09 Heatherly et al. 09
  • 167,000 profiles from the Facebook online social
    network
  • Restricted to public profiles in the Dallas/Fort
    Worth network
  • Over 3 million links

26
General Data Properties
Lindamood et al. 09 Heatherly et al. 09
Diameter of the largest component 16
Number of nodes 167,390
Number of friendship links 3,342,009
Total number of listed traits 4,493,436
Total number of unique traits 110,407
Number of components 18
Probability Liberal .45
Probability Conservative .55
27
Inference Methods
Lindamood et al. 09 Heatherly et al. 09
  • Details only Uses Naïve Bayes classifier to
    predict attribute
  • Links Only Uses only the link structure to
    predict attribute
  • Average Classifies based on an average of the
    probabilities computed by Details and Links

28
Predicting Private Details
Lindamood et al. 09 Heatherly et al. 09
  • Attempt to predict the value of the political
    affiliation attribute
  • Three Inference Methods used as the local
    classifier
  • Relaxation labeling used as the Collective
    Inference method

29
Removing Details
Lindamood et al. 09 Heatherly et al. 09
  • Ensures that no false information is added to
    the network, all details in the released graph
    were entered by the user
  • Details that have the highest global probability
    of indicating political affiliation removed from
    the network

30
Removing Links
Lindamood et al. 09 Heatherly et al. 09
  • Ensures that the link structure of the released
    graph is a subset of the original graph
  • Removes links from each node that are the most
    like the current node

31
Most Liberal Traits
Lindamood et al. 09 Heatherly et al. 09
Trait Name Trait Value Weight Liberal
Group legalize same sex marriage 46.16066789
Group every time i find out a cute boy is conservative a little part of me dies 39.68599463
Group equal rights for gays 33.83786875
Group the democratic party 32.12011605
Group not a bush fan 31.95260895
Group people who cannot understand people who voted for bush 30.80812425
Group government religion disaster 29.98977927
Group buck fush 27.05782866
32
Most Conservative Traits
Lindamood et al. 09 Heatherly et al. 09
Trait Name Trait Value Weight Conservative
Group george w bush is my homeboy 45.88831329
Group college republicans 40.51122488
Group texas conservatives 32.23171423
Group bears for bush 30.86484689
Group kerry is a fairy 28.50250433
Group aggie republicans 27.64720818
Group keep facebook clean 23.653477
Group i voted for bush 23.43173116
Group protect marriage one man one woman 21.60830487
33
Most Liberal Traits per Trait Name
Lindamood et al. 09 Heatherly et al. 09
Trait Name Trait Value Weight Liberal
activities amnesty international 4.659100601
Employer hot topic 2.753844959
favorite tv shows queer as folk 9.762900035
grad school computer science 1.698146579
hometown mumbai 3.566007713
Relationship Status in an open relationship 1.617950632
religious views agnostic 3.15756412
looking for whatever i can get 1.703651985
34
Experiments
Lindamood et al. 09 Heatherly et al. 09
  • Conducted on 35,000 nodes which recorded
    political affiliation
  • Tests removing 0 details and 0 links, 10 details
    and 0 links, 0 details and 10 links, and 10
    details and 10 links
  • Varied Training Set size from 10 of available
    nodes to 90

35
Local Classifier Results
Lindamood et al. 09 Heatherly et al. 09
36
Collective Inference Results
Lindamood et al. 09 Heatherly et al. 09
37
Access Control for Social Networks
  • Murat Kantarcioglu

38
Online Social Networks Access Control Issues
  • Current access control systems for online social
    networks are either too restrictive or too loose
  • selected friends
  • Bebo, Facebook, and Multiply.
  • neighbors (i.e., the set of users having
    musical preferences and tastes similar to mine)
  • Last.fm
  • friends of friends
  • (Facebook, Friendster, Orkut)
  • contacts of my contacts (2nd degree contacts),
    3rd and4th degree contacts
  • Xing

39
Challenges
I want only my family and close friends to see
this picture.
40
Requirements
  • Many different online social networks with
    different terminology
  • Facebook vs Linkedin
  • We need to have flexible models that can
    represent
  • Users profiles
  • Relationships among users
  • (e.g. Bob is Alices close friend)
  • Resources
  • (e.g., online photo albums)
  • Relationships among users and resources
  • (e.g., Bob is the owner of the photo album and
    Alice is tagged in this photo),
  • Actions (e.g., post a message on someones wall).

41
Overview of the Solution
  • We use semantic web technologies (e.g., OWL) to
    represent social network knowledge base.
  • We use semantic web rule language (SWRL) to
    represent various security, admin and filter
    policies.

42
Modeling User Profiles and Resources
  • Existing ontologies such as FoAF could be
    extended to capture user profiles.
  • Relationship among resources could be captured by
    using OWL concepts
  • PhotoAlbum rdfssubClassOf Resource
  • PhotoAlbum consistsOf Photos

43
Modeling Relationships Among Users
  • We model relationships among users by defining
    N-ary relationship
  • Christine a Person
    has_friend _Friendship_Relation_1.
  • _Friendship_relation_1 a
    Friendship_Relation Friendship_trust
    HIGH Friendship_value Mike .
  • Owl reasoners cannot be used to infer some
    relationships such as Christine is a third degree
    friend of John.
  • Such computations needs to be done separately and
    represented by using new class.

44
Specifying Policies Using OSN Knowledge Base
  • Most of the OSN information could be captured
    using OWL to represent rich set of concepts
  • This makes it possible to specify very flexible
    access control policies
  • Photos could be accessed by friends only
    automatically implies closeFriend can access the
    photos too.
  • Policies could be defined based on user-resource
    relationships easily.

45
Security Policies for OSNs
  • Access control policies
  • Filtering policies
  • Could be specified by user
  • Could be specified by authorized user
  • Admin policies
  • Security admin specifies who is authorized
    specify filtering and access control policies
  • Exp if U1 isParentOf U2 and U2 is a child than
    U1 can specify filtering policies
    for U2.

46
Security Policy Specification (using semantic web
technologies)
  • Semantic Web Rule Language (SWRL) is used for
    specifying access control, filtering and
    authorization policies.
  • SWRL is based on OWL
  • all rules are expressed in terms of OWL concepts
    (classes, properties, individuals, literals).
  • Using SWRL, subject, object and actions are
    specified
  • Rules can have different authorization that
    states the subjects rights on target object.

47
Knowledge based for Authorizations and
Prohibitions
  • Authorizations/Prohibitions needs to be specified
    using OWL
  • Different object property for each actions
    supported by OSN.
  • Authorizations/prohibitions could automatically
    propagate based on action hierarchies
  • Assume post is a subproperty of write
  • If a user is given post permission than user
    will have write permission as well
  • Admin Prohibitions need to be specified slightly
    different. (Supervisor, Target, Object,
    Privilige)

48
Security Rule Examples
  • SWRL rule specification does depend on the
    authorization and OSN knowledge bases.
  • It is not possible to specify generic rules
  • Examples

49
Security Rule Enforcement
  • A reference monitor evaluates the requests.
  • Admin request for access control could be
    evaluated by rule rewriting
  • Example Assume Bob submits the following admin
    request
  • Rewrite as the following rule

50
Security Rule Enforcement
  • Admin requests for Prohibitions could be
    rewritten as well.
  • Example Bob issues the following prohibition
    request
  • Rewritten version
  • Access control requests needs to consider both
    filter and access control policies

51
Framework Architecture
52
Conclusions
  • Various attacks exist to
  • Identify nodes in anonymized data
  • Infer private details
  • Recent attempts to increase social network access
    control to limit some of the attacks
  • Balancing privacy, security and usability on
    online social networks will be an important
    challenge
  • Directions
  • Scalability
  • We are currently implementing such system to test
    its scalability.
  • Usability
  • Create techniques to automatically learn rules
  • Create simple user interfaces so that users can
    easily specify these rules.
Write a Comment
User Comments (0)
About PowerShow.com