Title: Information Revelation and Privacy in Online Social Networks
1Information Revelation and Privacy in Online
Social Networks
- Ralph Gross and Alessandro Acquisti
- rgross_at_cs.cmu.edu acquisti_at_andrew.cmu.edu
- Heinz Seminars, October 3rd, 2005
2Information revelation and privacyin online
social networks
- Online social networks (OSN) sites that
facilitate interaction between members through
their self-published personal profiles - How much do users of OSN reveal about themselves
online? - A lot
- To whom?
- Friends and strangers
- Why?
3Why?
- Rationality hypothesis signaling
- Low privacy sensitivity
- Herding behavior
- Peer pressure
- Myopic discounting
- Incomplete information
4Privacy, economics, and rationality
- Incomplete information
- Bounded rationality
- Affective processes, psychological/behavioral
deviations from pure rationality model
5Our study
- Starts research on privacy implications of OSN
- Provides first quantification of observed
behavior - Studies actual usage data
- Discusses trade-offs and incentives and advances
behavioral hypotheses - Yet, still preliminary
- Implications extend beyond OSN domain
6Agenda
- Online social networks
- The Facebook
- CMU students and the Facebook
- Usage data
- Patterns of information revelation
- Inferred privacy preferences
- Risks and trade-offs
- User survey (pilot)
- Users knowledge and expectations
- Drivers and incentives
- Next step
- Experiments
7Online Social Networks
8What are online social networks?
- Sites that facilitate interaction between members
through their self-published personal profiles - Common core
- Through the site, individuals offer
representations of their selves to others to
peruse, with the intention of contacting or being
contacted by others, to meet new friends or
dates, find new jobs, receive or provide
recommendations, - Progressive diversification and sophistication of
purposes and usage patterns - Social Software Weblog groups hundreds of social
networking sites in nine categories (business,
common interests, dating, facetoface
facilitation, friends, pets, photos, ) - Classifieds ltgt OSN ltgt blogs
9A history of online social networks
- 1960s Plato (University of Illinois)
- 1997 SixDegrees.com
- After 2002 commercial explosion
- Friendster, Orkut, LinkedIn, ,
- Viral growth with participation expanding at
rates topping 20 a month - 7 million Friendster users 2 millions MySpace
users 16 million registered on Tickle to take
personality test (Leonard 2004) - Revenues advertising, data trading,
subscriptions - Media attention Salon, NYT, Wired,
10Research on online social networks
- boyd (2003) trust and intimacy on OSN
- Donath and boyd (2004) representation of self on
OSN - Liu and Maes (2005) harvesting OSN for
recommender systems - (some additional research uses OSN data for other
purposes)
11From (social) network theoryto online networks
- Milgram (1967) the small world problem
- Watts (2003) six degrees
- Granovetter (1973, 1983) weak and strong ties
- Milgram (1977) the familiar stranger
- What about the unknown buddy?
12Social network theory and privacy
- Strahilevitz (2005)
- Discourse about privacy should be based on what
the parties should have expected to follow the
initial disclosure of information by someone
other than the defendant - Consideration of expected information flows
within/outside somebodys social network should
inform that persons expectations for privacy - However, application to online social network
reveals challenges
13Online vs offline social networks
- Offline extremely diverse ties. Online
simplistic binary relations (boyd 2004) - Number of strong ties not significantly
increased, but number of weak ties can increase
substantially (Donath and boyd 2004) - From a dozen of intimate ties plus 1000 to 1700
acquaintances, to hundreds of direct friends
and hundreds of thousands of relations
14Hence
- Online social networks are vaster and have more
weaker ties than offline social networks - An imagined community?
- Anderson (1991)
- Intimacy and trust
- Sharing same personal information with a large
and potential unknown number of friends and
strangers - Intimate with everybody? (Gerstein 1984)
- Ability to meaningfully interact with others is
mildly augmented, while ability of others to
access the person is significantly enlarged
15Online social networks and personal information
- Pretense of identifiability changes across
different types of sites - Anonymous ltgt Pseudonymous ltgt Fully identified
- Type of information revealed or elicited often
orbits around hobbies and interests, but can
stride from there in different directions - From classified to journals
- Visibility of information is highly variable
- Members only
- Everybody
16Online social networks and privacy
- Privacy implications of OSN depend on the level
of identifiability of the information provided,
its possible recipients, and its possible uses - Re-identification
- Two directions knowngtadditional information
unknowngtknown - To whom may identifiable information be made
available? - Site, third-parties (hackers, government), users
(little control on social network and its
expansion) - Risks
- From identity theft to online and physical
stalking from embarrassment and blackmailing to
spam and price discrimination
17Online social networks and privacy
- And yet
- OSN can also offer tools to address online
privacy problems - Social networking has the potential to create an
intelligent order in the current chaos by letting
you manage how public you make yourself and why
and who can contact you. Tribe.net CEO Mark
Pincus - Is that true?
18The Facebook
19The Facebook
- www.facebook.com
- Started February 2004
- Attracted Silicon Valley funding
- Has spread to 2000 schools and 4.2 million users
- Typically attracts 80 percent of a schools
undergraduate population - Also gets graduate students, faculty members,
staff, and alumni - Now targeting high schools
- Growing media attention
20(No Transcript)
21Facebooks privacy policy
- is lax, but straightforwardly so
- Facebook also collects information about you
from other sources, such as newspapers and
instant messaging services. This information is
gathered regardless of your use of the Web Site. -
- We use the information about you that we have
collected from other sources to supplement your
profile unless you specify in your privacy
settings that you do not want this to be done. -
- In connection with these offerings and business
operations, our service providers may have access
to your personal information for use in
connection with these business activities.
22Facebook and unique privacy issues
- Unique data
- Includes home location, current location (from IP
address), etc. - Uniquely identified
- College email account
- Contact information
- Ostensibly bounded community
- Shared real space
- or imagined community?
23CMU students and the Facebook usage data
24Studies
- Gross and Acquisti, Proceedings of WPES 2005
- Acquisti and Gross, Proceedings of PET 2006
25Data gathering
- In June 2005, we created Facebook profiles with
different characteristics - E.g., degree of connectedness, geographical
location, - We searched for CMU Facebook members profiles
using advanced search feature and extracted
profile IDs - Downloaded profiles
- Inferred additional information not immediately
visible from profiles
26Demographics
27Demographics
28Demographics
29Information revelation
30Information revelation
- Male users 63 more likely to leave phone number
than female users - Single male users tend to report their phone
numbers in even higher frequencies
31Data verifiability
32Data verifiability
33Privacy risks
- Stalking
- Re-identification
- Digital dossier
34Privacy risks Stalking
- Real-World Stalking
- College life centers around class attendance
- Facebook users put home address and class list on
their profiles whereabouts are known for large
portions of the day - Online stalking
- Facebook profiles list AIM screennames
- AIM lets users add buddies without notification
- Unless AIM privacy settings have been changed,
adversary can track when user is online
35Privacy risks Re-identification
- Demographics re-identification
- 87 of US population is uniquely identified by
gender, ZIP, date of birth (Sweeney, 2001) - Facebook users that put this information up on
their profile could link them up to outside,
de-identified data sources - Face re-identification
- Facebook profiles often show high quality facial
images - Images can be linked to de-identified profiles on
e.g. Match.com or Friendster.com using face
recognition - Social Security Number re-identification
- Anatomy of a social security number xxx yy zzzz
- Based on hometown and date of birth xxx and yy
can be narrowed down substantially
36Privacy risks Digital Dossier
- Users reveal sensitive information (e.g. current
partners, political views) in profiles - Simple script programs allow adversaries to
continuously retrieve and save all profile
information - Cheap hard drives enable essentially indefinite
storage
37Privacy risks
38Data accessibility
39Data accessibility
40Data accessibility
- Profile Searchability
- We measured the percentage of users that changed
search default setting away from being searchable
to everyone on the Facebook to only being
searchable to CMU users - 1.2 of users (18 female, 45 male) made use of
this privacy setting - Profile Visibility
- We evaluated the number of CMU users that changed
profile visibility by restricting access from
unconnected users - Only 3 profiles (0.06) in total fall into this
category - Caveat We would not detect users who had made
themselves both unsearchable and invisible within
CMU network (safe to assume their number is very
low)
41Data accessibility
42Actual data accessibilityAn imagined community?
- Extensive, uncontrolled social networks
- Fragile protection
- Fake email addresses
- Manipulating users
- Geographical location
- Advanced search features
- Using advanced search features various profile
information can be searched for, e.g.
relationship status, phone number, sexual
preferences, political views and (college)
residence - By keeping track of the profile IDs returned in
the different searches a significant portion of
the previously inaccessible information can be
reconstructed - AIM
- Facebook profiles are, effectively, public data
43Actual data accessibilityAn imagined community
- What a great illustration of how things you
might not mind being public in one context can
cause all sorts of problems when they wind up
globally public. - CMU student
44Initial hypotheses
- Default settings (Mackay 1991)/ Myopic
discounting? - Less than 2 make their profiles less searchable
- Less than 1 make their profiles less visible
- Peer pressure
- Incomplete information and biased perspectives
- An imagined community
- Or simply
- Low privacy concerns
- Signaling
- Single males list phone number with highly
significant more frequency than females
45User survey (pilot)
46(Pilot) Survey
- Goals
- Understand CMU Facebooks users degree of
awareness about the site and its information
revelation patterns understand their privacy
attitudes and expectations - Thirty-six online questions
- Anonymous, paid
- Pilot
- 50 subjects
- Focused on Facebook users
- Survey link
47CAVEAT The following results are based on our
pilot test (50 subjects). Hence they must only be
considered suggestive trends rather than robust
evidence. We are now exploring the same questions
in the full survey please contact us for the
most recent results acquisti_at_andrew.cmu.edu.
48Generic concerns (7-point Likert scale)
49Specific concerns (7-point Likert scale)
50Attitudes vs. behavior
- Share of users with high sensitivity (Likert gt5)
to partner/sexual orientation information who
provide it on Facebook 70 - Share of users with high sensitivity (Likert gt5)
to home location and class schedule information
who provide it on Facebook 32 - Share of users with high sensitivity (Likert gt5)
to contact information who provide it on
Facebook 42
51Awareness visibility and searchability
- 21 incorrectly believe only CMU users can search
their profiles - 71 do not realize that everybody at UPitt can
search their profiles - 40 do not realize that anybody on Facebook can
search their profiles - 31 do not realize that everybody at CMU can read
their profiles - On the other side, 23 incorrectly believe that
everybody on Facebook can read their profiles
52Facebooks privacy policy, revisited
- Facebook also collects information about you
from other sources, such as newspapers and
instant messaging services. This information is
gathered regardless of your use of the Web Site. - 85 believe that is not the case
- We use the information about you that we have
collected from other sources to supplement your
profile unless you specify in your privacy
settings that you do not want this to be done. - 87 believe that is not the case
- In connection with these offerings and business
operations, our service providers may have access
to your personal information for use in
connection with these business activities. - 60 believe that is not the case
- Control perusal of privacy policy does not
improve awareness
53Privacy concerns
- 69 believe that the information other Facebook
users reveal may create privacy risks for those
users - But
54Information revelation
- Reasons to provide more personal information (in
order of importance) - No factor in particular, it's just fun
- No factor in particular, but the amount of
information I reveal is necessary to me and other
users to benefit from the FaceBook - No factor in particular, rather I am following
the norms and habits common on the site - Quite simply, expressing myself and defining my
online persona - Showing more information about me to "advertise"
myself - ..
- Getting more potential dates
55Other privacy concerns
- Reasons for low privacy concerns (in order of
importance) - Control on information
- Control on access
- CMU environment
- Student environment
56Other privacy concerns
- Does your Facebook profile contain information
that you might not mind being "public" within the
your Facebook or CMU network, but that would
indeed bother you if other people could access
(e.g., family, interviewers, etc.)? - 50 answer yes
57Is it possible/likely?
58Next steps
59Next steps
- Full survey
- Users and non-users different privacy
sensitivities? - Experiments
- Control for initial privacy settings
- Control for perception of other users
information patterns - Control for perception of other users
information revelation - Other scripts
- Study evolution of a new network
- Study dynamics of information revelation
60Conclusions
- OSN offer exciting ground for privacy research
- Plenty of information revelation
- Alternative explanations
- Actual usage data
- The unknown buddy?
- An imagined community?
61Conclusions
- Facebook users claim, in general, to be concerned
about their privacy but - Publish plenty of personal information
- Do not use privacy enhancing features
- However, they are both
- uninformed about specific information revelation
patterns - aware of generic possibilities
- Suggestive evidence pointing towards
- Signaling, but also
- Myopic discounting
- Incomplete information