Title: A Study of social influence in diffusion of innovation over Facebook
1A Study of social influence in diffusion of
innovation over Facebook
- Shaomei Wu
- sw475_at_cornell.edu
- Information Science
- Cornell University
- Information Science Breakfast Dec 5 2008
2Diffusion of Innovation
- Diffusion is the process in which an innovation
is communicated through certain channels over
time among the members of a social system. - Everett M. Rogers
- innovation Friendship Quiz a Facebook
application - Communicated Invitations among Facebook
friends - time September 25 2008 Now
- social system Facebook
Rogers Everett M. (2003). Diffusion of
Innovations 5th ed.. New York NY Free Press
pp 5-6
3Basic Diffusion Models
Threshold Model
Cascade Model
Statistically Equivalent
David Kempe Jon Kleinberg Eva Tardos.
Maximizing the Spread of Influence through a
Social Network. KDD 2003
4Cascade Model
- Each recommendation will succeed with certain
probability.
h
k
b
c
pgk
i
pab
pab
pac
pdi
g
pgl
pag
d
a
pad
l
pdj
j
paf
pae
non-adopter adopter social link recommendation
f
e
Question how to estimate puv
5Question how to estimate puv
- Current practice
- Constant 1
- Based on ONLY network structure (e.g.
in/out-degree) 2
Do individuals and the social relationship among
them matter
1 Jure Leskovec Mary McGlohon Christos
Faloutsos Natalie Glance Matthew Hurst
Cascading Behavior in Large Blog Graphs. SDM
2007. 2 Jure Leskovec Lada Adamic Bernardo
Huberman. The Dynamics of Viral Marketing. ACM
Conference on Electronic Commerce (EC) 2006.
6Theories from Empirical Diffusion Research
- Opinion leaders who own greater exposure to
mass media than their followers are more
cosmopolite have greater social participation
have higher socioeconomic status and are
more innovative Rogers 2003 pp 316-318. - The importance of heterophily between
participants on certain attributes (i.e.
education and socioeconomic status) at
determining the efficiency of diffusion despite
the fact that more effective communication
occurs when two or more individuals are
homophilous Rogers 2003 pp19
7This project is to
- Model puvs for cascade model
- Identify the most influential factors at
determining puv - Predict the success of contagion
- Exploit Facebook data
- A real-world ongoing diffusion instance
- Rich and (most of the time) trustable profile
information of individuals and their social
connections/activities - Precisely timestamped diffusion process a
complete log of events
8Status
- Launched Sep 25 2008.
- Currently used data is until Nov 25 2008.
- 216 adopters
- 375 individuals
- 737 edges between 266 pairs of people
- 90 successful infection
- 178 failed infection
- Network Evolution (in the first month after
release)
9(No Transcript)
10Predict the success of invitation with SVM
- A Binary classifier
- each invitation is either successful or failed.
- Features
- Individual features
- Pair features (homophily/heterophily)
11Individual Features
of events attended/invited of photo tagged
of wall posts of networks of groups
participated of notes Religion Political
View Gender Age Culture Background Relationship
Status Work Info Education Info
Social Activeness
Innovativeness
Socioeconomics
Education
12Pair-wise Features
Age difference Same gender Same political
view Same religion Same culture background
of same networks of photos both tagged of
groups both participated of events both
attended Same education level Same high
school Same college Same workplace Same
current city
Biological traits
Belief
Socioeconomics
Proximity
13Each invitation is a training example - machine
learning.
Training Data
all numerical features are normalized across
examples.
14AdaBoost (with DecisionDump) A popular way
to do feature selection.
- Selected Features
- sender wall post count
- sender group count
- sender network count
- receiver age
- receiver group count
- sender receiver common group count
- Performance (10-fold cross validation)
- Accuracy 83.6
15SVM performance
- SVM-light (10-fold cross-validation)
16Weights from SVM
17Result
- SVM-light performance
- 209 records into 5 folds 4 for training 1 for
testing. - Performance on the testing set
- Accuracy 71.43 (30 correct 12 incorrect 42
total) - Precision/recall 55.56/38.46
- Feature weights distribution
Top weighted features 8 sender_events_invited4
sender_friend_count11 sender_gender35
receiver_is_Its Complicated5
sender_wall_post_count9 sender_note_count27.
sender_is_In a Relationship
So the story can be when a sender who has been
invited to greater number of events in Facebook
has more friends wrote more Facebook notes (blog
entries) is female has less wall posts in a
relationship tried to infect a person whose
relationship status is its complicated its
more like the infection will happen compared to
other cases.
18SVM with features selected by AdaBoost
19Background
- Diffusion of Innovation
- Question
- How does it work in large online social networks
- What are the key factors at determining the
success of infection - Can we predict the propagation path
20Hypothesis
- Social influence depends on 5 dimensions of
similarities - geographical distance
- current location(country/state/city) current
school current major year of class current
workplace current courses enrolled - background similarity
- sex sexual preference dating interest
relationship interest relationship status
birthday political view religious view
hometown address previous school previous
workplace - social similarity
- number of mutual networks they belong to
number of mutual friends - interest similarity
- activities favorite books favorite music
favorite movies favorite TV shows favorite
quotas - social status distance
- difference of numbers of friends difference
of wallpost counts difference of counts of
message sent and received difference of counts
of notes.
21Project Description
- Objectives
- Identify the key factors for social influence
- Predict occurrence of adoption based on the key
factors. - Friendship Quiz
- A Facebook application we developed
- Enable users to make quizzes and send to their
friends (take a peek!) - We track the spread of application.
22Highlights
- A real-world diffusion of innovation
- Rich and (most of the time) trustful profile
information of individuals and their social
connections/activities - Precisely timestamped diffusion process a
complete log of events - Ongoing diffusion process
23Backup Threshold Model