Tutorial: Analyzing real network data - PowerPoint PPT Presentation

About This Presentation
Title:

Tutorial: Analyzing real network data

Description:

Both work by extending the RNM approach of Moody (2001), but jiggle is faster ... We end up with smaller clusters, and a larger 'background' set. ... – PowerPoint PPT presentation

Number of Views:185
Avg rating:3.0/5.0
Slides: 38
Provided by: James9
Learn more at: https://people.duke.edu
Category:

less

Transcript and Presenter's Notes

Title: Tutorial: Analyzing real network data


1
Tutorial Analyzing real network data 1) Creating
data from survey
  • You can download all of the needed files from
    here
  • http//www.soc.duke.edu/jmoody77/rwj/wsfiles.htm
  • This is data (modified) from one of the Add
    Health schools. Ive changed the data some for
    security reasons. Well walk through some of the
    data coding issues, creating measures figures,
    and then running peer influence structural
    models on the network.
  • Outline
  • From survey to analysis files
  • Exploring the network visualization
  • Network Behavior Peer Influence Models
  • Network structure as indep variable
  • Peer influence models
  • Dyad similarity models
  • Network Structure analyses
  • Clustering for peer groups
  • Block models
  • Statistical Models for networks (STANET).

2
Tutorial Analyzing real network data 1) Creating
data from survey
This is what students filled out in the Add
Health, in school survey. One set for male
friends, another for female friends. This is the
foundation of our data.
3
Tutorial Analyzing real network data 1) Creating
data from survey
This is what students filled out in the Add
Health, in school survey. One set for male
friends, another for female friends. This is the
foundation of our data. Resulting in a
nomination data file that looks something like
this (actual numbers changed). We want to turn
this file into something PAJEK, UCINET, etc. can
read. Open netcreate.sas walk through logic
of the file.
4
Tutorial Analyzing real network data 1) Creating
data from survey
Netcreate.sas used files from SPAN to create
PAJEK files. PAJEK files have a fixed structure
that is easy to program for. See the PAJEK
support files for details. There are programs
that convert excel or text to PAJEK format. And
UCINET (and STATNET, sort of ) all read pajek
.NET files.
5
Tutorial Analyzing real network data 2)
Exploring the network graphically
I think its extremely useful to simply play
with the network in various ways and get a sense
of the shape of the network. This is perhaps
PAJEKs most usefule effect. -- Load a network
and work through good/bad plots.
6
Tutorial Analyzing real network data 2)
Exploring the network graphically
  • Once you have a network, how do you create a
    print-ready image?
  • Screen shots (good for .ppt)
  • Export to .ps or FLASH and edit in Illustrator

7
Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
We often want to know how some simple features of
the network position affect students. These are
network behavior models, where some indicator
measure of network position is used to predict an
outcome. One should think carefully about a
theoretical model here. Cause is often very
difficult to disentangle. Here well leave those
questions asside and simply look for correlates
of network position in behavior. Well look
at a) network volume (degree) b) centrality
(Closeness) c) local reciprocity (proportion of
ties ego send that are received) We can get most
of these from either SAS or PAJEK, though Im not
sure PAJEK can give you node-level reciprocity
rates Paj_nodestatread.sas is the SAS file
8
Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
Paj_nodestatread.sas is the SAS file After
running some models we get
9
Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
Open nodestats1.sas to see how to code these same
stats, plus a few, in SAS
10
Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
QAP is an alternative method that doesnt make as
many strong assumptions about the model. To use
QAP, we can run in SAS (but its slow and basic),
or export to UCINET (which is fast, sophisticated
and all that jazz). The qapstats.sas file
moves the data for us.
11
Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
We can also estimate the network autocorrelation
model directly. We can get QAD estimates just
by adding the WY term to the base model, which
typically performs fairly well. Open
peerinfl1.sas to see this routine. Alternatively,
UCINET calculates a simple network correlation
between any vector (Nx1) and any matrix (NxN) to
estimate the bivariate peer effect, and Carter
Butts LNAM routine in R (as part of SNA), lets
you run a full linear network autocorrelation
model. For stats details Leenders, T.Th.A.J.
(2002) Modeling Social Influence Through
Network Autocorrelation Constructing the Weight
Matrix'' Social Networks, 24(1), 21-47.
Anselin, L. (1988) Spatial Econometrics
Methods and Models. Norwell, MA Kluwer
12
Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
To run the R version, we need to export the data.
We can get started using the send2r.mac routine
and reshape some of the files. The sas program
sas2r_peerinfl.sas creates the needed external
files The r script lname_example.r is the
needed r script. Run the example models.
Call lnam(y fights, x cv, W1 w1, W2
clbs) Residuals Min 1Q Median 3Q
Max -1.3138 -0.7955 -0.3844 0.3147 3.6792
Coefficients Estimate Std. Error Z
value Pr(gtz) FEMALE -0.292433 0.144148
-2.029 0.042489 WHITE 0.160314 0.149228
1.074 0.282692 S3 0.061595 0.014843
4.150 3.33e-05 rho1.1 0.379421 0.103426
3.669 0.000244 rho2.1 0.001573 0.003954
0.398 0.690870 ---
Result of fights as Y, friendship as W1, club
overlap as W2
13
Tutorial Analyzing real network data 3) Network
Behavior Peer Influence
Getting measures from PAJEK. PAJEK has no
direct ID link to files. These are simply text
files, so sort order matters. The basic routine
to get any measure in PAJEK is to create the
measure using the dropdown menus, then save the
files and read them into SAS, SPSS or whatever
stats program you use. Open the PAJEK files and
create in-degree, out-degree, closeness
centrality, reciprocity.
14
Tutorial Analyzing real network data 4) Network
Structure Clustering the network
As part of the description, we often want to
identify significant clusters in the network.
There are lots of ways to do this, well sample a
few. a) Using UCINETs routines b) Clustering a
distance matrix (SAS) c) The Jiggle routine
(SAS, Moody) d) The Crowds algorithm e) Using
PAJEKs blockmodel routine to fine-tune a peer
group model.
15
Tutorial Analyzing real network data 4) Network
Structure Clustering the network
  • Clustering in UCINET
  • -I find it simplest to read PAJEK files. Then
    the best general routine is FACTIONS, though it
    is slow for large (100s) nets. Very effective
    for small nets.
  • In a pinch, CONCOR will often yield reasonable
    peer groups, and its faster in UCINET
  • Clustering in SAS
  • - We can often get a quick starting point by
    simply using a hierarchical clustering on the
    distance matrix. This is a fair place to start
    for nets in the 100s of nodes size.
  • - Two algorithms that work fairly well are
    Jiggle for large nets and Crowds for smaller
    nets. Both work by extending the RNM approach of
    Moody (2001), but jiggle is faster for large
    nets, Crowds includes more checks for particular
    structurs (like biconnected sets) and thus is
    slower.

16
Tutorial Analyzing real network data 4) Network
Structure Clustering the network
Clustering in PAJEK Pajek doesnt have a
dedicated clustering routine for finding peer
groups in nets. But you can coerce the
blockmodel routine to find block-diagonal
structures (slow) or use some of its neighboring
partitions. Keep an eye on this, as I bet they
implement Newmans algorithm soon Lets try
running some of these.
17
Tutorial Analyzing real network data 4) Network
Structure Clustering the network
Sample results This is the resulting graph from
a Jiggle run on the school net. Note this is a
randomized algorithm, so you will get dif.
Results from dif. runs
18
Tutorial Analyzing real network data 4) Network
Structure Clustering the network
Sample results This is the resulting graph from
a Crowds run on the school net. We end up with
smaller clusters, and a larger background set.
By construction, the clusters must be
bi-connected, so they are rounder than in the
prior algorithm.
19
Tutorial Analyzing real network data 4) Network
Structure Clustering the network
Sample results This is the resulting graph from
a Crowds run on the school net. We end up with
smaller clusters, and a larger background set.
By construction, the clusters must be
bi-connected, so they are rounder than in the
prior algorithm.
20
Tutorial Analyzing real network data 4) Network
Structure Clustering the network
Sample results This is the resulting graph from
a Crowds run on the school net. We end up with
smaller clusters, and a larger background set.
By construction, the clusters must be
bi-connected, so they are rounder than in the
prior algorithm.
21
Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
Split 1
Sample results The most commonly used blockmodel
routine is ConCorr, which is simple and fast.
The result is a set of nested splits to some
pre-specified depth. Here I apply that result to
the school net, working to a depth of 3 splits.
22
Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
Split 2
Sample results The most commonly used blockmodel
routine is ConCorr, which is simple and fast.
The result is a set of nested splits to some
pre-specified depth. Here I apply that result to
the school net, working to a depth of 3 splits.
23
Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
Split 3
Sample results The most commonly used blockmodel
routine is ConCorr, which is simple and fast.
The result is a set of nested splits to some
pre-specified depth. Here I apply that result to
the school net, working to a depth of 3 splits.
24
Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
More in keeping w. the spirit of the original
block modeling papers, regular equivalence
models are less likely to generate block-diagonal
models. A simple positional model is the
core-periphery model. This searches for a
single core in the net. Since we know this net
is split in two wings, well just look within
one of them.
25
Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
Another simple way to get at positions in a
network is to compare nodes across a vector of
triad-positions. In a directed network, the
vector giving the count of which positions an
actor is part of nicely summarizes the type of
role the actor plays in the net.
26
Tutorial Analyzing real network data 4) Network
Structure Block modeling a network
Another simple way to get at positions in a
network is to compare nodes across a vector of
triad-positions. In a directed network, the
vector giving the count of which positions an
actor is part of nicely summarizes the type of
role the actor plays in the net.
27
Tutorial Analyzing real network data 4)
Statistical Models for Networks
The exponential random graph (ERGM) class of
models are designed to let you model an observed
network as a function of local-network, node, and
dyad-level features. These models take the form
28
Tutorial Analyzing real network data Statistical
Models for Networks
http//csde.washington.edu/statnet/Sunbelt2006/erg
mssunbeltxxviintroduction.ppt
29
Tutorial Analyzing real network data Statistical
Models for Networks
http//csde.washington.edu/statnet/Sunbelt2006/erg
mssunbeltxxviintroduction.ppt
30
Tutorial Analyzing real network data Statistical
Models for Networks
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
31
Tutorial Analyzing real network data Statistical
Models for Networks
Note this is a very simple dyad independence
model.
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
32
Tutorial Analyzing real network data Statistical
Models for Networks
The dyad-independence model had been extended to
other node features
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
33
Tutorial Analyzing real network data Statistical
Models for Networks
Lots of other structural features can be
included, though not all imply reasonable models
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
34
Tutorial Analyzing real network data Statistical
Models for Networks
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
35
Tutorial Analyzing real network data Statistical
Models for Networks
  • The STATNET statistical package in R is the best
    way to estimate these models.
  • We will
  • walk through exporting our school friendship
    data from SAS and bringing it into R.
  • Specify some simple models
  • Demonstrate getting goodness of fit stats on
    these models
  • Demonstrate simulating from a model
  • The ultimate set of stats one can add to a model
    are growing quickly.
  • Open statnet_datawrite.sas to see how to create
    data for export.

From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
36
Tutorial Analyzing real network data Statistical
Models for Networks
Results from a model (takes too long to run in
real time!)
Summary of model fit F
ormula s_friends edges mutual ttriad
nodematch("S3") nodematch("WHITE")
edgecov(s_clubs, "ovlpec") Newton-Raphson
iterations 87 MCMC sample of size 10000
Monte Carlo MLE Results
estimate s.e. p-value MCMC s.e. edges
-6.0927 0.1590376 lt 1e-04 3.054007
mutual 1.7009 0.3217789 lt
1e-04 0.716237 ttriad 0.4666
0.0003942 lt 1e-04 0.006069 nodematch.S3
1.4469 0.1719817 lt 1e-04 0.597009
nodematch.WHITE 0.9567 0.2931915
0.00110 2.890984 edgecov.s_clubs.ovlpec 0.2689
0.1585942 0.09001 0.555580 Null Deviance
85606.4 on 61752 degrees of freedom Residual
Deviance 6867.4 on 61746 degrees of freedom
Deviance 78739.0 on 6 degrees of
freedom AIC 6879.4 BIC 6933.6
From Handcock (2006)http//csde.washington.edu/st
atnet/Sunbelt2006/ergmssunbeltxxviergmclass.pdf
37
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com