Title: Social Sub-groups II
 1Social Sub-groups II
- Outline 
 -  How? 
 -  - Review group-finding strategies 
 -  - Evade  PCA (SVD for the 
math-oriented!)  -  - Theory Problem What should 
group-structure be?  -  
 -  Why? 
 -  Wayne Baker 
 - Social structure in a place where there should be 
none  -  Scott Feld 
 - What causes clustering in a network? Opportunity 
and interests  - Examples from Add Health  Prosper 
 -  Practical 
 - Software  Program examples. 
 - Next week Roles  Blockmodels
 
  2Methods How do we identify primary groups in a 
network?
Strategies for identifying primary groups 
 Search 1) Fit Measure Identify a measure of 
groupness (usually a function of the number of 
ties that fall within group compared to the 
number of ties that fall between group). 2) 
Algorithm to maximize fit. Once we have the 
index, we need a clever method for searching 
through the network to maximize the fit. See 
Jiggle, Factions etc. Destroy Break 
apart the network in strategic ways, removing the 
weakest parts first, whats left are your primary 
groups. See edge betweeness MCL 
Evade Dont look directly, instead find a 
simpler problem that correlates Examples 
Generalized cluster analysis, Factor Analysis, RM. 
 3Strategies for identifying primary groups 
 Search - UCINETs Factions - Rs 
FastGreedy - PAJEKs Generalized 
block-modeling - Franks KliqueFinder 
Destroy Edge-betweenness reduction MCL Flow 
model Evade Leading Eigenvector 
model Clustering Distance (or other) 
matrix Principle Component / Factor / SVD 
methods RNM Hybrids Use a 
simple evade technique for starting values and 
then use a search technique. (CROWDS, JIGGLE) 
 4Strategies for identifying primary groups  
Evade
Factor Analysis Treat the adjacency/similarity 
matrix as a set of N variables and look for 
latent factors that explain the variance in the 
data.
IQ
SES
1.0
1.0
Math Score
Income
0.0
0.0
d
d
We often use simple indicators and assume they 
measure our concepts 
 5Strategies for identifying primary groups  
Evade
Factor Analysis Treat the adjacency/similarity 
matrix as a set of N variables and look for 
latent factors that explain the variance in the 
data.
IQ
SES
Income
Reading Score
Occupation
Highest Degree
House Size
Languages Spoken
Math Score
d
d
d
d
d
d
d
But we dont have to! We can imagine that each 
latent concept causes our indicators, and build a 
measurement model. 
 6Strategies for identifying primary groups  
Evade
Factor Analysis Treat the adjacency/similarity 
matrix as a set of N variables and look for 
latent factors that explain the variance in the 
data.
But we dont have to! We can imagine that each 
latent concept causes our indicators, and build a 
measurement model. 
 7Strategies for identifying primary groups  
Evade
Factor Analysis Treat the adjacency/similarity 
matrix as a set of N variables and look for 
latent factors that explain the variance in the 
data.
In a network, we assume that the tie pattern is 
an imperfect measure of an underlying latent 
structure that we can explain with similar 
factors. Instead of lots of measurements we 
have many columns in the adjacency (sim) matrix, 
and we can summarize that with factor scores. -- 
works best if the similarity matrix has more 
information  so multiple account data are 
perfect.  or you can transform the data in some 
way to more information (like use a distance 
matrix. 
 8Strategies for identifying primary groups  
Evade
Factor Analysis Treat the adjacency/similarity 
matrix as a set of N variables and look for 
latent factors that explain the variance in the 
data.
Here is code I used in the PROSPER data
/ this section builds info on how to weight 
dyads for in-group, out-group. / twostp((adjma
tadjmat)gt0)adjmat / make it either direction 
w. the first term / ttieadjmattwostp /1 if 
tie contributes to a transitive triple 
/ ttie((ttiettie)) adjrawadjmat 
 adjmat(adjmatadjmat) / force it to be 
symetric, 1asym 2reciped / adjmatadjmat-diag(
adjmat) / remove any self ties 
/ d2reachlim((adjmatgt0),3) / re-weight to 
bias toward recip ties / wm_4  
(d21)(adjmat2)8 / recip direct ties 
/ wm_2a  (d21)(adjmat1)4 / unrecip 
direct ties / wm_1  2(d22)/ ties 2-steps 
out / wm_p5  0(d23) / ties 3-steps out - 
note it's zeroed out here/ wmwm_4wm_2awm_1w
m_p5(3(ttie/(max(ttie)))) / transitivity is 
at the end/ wmwm-diag(wm) 
 9Strategies for identifying primary groups  
Evade
Factor Analysis Treat the adjacency/similarity 
matrix as a set of N variables and look for 
latent factors that explain the variance in the 
data.
Here is code I used in the PROSPER data
 / run factor analysis. Note nfactors is a high 
value, should only take those w. EV gt 2, but 
this gives us room... / proc factor 
rotatevarimax minminev outfactset 
datasymmat nfactors175 outstatfscores 
noprint run quit  
 10Strategies for identifying primary groups  
Evade
Result 
 11Strategies for identifying primary groups  
Evade
Result
Each column is a person, these are the factor 
loadings for each person on each retained factor. 
 12Strategies for identifying primary groups  
Evade
Result
Sociogram for a single school 
 13Strategies for identifying primary groups  
Evade
Result
- Sociogram for a single school. 
 - Problem is that there are no necessary 
connectivity checks  you can get groups that 
are disconnected.  - Biggest strengths are 
 - Really fast 
 - Allows for overlapping groups 
 - Gives you embeddedness scores based on factor 
loadigs 
  14Strategies for identifying primary groups  
Hybrid
The Crowds Algorithm 1. Identify members of 
network bicomponents, remove people not included. 
 2. Cluster the reduced network. - Identify 
optimal number of groups (TREEWALK) - 
For each level of the cluster partition tree do 
(BFS) -Move up the tree from smaller to 
larger groups. -If the fit for both groups 
is improved by joining them then do so. -If 
not, then identify group at that level. 
-End TREEWALK. Do until all groups are 
identified (GLOBAL LOOP) 3. Evaluate node 
fit. Do until nodes cannot be moved 
 For each identified cluster do 
(GRPCHECK) - Ensure group is a 
bi-component. -Calculate effect on 
group a of moving node j to group a. 
 -Calculate effect on j's present group of 
removing j. - If there is a 
positive net gain to moving j from own group to 
a, then do so. End. 4. Identify 
Bridging members. -If removing j from group a 
would improve the fit of group a, AND assigning j 
to any other group would lower the fit for that 
group, then j is considered a bridge. Place all 
bridges in separate class. 5. Group Check. Check 
returns to combining groups. IF merging groups 
would improve the fit of all groups to be merged, 
then do so. - Evaluate bridges, to be sure that 
they are not bridging two groups that have now 
merged. End Global loop.    
 15Return to first question What is a group?  
- The simple notions of a complete clique are 
difficult to square w. real-world data.  - Density is an indicator, but subject to 
over-grouping (no connectivity) and 
star-patterns.  - Groups are likely internally differentiated  
with core vs. periphery members  - Most sociological theories of groups rest on 
transitive closure and short distances  - Theres a sense that members are equal  a 
tight-knit group  - The group should be fairly small  face-to-face 
scale  - The social processes underlying the group turn on 
reciprocity, trust, communication, homogeneity of 
norms  beliefs.  - Almost all require a comparative set in-group to 
out-group. It is relational not essential.  - Cross-cutting social circles  would lead us to 
expect overlapping groups, but in practice most 
methods do not do that, as its analytically too 
cumbersome.  - Practically, group detection is hard and most 
methods will give you (slightly) different 
results. You can compare results using a Rand 
statistic (proportion of pairs similarly 
categorized in two partitions), but for small 
settings these differences can matter. 
  16Social Sub-groups why look?
Wayne Baker The Social Structure of a National 
Securities Market 1) Behavioral assumptions of 
economic actors 2) Micro-structure of 
networks 3) Macro-structure of networks 4) 
Price Consequences
Under standard economic assumptions, people 
should act rationally and act only on price. 
This would result in expansive and homogeneous 
(I.e. random) networks. It is, in fact, this 
structure that allows microeconomic theory to 
predict that prices will settle to an optimal 
equilibrium 
 17Bakers Model 
 18Bakers Model
He makes two assumptions in contrast to standard 
economic assumptions a) that people do not have 
access to perfect information and b) that some 
people act opportunistically 
He then shows how these assumptions change the 
underlying mechanisms in the market, focusing on 
price volatility as a marker for 
uncertainty. The key on the exchange floor is 
market makers people who will keep the process 
active, keep trading alive, and thus not hoard 
(and lower profits system wide) 
 19Bakers Model
Micronetworks Actors should trade extensively 
and widely. Why might they not? A) Physical 
factors (noise and distance) B) Avoid risk and 
build trust 
Macro-Networks Should be undifferentiated. Why 
not? A) Large crowds should be more 
differentiated than small crowds. Why?
Price consequences Markets should clear. They 
often dont. Why? Network differentiation 
reduces economic efficiency, leading to less 
information and more volatile prices  
 20Baker Use frequency of exchange to identify the 
network, resulting in
Baker finds that the structure of this network 
significantly (and differentially) affects the 
price volatility of the network Groups found 
w. NEGOPY 
 21The one other program you should know about is 
NEGOPY. Negopy is a program that combines 
elements of the density based approach and the 
graph theoretic approach to find groups and 
positions. Like CROWDS, NEGOPY assigns people 
both to groups and to outsider or between 
group positions. It also tells you how many 
groups are in the network. Its a DOS based 
program, and a little clunky to use, but 
NEGWRITE.MOD will translate your data into NEGOPY 
format if you want to use it. There are many 
other approaches. If youre interested in some 
specifically designed for very large networks 
(10,000 nodes), Ive developed something I call 
Recursive Neighborhood Means that seems to work 
fairly well. 
 22Baker Because size is the primary determinant of 
clustering in this setting, he concludes that the 
standard economic assumption of large market  
efficient is unwarranted.  
 23Scott Feld Focal Organization of Social Ties
Feld wants to look at the effects of constraint  
opportunity for mixing, to situate relational 
activity within a wider context. The contexts 
form Foci, A social, psychological, legal or 
physical entity around which joint activities are 
organized (p.1016) People with similar foci 
will be clustered together. He contrasts this 
with social balance theory. Claim that much of 
the clustering attributed to interpersonal 
balance processes are really due to focal 
clustering. (note that this is not theoretically 
fair critique -- given that balance theory can 
easily accommodate non-personal balance factors 
(like smoking or group membership) but is a good 
empirical critique -- most researchers havent 
properly accounted for foci.) 
 24Observed Clustering within Adolescent Social 
Networks
Network Characteristics of Sub Groups
- On average, 65 of a schools adolescents are in 
 cohesive sub-groups.  - 87 of all relations are within sub-groups. 
 - The average sub-group has 22 members. 
 - The average diameter for a sub-group is 3 steps. 
 - The mean segregation index is .96 (1Complete, 
0Random) 
  25Observed Clustering within Adolescent Social 
Networks
Distribution of Characteristic within groups, 
relative to school distribution 
 26Group Data in Add Health 
 27Group data in Add Health
Inter-Group Relations 
 28Group data in Prosper
- We have 368 network observations based on 2 
cohorts observed over 5 waves in 2 states. Using 
a variant of the CROWDs algorithm, I identified 
groups in every network.  - Results in about 4500 groups averaging in size of 
about 10 kids, though some settings are really 
too cohesive to break into small bits, resulting 
peer groups of 40ish kids.  
Network Group Characteristics 
 29Group data in Prosper
- We have 368 network observations based on 2 
cohorts observed over 5 waves in 2 states. Using 
a variant of the CROWDs algorithm, I identified 
groups in every network.  - Results in about 4500 groups averaging in size of 
about 10 kids, though some settings are really 
too cohesive to break into small bits, resulting 
peer groups of 40ish kids.  
  30Group data in Prosper
- We have 368 network observations based on 2 
cohorts observed over 5 waves in 2 states. Using 
a variant of the CROWDs algorithm, I identified 
groups in every network.  - Results in about 4500 groups averaging in size of 
about 10 kids, though some settings are really 
too cohesive to break into small bits, resulting 
peer groups of 40ish kids.  
  31Group data in Prosper