Title: Social Network Inspired Models of NLP and Language Evolution
1Social Network Inspired Models of NLP and
Language Evolution
Monojit Choudhury (Microsoft Research
India)Animesh Mukherjee (IIT Kharagpur) Niloy
Ganguly (IIT Kharagpur)
2What is a Social Network?
- Nodes Social entities (people, organization
etc.) - Edges Interaction/relationship between entities
(Friendship, collaboration, sex)
Courtesy http//blogs.clickz.com
3Social Network Inspired Computing
- Society and nature of human interaction is a
Complex System - Complex Network A generic tool to model complex
systems - There is a growing body of work on CNT Theory
- Applied to a variety of fields Social,
Biological, Physical Cognitive sciences,
Engineering Technology - Language is a complex system
4Objective of this Tutorial
- To show that SNIC (Soc. Net. Inspired Comp.) is
an emerging and promising technique - Apply it to model Natural Languages
- NLP, Quantitative Linguistics, Language
Evolution, Historical Linguistics, Language
acquisition - Familiarize with tools and techniques in SNIC
- Compare it with other standard approaches to NLP
5Outline of the Tutorial
- Part I Background
- Introduction 25 min
- Network Analysis Techniques 25 min
- Network Synthesis Techniques 25 min
- Break 320pm 340pm
- Part II Case Studies
- Self-organization of Sound Systems 20 min
- Modeling the Lexicon 20 min
- Unsupervised Labeling (Syntax Semantics) 20
min - Conclusion and Discussions 20 min
6Complex System
- Non-trivial properties and patterns emerging from
the interaction of a large number of simple
entities - Self-organization The process through which
these patterns evolve without any external
intervention or central control - Emergent Property or Emergent Behavior The
pattern that emerges due to self-organization
7Emergence of a networked life
Communities
Atom
Organisms
Molecule
Tissue
Cell
Organs
8Language a complex system
- Language medium for communication through an
arbitrary set of symbols - Constantly evolving
- An outcome of self-organization at many levels
- Neurons
- Speakers and listeners
- Phonemes, morphemes, words
- 80-20 Rule in every level of structure
9Syntactic Network of Words
color
sky
weight
light
1
20
blue
100
blood
heavy
red
10Complex Network Theory
- Handy toolbox for modeling complex systems
- Marriage of Graph theory and Statistics
- Complex because
- Non-trivial topology
- Difficult to specify completely
- Usually large (in terms of nodes and edges)
- Provides insight into the nature and evolution of
the system being modeled
11Internet
129-11 Terrorist Network Social Network Analysis
is a mathematical methodology for connecting the
dots -- using science to fight terrorism.
Connecting multiple pairs of dots soon reveals an
emergent network of organization.
13What Questions can be asked
- Do these networks display some symmetry?
- Are these networks creation of intelligent
objects or they have emerged? - How have these networks emerged
- What are the underlying simple rules leading
to their complex formation?
14Bi-directional Approach
- Analysis of the real-world networks
- Global topological properties
- Community structure
- Node-level properties
- Synthesis of the network by means of some simple
rules - Small-world models ..
- Preferential attachment models
15Application of CNT in Linguistics - I
- Quantitative linguistics
- Invariance and typology (Zipfs law, syntactic
dependencies) - Natural Language Processing
- Unsupervised methods for text labeling (POS
tagging, NER, WSD, etc.) - Textual similarity (automatic evaluation,
document clustering) - Evolutionary Models (NER, multi-document
summarization)
16Application of CNT in Linguistics - II
- Language Evolution
- How did sound systems evolve?
- Development of syntax
- Language Change
- Innovation diffusion over social networks
- Language as an evolving network
- Language Acquisition
- Phonological acquisition
- Evolution of the mental lexicon of the child
17Linguistic Networks
18Summarizing
- SNIC and CNT are emerging techniques for modeling
complex systems at mesoscopic level - Applied to Physics, Biology, Sociology,
Economics, Logistics - Language - an ideal application domain for SNIC
- SNIC models in NLP, Quantitative linguistics,
language change, evolution and acquisition
19Topological Characterization of Networks
20Types Of Networks and Representation
- Representation
- Adjacency Matrix
- Adjacency List
21Characterization of Complex N/ws??
- They have a non-trivial topological structure
- Properties
- Heavy tail in the degree distribution
(non-negligible probability mass towards the
tail more than in the case of an exp.
distribution) - High clustering coefficient
- Centrality Properties
- Social Roles Equivalence
- Assortativity
- Community Structure
- Random Graphs Small avg. path length
- Preferential attachment
- Small World Properties
22Degree Distribution (DD)
- Let pk be the fraction of vertices in the network
that has a degree k. - The k versus pk plot is defined as the degree
distribution of a network - For most of the real world networks these
distributions are right skewed with a long right
tail showing up values far above the mean pk
varies as k-a - Due to noisy and insufficient data sometimes the
definition is slightly modified - Cumulative degree distribution is plotted
- Probability that the degree of a node is greater
than or equal to k
23A Few Examples
Power law Pk k-a
24Friend of Friends
- Consider the following scenario
- Sourish and Ravi are friends
- Sourish and Shaunak are friends
- Are Shaunak and Ravi friends?
- If so then
- This property is known as transitivity
25Measuring Transitivity Clustering Coefficient
- The clustering coefficient for a vertex v in a
network is defined as the ratio between the total
number of connections among the neighbors of v
to the total number of possible connections
between the neighbors - High clustering coefficient means my friends know
each other with high probability a typical
property of social networks
26Mathematically
- The clustering coefficient of a vertex i is
- The clustering coefficient of the whole network
is the average - Alternatively,
27Centrality
- Centrality measures are commonly described as
indices of 4 Ps -- prestige, prominence,
importance, and power - Degree Count of immediate neighbors
- Betweenness Nodes that form a bridge between
two regions of the n/w - Where sst is total number of shortest paths
between s and t and sst (v) is the total number
of shortest paths from s to t via v
28Eigenvector centrality Bonacich (1972)
- It is not just how many people knows me counts to
my popularity (or power) but how many people
knows people who knows me this is recursive! - In context of HIV transmission A person x with
one sex partner is less prone to the disease than
a person y with multiple partners - But imagine what happens if the partner of x has
multiple partners - The basic idea of eigenvector centrality
29Definition
- Eigenvector centrality is defined as the
principal eigenvector of the adjacency matrix - Eigenvector of any symmetric matrix A aij is
any vector e such that - Where ? is a constant and ei is the centrality of
the node i - What does it imply centrality of a node is
proportional to the centrality of the nodes it is
connected to (recursively) - Practical Example Google PageRank
30Assortativity (homophily)
- Rich goes with the rich (selective linking)
- A famous actor (e.g., Shah Rukh Khan) would
prefer to pair up with some other famous actor
(e.g., Rani Mukherjee) in a movie rather than a
new comer in the film industry.
Assortative Scale-free network
Disassortative Scale-free network
31Measures of Assortativity
- ANND (Average nearest neighbor degree)
- Find the average degree of the neighbors of each
node i with degree k - Find the Pearson correlation (r) between the
degree of i and the average degree of its
neighbors - For further reference see the supplementary
material
32Community structure
- Community structure a group of vertices that
have a high density of edges within them and a
low density of edges in between groups - Example
- Friendship n/w of children
- Citation n/ws research interest
- World Wide Web subject matter of pages
- Metabolic networks Functional units
- Linguistic n/ws similar linguistic categories
33Some Examples
Community Structure in Political Books
Community structure in a Social n/w of Students
(American High School)
34Community Identification Algorithms
- Hierarchical
- Girvan-Newman
- Radicchi et al.
- Chinese Whispers
- Spectral Bisection
- See (Newman 2004) for a comprehensive
- survey (you will find the ref. in the
- supplementary material)
35Evolution of NetworksProcesses on Networks
36The World is Small!
- Registration fee for IJCNLP 2008 are being
waived for all participants get it collected
from the registration counter - How long do you think the above information will
take to spread among yourselves - Experiments say it will spread very fast within
6 hops from the initiator it would reach all - This is the famous Milgrams six degrees of
separation
37The Small World Effect
- Even in very large social networks, the average
distance - between nodes is usually quite short.
- Milgrams small world experiment
- Target individual in Boston
- Initial senders in Omaha, Nebraska
- Each sender was asked to forward a packet to a
friend who was closer to the target - Friends asked to do the same
- Result Average of six degrees of separation.
- S. Milgram, The small world problem, Psych.
Today, 2 (1967), pp. 60-67.
38Measure of Small-Worldness
- Low average geodesic path length
- High clustering coefficient
- Geodesic path Shortest path through the network
from one vertex to another - Mean path length
- l 2?ijdij/n(n1) where dij is the geodesic
distance from vertex i to vertex j - Most of the networks observed in real world have
l 6 - Film actors 3.48
- Company Directors 4.60
- Emails 4.95
- Internet 3.33
- Electronic circuits 4.34
39Random Graphs Small Average Path Length
Q What do we mean by a random graph? A
Erdos-Renyi random graph model For every pair of
nodes, draw an edge between them with equal
probability p.
Degrees of Separation in a Random Graph
- N nodes
- z neighbors per node, on average, z ltkgt
- D degrees of separation
P(k) e-ltkgt ltkgtk/k!
40Clustering
C Probability that two of a nodes neighbors
are themselves connected In a random graph
Crand 1/N (if the average degree is held
constant)
41Watts-Strogatz Small World Model
Watts and Strogatz introduced this simple model
to show how networks can have both short path
lengths and high clustering.
D. J. Watts and S. H. Strogatz, Collective
dynamics of small-world networks, Nature, 393
(1998), pp. 440442.
42Power Law
43Degree distributions for various networks
- World-Wide Web
- Coauthorship networks computer science, high
energy physics, condensed matter physics,
astrophysics - Power grid of the western United States and
Canada - Social network of 43 Mormons in Utah
44How do Power law DDs arise?
Barabási-Albert Model of Preferential Attachment
(Rich gets Richer)
(1) GROWTH Starting with a small number of
nodes (m0) at every timestep we add a new node
with m (ltm0) edges (connected to the nodes
already present in the system). (2) PREFERENTIAL
ATTACHMENT The probability ? that a new node
will be connected to node i depends on the
connectivity ki of that node
A.-L.Barabási, R. Albert, Science 286, 509 (1999)
45Growth analysisMarkov chain representation
Probability that the new edge is attached to any
of the vertices of degree k
where total number of edges
46Growth analysisMarkov chain representation
- Growth dynamics at time (t1)
Number of nodes of degree (k-1) at t
Number of nodes of degree k at t
Number of nodes of degree k at t1
47Growth analysisMarkov chain representation
The net change in npk per vertex added
for k gt m
for k m
In the stationary solution, we find
Which results
48CASE STUDY I Self-Organization of the Sound
Inventories
49Human Speech Sounds
- Human speech sounds are called phonemes the
smallest unit of a language - Phonemes are characterized by certain distinctive
features like
50Types of Phonemes
Consonants
Vowels
Diphthongs
L
/t/
/i/
/ai/
/a/
/u/
/p/
/k/
51Choice of Phonemes
- How a language chooses a set of phonemes in order
to build its sound inventory? - Is the process arbitrary?
- Certainly Not!
- What are the forces affecting this choice?
52Vowels A (Partially) Solved Mystery
- Languages choose vowels based on maximal
perceptual contrast. - For instance if a language has three vowels then
in more than 95 of the cases they are /a/,/i/,
and /u/.
53Consonants A puzzle
- Research From 1929 Date
- No single satisfactory explanation of the
organization of the consonant inventories - The set of features that characterize consonants
is much larger than that of vowels - No single force is sufficient to explain this
organization - Rather a complex interplay of forces goes on in
shaping these inventories
54Principle of Occurrence
- PlaNet The Phoneme-Language Network
- A bipartite network N(VL,VC,E)
- VL Nodes representing languages of the world
- VC Nodes representing consonants
- E Set of edges which run between VL and VC
- There is an edge e ? E between two nodes
- vl ? VL and vc ? VC if the consonant c occurs
- in the language l.
- Data Source UPSID (317 languages)
Choudhury et al. 2006 ACL Mukherjee et al. 2007
Int. Jnl of Modern Physics C
The Structure of PlaNet
55Degree Distribution of PlaNet
DD of the language nodes follows a ß-distribution
DD of the consonant nodes follows a
power-law with an exponential cut-off
Distribution of Consonants over Languages follow
a power-law
56Synthesis of PlaNet
- Non-linear preferential attachment
- Iteratively construct the language inventories
given their inventory sizes
dia e
Pr(Ci)
?x?V (dxa e)
57Simulation Result
The parameters a and e are 1.44 and 0.5
respectively. The results are averaged over 100
runs
58Principle of Co-occurrence
- Consonants tend to co-occur in groups or
communities - These groups tend to be organized around a few
distinctive features (based on manner of
articulation, place of articulation phonation)
Principle of feature economy
If a language has
in its inventory
then it will also tend to have
59How to Capture these Co-occurrences?
- PhoNet Phoneme Phoneme Network
- A weighted network N(VC,E)
- VC Nodes representing consonants
- E Set of edges which run between the nodes in
VC - There is an edge e ? E between two nodes vc1 ,vc2
? VC if the consonant c1 and c2 co-occur in a
language. The number of languages in which c1 and
c2 co-occurs defines the edge-weight of e. The
number of languages in which c1 occurs defines
the node-weight of vc1.
60Construction of PhoNet
- Data Source UPSID
- Number of nodes in VC is 541
- Number of edges is 34012
PhoNet
61Community Formation
Radicchi et al Algorithm
S
For different values of ? we get different sets
of communities
62Consonant Societies!
?0.35
?0.60
?0.72
?1.25
The fact that the communities are good can
quantitatively shown by measuring the feature
entropy
63Problems to ponder on
- Physical significance of PA
- Functional forces
- Historical/Evolutionary process
- Labeled synthesis of PlaNet and PhoNet
- Language diversity vs. Preferential attachment
64CASE STUDY II Modeling the Mental Lexicon
65Metal Lexicon (ML) Basics
- It refers to the repository of the word forms
that resides in the human brain - Two Questions
- How words are stored in the long term memory,
i.e., the organization of the ML. - How are words retrieved from the ML (lexical
access) - The above questions are highly inter-related
to predict the organization one can investigate
how words are retrieved and vice versa.
66Ways of Organization of Mental Lexicon
- Un-organized (a bag full of words) or,
- Organized
- By sound (phonological similarity)
- E.g., start the same banana, bear, bean
- End the same look, took, book
- Number of phonological segments they share
- By Meaning (semantic similarity)
- Banana, apple, pear, orange
- By age at which the word is acquired
- By frequency of usage
- By POS
- Orthographically
67Some Unsolved Mysteries You can Give it a Try
?
- What can be a model for the evolution of the ML?
- How is the ML acquired by a child learner?
- Is there a single optimal structure for the ML
or is it organized based on multiple criteria
(i.e., a combination of the different n/ws)
Towards a single framework for studying ML!!!
68CASE STUDY III SyntaxUnsupervised POS Tagging
69Labeling of Text
- Lexical Category (POS tags)
- Syntactic Category (Phrases, chunks)
- Semantic Role (Agent, theme, )
- Sense
- Domain dependent labeling (genes, proteins, )
- How to define the set of labels?
- How to (learn to) predict them automatically?
70Nothing makes sense, unless in context
- Distribution-based definition of
- Lexical category
- Sense (meaning)
- The X is
- If you X then I shall
- looking at the star PP
71General Approach
- Represent the context of a word (token)
- Define some notion of similarity between the
contexts - Cluster the contexts of the tokens
- Get the label of the tokens
w1
w3
w2
w4
72Issues
- How to define the context?
- How to define similarity
- How to Cluster?
- How to evaluate?
73Syntactic Network of Words
color
sky
weight
light
1
20
blue
100
blood
heavy
1 1 cos(red, blue)
red
74The Chinese Whisper Algorithm
color
sky
weight
0.9
0.8
light
-0.5
0.7
blue
0.9
blood
heavy
0.5
red
75The Chinese Whisper Algorithm
color
sky
weight
0.9
0.8
light
-0.5
0.7
blue
0.9
blood
heavy
0.5
red
76The Chinese Whisper Algorithm
color
sky
weight
0.9
0.8
light
-0.5
0.7
blue
0.9
blood
heavy
0.5
red
77Word Sense Disambiguation
- Véronis, J. 2004. HyperLex lexical cartography
for information retrieval. Computer Speech
Language 18(3)223-252. - Let the word to be disambiguated be light
- Select a subcorpus of paragraphs which have at
least one occurrence of light - Construct the word co-occurrence graph
78HyperLex
- A beam of white light is dispersed into its
component colors by its passage through a prism. - Energy efficient light fixtures including
solar lights, night lights, energy star lighting,
ceiling lighting, wall lighting, lamps - What enables us to see the light and
experience such wonderful shades of colors during
the course of our everyday lives?
prism
beam
dispersed
white
colors
shades
energy
fixtures
efficient
lamps
79Hub Detection and MST
prism
beam
light
dispersed
white
colors
lamps
colors
shades
beam
prism
fixtures
energy
shades
energy
efficient
dispersed
white
fixtures
efficient
lamps
White fluorescent lights consume less energy than
incandescent lamps
80Other Related Works
- Solan, Z., Horn, D., Ruppin, E. and Edelman, S.
2005. Unsupervised learning of natural languages.
PNAS, 102 (33) 11629-11634 - Ferrer i Cancho, R. 2007. Why do syntactic links
not cross? Europhysics Letters - Also applied to IR, Summarization, sentiment
detection and categorization, script evaluation,
author detection,
81Discussions Conclusions
- What we learnt
- Advantages of SNIC in NLP
- Comparison to standard techniques
- Open problems
- Concluding remarks and QA
82What we learnt
- What is SNIC and Complex Networks
- Analytical tools for SNIC
- Applications to human languages
- Three Case-studies
83Insights
- Language features complex structure at every
level of organization - Linguistic networks have non-trivial properties
scale-free small-world - Therefore, Language and Engineering systems
involving language should be studied within the
framework of complex systems, esp. CNT
84Advantages of SNIC
- Fully Unsupervised techniques
- No labeled data required A good solution to
resources scarcity - Problem of evaluation circumvented by
semi-supervised techniques - Ease of computation
- Simple and scalable
- Distributed and parallel computable
- Holistic treatment
- Language evolution psycho-linguistic theories
85Comparison to Standard Techniques
- Rule-based vs. Statistical NLP
- Graphical Models
- Generative models in machine learning
- HMM, CRF, Bayesian belief networks
JJ
NN
RB
VF
86Graphical Models vs. SNIC
- Principled based on Bayesian Theory
- Structure is assumed and parameters are learnt
- Focus Decoding parameter estimation
- Data-driven or computationally intensive
- The generative process is easy to visualize, but
no visualization of the data
- Heuristic, but underlying principles of linear
algebra - Structure is discovered and studied
- Focus Topology and evolutionary dynamics
- Unsupervised and computationally easy
- Easy visualization of the data
87Language Modeling
- A network of words as a model of language vs.
n-gram models - Hierarchical, hyper-graph based models
- Smoothing through holistic analysis of the
network topology - Jedynak, B. and Karakos, D. 2007. Unigram
Language Models using Diffusion Smoothing over
Graphs. Proc. of TextGraphs - 2
88Open Problems
- Universals and variables of linguistic networks
- Superimposition of networks phonetic, syntactic,
semantic - Which clustering algorithm for which topology?
- Metrics for network comparison important for
language modeling - Unsupervised dependency parsing using networks
- Mining translation equivalents
89Resources
- Conferences
- TextGraphs, Sunbelt, EvoLang, ECCS
- Journals
- PRE, Physica A, IJMPC, EPL, PRL, PNAS, QL, ACS,
Complexity, Social Networks - Tools
- Pajek, CUNG, http//www.insna.org/INSNA/soft_inf.
html - Online Resources
- Bibliographies, courses on CNT
90Contact
- Monojit Choudhury
- monojitc_at_microsoft.com
- http//www.cel.iitkgp.ernet.in/monojit/
- Animesh Mukherjee
- animeshm_at_cse.iitkgp.ernet.in
- http//www.cel.iitkgp.ernet.in/animesh/
- Niloy Ganguly
- niloy_at_cse.iitkgp.erent.in
- http//www.facweb.iitkgp.ernet.in/niloy/
91Thank you!!
Book Volume on Dynamics on and of Complex
Networks To be published by May 2008 from
Birkhauser, Springer http//www.cel.iitkgp.ernet.i
n/eccs07/