Social Network Inspired Models of NLP and Language Evolution - PowerPoint PPT Presentation

1 / 91

About This Presentation

Title:

Social Network Inspired Models of NLP and Language Evolution

Description:

Growth dynamics at time (t 1) Number of nodes of degree (k-1) at t. Number of nodes of degree k at t 1. Number of nodes of degree k at t ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 92

Provided by: facwebIit

Category:

more less

Transcript and Presenter's Notes

Title: Social Network Inspired Models of NLP and Language Evolution

1
Social Network Inspired Models of NLP and
Language Evolution
Monojit Choudhury (Microsoft Research
India)Animesh Mukherjee (IIT Kharagpur) Niloy
Ganguly (IIT Kharagpur)
2
What is a Social Network?

Nodes Social entities (people, organization
etc.)
Edges Interaction/relationship between entities
(Friendship, collaboration, sex)

Courtesy http//blogs.clickz.com
3
Social Network Inspired Computing

Society and nature of human interaction is a
Complex System
Complex Network A generic tool to model complex
systems
There is a growing body of work on CNT Theory
Applied to a variety of fields Social,
Biological, Physical Cognitive sciences,
Engineering Technology
Language is a complex system

4
Objective of this Tutorial

To show that SNIC (Soc. Net. Inspired Comp.) is
an emerging and promising technique
Apply it to model Natural Languages
NLP, Quantitative Linguistics, Language
Evolution, Historical Linguistics, Language
acquisition
Familiarize with tools and techniques in SNIC
Compare it with other standard approaches to NLP

5
Outline of the Tutorial

Part I Background
Introduction 25 min
Network Analysis Techniques 25 min
Network Synthesis Techniques 25 min
Break 320pm 340pm
Part II Case Studies
Self-organization of Sound Systems 20 min
Modeling the Lexicon 20 min
Unsupervised Labeling (Syntax Semantics) 20
min
Conclusion and Discussions 20 min

6
Complex System

Non-trivial properties and patterns emerging from
the interaction of a large number of simple
entities
Self-organization The process through which
these patterns evolve without any external
intervention or central control
Emergent Property or Emergent Behavior The
pattern that emerges due to self-organization

7
Emergence of a networked life
Communities
Atom
Organisms
Molecule
Tissue
Cell
Organs
8
Language a complex system

Language medium for communication through an
arbitrary set of symbols
Constantly evolving
An outcome of self-organization at many levels
Neurons
Speakers and listeners
Phonemes, morphemes, words
80-20 Rule in every level of structure

9
Syntactic Network of Words
color
sky
weight
light
1
20
blue
100
blood
heavy
red
10
Complex Network Theory

Handy toolbox for modeling complex systems
Marriage of Graph theory and Statistics
Complex because
Non-trivial topology
Difficult to specify completely
Usually large (in terms of nodes and edges)
Provides insight into the nature and evolution of
the system being modeled

11
Internet
12
9-11 Terrorist Network Social Network Analysis
is a mathematical methodology for connecting the
dots -- using science to fight terrorism.
Connecting multiple pairs of dots soon reveals an
emergent network of organization.
13
What Questions can be asked

Do these networks display some symmetry?
Are these networks creation of intelligent
objects or they have emerged?
How have these networks emerged
What are the underlying simple rules leading
to their complex formation?

14
Bi-directional Approach

Analysis of the real-world networks
Global topological properties
Community structure
Node-level properties
Synthesis of the network by means of some simple
rules
Small-world models ..
Preferential attachment models

15
Application of CNT in Linguistics - I

Quantitative linguistics
Invariance and typology (Zipfs law, syntactic
dependencies)
Natural Language Processing
Unsupervised methods for text labeling (POS
tagging, NER, WSD, etc.)
Textual similarity (automatic evaluation,
document clustering)
Evolutionary Models (NER, multi-document
summarization)

16
Application of CNT in Linguistics - II

Language Evolution
How did sound systems evolve?
Development of syntax
Language Change
Innovation diffusion over social networks
Language as an evolving network
Language Acquisition
Phonological acquisition
Evolution of the mental lexicon of the child

17
Linguistic Networks
18
Summarizing

SNIC and CNT are emerging techniques for modeling
complex systems at mesoscopic level
Applied to Physics, Biology, Sociology,
Economics, Logistics
Language - an ideal application domain for SNIC
SNIC models in NLP, Quantitative linguistics,
language change, evolution and acquisition

19
Topological Characterization of Networks
20
Types Of Networks and Representation

Representation
Adjacency Matrix
Adjacency List

21
Characterization of Complex N/ws??

They have a non-trivial topological structure
Properties
Heavy tail in the degree distribution
(non-negligible probability mass towards the
tail more than in the case of an exp.
distribution)
High clustering coefficient
Centrality Properties
Social Roles Equivalence
Assortativity
Community Structure
Random Graphs Small avg. path length
Preferential attachment
Small World Properties

22
Degree Distribution (DD)

Let pk be the fraction of vertices in the network
that has a degree k.
The k versus pk plot is defined as the degree
distribution of a network
For most of the real world networks these
distributions are right skewed with a long right
tail showing up values far above the mean pk
varies as k-a
Due to noisy and insufficient data sometimes the
definition is slightly modified
Cumulative degree distribution is plotted
Probability that the degree of a node is greater
than or equal to k

23
A Few Examples
Power law Pk k-a
24
Friend of Friends

Consider the following scenario
Sourish and Ravi are friends
Sourish and Shaunak are friends
Are Shaunak and Ravi friends?
If so then
This property is known as transitivity

25
Measuring Transitivity Clustering Coefficient

The clustering coefficient for a vertex v in a
network is defined as the ratio between the total
number of connections among the neighbors of v
to the total number of possible connections
between the neighbors
High clustering coefficient means my friends know
each other with high probability a typical
property of social networks

26
Mathematically

The clustering coefficient of a vertex i is
The clustering coefficient of the whole network
is the average
Alternatively,

27
Centrality

Centrality measures are commonly described as
indices of 4 Ps -- prestige, prominence,
importance, and power
Degree Count of immediate neighbors
Betweenness Nodes that form a bridge between
two regions of the n/w
Where sst is total number of shortest paths
between s and t and sst (v) is the total number
of shortest paths from s to t via v

28
Eigenvector centrality Bonacich (1972)

It is not just how many people knows me counts to
my popularity (or power) but how many people
knows people who knows me this is recursive!
In context of HIV transmission A person x with
one sex partner is less prone to the disease than
a person y with multiple partners
But imagine what happens if the partner of x has
multiple partners
The basic idea of eigenvector centrality

29
Definition

Eigenvector centrality is defined as the
principal eigenvector of the adjacency matrix
Eigenvector of any symmetric matrix A aij is
any vector e such that
Where ? is a constant and ei is the centrality of
the node i
What does it imply centrality of a node is
proportional to the centrality of the nodes it is
connected to (recursively)
Practical Example Google PageRank

30
Assortativity (homophily)

Rich goes with the rich (selective linking)
A famous actor (e.g., Shah Rukh Khan) would
prefer to pair up with some other famous actor
(e.g., Rani Mukherjee) in a movie rather than a
new comer in the film industry.

Assortative Scale-free network
Disassortative Scale-free network
31
Measures of Assortativity

ANND (Average nearest neighbor degree)
Find the average degree of the neighbors of each
node i with degree k
Find the Pearson correlation (r) between the
degree of i and the average degree of its
neighbors
For further reference see the supplementary
material

32
Community structure

Community structure a group of vertices that
have a high density of edges within them and a
low density of edges in between groups
Example

Friendship n/w of children
Citation n/ws research interest
World Wide Web subject matter of pages
Metabolic networks Functional units
Linguistic n/ws similar linguistic categories

33
Some Examples
Community Structure in Political Books
Community structure in a Social n/w of Students
(American High School)
34
Community Identification Algorithms

Hierarchical
Girvan-Newman
Radicchi et al.
Chinese Whispers
Spectral Bisection
See (Newman 2004) for a comprehensive
survey (you will find the ref. in the
supplementary material)

35
Evolution of NetworksProcesses on Networks
36
The World is Small!

Registration fee for IJCNLP 2008 are being
waived for all participants get it collected
from the registration counter
How long do you think the above information will
take to spread among yourselves
Experiments say it will spread very fast within
6 hops from the initiator it would reach all
This is the famous Milgrams six degrees of
separation

37
The Small World Effect

Even in very large social networks, the average
distance
between nodes is usually quite short.
Milgrams small world experiment
Target individual in Boston
Initial senders in Omaha, Nebraska
Each sender was asked to forward a packet to a
friend who was closer to the target
Friends asked to do the same
Result Average of six degrees of separation.
S. Milgram, The small world problem, Psych.
Today, 2 (1967), pp. 60-67.

38
Measure of Small-Worldness

Low average geodesic path length
High clustering coefficient
Geodesic path Shortest path through the network
from one vertex to another
Mean path length
l 2?ijdij/n(n1) where dij is the geodesic
distance from vertex i to vertex j
Most of the networks observed in real world have
l 6
Film actors 3.48
Company Directors 4.60
Emails 4.95
Internet 3.33
Electronic circuits 4.34

39
Random Graphs Small Average Path Length
Q What do we mean by a random graph? A
Erdos-Renyi random graph model For every pair of
nodes, draw an edge between them with equal
probability p.
Degrees of Separation in a Random Graph

N nodes
z neighbors per node, on average, z ltkgt
D degrees of separation

P(k) e-ltkgt ltkgtk/k!
40
Clustering
C Probability that two of a nodes neighbors
are themselves connected In a random graph
Crand 1/N (if the average degree is held
constant)
41
Watts-Strogatz Small World Model
Watts and Strogatz introduced this simple model
to show how networks can have both short path
lengths and high clustering.
D. J. Watts and S. H. Strogatz, Collective
dynamics of small-world networks, Nature, 393
(1998), pp. 440442.
42
Power Law
43
Degree distributions for various networks

World-Wide Web
Coauthorship networks computer science, high
energy physics, condensed matter physics,
astrophysics
Power grid of the western United States and
Canada
Social network of 43 Mormons in Utah

44
How do Power law DDs arise?
Barabási-Albert Model of Preferential Attachment
(Rich gets Richer)
(1) GROWTH Starting with a small number of
nodes (m0) at every timestep we add a new node
with m (ltm0) edges (connected to the nodes
already present in the system). (2) PREFERENTIAL
ATTACHMENT The probability ? that a new node
will be connected to node i depends on the
connectivity ki of that node
A.-L.Barabási, R. Albert, Science 286, 509 (1999)
45
Growth analysisMarkov chain representation
Probability that the new edge is attached to any
of the vertices of degree k
where total number of edges
46
Growth analysisMarkov chain representation

Growth dynamics at time (t1)

Number of nodes of degree (k-1) at t
Number of nodes of degree k at t
Number of nodes of degree k at t1
47
Growth analysisMarkov chain representation
The net change in npk per vertex added
for k gt m
for k m
In the stationary solution, we find
Which results
48
CASE STUDY I Self-Organization of the Sound
Inventories
49
Human Speech Sounds

Human speech sounds are called phonemes the
smallest unit of a language
Phonemes are characterized by certain distinctive
features like

50
Types of Phonemes
Consonants
Vowels
Diphthongs
L
/t/
/i/
/ai/
/a/
/u/
/p/
/k/
51
Choice of Phonemes

How a language chooses a set of phonemes in order
to build its sound inventory?
Is the process arbitrary?
Certainly Not!
What are the forces affecting this choice?

52
Vowels A (Partially) Solved Mystery

Languages choose vowels based on maximal
perceptual contrast.
For instance if a language has three vowels then
in more than 95 of the cases they are /a/,/i/,
and /u/.

53
Consonants A puzzle

Research From 1929 Date
No single satisfactory explanation of the
organization of the consonant inventories
The set of features that characterize consonants
is much larger than that of vowels
No single force is sufficient to explain this
organization
Rather a complex interplay of forces goes on in
shaping these inventories

54
Principle of Occurrence

PlaNet The Phoneme-Language Network
A bipartite network N(VL,VC,E)
VL Nodes representing languages of the world
VC Nodes representing consonants
E Set of edges which run between VL and VC
There is an edge e ? E between two nodes
vl ? VL and vc ? VC if the consonant c occurs
in the language l.
Data Source UPSID (317 languages)

Choudhury et al. 2006 ACL Mukherjee et al. 2007
Int. Jnl of Modern Physics C
The Structure of PlaNet
55
Degree Distribution of PlaNet
DD of the language nodes follows a ß-distribution
DD of the consonant nodes follows a
power-law with an exponential cut-off
Distribution of Consonants over Languages follow
a power-law
56
Synthesis of PlaNet

Non-linear preferential attachment
Iteratively construct the language inventories
given their inventory sizes

dia e
Pr(Ci)
?x?V (dxa e)
57
Simulation Result
The parameters a and e are 1.44 and 0.5
respectively. The results are averaged over 100
runs
58
Principle of Co-occurrence

Consonants tend to co-occur in groups or
communities
These groups tend to be organized around a few
distinctive features (based on manner of
articulation, place of articulation phonation)
Principle of feature economy

If a language has
in its inventory
then it will also tend to have
59
How to Capture these Co-occurrences?

PhoNet Phoneme Phoneme Network
A weighted network N(VC,E)
VC Nodes representing consonants
E Set of edges which run between the nodes in
VC
There is an edge e ? E between two nodes vc1 ,vc2
? VC if the consonant c1 and c2 co-occur in a
language. The number of languages in which c1 and
c2 co-occurs defines the edge-weight of e. The
number of languages in which c1 occurs defines
the node-weight of vc1.

60
Construction of PhoNet

Data Source UPSID
Number of nodes in VC is 541
Number of edges is 34012

PhoNet
61
Community Formation
Radicchi et al Algorithm
S
For different values of ? we get different sets
of communities
62
Consonant Societies!
?0.35
?0.60
?0.72
?1.25
The fact that the communities are good can
quantitatively shown by measuring the feature
entropy
63
Problems to ponder on

Physical significance of PA
Functional forces
Historical/Evolutionary process
Labeled synthesis of PlaNet and PhoNet
Language diversity vs. Preferential attachment

64
CASE STUDY II Modeling the Mental Lexicon
65
Metal Lexicon (ML) Basics

It refers to the repository of the word forms
that resides in the human brain
Two Questions
How words are stored in the long term memory,
i.e., the organization of the ML.
How are words retrieved from the ML (lexical
access)
The above questions are highly inter-related
to predict the organization one can investigate
how words are retrieved and vice versa.

66
Ways of Organization of Mental Lexicon

Un-organized (a bag full of words) or,
Organized
By sound (phonological similarity)
E.g., start the same banana, bear, bean
End the same look, took, book
Number of phonological segments they share
By Meaning (semantic similarity)
Banana, apple, pear, orange
By age at which the word is acquired
By frequency of usage
By POS
Orthographically

67
Some Unsolved Mysteries You can Give it a Try
?

What can be a model for the evolution of the ML?
How is the ML acquired by a child learner?
Is there a single optimal structure for the ML
or is it organized based on multiple criteria
(i.e., a combination of the different n/ws)
Towards a single framework for studying ML!!!

68
CASE STUDY III SyntaxUnsupervised POS Tagging
69
Labeling of Text

Lexical Category (POS tags)
Syntactic Category (Phrases, chunks)
Semantic Role (Agent, theme, )
Sense
Domain dependent labeling (genes, proteins, )
How to define the set of labels?
How to (learn to) predict them automatically?

70
Nothing makes sense, unless in context

Distribution-based definition of
Lexical category
Sense (meaning)
The X is
If you X then I shall
looking at the star PP

71
General Approach

Represent the context of a word (token)
Define some notion of similarity between the
contexts
Cluster the contexts of the tokens
Get the label of the tokens

w1 w2 w3 w4

w1
w3
w2
w4
72
Issues

How to define the context?
How to define similarity
How to Cluster?
How to evaluate?

73
Syntactic Network of Words
color
sky
weight
light
1
20
blue
100
blood
heavy
1 1 cos(red, blue)
red
74
The Chinese Whisper Algorithm
color
sky
weight
0.9
0.8
light
-0.5
0.7
blue
0.9
blood
heavy
0.5
red
75
The Chinese Whisper Algorithm
color
sky
weight
0.9
0.8
light
-0.5
0.7
blue
0.9
blood
heavy
0.5
red
76
The Chinese Whisper Algorithm
color
sky
weight
0.9
0.8
light
-0.5
0.7
blue
0.9
blood
heavy
0.5
red
77
Word Sense Disambiguation

Véronis, J. 2004. HyperLex lexical cartography
for information retrieval. Computer Speech
Language 18(3)223-252.
Let the word to be disambiguated be light
Select a subcorpus of paragraphs which have at
least one occurrence of light
Construct the word co-occurrence graph

78
HyperLex

A beam of white light is dispersed into its
component colors by its passage through a prism.
Energy efficient light fixtures including
solar lights, night lights, energy star lighting,
ceiling lighting, wall lighting, lamps
What enables us to see the light and
experience such wonderful shades of colors during
the course of our everyday lives?

prism
beam
dispersed
white
colors
shades
energy
fixtures
efficient
lamps
79
Hub Detection and MST
prism
beam
light
dispersed
white
colors
lamps
colors
shades
beam
prism
fixtures
energy
shades
energy
efficient
dispersed
white
fixtures
efficient
lamps
White fluorescent lights consume less energy than
incandescent lamps
80
Other Related Works

Solan, Z., Horn, D., Ruppin, E. and Edelman, S.
2005. Unsupervised learning of natural languages.
PNAS, 102 (33) 11629-11634
Ferrer i Cancho, R. 2007. Why do syntactic links
not cross? Europhysics Letters
Also applied to IR, Summarization, sentiment
detection and categorization, script evaluation,
author detection,

81
Discussions Conclusions

What we learnt
Advantages of SNIC in NLP
Comparison to standard techniques
Open problems
Concluding remarks and QA

82
What we learnt

What is SNIC and Complex Networks
Analytical tools for SNIC
Applications to human languages
Three Case-studies

83
Insights

Language features complex structure at every
level of organization
Linguistic networks have non-trivial properties
scale-free small-world
Therefore, Language and Engineering systems
involving language should be studied within the
framework of complex systems, esp. CNT

84
Advantages of SNIC

Fully Unsupervised techniques
No labeled data required A good solution to
resources scarcity
Problem of evaluation circumvented by
semi-supervised techniques
Ease of computation
Simple and scalable
Distributed and parallel computable
Holistic treatment
Language evolution psycho-linguistic theories

85
Comparison to Standard Techniques

Rule-based vs. Statistical NLP
Graphical Models
Generative models in machine learning
HMM, CRF, Bayesian belief networks

JJ
NN
RB
VF
86
Graphical Models vs. SNIC

GRAPHICAL MODEL

COMPLEX NETWORK

Principled based on Bayesian Theory
Structure is assumed and parameters are learnt
Focus Decoding parameter estimation
Data-driven or computationally intensive
The generative process is easy to visualize, but
no visualization of the data

Heuristic, but underlying principles of linear
algebra
Structure is discovered and studied
Focus Topology and evolutionary dynamics
Unsupervised and computationally easy
Easy visualization of the data

87
Language Modeling

A network of words as a model of language vs.
n-gram models
Hierarchical, hyper-graph based models
Smoothing through holistic analysis of the
network topology
Jedynak, B. and Karakos, D. 2007. Unigram
Language Models using Diffusion Smoothing over
Graphs. Proc. of TextGraphs - 2

88
Open Problems

Universals and variables of linguistic networks
Superimposition of networks phonetic, syntactic,
semantic
Which clustering algorithm for which topology?
Metrics for network comparison important for
language modeling
Unsupervised dependency parsing using networks
Mining translation equivalents

89
Resources

Conferences
TextGraphs, Sunbelt, EvoLang, ECCS
Journals
PRE, Physica A, IJMPC, EPL, PRL, PNAS, QL, ACS,
Complexity, Social Networks
Tools
Pajek, CUNG, http//www.insna.org/INSNA/soft_inf.
html
Online Resources
Bibliographies, courses on CNT

90
Contact

Monojit Choudhury
monojitc_at_microsoft.com
http//www.cel.iitkgp.ernet.in/monojit/
Animesh Mukherjee
animeshm_at_cse.iitkgp.ernet.in
http//www.cel.iitkgp.ernet.in/animesh/
Niloy Ganguly
niloy_at_cse.iitkgp.erent.in
http//www.facweb.iitkgp.ernet.in/niloy/

91
Thank you!!
Book Volume on Dynamics on and of Complex
Networks To be published by May 2008 from
Birkhauser, Springer http//www.cel.iitkgp.ernet.i
n/eccs07/

Write a Comment

User Comments (0)