Examining Higher Order Transformations for Scalefree Small World Graphs - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Examining Higher Order Transformations for Scalefree Small World Graphs

Description:

ringing. phone, bells, phones, hook, bell, endorsement, distinctive, ears, alarm, telephone ... ringing. rung, Centrex, rang, phone, sounded, bell, ring, ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 25

Provided by: uwequa

Category:

more less

Transcript and Presenter's Notes

Title: Examining Higher Order Transformations for Scalefree Small World Graphs

1
Examining Higher Order Transformations for
Scale-free Small World Graphs

Uwe Quasthoff, Chris Biemann
Universität LeipzigInstitut für
Informatikquasthoff,biemann_at_informatik.uni-leip
zig.de

2
Background Word co-occurrences

Given al lot of sentences in one language
(typically, millions of sentences), we ask
Which words appear significantly often together
within a sentence? (Examples Dresden Semper,
dog cat)
Which words appear significantly as next
neighbors? (Examples Semper - Opera, hot dog)
Significance is measured using log-likelihood.
Size of German corpus
Sentences 50 M
Words 11 M lt Nodes for both graphs
Sentence co-occurrences 180 M lt Edges for
sentence co-occurrence graph
NN co-occurrences 34 M lt Edges for NN
co-occurrence graph

3
Sample word co-occurrences Space

Significant co-occurrences of space within a
sentence
disk (2629), shuttle (2618), square (1163),
station (991), NASA (920), feet (822), memory
(718), address (653), Space (602), leased (567),
launch (505), storage (479), astronauts (473),
Challenger (420), represented (412), manned
(406), lessor (390), mission (385), office (382),
Discovery (341), hard (336), Mir (335), rocket
(329), orbit (326), program (308), RAM (307),
free (300), NASA's (297), flight (293), Atlantis
(291), cosmonauts (275), files (261), Earth
(239), satellite (238), amount (230), into (226),
requires (223)
Significant left neighbors of space
disk (4073), address (1157), office (953),
storage (685), desk (323), manned (306), outer
(305), free (293), shelf (257), floor (230),
memory (229), into (219), hard-disk (208), phase
(198), breathing (194), presentation (179), white
(149), Mir (145), open (140), Soviet (130),
parking (122), retail (122), tuple (120),
industrial (109), air (105), warehouse (104),
extra (103), empty (102), save (96), less (82),
NASA's (79), parameter (78), blank (77), moduli
(77), much (74), orbiting (70), crawl (69),
enough (63), Hilbert (62), more (60), swap (59)
significant right neighbors of space
shuttle (2967), station (1516), agency
(385), program (312), heating (258), for (229),
exploration (217), between (183), flight (181),
at (179), shuttles (167), probe (145), bar (139),
is (125), available (108), telescope (105), on
(98), center (96), missions (79), requirements
(76), charge (74), agency's (73), probes (63),
shuttle's (57), heaters (48), mission (47),
required (45), limitations (44), science (39),
than (37), walk (37), capsule (35), travel (33),
constraints (32), allocation (31), heater (30),
endurance (27), character (26)

4
The co-occurrence graph for space

Local sentence co-occurrence graph Distance 1
from space
Different meanings are clearly visible.

5
http//corpora.informatik.uni-leipzig.de/
6
http//corpora.informatik.uni-leipzig.de/
7
Pruning of our graphs

Due to construction, all edges are weighted by
significance.
We apply the following additional pruning
Remove all edges
For each node, re-insert the N (here, N3 or
N10) strongest edges (if not yet inserted)
Note that the degree of a node is not bounded by
this pruning.

8
Statistical properties
9
Comparison with random graphs

The following random graph models are usually
associated with natural language
Barabási - Albert (1999)
In the BA-model, a graph is constructed by
preferential attachment a new vertex connects to
existing vertices with a probability according to
their degree.
Dorogovtsev - Mendes (2001)
A new vertex is connected to the preferential
existing vertex, but additionally edges in the
set of existing vertices are introduced with a
probability according to the product of their
degrees

10
Degree distribution for BA and DM (left) and
sentence-based word co-occurrences (right)
11
Searching for similar words

Word co-occurrences represent all kind of
semantic relations.
Sometimes, we find similar words as strongest
sentence co-occurrences, but mostly not.
Good example Significant co-occurrences of zinc
copper (369), lead (323), cadmium (212),
nickel (145), iron (94), metals (89), silver
(79), manganese (76), tonnes (71), oxide (67),
Dollars (55), chromium (54), Cominco (53), mine
(52), chloride (48), ...
Typical example Significant co-occurrences of
refrigerator
kitchen (41), freezer (38), magnets (24),
cold (24), magnet (23), helium (23), stove (19),
heat (19), compressors (18), door (18), oven
(17), her (15), cooling (14), Store (14),
microwave (14), stored (14), food (14), water
(13), ice (13), ...

12
Co-occurrences of higher order

Idea
If the process of the calculation of significant
co-occurrences gives us some similar words, the
iteration of the process will give us more
similar words.
But How to iterate?
Answer Co-occurrence sets produced in the first
steps replace sentences.
Sample sentences are
copper lead cadmium nickel iron metals silver
manganese tonnes oxide Dollars chromium Cominco
mine chloride ...
kitchen freezer magnets cold magnet helium stove
heat compressors door oven her cooling Store
microwave stored food water ice ...

13
The usual co-occurrences for Auto Using sentences
fahren (1396), Wagen (979), prallte (914), Fahrer
(809), seinem (723), fuhr (709), fährt (638),
Polizei (609), erfaßt(587), gefahren (485)
14
Co-occurrences of second order for Auto Using
sentence co-occurrences
Wagen (114), Fahrzeug (54), Fahrer (41), Fahrbahn
(35), prallte (35), Polizei (28), verletzt (27),
Schleudern (24), fuhr (24), Richtung (21),
15
Co-occurrences of second order for Auto Using
NN-co-occurrences
Wagen (35), Lastwagen (14), Fahrzeug (13), Autos
(9), Personenwagen (9), Bus (8), Zug (7), Haus
(5),Lkw (5), Pkw (5)
16
First Iteration Step

The two black nodes A and B get connected in the
step if there are many nodes C which are
connected to both A and B
The more Cs, the higher the weight of the new edge

existing connection
new connection
17
Second Iteration Step

The two black nodes A and B get connected in the
step if there are many (dark gray) nodes Ds which
are connected to both A and B.
The connections between the nodes Ds and the
nodes A and B were constructed because of (light
gray) nodes Es and Fs, respectively

Es
Ds
Fs
former connection
existing connection
new connection
B
A
18
Collapsing bridging nodes

Upper bound for path length in iteration n is 2n.
However, some of the bridging nodes collapse,
giving rise to self-keeping clusters of arbitrary
path length, which are invariant under iteration.

Upper 5 nodes invariant cluster A, B are being
absorbed by this cluster
19
Examples of Iterated Co-occurrences
20
Where are fixed points?

As expected, the dynamics often yield to fixed
points, i.e. sets of nodes invariant under
iteration. Usually, there are several strong
attracting fixed points. One can also observe
attracting cycles.
In the case of words, the words in the fixed
point are often not semantically related to the
starting point, hence the first steps seem more
interesting.
Stronger thresholds lead to fewer fixed points
and cycles, the empty set may be the only
attractor.

21
Generalization for arbitrary Networks The
co-occurrence mapping
Given a large set of nodes (here words) and some
connected subgraphs as shown in fat. The upper
graph represents a sentence, the lower a
collocation set corresponding to the central
element. The graphs are completed representing
collocation edges. Both nodes and edges are
weighted. The collocation mapping removes most of
the edges because they are considered as
noise. Result is another graph which can be used
to iterate the process.
22
Orders 2 and 3 for the random graph models and
for word cooccurrences
23
Conclusions

Natural language word co-occurrence networks
differ from networks created by BA and DM models.
The difference is due to longer range
dependencies in language given by syntax and
semantics.
The higher order transformation discussed here
shows interesting dynamics and
maps similar nodes into next neighbors.
Future work is necessary to
investigate hyperbolic fixed points
understand the dynamics.

24
Thank you!

Write a Comment

User Comments (0)