Examining Higher Order Transformations for Scalefree Small World Graphs - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Examining Higher Order Transformations for Scalefree Small World Graphs

Description:

ringing. phone, bells, phones, hook, bell, endorsement, distinctive, ears, alarm, telephone ... ringing. rung, Centrex, rang, phone, sounded, bell, ring, ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 25
Provided by: uwequa
Category:

less

Transcript and Presenter's Notes

Title: Examining Higher Order Transformations for Scalefree Small World Graphs


1
Examining Higher Order Transformations for
Scale-free Small World Graphs
  • Uwe Quasthoff, Chris Biemann
  • Universität LeipzigInstitut für
    Informatikquasthoff,biemann_at_informatik.uni-leip
    zig.de

2
Background Word co-occurrences
  • Given al lot of sentences in one language
    (typically, millions of sentences), we ask
  • Which words appear significantly often together
    within a sentence? (Examples Dresden Semper,
    dog cat)
  • Which words appear significantly as next
    neighbors? (Examples Semper - Opera, hot dog)
  • Significance is measured using log-likelihood.
  • Size of German corpus
  • Sentences 50 M
  • Words 11 M lt Nodes for both graphs
  • Sentence co-occurrences 180 M lt Edges for
    sentence co-occurrence graph
  • NN co-occurrences 34 M lt Edges for NN
    co-occurrence graph

3
Sample word co-occurrences Space
  • Significant co-occurrences of space within a
    sentence
  • disk (2629), shuttle (2618), square (1163),
    station (991), NASA (920), feet (822), memory
    (718), address (653), Space (602), leased (567),
    launch (505), storage (479), astronauts (473),
    Challenger (420), represented (412), manned
    (406), lessor (390), mission (385), office (382),
    Discovery (341), hard (336), Mir (335), rocket
    (329), orbit (326), program (308), RAM (307),
    free (300), NASA's (297), flight (293), Atlantis
    (291), cosmonauts (275), files (261), Earth
    (239), satellite (238), amount (230), into (226),
    requires (223)
  • Significant left neighbors of space
  • disk (4073), address (1157), office (953),
    storage (685), desk (323), manned (306), outer
    (305), free (293), shelf (257), floor (230),
    memory (229), into (219), hard-disk (208), phase
    (198), breathing (194), presentation (179), white
    (149), Mir (145), open (140), Soviet (130),
    parking (122), retail (122), tuple (120),
    industrial (109), air (105), warehouse (104),
    extra (103), empty (102), save (96), less (82),
    NASA's (79), parameter (78), blank (77), moduli
    (77), much (74), orbiting (70), crawl (69),
    enough (63), Hilbert (62), more (60), swap (59)
  • significant right neighbors of space
  • shuttle (2967), station (1516), agency
    (385), program (312), heating (258), for (229),
    exploration (217), between (183), flight (181),
    at (179), shuttles (167), probe (145), bar (139),
    is (125), available (108), telescope (105), on
    (98), center (96), missions (79), requirements
    (76), charge (74), agency's (73), probes (63),
    shuttle's (57), heaters (48), mission (47),
    required (45), limitations (44), science (39),
    than (37), walk (37), capsule (35), travel (33),
    constraints (32), allocation (31), heater (30),
    endurance (27), character (26)

4
The co-occurrence graph for space
  • Local sentence co-occurrence graph Distance 1
    from space
  • Different meanings are clearly visible.

5
http//corpora.informatik.uni-leipzig.de/
6
http//corpora.informatik.uni-leipzig.de/
7
Pruning of our graphs
  • Due to construction, all edges are weighted by
    significance.
  • We apply the following additional pruning
  • Remove all edges
  • For each node, re-insert the N (here, N3 or
    N10) strongest edges (if not yet inserted)
  • Note that the degree of a node is not bounded by
    this pruning.

8
Statistical properties
9
Comparison with random graphs
  • The following random graph models are usually
    associated with natural language
  • Barabási - Albert (1999)
  • In the BA-model, a graph is constructed by
    preferential attachment a new vertex connects to
    existing vertices with a probability according to
    their degree.
  • Dorogovtsev - Mendes (2001)
  • A new vertex is connected to the preferential
    existing vertex, but additionally edges in the
    set of existing vertices are introduced with a
    probability according to the product of their
    degrees

10
Degree distribution for BA and DM (left) and
sentence-based word co-occurrences (right)
11
Searching for similar words
  • Word co-occurrences represent all kind of
    semantic relations.
  • Sometimes, we find similar words as strongest
    sentence co-occurrences, but mostly not.
  • Good example Significant co-occurrences of zinc
  • copper (369), lead (323), cadmium (212),
    nickel (145), iron (94), metals (89), silver
    (79), manganese (76), tonnes (71), oxide (67),
    Dollars (55), chromium (54), Cominco (53), mine
    (52), chloride (48), ...
  • Typical example Significant co-occurrences of
    refrigerator
  • kitchen (41), freezer (38), magnets (24),
    cold (24), magnet (23), helium (23), stove (19),
    heat (19), compressors (18), door (18), oven
    (17), her (15), cooling (14), Store (14),
    microwave (14), stored (14), food (14), water
    (13), ice (13), ...

12
Co-occurrences of higher order
  • Idea
  • If the process of the calculation of significant
    co-occurrences gives us some similar words, the
    iteration of the process will give us more
    similar words.
  • But How to iterate?
  • Answer Co-occurrence sets produced in the first
    steps replace sentences.
  • Sample sentences are
  • copper lead cadmium nickel iron metals silver
    manganese tonnes oxide Dollars chromium Cominco
    mine chloride ...
  • kitchen freezer magnets cold magnet helium stove
    heat compressors door oven her cooling Store
    microwave stored food water ice ...

13
The usual co-occurrences for Auto Using sentences
fahren (1396), Wagen (979), prallte (914), Fahrer
(809), seinem (723), fuhr (709), fährt (638),
Polizei (609), erfaßt(587), gefahren (485)
14
Co-occurrences of second order for Auto Using
sentence co-occurrences
Wagen (114), Fahrzeug (54), Fahrer (41), Fahrbahn
(35), prallte (35), Polizei (28), verletzt (27),
Schleudern (24), fuhr (24), Richtung (21),
15
Co-occurrences of second order for Auto Using
NN-co-occurrences
Wagen (35), Lastwagen (14), Fahrzeug (13), Autos
(9), Personenwagen (9), Bus (8), Zug (7), Haus
(5),Lkw (5), Pkw (5)
16
First Iteration Step
  • The two black nodes A and B get connected in the
    step if there are many nodes C which are
    connected to both A and B
  • The more Cs, the higher the weight of the new edge

existing connection
new connection
17
Second Iteration Step
  • The two black nodes A and B get connected in the
    step if there are many (dark gray) nodes Ds which
    are connected to both A and B.
  • The connections between the nodes Ds and the
    nodes A and B were constructed because of (light
    gray) nodes Es and Fs, respectively

Es
Ds
Fs
former connection
existing connection
new connection
B
A
18
Collapsing bridging nodes
  • Upper bound for path length in iteration n is 2n.
  • However, some of the bridging nodes collapse,
    giving rise to self-keeping clusters of arbitrary
    path length, which are invariant under iteration.

Upper 5 nodes invariant cluster A, B are being
absorbed by this cluster
19
Examples of Iterated Co-occurrences
20
Where are fixed points?
  • As expected, the dynamics often yield to fixed
    points, i.e. sets of nodes invariant under
    iteration. Usually, there are several strong
    attracting fixed points. One can also observe
    attracting cycles.
  • In the case of words, the words in the fixed
    point are often not semantically related to the
    starting point, hence the first steps seem more
    interesting.
  • Stronger thresholds lead to fewer fixed points
    and cycles, the empty set may be the only
    attractor.

21
Generalization for arbitrary Networks The
co-occurrence mapping
Given a large set of nodes (here words) and some
connected subgraphs as shown in fat. The upper
graph represents a sentence, the lower a
collocation set corresponding to the central
element. The graphs are completed representing
collocation edges. Both nodes and edges are
weighted. The collocation mapping removes most of
the edges because they are considered as
noise. Result is another graph which can be used
to iterate the process.
22
Orders 2 and 3 for the random graph models and
for word cooccurrences
23
Conclusions
  • Natural language word co-occurrence networks
    differ from networks created by BA and DM models.
  • The difference is due to longer range
    dependencies in language given by syntax and
    semantics.
  • The higher order transformation discussed here
  • shows interesting dynamics and
  • maps similar nodes into next neighbors.
  • Future work is necessary to
  • investigate hyperbolic fixed points
  • understand the dynamics.

24
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com