Title: Universal Information Graphs via Hierarchical Graph Maps and Graph Fusion
1 Universal Information Graphs
via Hierarchical Graph Maps
and Graph Fusion
-
- James Abello, DyDAn Rutgers University
- Portions of this work have been done jointly
with - J. Crobak (Rutgers) , R. Dementiev(U.
Karslhube) , I. Khan(Rutgers), H. Schulz (U.
Rostock) - Partially supported by LLNL (in
consultation with Scott Kohn )
2Universal Information Graphs Encode
multiple relations among a set of entities.
Usually, each entity has associated with it a set
of possibly labeled attributes. Entities
people, organizations, places, events,
documents, telecommunication activities,
computer addresses, web page descriptors,
images, videos, parts of speech, etc.
Entities are associated when they co-occur in
a logical unit of interest. Associated entity
pairs get tagged by a vector of labels and by a
weight vector that measures the strength of the
associations. Patterns correspond to special
subsets of entities and their inter
relations.
3- Overall Goal
- To efficiently uncover and classify patterns
that can be used as triggers to take preventive
actions against potential society threats - How ?
- 1. Design similarity measures among data
entities via a semantic dot product. This
amounts to quantification via some weighting
mechanism of the set of attributes shared by a
set of entities. - The main question is how to learn these
weights and how to determine levels of
agreement that correspond to cluster pattern
formation in the data. - Instead of some form of relational data base
- Use vertex and edge weighted labeled
multi-graphs. - 2. Since scale and interactivity are essential
we opt for - Hierarchical Graph Map methods that are
- a. I/O efficient and
- b. parameterized by the amount of RAM and
real state screen available to an analyst
posing queries (semi-external algorithms).
4 What have we done? A Client- Server
SystemThe server uses our C library hgv. It
creates hierarchical maps of semi-external graphs
(up to billion edges on an 8Gb RAM commodity
machine). It provides parameterized abstractions
of the data.The client is used to interactively
navigate server graph answers. It builds up on
our previous work on graph visualization systems
( GraphView ).
5Major Computations Structural Groupings
- Connected components
- Peripheral Trees
- Biconnected components
- Clusters based on topology and labeling
information - Graph Abstractions ( via Sparse Cuts )
6Server Side Hierarchy Creation
- INPUT Weighted simple undirected input graph G
(V,E) - E possibly larger than will fit in RAM.
- Apply external memory weighted contraction
algorithm (MST or Matching)1. - Find antichain/slice/cut in the hierarchy that
just fits in RAM by iteratively contracting edges.
Binary hierarchy on semi-external graph
Binary hierarchy on internal graph
1 J. Abello, Hierarchical Graph Maps, Computer
Graphics, 2004
7Visualization Client Interactivity
- Process each chunk of IH edges using the
hierarchical map obtained via our topological
grouping/clustering algorithms. - Starting from the higher level groups/clusters,
layout each subgraph on demand. - If we hit the leaf nodes of the current chunk,
load the next chunk into memory, process it in
the same way and hook it to the hierarchy.
8Visualization Samples Wordnet, Sample of General
Queries, Terrorist Incidents, From an SQL query
to its graph (via PubMed Descriptors).
9Global Terrorist Data Base Picks
- Terrorist Incidents in Country_Name
- perpetrated by Group_Name
- Sample 1 Terrorist Incidents in Cyprus
perpetrated by Lebanese. - Comment The answer set is very focused
- Terrorist Incidents in ltCountrygt
- Sample 2 Terrorist Incidents in Bolivia.
Comment The answer set is split into several
connected pieces each with a distinctive
characteristic
10Global Terrorist Data Base Picks (cont)
- Sample 2
- Terrorist Incidents in Bolivia (continued)
- Comment Notice the connection with the
pro-palestinian group.
11Global Terrorist Data Base Picks (cont)
- Terrorist Incidents with ltAttack_Typegt and
ltFatalitiesgt -
- Terrorist Incidents with Kidnapping
- and Fatalities.
- Comment A more varied answer set.
- Notice the several connected pieces with
distinctive characteristics.
12Global Terrorist Data Base Picks (cont)
-
- Terrorist
- Incidents with
- Kidnapping
- and Fatalities.
- Comment A more varied answer set.
- Notice the several
- connected pieces
- with distinctive
- characteristics.
13Global Terrorist Data Base Picks (cont)
-
- Terrorist
- Incidents with
- Kidnapping
- and Fatalities.
- Comment A more varied answer set.
- Notice the several
- connected pieces
- with distinctive
- characteristics.
14Global Terrorist Data Base Picks (cont)
-
- Terrorist
- Incidents with
- Kidnapping
- and Fatalities.
- Comment A more varied answer set.
- Notice the several
- connected pieces
- with distinctive
- characteristics.
15Kidnappings with fatalities (cont)
16Graph Fusion (supported by LLNL)
- Problem Statement Given a collection of entities
each with an associated set of attributes the
corresponding Universal Graph pre-supposes that
we have the ability to efficiently compute a
notion of semantic similarity between every pair
of entities (a quadratic computation). Graph
Fusion is the reverse process, i.e. how much of
the Universal Graph can we efficiently recover
from its collection of projections into selected
subsets of attributes? - Typical scenario an entity may have identities
in email, call detail, web pages, blogs, instant
messaging, chat rooms, etc. As such it has its
own graph neigborhoods on each of these networks.
The question is how to approximate from these
local neighborhoods the neighborhood of this
persona in the Universal Graph. - Our current approach deals with the restricted
case in which there is a known Taxonomy for the
set of entity attributes of the Universal graph. - We have developed some graph fusion mechanisms
for this case and have applied them to the PubMed
data base (Research supported by LLNL). These
methods are based on Spanning Trees and
Matchings.
17PubMed Data Base SamplesKeyPhrases eyes OR
vitamin
18PubMed Data Base SamplesKeyPhrases eyes OR
vitamin
19Graph Fusion via a DAG Taxonomy.
- Fuse Multirelational Data Stats on the vertexes
and edges of a DAG Taxonomy - For a vertex v in the DAG and an entity type
EType in the Collection - Let EType(v, Collection) entities of Etype
with label(v) - Example In PubMed, Number of papers with a
given Mesh - Similarly Number of Authors with a given
Mesh, etc - For an edge (u,v) in the DAG and an entity
type EType in the Collection - Let Etype((u,v), Collection) SimilarityOf(
Etype entities with labels label(u) and label(v)
) - Example In PubMed DAG,
- Paperweight(u,v) JaccardCoefficient(Papers
with label(u) or label(v)) - Authorweight(u,v) JaccardCoefficient(Authors
with label(u) or label(v)) - Example The DAG Taxonomy on Meshes from PubMed
(Demo samples).
20How good is a partition(i.e. a MAC)? Use
Information Loss
- General View Each point has certain probability
of being a member of one of the sets in the
partition. If a random variable X takes values on
Range(X) is natural then to partition Range(X) in
such a manner that a random variable P over the
partition representatives becomes a reasonable
quantization of X. - The quality of P can be measured as I(X) I(P)
/Information Loss / - The expectation of X is the unique optimal vector
s that achieves the minimal expected distortion
Evdivergence(X,s). This suggests to define a
divergence based measure of information I(X) as
Evdivergence(X, EvX) - Recent advances characterize the class of
functions that admits kmeans iterative relocation
schemes where a corresponding distortion based
global objective function is progressively
decreased. These include Square Loss, KL
Divergence, Logistic Loss, Mahalanobis Distance
and the Itakura-Saito distances used in signal
processing.
21What About Multiple Hierarchies(Taxonomies) ?
- For two hierarchies the approach can be
generalized via restrictewd cross products. - As an example video demo of Matrix Zoom. One
hierarchy is the geography and the other is
application dependent. - Go to Video of Matrix Zoom.
22Some interactivity Tricks
- Since views get cluttered some efforts have been
made to alleviate this. - Video demo of Fish EyeViews and Edge Lenses.
23Conclusions
- Advantages
- With parameter tuning and adequate Ram
resources these type of systems remain
interactive even when dealing with very large
graphs via graph maps derived from hierarchy
trees. Currently, extending it to graph maps
derived from hierarchy DAGs. - The approach is applicable to almost any type of
graph, regardless of density. It turns out that
large degree which is usually considered just a
nuisance now becomes a delicate issue. - Parameterized interactivity allows it to run on
less powerful systems - Structural grouping/clustering algorithms
performs well (removal of subtrees helps a lot) - Needed Improvements/Extensions
- Extend the system to directed multi-graphs
(currently it handles undirected graphs). - Add Data Streaming Capabilities
24Conclusions (cont)
- Needed Improvements/Extensions
- Extend the system to directed multi-graphs
(currently it handles undirected graphs). - Add Data Streaming Capabilities
- Improved ways to effectively summarize contents
of a group/cluster. Currently, we use a
hierarchical frequency based group labeling
algorithm. - Need ways to better guide users to potentially
interesting - pieces of the data.
25Questions?
- Contact Info
- James Abello abello_at_dimacs.rutgers.edu
- Related Publications
- Semi-External Induced Subgraphs, J. Abello and
R. Dementiev, in preparation. - HGV A C Library to compute Hierarchical Graph
Views, J. Abello and J. Crobak, in preparation. - Name That Cluster, J. Abello, H. Schulz,, B.
Gaudin, C. Tominski, in Infovis 2006, IEEE,
Sacramento, CA. - CVG Coordinate Graph Visualizations, J.
Abello, C. Tominski, H. Schumann, in Infovis
2006, IEEE, Sacramento, CA.
26- attackid attacklabel shortlabel
- 3 Assassination ass
- 4 Bombing bom
- 5 Facility Attack fac
- 6 Hijacking hij
- 7 Kidnapping kid
- 8 Maiming mai
- 9 Assault alt
- 10 Mass Disruption md
- 11 Arson ars
-
-
27TargetType Table
- targetid targetlabel shortlabel
- 0 - -
- 1 US Diplomat udi
- 2 US Police/Military upm
- 3 US Other uot
- 4 US Unknown uun
- 5 US Government ugo
- 6 US Political Parties upp
- 7 US Media ume
- 8 US Business ubu
- 9 US Transportation utr
- 10 US Utilities uut
- 11 Foreign Business fbu
- 12 Domestic Business dbu
- 13 Transportation tra
- 14 Utilities uti
- 15 Media med
- 16 Diplomat dip
- 17 Government gov