Universal Information Graphs via Hierarchical Graph Maps and Graph Fusion - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Universal Information Graphs via Hierarchical Graph Maps and Graph Fusion

Description:

James Abello, DyDAn Rutgers University. Portions of this work have been done ... J. Abello, C. Tominski, H. Schumann, in Infovis 2006, IEEE, Sacramento, CA. ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 20

Provided by: meli144

Category:

more less

Transcript and Presenter's Notes

Title: Universal Information Graphs via Hierarchical Graph Maps and Graph Fusion

1
Universal Information Graphs
via Hierarchical Graph Maps
and Graph Fusion

James Abello, DyDAn Rutgers University
Portions of this work have been done jointly
with
J. Crobak (Rutgers) , R. Dementiev(U.
Karslhube) , I. Khan(Rutgers), H. Schulz (U.
Rostock)
Partially supported by LLNL (in
consultation with Scott Kohn )

2
Universal Information Graphs Encode
multiple relations among a set of entities.
Usually, each entity has associated with it a set
of possibly labeled attributes. Entities
people, organizations, places, events,
documents, telecommunication activities,
computer addresses, web page descriptors,
images, videos, parts of speech, etc.
Entities are associated when they co-occur in
a logical unit of interest. Associated entity
pairs get tagged by a vector of labels and by a
weight vector that measures the strength of the
associations. Patterns correspond to special
subsets of entities and their inter
relations.
3

Overall Goal
To efficiently uncover and classify patterns
that can be used as triggers to take preventive
actions against potential society threats
How ?
1. Design similarity measures among data
entities via a semantic dot product. This
amounts to quantification via some weighting
mechanism of the set of attributes shared by a
set of entities.
The main question is how to learn these
weights and how to determine levels of
agreement that correspond to cluster pattern
formation in the data.
Instead of some form of relational data base
Use vertex and edge weighted labeled
multi-graphs.
2. Since scale and interactivity are essential
we opt for
Hierarchical Graph Map methods that are
a. I/O efficient and
b. parameterized by the amount of RAM and
real state screen available to an analyst
posing queries (semi-external algorithms).

4
What have we done? A Client- Server
SystemThe server uses our C library hgv. It
creates hierarchical maps of semi-external graphs
(up to billion edges on an 8Gb RAM commodity
machine). It provides parameterized abstractions
of the data.The client is used to interactively
navigate server graph answers. It builds up on
our previous work on graph visualization systems
( GraphView ).
5
Major Computations Structural Groupings

Connected components
Peripheral Trees
Biconnected components
Clusters based on topology and labeling
information
Graph Abstractions ( via Sparse Cuts )

6
Server Side Hierarchy Creation

INPUT Weighted simple undirected input graph G
(V,E)
E possibly larger than will fit in RAM.
Apply external memory weighted contraction
algorithm (MST or Matching)1.
Find antichain/slice/cut in the hierarchy that
just fits in RAM by iteratively contracting edges.

Binary hierarchy on semi-external graph
Binary hierarchy on internal graph
1 J. Abello, Hierarchical Graph Maps, Computer
Graphics, 2004
7
Visualization Client Interactivity

Process each chunk of IH edges using the
hierarchical map obtained via our topological
grouping/clustering algorithms.
Starting from the higher level groups/clusters,
layout each subgraph on demand.
If we hit the leaf nodes of the current chunk,
load the next chunk into memory, process it in
the same way and hook it to the hierarchy.

8
Visualization Samples Wordnet, Sample of General
Queries, Terrorist Incidents, From an SQL query
to its graph (via PubMed Descriptors).
9
Global Terrorist Data Base Picks

Terrorist Incidents in Country_Name
perpetrated by Group_Name
Sample 1 Terrorist Incidents in Cyprus
perpetrated by Lebanese.
Comment The answer set is very focused
Terrorist Incidents in ltCountrygt
Sample 2 Terrorist Incidents in Bolivia.
Comment The answer set is split into several
connected pieces each with a distinctive
characteristic

10
Global Terrorist Data Base Picks (cont)

Sample 2
Terrorist Incidents in Bolivia (continued)
Comment Notice the connection with the
pro-palestinian group.

11
Global Terrorist Data Base Picks (cont)

Terrorist Incidents with ltAttack_Typegt and
ltFatalitiesgt
Terrorist Incidents with Kidnapping
and Fatalities.
Comment A more varied answer set.
Notice the several connected pieces with
distinctive characteristics.

12
Global Terrorist Data Base Picks (cont)

Terrorist
Incidents with
Kidnapping
and Fatalities.
Comment A more varied answer set.
Notice the several
connected pieces
with distinctive
characteristics.

13
Global Terrorist Data Base Picks (cont)

Terrorist
Incidents with
Kidnapping
and Fatalities.
Comment A more varied answer set.
Notice the several
connected pieces
with distinctive
characteristics.

14
Global Terrorist Data Base Picks (cont)

Terrorist
Incidents with
Kidnapping
and Fatalities.
Comment A more varied answer set.
Notice the several
connected pieces
with distinctive
characteristics.

15
Kidnappings with fatalities (cont)
16
Graph Fusion (supported by LLNL)

Problem Statement Given a collection of entities
each with an associated set of attributes the
corresponding Universal Graph pre-supposes that
we have the ability to efficiently compute a
notion of semantic similarity between every pair
of entities (a quadratic computation). Graph
Fusion is the reverse process, i.e. how much of
the Universal Graph can we efficiently recover
from its collection of projections into selected
subsets of attributes?
Typical scenario an entity may have identities
in email, call detail, web pages, blogs, instant
messaging, chat rooms, etc. As such it has its
own graph neigborhoods on each of these networks.
The question is how to approximate from these
local neighborhoods the neighborhood of this
persona in the Universal Graph.
Our current approach deals with the restricted
case in which there is a known Taxonomy for the
set of entity attributes of the Universal graph.
We have developed some graph fusion mechanisms
for this case and have applied them to the PubMed
data base (Research supported by LLNL). These
methods are based on Spanning Trees and
Matchings.

17
PubMed Data Base SamplesKeyPhrases eyes OR
vitamin
18
PubMed Data Base SamplesKeyPhrases eyes OR
vitamin
19
Graph Fusion via a DAG Taxonomy.

Fuse Multirelational Data Stats on the vertexes
and edges of a DAG Taxonomy
For a vertex v in the DAG and an entity type
EType in the Collection
Let EType(v, Collection) entities of Etype
with label(v)
Example In PubMed, Number of papers with a
given Mesh
Similarly Number of Authors with a given
Mesh, etc
For an edge (u,v) in the DAG and an entity
type EType in the Collection
Let Etype((u,v), Collection) SimilarityOf(
Etype entities with labels label(u) and label(v)
)
Example In PubMed DAG,
Paperweight(u,v) JaccardCoefficient(Papers
with label(u) or label(v))
Authorweight(u,v) JaccardCoefficient(Authors
with label(u) or label(v))
Example The DAG Taxonomy on Meshes from PubMed
(Demo samples).

20
How good is a partition(i.e. a MAC)? Use
Information Loss

General View Each point has certain probability
of being a member of one of the sets in the
partition. If a random variable X takes values on
Range(X) is natural then to partition Range(X) in
such a manner that a random variable P over the
partition representatives becomes a reasonable
quantization of X.
The quality of P can be measured as I(X) I(P)
/Information Loss /
The expectation of X is the unique optimal vector
s that achieves the minimal expected distortion
Evdivergence(X,s). This suggests to define a
divergence based measure of information I(X) as
Evdivergence(X, EvX)
Recent advances characterize the class of
functions that admits kmeans iterative relocation
schemes where a corresponding distortion based
global objective function is progressively
decreased. These include Square Loss, KL
Divergence, Logistic Loss, Mahalanobis Distance
and the Itakura-Saito distances used in signal
processing.

21
What About Multiple Hierarchies(Taxonomies) ?

For two hierarchies the approach can be
generalized via restrictewd cross products.
As an example video demo of Matrix Zoom. One
hierarchy is the geography and the other is
application dependent.
Go to Video of Matrix Zoom.

22
Some interactivity Tricks

Since views get cluttered some efforts have been
made to alleviate this.
Video demo of Fish EyeViews and Edge Lenses.

23
Conclusions

Advantages
With parameter tuning and adequate Ram
resources these type of systems remain
interactive even when dealing with very large
graphs via graph maps derived from hierarchy
trees. Currently, extending it to graph maps
derived from hierarchy DAGs.
The approach is applicable to almost any type of
graph, regardless of density. It turns out that
large degree which is usually considered just a
nuisance now becomes a delicate issue.
Parameterized interactivity allows it to run on
less powerful systems
Structural grouping/clustering algorithms
performs well (removal of subtrees helps a lot)
Needed Improvements/Extensions
Extend the system to directed multi-graphs
(currently it handles undirected graphs).
Add Data Streaming Capabilities

24
Conclusions (cont)

Needed Improvements/Extensions
Extend the system to directed multi-graphs
(currently it handles undirected graphs).
Add Data Streaming Capabilities
Improved ways to effectively summarize contents
of a group/cluster. Currently, we use a
hierarchical frequency based group labeling
algorithm.
Need ways to better guide users to potentially
interesting
pieces of the data.

25
Questions?

Contact Info
James Abello abello_at_dimacs.rutgers.edu
Related Publications
Semi-External Induced Subgraphs, J. Abello and
R. Dementiev, in preparation.
HGV A C Library to compute Hierarchical Graph
Views, J. Abello and J. Crobak, in preparation.
Name That Cluster, J. Abello, H. Schulz,, B.
Gaudin, C. Tominski, in Infovis 2006, IEEE,
Sacramento, CA.
CVG Coordinate Graph Visualizations, J.
Abello, C. Tominski, H. Schumann, in Infovis
2006, IEEE, Sacramento, CA.