Universal Information Graphs via Hierarchical Graph Maps and Graph Fusion - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Universal Information Graphs via Hierarchical Graph Maps and Graph Fusion

Description:

James Abello, DyDAn Rutgers University. Portions of this work have been done ... J. Abello, C. Tominski, H. Schumann, in Infovis 2006, IEEE, Sacramento, CA. ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 20
Provided by: meli144
Category:

less

Transcript and Presenter's Notes

Title: Universal Information Graphs via Hierarchical Graph Maps and Graph Fusion


1
Universal Information Graphs
via Hierarchical Graph Maps
and Graph Fusion
  • James Abello, DyDAn Rutgers University
  • Portions of this work have been done jointly
    with
  • J. Crobak (Rutgers) , R. Dementiev(U.
    Karslhube) , I. Khan(Rutgers), H. Schulz (U.
    Rostock)
  • Partially supported by LLNL (in
    consultation with Scott Kohn )

2
Universal Information Graphs Encode
multiple relations among a set of entities.
Usually, each entity has associated with it a set
of possibly labeled attributes. Entities
people, organizations, places, events,
documents, telecommunication activities,
computer addresses, web page descriptors,
images, videos, parts of speech, etc.
Entities are associated when they co-occur in
a logical unit of interest. Associated entity
pairs get tagged by a vector of labels and by a
weight vector that measures the strength of the
associations. Patterns correspond to special
subsets of entities and their inter
relations.
3
  • Overall Goal
  • To efficiently uncover and classify patterns
    that can be used as triggers to take preventive
    actions against potential society threats
  • How ?
  • 1. Design similarity measures among data
    entities via a semantic dot product. This
    amounts to quantification via some weighting
    mechanism of the set of attributes shared by a
    set of entities.
  • The main question is how to learn these
    weights and how to determine levels of
    agreement that correspond to cluster pattern
    formation in the data.
  • Instead of some form of relational data base
  • Use vertex and edge weighted labeled
    multi-graphs.
  • 2. Since scale and interactivity are essential
    we opt for
  • Hierarchical Graph Map methods that are
  • a. I/O efficient and
  • b. parameterized by the amount of RAM and
    real state screen available to an analyst
    posing queries (semi-external algorithms).

4
What have we done? A Client- Server
SystemThe server uses our C library hgv. It
creates hierarchical maps of semi-external graphs
(up to billion edges on an 8Gb RAM commodity
machine). It provides parameterized abstractions
of the data.The client is used to interactively
navigate server graph answers. It builds up on
our previous work on graph visualization systems
( GraphView ).
5
Major Computations Structural Groupings
  • Connected components
  • Peripheral Trees
  • Biconnected components
  • Clusters based on topology and labeling
    information
  • Graph Abstractions ( via Sparse Cuts )

6
Server Side Hierarchy Creation
  • INPUT Weighted simple undirected input graph G
    (V,E)
  • E possibly larger than will fit in RAM.
  • Apply external memory weighted contraction
    algorithm (MST or Matching)1.
  • Find antichain/slice/cut in the hierarchy that
    just fits in RAM by iteratively contracting edges.

Binary hierarchy on semi-external graph
Binary hierarchy on internal graph
1 J. Abello, Hierarchical Graph Maps, Computer
Graphics, 2004
7
Visualization Client Interactivity
  • Process each chunk of IH edges using the
    hierarchical map obtained via our topological
    grouping/clustering algorithms.
  • Starting from the higher level groups/clusters,
    layout each subgraph on demand.
  • If we hit the leaf nodes of the current chunk,
    load the next chunk into memory, process it in
    the same way and hook it to the hierarchy.

8
Visualization Samples Wordnet, Sample of General
Queries, Terrorist Incidents, From an SQL query
to its graph (via PubMed Descriptors).
9
Global Terrorist Data Base Picks
  • Terrorist Incidents in Country_Name
  • perpetrated by Group_Name
  • Sample 1 Terrorist Incidents in Cyprus
    perpetrated by Lebanese.
  • Comment The answer set is very focused
  • Terrorist Incidents in ltCountrygt
  • Sample 2 Terrorist Incidents in Bolivia.
    Comment The answer set is split into several
    connected pieces each with a distinctive
    characteristic

10
Global Terrorist Data Base Picks (cont)
  • Sample 2
  • Terrorist Incidents in Bolivia (continued)
  • Comment Notice the connection with the
    pro-palestinian group.

11
Global Terrorist Data Base Picks (cont)
  • Terrorist Incidents with ltAttack_Typegt and
    ltFatalitiesgt
  • Terrorist Incidents with Kidnapping
  • and Fatalities.
  • Comment A more varied answer set.
  • Notice the several connected pieces with
    distinctive characteristics.

12
Global Terrorist Data Base Picks (cont)
  • Terrorist
  • Incidents with
  • Kidnapping
  • and Fatalities.
  • Comment A more varied answer set.
  • Notice the several
  • connected pieces
  • with distinctive
  • characteristics.

13
Global Terrorist Data Base Picks (cont)
  • Terrorist
  • Incidents with
  • Kidnapping
  • and Fatalities.
  • Comment A more varied answer set.
  • Notice the several
  • connected pieces
  • with distinctive
  • characteristics.

14
Global Terrorist Data Base Picks (cont)
  • Terrorist
  • Incidents with
  • Kidnapping
  • and Fatalities.
  • Comment A more varied answer set.
  • Notice the several
  • connected pieces
  • with distinctive
  • characteristics.

15
Kidnappings with fatalities (cont)
16
Graph Fusion (supported by LLNL)
  • Problem Statement Given a collection of entities
    each with an associated set of attributes the
    corresponding Universal Graph pre-supposes that
    we have the ability to efficiently compute a
    notion of semantic similarity between every pair
    of entities (a quadratic computation). Graph
    Fusion is the reverse process, i.e. how much of
    the Universal Graph can we efficiently recover
    from its collection of projections into selected
    subsets of attributes?
  • Typical scenario an entity may have identities
    in email, call detail, web pages, blogs, instant
    messaging, chat rooms, etc. As such it has its
    own graph neigborhoods on each of these networks.
    The question is how to approximate from these
    local neighborhoods the neighborhood of this
    persona in the Universal Graph.
  • Our current approach deals with the restricted
    case in which there is a known Taxonomy for the
    set of entity attributes of the Universal graph.
  • We have developed some graph fusion mechanisms
    for this case and have applied them to the PubMed
    data base (Research supported by LLNL). These
    methods are based on Spanning Trees and
    Matchings.

17
PubMed Data Base SamplesKeyPhrases eyes OR
vitamin
18
PubMed Data Base SamplesKeyPhrases eyes OR
vitamin
19
Graph Fusion via a DAG Taxonomy.
  • Fuse Multirelational Data Stats on the vertexes
    and edges of a DAG Taxonomy
  • For a vertex v in the DAG and an entity type
    EType in the Collection
  • Let EType(v, Collection) entities of Etype
    with label(v)
  • Example In PubMed, Number of papers with a
    given Mesh
  • Similarly Number of Authors with a given
    Mesh, etc
  • For an edge (u,v) in the DAG and an entity
    type EType in the Collection
  • Let Etype((u,v), Collection) SimilarityOf(
    Etype entities with labels label(u) and label(v)
    )
  • Example In PubMed DAG,
  • Paperweight(u,v) JaccardCoefficient(Papers
    with label(u) or label(v))
  • Authorweight(u,v) JaccardCoefficient(Authors
    with label(u) or label(v))
  • Example The DAG Taxonomy on Meshes from PubMed
    (Demo samples).

20
How good is a partition(i.e. a MAC)? Use
Information Loss
  • General View Each point has certain probability
    of being a member of one of the sets in the
    partition. If a random variable X takes values on
    Range(X) is natural then to partition Range(X) in
    such a manner that a random variable P over the
    partition representatives becomes a reasonable
    quantization of X.
  • The quality of P can be measured as I(X) I(P)
    /Information Loss /
  • The expectation of X is the unique optimal vector
    s that achieves the minimal expected distortion
    Evdivergence(X,s). This suggests to define a
    divergence based measure of information I(X) as
    Evdivergence(X, EvX)
  • Recent advances characterize the class of
    functions that admits kmeans iterative relocation
    schemes where a corresponding distortion based
    global objective function is progressively
    decreased. These include Square Loss, KL
    Divergence, Logistic Loss, Mahalanobis Distance
    and the Itakura-Saito distances used in signal
    processing.

21
What About Multiple Hierarchies(Taxonomies) ?
  • For two hierarchies the approach can be
    generalized via restrictewd cross products.
  • As an example video demo of Matrix Zoom. One
    hierarchy is the geography and the other is
    application dependent.
  • Go to Video of Matrix Zoom.

22
Some interactivity Tricks
  • Since views get cluttered some efforts have been
    made to alleviate this.
  • Video demo of Fish EyeViews and Edge Lenses.

23
Conclusions
  • Advantages
  • With parameter tuning and adequate Ram
    resources these type of systems remain
    interactive even when dealing with very large
    graphs via graph maps derived from hierarchy
    trees. Currently, extending it to graph maps
    derived from hierarchy DAGs.
  • The approach is applicable to almost any type of
    graph, regardless of density. It turns out that
    large degree which is usually considered just a
    nuisance now becomes a delicate issue.
  • Parameterized interactivity allows it to run on
    less powerful systems
  • Structural grouping/clustering algorithms
    performs well (removal of subtrees helps a lot)
  • Needed Improvements/Extensions
  • Extend the system to directed multi-graphs
    (currently it handles undirected graphs).
  • Add Data Streaming Capabilities

24
Conclusions (cont)
  • Needed Improvements/Extensions
  • Extend the system to directed multi-graphs
    (currently it handles undirected graphs).
  • Add Data Streaming Capabilities
  • Improved ways to effectively summarize contents
    of a group/cluster. Currently, we use a
    hierarchical frequency based group labeling
    algorithm.
  • Need ways to better guide users to potentially
    interesting
  • pieces of the data.

25
Questions?
  • Contact Info
  • James Abello abello_at_dimacs.rutgers.edu
  • Related Publications
  • Semi-External Induced Subgraphs, J. Abello and
    R. Dementiev, in preparation.
  • HGV A C Library to compute Hierarchical Graph
    Views, J. Abello and J. Crobak, in preparation.
  • Name That Cluster, J. Abello, H. Schulz,, B.
    Gaudin, C. Tominski, in Infovis 2006, IEEE,
    Sacramento, CA.
  • CVG Coordinate Graph Visualizations, J.
    Abello, C. Tominski, H. Schumann, in Infovis
    2006, IEEE, Sacramento, CA.

26
  • attackid attacklabel shortlabel
  • 3 Assassination ass
  • 4 Bombing bom
  • 5 Facility Attack fac
  • 6 Hijacking hij
  • 7 Kidnapping kid
  • 8 Maiming mai
  • 9 Assault alt
  • 10 Mass Disruption md
  • 11 Arson ars
  •  
  •  

27
TargetType Table
  •  targetid targetlabel shortlabel
  • 0 - -
  • 1 US Diplomat udi
  • 2 US Police/Military upm
  • 3 US Other uot
  • 4 US Unknown uun
  • 5 US Government ugo
  • 6 US Political Parties upp
  • 7 US Media ume
  • 8 US Business ubu
  • 9 US Transportation utr
  • 10 US Utilities uut
  • 11 Foreign Business fbu
  • 12 Domestic Business dbu
  • 13 Transportation tra
  • 14 Utilities uti
  • 15 Media med
  • 16 Diplomat dip
  • 17 Government gov
Write a Comment
User Comments (0)
About PowerShow.com