Mining Networks through Visual Analytics - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Mining Networks through Visual Analytics

Description:

CNRS LaBRI UMR 5800 & INRIA Futurs GRAVIT . Bordeaux, France. peacokmaps.com ' ... LaBRI UMR 5800, Bordeaux -- Equipe GRAVIT / INRIA Futurs. Guy Melan on ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 66
Provided by: guyme
Category:

less

Transcript and Presenter's Notes

Title: Mining Networks through Visual Analytics


1
Mining Networks through Visual Analytics
  • Incremental Hypothesis Building and Validation

David Auber Romain Bourqui Guy Melançon
CNRS LaBRI UMR 5800 INRIA Futurs
GRAVITÉ Bordeaux, France
2
peacokmaps.com
3
InfoVis CyberInfraStructure Pajek
  • A picture is worth a thousand words
  • Chinese proverb (?)

4
Tulip BubbleTree
5
Graph Viz Framework Tulip
  • Its all visual
  • R. Feynman (Nobel prize in Physics)

6
Internet traffic
7
Voronoï Treemaps
  • The purpose of computing is insight not numbers
  • R. Hamming (1973)

8
Cushion Treemaps
9
  • Visualization uses
    computer
    graphics to help provide insight on complicated
    problems, models or systems
  • Scientific visualization is exploring data and
    information graphically, gaining understanding
    and insights into the data
  • R.A. Earnshaw (a pioneer in computer graphics,
    1973)

Munzners Hyperbolic Browser
10
Tulip Sugiyama Layout
11
Visualize?
  • Inselberg creator of parallel coordinates
  •  Insight through images 
  •  Goal Visual Model to Help our Intuition 
  •  Involves Geometry, Cognition, Art ? 

12
Visualize?
13
Visual graph mining related to security issues
  • Recognize structural properties
  • Identify key actors
  • Identify their neighborhood
  • Community structure
  • Connectivity between communities

Chess players recognize patterns
14
Example from NCTC data
  • Extracted about 8000 incidents from WITS
  • Identified terrorists groups when possible
    (directly or through AFP)
  • Identified countries where incidents took place
  • Added territorial information (continents, world
    regions) to help organize the overall map

15
Example from NCTC data
  • About 8000 incidents
  • 9419 nodes
  • 18486 edges
  • Layout is time consuming
  • Does not provide clue about structure
  • Filter out incidents with no identified group

16
Example from NCTC data
  • Interactivity
  • Play with network
  • Apply various metrics
  • Attribute-based node filtering
  • Tulip Graph Viz Framework
  • Opensource
  • Plug-in architecture
  • www.tulip-software.org

17
Massive data
  • Information big bang - Projet  How much
    information , Berkeley University
  • In 2001, about 1 exabyte (1 million terabytes) of
    data is generated annually worldwide, including
    99.997 available only in digital form
  • In 2003 each individual produces about 800
    megabytes per year

18
Massive data
  • 100 million FedEx transactions / day
  • 150 million VISA transactions / day
  • 300 millions long distance calls / day over ATTs
    network
  • 35 billions e-mails / day over the world
  • 600 billions IP packets / day over DE-CIX backbone

Keim, VIEW Workshop 2006
19
Visualization and Moores law
Daniel Keim - Keynote Address, VIEW 2006
20
Visualization and Moores law
  • Issues that wont be solved by hardware only
  • Design interaction together with visualization
  • Understand how and why visualization pays
  • Collaborate with other fields
  • Integrate visualization together with other
    technology

NIH-NSF Visualization Research Challenges Report,
2006
21
Added value of visual and interactive mining
  • KDD Panel   The Perfect Data Mining Tool 
    Ankerst 2002
  • The human eye is an excellent tool for spotting
    natural patterns
  • Getting rid of the human in the loop? Wrong
    decision!
  • Increase human participation through
    visualization in the data exploration and
    knowledge discovery processes

22
Sense making loop
J. Thomas Visual Analytics Initiative
23
Visualization mantras
  • Visual Information Seeking Mantra
  • Overview, Zoom-in / Filter, and Details on Demand
    (Shneiderman, 1996)
  • Visual Analytics Mantra
  • Analyse first, Show the Important, Zoom, filter
    and analyse, Details on demand (Keim 2006)

24
Visualization pipeline
  • A designers view on the visualization process

25
Visualize?
Protein interaction network (yeast) Barabàsi 2000
26
Organize data prior to visualization
  • Layer or hierarchize data based on
  • node/edge metrics (eigenvalues, centralities, )
  • topological feature detection
  • Use relevant drawing methods
  • Combine with interaction

27
Case study ITA 2000 passenger air traffic
  • Cities connect through direct flights
  • Edge weights number of passengers
  • Questions
  • Read motivations of carriers through organization
    of the network?
  • Territorial logic?
  • Political? Economical?

28
Case study ITA 2000 passenger air traffic
  • Cities connect through direct flights
  • Edge weights number of passengers
  • Questions
  • Read motivations of carriers through organization
    of the network?
  • Territorial logic?
  • Political? Economical?

29
TopoLayout (Topological) Feature-based
Hierarchization
  • Search the graph for components of growing
    complexity
  • Subtrees
  • Biconnected components ( blocks )
  • Grid-like
  • Clusters

30
TopoLayout (Topological) Feature-based
Hierarchization
  • Search the graph for components of growing
    complexity
  • Subtrees
  • Biconnected components ( blocks )
  • Grid-like
  • Clusters

31
TopoLayout (Topological) Feature-based
Hierarchization
  • Search the graph for components of growing
    complexity
  • Subtrees
  • Biconnected components
  • Grid-like
  • Clusters

32
TopoLayout (Topological) Feature-based
Hierarchization
  • Search the graph for components of growing
    complexity
  • Subtrees
  • Biconnected components
  • Grid-like
  • Clusters
  • Need to identify articulation points (pivots)
  • The graph builds into a tree of biconnected
    components

33
TopoLayout (Topological) Feature-based
Hierarchization
  • Search the graph for components of growing
    complexity
  • Subtrees
  • Biconnected components ( blocks )
  • Grid-like (eigenvalues)
  • Clusters

34
TopoLayout (Topological) Feature-based
Hierarchization
  • Search the graph for components of growing
    complexity
  • Subtrees
  • Biconnected components ( blocks )
  • Grid-like (eigenvalues)
  • Clusters

35
TopoLayout
  • Components naturally organize as a hierarchy
    through the search process

36
TopoLayout interaction Grouse
  • Explore the graph by unfolding/folding the
    hierarchy
  • The users navigation triggers layout of
    components
  • Higher level graphs (quotient graphs) are built
    from metanodes
  • Improve readability / Less visual elements
  • Faster layout, based on topology of quotient
    graph
  • Grouse

37
TopoLayout interaction Grouse
  • Multilevel hierarchy recursive grouping of
    metanodes

38
TopoLayout interaction Grouse
  • Multilevel hierarchy recursive grouping of
    metanodes

39
TopoLayout interaction Grouse
  • Multilevel Hierarchy for Abstraction Cut

40
Multilevel navigation of small world networks
  • Small world networks social networks, web
    graphs, transportation networks (ITA),
  • Small world networks organize into several levels
    (hierarchy) Adamic, Huberman
  • Idea capture the hierarchy and use it as a
    navigation paradigm

41
Small world networks
  • Centralities
  • Bottleneck passageways
  • Network organizes around those pivots nodes

42
Small world networks
  • Centralities
  • Betweenness centrality has high computational
    cost (global)
  • Betweenness centrality
  • Eigenvalue centrality
  • Prefer local index
  • Degree
  • Edge strength

43
Small world networks
  • Edge strength proportion of cycles containing an
    edge (length 3 and 4)

(Jaccard 1912) (Tanimoto 1958) Auber et al.
2003 Raddichi et al. 2004
44
Small world networks
  • Edge strength
  • Costs linear time if degree is bounded, otherwise
    quadratic

45
Small world networks
  • Edge strength
  • Cost yet lower than most centralities (local
    versus global indices)
  • Incremental local modification of graphs require
    local recomputation

46
Community structure of small world networks
  • Filter out weak edges
  • Capture components
  • Infer quotient graph (metanodes)
  • Recurse over each component

47
Community structure of small world networks
  • Filter out weak edges
  • Capture components
  • Infer quotient graph (metanodes)
  • Recurse over each component

48
Community structure of small world networks
  • Filter out weak edges
  • Q. What threshold to choose?
  • A. Best possible one (!)
  • Use quality criteria
  • MQ (modularity quality)

49
Quality criteria MQ
  • C (C1, C2, , Cp) is a clustering of a graph G

50
MQ / Nice properties
  • MQ varies over a bounded interval -1, 1
  • MQ behaves like a Gaussian distribution

51
MQ / Nice properties
  • MQ behaves like a Gaussian distribution

52
Challenge find the best possible clustering
(according to MQ)
  • Exhaustive search intractable
  • Optimization, search algorithms (hill climbing,
    genetic algorithms, bio-mimetics, ) costy
  • Heuristic exploit node/edge centralities
  • Filter out weak edges
  • Tickmark possible values for edges
  • Find threshold with best MQ

53
Filter / Threshold
54
Filter / Threshold
55
Filter / Threshold
56
Hierarchical organization of the network
  • The procedure can be iterated to produce a
    hierarchy of clusters
  • Strength of edges is recomputed at each stage
  • Threshold is locally chosen for each component

57
MQ / Extension
  • To take into account the relative size of
    clusters
  • (MQ also naturally extends to fuzzy clustering)

58
MQ / Extension
  • Extend to various classes of graphs (where F
    stands for any adequate edge density function)

59
Conclusion Future work
  • MQ / Extension to graph hierarchies

60
MQ / Extension to graph hierarchies
  • Inspired from attribute grammars

61
Conclusion Future work
  • Study dynamic network
  • Streamed / Time-stamped network
  • Incremental/local computation/adjustment of
  • edge metrics (local metrics)
  • MQ (or other possible quality criteria)

62
Conclusion
  • Interaction is the real added value of
    visualization
  • Must combine with other mining techniques
  • Insert combination in sense making loop

63
Conclusion
  • We are opened and interested to collaborate with
    colleagues from other areas, adopting different
    perspectives
  • Learning / Mining /
  • Experts / Corporate organizations / Final users
  • Any idea for a different multilevel clustering
    criteria/approach?

64
Conclusion
  • We are opened and interested to collaborate with
    colleagues from other areas, with other
    perspectives
  • Learning / Mining /
  • Experts / Corporate organizations / Final users
  • Go visit Tulips website and download the
    software (Im here until Friday if you need a
    coach !)
  • www.tulip-software.org
  • Guy.Melancon_at_labri.fr

65
Credits
  • LaBRI UMR 5800, Bordeaux -- Equipe GRAVITÉ /
    INRIA Futurs
  • Guy Melançon
  • Maylis Delest
  • David Auber
  • Patrick Mary
  • Tulip Graph Viz Framework
  • www.tulip-software.org
  • R. Bourqui, U Bx, FR
  • D. Archambault, UBC, CA
  • T. Munzner, UBC, CA

_at_labri.fr
Write a Comment
User Comments (0)
About PowerShow.com