INARC I3.1 Mid-Year Report I3.1: QoI Mining of Noisy, Volatile, Uncertain, and Incomplete Heterogeneous Information Networks - PowerPoint PPT Presentation

Loading...

PPT – INARC I3.1 Mid-Year Report I3.1: QoI Mining of Noisy, Volatile, Uncertain, and Incomplete Heterogeneous Information Networks PowerPoint presentation | free to download - id: 665ca0-OGZiM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

INARC I3.1 Mid-Year Report I3.1: QoI Mining of Noisy, Volatile, Uncertain, and Incomplete Heterogeneous Information Networks

Description:

: First data cleaning and data fusion by information network analysis, and then mine the cleansed data in information networks – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Date added: 5 March 2020
Slides: 31
Provided by: webEngrIl
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: INARC I3.1 Mid-Year Report I3.1: QoI Mining of Noisy, Volatile, Uncertain, and Incomplete Heterogeneous Information Networks


1
INARC I3.1 Mid-Year Report I3.1 QoI Mining of
Noisy, Volatile, Uncertain, and Incomplete
Heterogeneous Information Networks
  • Jiawei Han (Task Lead)
  • Christos Faloutsos (CMU) Xifeng Yan (UCSB)
  • University of Illinois at Urbana-Champaign
  • NS-CTA INARC

1
2
I3.1 QoI Mining of Noisy, Volatile, Uncertain,
and Incomplete Heterogeneous Information Networks
  • Key Objectives
  • Develop robust and quality mining methods for
    noisy and inaccurate heterogeneous information
    networks
  • Design substantially enhanced data mining methods
    to uncover hidden patterns and knowledge
  • Deliverables
  • Q1 Methodology design for (i) two-stage mining
    and (ii) noise-aware mining, in heterogeneous
    information networks
  • Q2 Algorithm development for the two approaches
  • Q3 Algorithm test and refinement for the two
    approaches
  • Q4 System prototype demo of the two approaches
  • Impact
  • Enable tools to uncover hidden patterns and
    knowledge from info_nets despite noise and
    uncertainty in the networks

Dirty Information Network
Cleaned/Inferred Adversarial Network
Role Researchers
Lead J. Han, UIUC (INARC)
Primary C. Faloutsos, CMU (INARC)
Primary X. Yan, UCSB (INARC)
Collaborator T. La Porta, Penn State (CNARC) (linked with E2.2)
Collaborator C.-Y. Lin, IBM (SCNARC) (linked with S1.1)
Collaborator M. Magdon-Ismail, RPI (SCNARC) (linked with S2.1)
Total 324K
  • Key Technical Innovations
  • Noise-aware mining model of incomplete and noisy
    network data by incorporating the uncertainty of
    the node attributes and network structure to
    discover hidden relationships

3
Overall Task Organization
  • Subtask 1 Two-Stage Mining First data cleaning
    and data fusion by information network analysis,
    and then mine the cleansed data in information
    networks

Subtask2 Noise-aware mining
  • Subtask 2 Noise-aware mining Directly mine the
    networked data with the consideration that a
    certain portion of the data may be noisy,
    incomplete, or unreliable

Subtask1 Two-Stage Mining
Subtask3 Exploring QoI Mining Applications
  • Subtask 3 Exploring QoI Mining Applications
    Explore QoI network mining methodology in various
    applications

3
3
4
Subtask 1 Two-Stage Mining
  • Two-Stage Mining
  • First data cleaning and data fusion by
    information network analysis, and
  • then mine the cleansed data in information
    networks
  • Role and relationship discovery Uncovering
    hierarchical relationships among linked objects
    (KDD11 sub)
  • Data Cleaning by Trust Analysis Cluster-Based
    Trustworthiness Analysis (WWW11 poster, KDD11
    sub)
  • Network Denoising and Sampling by Active Learning
    (ICML11 sub)
  • Clustering Heterogeneous Information Networks
    with Incomplete Attributes (KDD11 sub)
  • Assessing and Ranking Structural Correlations in
    Graphs (SIGMOD'11)
  • Differentially Private Data Cubes Optimizing
    Noise Source and Consistency (SIGMOD'11)

4
4
5
Uncovering Hierarchical Relationships among
Linked Objects
  • Parent-child, manager-subordinate,
    organizational, initiator-follower
  • DAG? underlying tree
  • Data Nodes, links, labeled trees
  • Jointly Learn the importance of features and
    rules (challenge joint learning)
  • Infer the tree structures of unlabeled data
    (challenge model feature design)
  • Develop a general model summarize typical
    features w/ uncertain importance
  • Local feature (singleton potential)
  • Dependency rule (pairwise potential)
  • Test on two tasks
  • Uncover family tree structure
  • Uncover online discussion structure

Examples of features and rules
Inference performance in diff. measures Practical usefulness and generality
Using state-of-the-art text mining method (2-3X) Does not require many labels for training
Joint model gt two-stage model (21 - 36 higher) Good adaptability for generalization
(UIUC CUNY) Chi Wang, Jiawei Han, Xiang Li, Qi
Li, Wen-Pin Lin, Adam Lee, Hao Li, Heng Ji,
"Uncovering Hierarchical Relationships among
Linked Objects A Probabilistic Modeling
Approach", KDD'11 (sub)
6
Cluster-Based Trustworthiness Analysis
  • Trust analysis and clustering of objects
    iteratively to obtain high accuracy of trust
    ranking of providers and confidence ranking of
    facts
  • Smoothing with the global scores
  • Several algorithms
  • Basic TruthFinder Algorithm (Basic TF)
  • Basic Cluster-based Fact Finder (BCFF)
  • Advanced Cluster-based Fact Finder (ACFF)

(UIUC) Manish Gupta, Jiawei Han, and Yizhou Sun,
"Cluster-Based Analysis of Information
Trustworthiness", KDD'11 (sub) (an earlier
version in WWW11 poster.)
7
Network Denoising and Sampling by Active Learning
  • Problem Which nodes are the most important in a
    network from a learning point of view? i.e., if
    they are labeled, the classifier trained will
    perform the best?
  • These nodes are more important, and less likely
    to contain noise
  • Methodology Select data to label which minimize
    the variance of an unbiased classifier ? Minimize
    the expected error
  • Significance Theoretically minimize the expected
    error of a given classifier

Classification accuracy vs. the number of labeled
nodes used in the co-author network
Our algorithm
(UIUC) Ming Ji, Xiaofei He, and Jiawei Han, "A
Variance Minimization Criterion to Active
Learning on Graphs", ICML11 (submission)
8
Correlation Metric in Information Networks
Question Is the distribution of events (blue
nodes) influenced by the network links or not?
If it is, to what degree?
  • A novel metric, Decayed Hitting Time, is proposed
    to assess and rank structural correlations in
    graphs that aggregates the proximity among nodes
    sharing the same event
  • SIGMOD reviewer Interesting problem that I
    havent seen before
  • Structural Correlation the first-of-its kind
    defined for networks
  • Able to test whether the distribution of events
    is related to (or influenced by) the underlying
    network structure. If Yes, how much?
  • Our sampling algorithm is 10-20 times faster than
    the iterative multiplication algorithm

(UCSB) Z. Guan, J. Wu,  Z. Yun, A. Singh, X. Yan,
Assessing and Ranking Structural Correlations in
Graphs, Proc. 2011 Int. Conf. on Management of
Data (SIGMOD'11), 2011
9
Differentially Private Data Cubes Optimizing
Noise Source and Consistency
  • Motivation
  • Concern Disclosure of sensitive information on
    data publication
  • Explore differential privacy to provide provable
    privacy guarantees for individuals in
    multi-dimensional data space (data cube)
  • Approach Adding noise to query answers
  • choose an initial subset of cuboids to compute
    directly from the fact table, injecting DP noise
    as usual
  • compute the remaining cuboids directly from the
    initial set
  • An efficient procedure with running time
    polynomial to of cuboids to select the initial
    set of cuboids, such that the maximal noise in
    all published cuboids is within a factor of the
    optimal
  • Result Enforce consistency in the published
    cuboids while simultaneously improving their
    utility (i.e., reducing error)

(UIUC) Bolin Ding, Marianne Winslett, Jiawei Han,
and Zhenhui Li, Differentially Private Data
Cube Optimizing Noise Source and Consistency,
SIGMOD'11, Athens, Greece, June 2011 (accepted)
9
10
Subtask 2 Noise-Aware Mining
  • Noise-aware mining
  • Directly mine the networked data with the
    consideration that a certain portion of the data
    may be noisy, incomplete, or unreliable
  • RankClass Ranking-Based Classification of
    Information Networks (KDD11 sub)
  • Apolo Making Sense of Large Network Data
    Combining Rich User Interaction Machine
    Learning (CHI11)
  • Event Detection in Time Series of Mobile
    Communication Graphs (Army Science Conference10)
  • PathSim Meta Path-Based Top-K Similarity Search
    in Heterogeneous Info. Networks (VLDB11 sub)
  • Towards Iceberg Analysis in Graph OLAP (in
    preparation, VLDB J.)
  • Graph Cube On Warehousing and OLAP
    Multidimensional Networks (SIGMOD11)

10
11
RankClass Ranking-Based Classification of
Information Networks
  • Output the classification results ranking list
    of objects within each class
  • For each class, objects ranked low are more
    likely to contain noise
  • Methodology iteratively use the current ranking
    results to extract the sub-network corresponding
    to each class, on which the within-class ranking
    algorithm is run
  • Iteratively use the current ranking results to
    remove noise from other classes

Clean sub-network for class 1
Clean Sub-network for class 2
Clean sub-network for class 3
Original network
(UIUC) Ming Ji and Jiawei Han, "Ranking-Based
Classification of Heterogeneous Information
Networks", KDD'11 (sub)
12
Apolo Making Sense of Large Network Data
Combining Rich User
Interaction Machine Learning
  • Provides a mixed-initiative approach (ML HCI)
    to help users interactively explore large graphs
  • Users start with small sub-graph, then
    iteratively expand
  • User specifies exemplars
  • Belief Propagation to find other relevant nodes
  • User study showed Apolo outperformed Google
    Scholar in making sense of citation network data

(CMU) Chau,et al, Apolo Making Sense of Large
Network Data CHI11
12
13
Event Detection in Time Series of Mobile
Communication Graphs
  • Problem Given a graph that changes over time,
    perform
  • 1) change detection time points at which many
    nodes change their behavior significantly
  • 2) attribution top nodes which contribute to
    the change in behavior the most

.
  • Main idea
  • Extract features for nodes
  • Derive the typical behavior (eigen-behavior) of
    nodes
  • Compare eigenbehaviors over time
  • detect important events
  • and festivals in our data,
  • spot nodes that change
  • behavior over time.

(CMU) Akoglu et .al., Event Detection Army
Science Conference10
13
14
Noise-Aware Mining Graph Iceberg
  • R1 has high concentration of black vertices, but
    low connectivity
  • R2 contrarily has few black vertices, but
    well-connected
  • R3, is an anomaly region with high density of
    black vertices and high connectivity

Scalable gIceberg is 10-50 times faster than the
existing algorithm
  • Graph Iceberg A novel graph iceberg mining
    framework to find anomaly regions in large
    heterogeneous information networks
  • Graph Iceberg the first-of-its kind in network
    science 
  • gIceberg identifies promising vertices to avoid
    costly candidate region enumeration (efficient
    10-50 times faster)
  • Able to find abnormal concentration of events in
    information networks, intensive attacks in
    intrusion networks, and special communities in
    social networks

Xifeng Yan, et al., Towards Iceberg Analysis in
Graph OLAP in preparation for VLDB Journal, 2011
14
15
PathSim Meta Path-Based Top-K Similarity Search
in Heterogeneous Info. Networks
  • Problem Study similarity betw. the same type of
    objects in heter. infonets
  • Solution
  • Define a meta path-based similarity framework
  • Propose a new measure called PathSim, which is
    able to detect peer objects for the given meta
    path
  • Propose a co-clustering-based efficient online
    search algorithm to support top-k search
  • Results

(UIUCUCSB) Yizhou Sun, Jiawei Han, Xifeng Yan,
Philip S. Yu, Tianyi Wu, "PathSim Meta PathBased
Top-K Similarity Search in Heterogeneous
Information Networks", submitted to VLDB'11
15
16
Graph Cube On Warehousing and OLAP
Multidimensional Networks
  • Multidimensional networks
  • Topological graph structure comprising entities
    and relationships
  • Multidimensional attributes associated with
    entities
  • Graph cube Extend decision support facilities on
    large multidimensional networks
  • A multidimensional network is summarized to a set
    of semantically meaningful and structure-enriched
    aggregate networks in coarser levels of
    granularity within different multidimensional
    spaces
  • Different query models and OLAP solutions
  • Cuboid queries
  • Crossboid queries straddling multiple cuboids

Peixiang Zhao (UIUC), Xiaolei Li (Microsoft),
Dong Xin (Google), and Jiawei Han (UIUC), Graph
Cube On Warehousing and OLAP Multidimensional
Networks, SIGMOD'11 (accepted)
16
17
Subtask 3 QoI Mining Applications
  • Exploring QoI Mining Applications
  • Consider the network is noisy, incomplete,
    unreliable,
  • Explore network mining methodology in various
    applications
  • Polonium Tera-Scale Graph Mining and Inference
    for Malware Detection (SDM11)
  • ValuePick Towards Dual-goal, Value-Oriented
    Recommendations (ICDM10 Workshop on Emerging
    Applications)
  • Reciprocity in Human Communication Networks
    (KDD11 sub)
  • Noise-Aware Mining Collective Classification of
    Information Networks for Web Search (SIGIR11
    sub)
  • Patent Value Estimation and Maintenance
    Recommendation with Patent Information Network
    Model (KDD11 sub)

17
18
Polonium Tera-Scale Graph Mining and Inference
for Malware Detection
malware
good file
binaries
machines
  • 60 terabytes of data anonymously contributed
    by participants of worldwide Norton Community
    Watch program (Symantec)
  • 50 million machines
  • 900 million executable files
  • A file-in-machine bipartite graph (0.2 TB)
  • 1 billion nodes (machines and files)
  • 37 billion edges
  • Contributions Malware detection (bad files),
    Scalability
  • Polonium is a new and effective reputation-based
    malware detection technology adapting the Belief
    Propagation algorithm 87 TPR, at 1 FPR

(CMU) Chau,et al, Polonium Tera-Scale Graph
Mining and Inference for Malware Detection, SDM11
19
ValuePick Towards Dual-goal, Value-Oriented
Recommendations
  • Problem
  • Given a graph with node-attributes (value), a
    query node q
  • Find (1) close-by (high proximity) as well as (2)
    value-able other nodes to recommend to q.

proximity
value
  • Main idea
  • Carefully change (perturb) the
  • order of nodes by proximity
  • s.t. total expected value is
  • maximized.
  • Q How to perturb?
  • How to pick the best k nodes?
  • A Formulation as an
  • optimization problem
  • Makes dual-goal
  • recommendations by
  • integrating value

v253
v162
v261
v327
. . .
. . .
(darker color higher proximity)
(value e.g. centrality)
query node
(CMU) Akoglu et .al., ValuePick ICDM
Workshop on Optimization Based Methods for
Emerging Data Mining Problems10
20
Reciprocity in Human Communication Networks
  • Motivation Reciprocity often treated as a
    global, unweighted quantity.
  • Problem How reciprocal are human relations?
  • Given nST (calls from S(Silent) to T
    (Talkative)) and similary nTS ,
  • Quantify degree of reciprocity between S and T
  • Does reciprocity depend on T, Ss topological
    features e.g. degree similarity?
  • Problem If T calls S nTS times, what can we say
    about how many times S calls T?
  • Approach Model Prob(nST, nTS) with 3PL

nTS
S
T
nST
Pareto
Yule
3PL
Real
nTS
Higher likelihood
nTS
nTS
nTS
nTS
3PL spots anomalous mutual interactions (low
data likelihood points)
(CMU) Akoglu et .al., Reciprocity Submitted to
KDD 11
21
Noise-Aware Mining Collective Classification of
Information Networks for Web Search
  • Extension of our work Ming Ji, et al., Graph
    Regularized Transductive Classification on
    Heterogeneous Information Networks, ECMLPKDD
    2010
  • Collective classification learning from both the
    network structure and the numerical features of
    nodes
  • Links and numerical features complement each
    other, so combining them provides more robust
    results against noise
  • Fully exploit all the information available in
    the network
  • Can predict the labels of new data that are not
    seen in the training phase, as long as they have
    features
  • Methodology unify the feature information into a
    feature graph

The unified network structure
(UIUC) Ming Ji, Jun Yan, Siyu Gu, and Jiawei Han,
"Learning Search Tasks in Queries and Web
Pages via Graph Regularization", SIGIR11
(submission)
22
Patent Value Estimation and Maintenance
Recommendation with Patent Information Network
Model
  • A U.S. granted patent can be held for up to 20
    years however, large maintenance fees need to be
    paid to keep it valid
  • For large companies/organizations, making such
    decision is difficult because too many patents
    need to be investigated
  • Model the patents as a heterogeneous
    time-evolving information network and propose new
    patent quality features and a network-based
    optimization model to rank the patents
  • Experiments on U.S. patent database over millions
    of patents show high accuracy of our approach

(UIUC IBM) Xin Jin, Scott Spangler, Ying Chen,
and Jiawei Han, "Patent Value Estimation and
Maintenance Recommendation with Patent
Information Network Model", KDD'11 (sub)
23
Advancing the State-of-the-Art of Network Science
  • Discovery hidden relationships in noisy,
    incomplete, dynamic and heterogeneous information
    networks
  • Focused on large heterogeneous information
    networks
  • Collections of information objects in diverse
    forms and from diverse resources
  • Developed state-of-the-art algorithmic tools
  • Supporting data cleaning, information trust
    analysis, network modeling and integrated
    information structure discovery
  • Utilizing in-depth data analysis statistical
    modeling approaches over the content and the
    structure of the network
  • Make use of both explicit network structure and
    hidden information structure
  • Advanced our understanding of how to
  • Perform two-stage mining and noise aware-mining
    from heterogeneous information networks when data
    is noisy, volatile, uncertain, and incomplete
  • Exploring various kinds of large-scale, new
    applications

23
23
24
Military Relevance
  • Subtask 1 Two-Stage Mining
  • Military networks are inherently noisy,
    incomplete, unreliable, from multiple (some are
    non-trustable sources)
  • Two-stage mining provides a systematic way to
    derive trustable information from multi-sourced,
    inconsistent networks
  • Subtask 2 Noise-Aware Mining
  • Noise-aware mining is to perform successful
    mining under the condition of existing various
    kinds of noise data
  • Military network likely badly needs such robust
    mining methodologies
  • Subtask 3 QoI Mining Applications
  • Many diverse applications under this QoI mining
    framework are explored
  • Such explorations will help understand how
    diverse military applications may explore
    different genres of networks effectively

24
24
25
Collaborations within NSCTA
T2.4 Network Behavior Based on Trust
I1.1 Context-Aware Data Fusion
T1.2 Large-Scale Info. Network Processing
IRC Data Experiments
Tarek, Lei, Huang
Pirolli
Adali
Leung
I3.1 QoI Mining of Information Networks
E2.2 Tactic Mobility Models
I1.2 QoI Sensor Data Collection Fusion
Tarek, Charu
La Porta
Chawla
Heng Yan, Roth
Yan, Charu
Zen, Tong
I3.2 Modeling and Mining of Text-Rich Information
Networks
E2.3 Co-Evolution of Composite Networks
I2.2 Large-Scale Info. Network Processing
S1.1T1.5
  • Weekly/monthly meetings or teleconfs among
    collaborators, joint research papers, proposals,
    etc.

25
26
Next Six Months and Path Ahead to 2012
  • Continue research on QoI mining of information
    networks
  • Research in three frontiers (1) integrated
    classification and clustering in network mining,
    (2) build up a theory on link/relationship
    analysis in heterogeneous networks, and (3)
    explore military applications
  • Exploration and consolidation of cross-center
    collaborations
  • Work with Nitesh Chawla and Bolek on evaluation
    of mining methods for clustering and
    classification of heterogeneous networks
  • Work with Tom LaPorta on mining
    communication/information networks
  • Next year research planned if funded
  • Effective theory and methods for mining
    heterogeneous networks involving social and
    communication networks
  • Network fusion Integration and modeling in
    multiple heterogeneous networks of multi-genres
  • Data fusion By exploration multi-networks of
    multi-genres exploration of information
    enhancement across multi-networks
  • Application of role discovery, network
    classification, and anomaly detection methods and
    network fusion in military applications

26
27
Research Papers (Accepted/Published) (2011)
  • (UIUC Microsoft Google) Peixiang Zhao,
    Xiaolei Li, Dong Xin, and Jiawei Han, Graph
    Cube On Warehousing and OLAP Multidimensional
    Networks, Proc. of 2011 ACM SIGMOD Int. Conf. on
    Management of Data (SIGMOD'11), Athens, Greece,
    June 2011
  • (UCSB) Z. Guan, J. Wu,  Z. Yun, A. Singh, X. Yan,
    Assessing and Ranking Structural Correlations in
    Graphs, Proc. 2011 Int. Conf. on Management of
    Data (SIGMOD'11), 2011.
  • (CMU) U Kang, Duen Horng Chau, and Christos
    Faloutsos. Mining Large Graphs Algorithms,
    Inference, and Discoveries. IEEE Int. Conf. on
    Data Engineering (ICDE) 2011, Hannover, Germany.
  • (SMU UCSB UIUC) Qiang Qu, Feida Zhu, Xifeng
    Yan, Jiawei Han, Philip S. Yu, and Hongyan Li,
    Efficient Topological OLAP on Information
    Networks", Proc. of 2011 Int. Conf. on Database
    Systems for Advanced Applications (DASFAA'10),
    Hong Kong, Apr. 2011
  • (UIUC IBM) Jing Gao, Wei Fan, Deepak S. Turaga,
    Olivier Verscheure, Xiaoqiao Meng, Lu Su, Jiawei
    Han, "Consensus Extraction from Heterogeneous
    Detectors to Improve Performance over Network
    Traffic Anomaly Detection, Proc. of 2011 IEEE
    INFOCOM Mini-Conf. (INFOCOM-Mini'10), Shanghai,
    China, Apr. 2011.
  • (CMU) Duen Horng Chau, Carey Nachenberg, Jeffrey
    Wilhelm, Adam Wright, Christos Faloutsos,
    Polonium Tera-Scale Graph Mining and Inference
    for Malware Detection.,SIAM Int. Conf. on Data
    Mining (SDM) 2011.
  • (CMU) Duen Horng (Polo) Chau, Aniket Kittur,
    Jason I. Hong, Christos Faloutsos. "Apolo Making
    Senses of Large Network Data by Combining Rich
    User Interaction and Machine Learning", ACM Conf.
    on Human Factors in Computing Systems (CHI 2011)
  • (UIUC) Bolin Ding, Marianne Winslett, Jiawei Han,
    and Zhenhui Li, Differentially Private Data
    Cube Optimizing Noise Source and Consistency,
    Proc. of 2011 ACM SIGMOD Int. Conf. on Management
    of Data (SIGMOD'11), Athens, Greece, June 2011
  • (UIUC) Manish Gupta, Yizhou Sun, and Jiawei Han,
    Trust Analysis with Clustering", Proc. of 2011
    Int. World Wide Web Conf. (WWW'11), Hyderabad,
    India, March 2011

28
Research Papers (Published) (Sept.-Dec. 2010)
  • (CMU) Leman Akoglu and Christos Faloutsos, Event
    Detection in Time Series of Mobile Communication
    Graphs, 27th Army Science Conference, Orlando,
    Florida, Dec. 2010.
  • (CMU) Leman Akoglu and Christos Faloutsos,
    ValuePick Towards a Value-Oriented Dual-Goal
    Recommender Systems, ICDM Workshop on
    Optimization Based Methods for Emerging Data
    Mining Problems , Sydney, Australia, Dec. 2010.
  • (CMU) Pedro Olmo Vaz de Melo, Leman Akoglu,
    Christos Faloutsos, Antonio Loureiro, Surprising
    Patterns for the Call Duration Distribution of
    Mobile Phone Users, ECML PKDD, Barcelona, Spain,
    Sep. 2010.
  • (Kodak UIUC) Jie Yu, Xin Jin, Jiawei Han, Jiebo
    Luo, "Collection-based Sparse Label Propagation
    and Its Application on Social Group Suggestion
    from Photos", ACM Transactions on Intelligent
    Systems and Technology (TIST), 2(2), 2011.
  • (UIUC) Xin Jin, Sangkyum Kim, Jiawei Han,
    Liangliang Cao, and Zhijun Yin, A General
    Framework for Efficient Clustering of Large
    Datasets based on Activity Detection,
    Statistical Analysis and Data Mining, accepted
    Sept. 2010.
  • (UIUC) Heli Sun, Jianbin Huang, Jiawei Han,
    Hongbo Deng, Peixiang Zhao, and Boqin Feng,
    gSkeleton-Clu Density-based Network Clustering
    via Structure-Connected Tree Division or
    Agglomeration, Proc. of 2010 Int. Conf. on Data
    Mining (ICDM'10), Sydney, Australia, Dec. 2010
  • (UTD UIUC) Mohammad Masud, Qing Chen, Latifur
    Khan, Charu Aggarwal, Jing Gao, Jiawei Han, and
    Bhavani Thuraisingham, Addressing
    Concept-Evolution in Concept-Drifting Data
    Streams, Proc. of 2010 Int. Conf. on Data Mining
    (ICDM'10), Sydney, Australia, Dec. 2010.
  • (UIUC) Jianbin Huang, Heli Sun, Jiawei Han,
    Hongbo Deng, Yizhou Sun, and Yaguang Liu,
    SHRINK A Structural Clustering Algorithm for
    Detecting Hierarchical Communities in Networks",
    Proc. 2010 ACM Int. Conf. on Information and
    Knowledge Management (CIKM'10), Toronto, Canada,
    Oct. 2010.
  • (UIUC) Xin Jin, Jiawei Han, Liangliang Cao, Jiebo
    Luo, Bolin Ding, Cindy Xide Lin, Visual Cube and
    On-Line Analytical Processing of Images", Proc.
    2010 ACM Int. Conf. on Information and Knowledge
    Management (CIKM'10), Toronto, Canada, Oct. 2010.
  • (UIUC) Ming Ji, Yizhou Sun, Marina Danilevsky,
    Jiawei Han, and Jing Gao, Graph Regularized
    Transductive Classification on Heterogeneous
    Information Networks", Proc. 2010 European Conf.
    on Machine Learning and Principles and Practice
    of Knowledge Discovery in Databases
    (ECMLPKDD'10), Barcelona, Spain, Sept. 2010
  • (UIUC) Hyung Sul Kim, Sangkyum Kim, Tim Weninger,
    Jiawei Han, and Tarek Abdelzaher, NDPMine
    Efficiently Mining Discriminative Numerical
    Features for Pattern-Based Classification", Proc.
    2010 European Conf. on Machine Learning and
    Principles and Practice of Knowledge Discovery in
    Databases (ECMLPKDD'10), Barcelona, Spain, Sept.
    2010
  • (UTD UIUC) Mohammad M. Masud, Qing Chen, Jing
    Gao, Latifur Khan, Jiawei Han, and Bhavani
    Thuraisingham, Classification and Novel Class
    Detection of Data Streams in a Dynamic Feature
    Space", Proc. 2010 European Conf. on Machine
    Learning and Principles and Practice of Knowledge
    Discovery in Databases (ECMLPKDD'10), Barcelona,
    Spain, Sept. 2010.
  • (UIUC New York State Museum) Zhenhui Li, Bolin
    Ding, Jiawei Han, and Roland Kays, Swarm Mining
    Relaxed Temporal Moving Object Clusters", Proc.
    2010 Int. Conf. on Very Large Data Bases
    (VLDB'10), Singapore, Sept. 2010.
  • (UIUC New York State Museum) Peixiang Zhao and
    Jiawei Han, On Graph Query Optimization in Large
    Networks", Proc. 2010 Int. Conf. on Very Large
    Data Bases (VLDB'10), Singapore, Sept. 2010.

29
Research Papers (Submitted, 2011)
  • (UIUC CUNY) Chi Wang, Jiawei Han, Xiang Li, Qi
    Li, Wen-Pin Lin, Adam Lee, Hao Li, Heng Ji,
    "Uncovering Hierarchical Relationships among
    Linked Objects A Probabilistic Modeling
    Approach", KDD'11 (sub)
  • (UIUC IBM) Jing Gao, Wei Fan, Deepak Turaga,
    Srinivasan Parthasarathy, and Jiawei Han, "A
    Spectral Framework for Detecting Inconsistency
    across Multi-Source Object Relationships", KDD'11
    (sub)
  • (UIUC) Manish Gupta, Jiawei Han, and Yizhou Sun,
    "Cluster-Based Analysis of Information
    Trustworthiness", KDD'11 (sub) (also for Trust as
    collab.)
  • (UIUC) Ming Ji and Jiawei Han, "Ranking-Based
    Classification of Heterogeneous Information
    Networks", KDD'11 (sub)
  • (UIUC IBM) Xin Jin, Scott Spangler, Ying Chen,
    and Jiawei Han, "Patent Value Estimation and
    Maintenance Recommendation with Patent
    Information Network Model", KDD'11 (sub)
  • (UIUC Tsinghua U) Yizhou Sun, Jie Tang, Jiawei
    Han, Cheng Chen, Manish Gupta, "Studying
    Co-Evolution of Multi-Typed Objects in Dynamic
    Heterogeneous Information Networks", KDD'11 (sub)
  • (UIUC IBM) Yizhou Sun, Charu Aggarwal, Jiawei
    Han, "A Framework for Clustering Heterogeneous
    Information Networks with Incomplete Attributes",
    KDD'11 (sub)
  • (UIUC UCSB UIC) PathSim Meta Path-Based Top-K
    Similarity Search in Heterogeneous Information
    Networks, VLDB 11 (sub)
  • (UCSB UIUC) Xifeng Yan, et al., Towards Iceberg
    Analysis in Graph OLAP in preparation for VLDB
    Journal.
  • (UIUC) Ming Ji, Xiaofei He, and Jiawei Han, "A
    Variance Minimization Criterion to Active
    Learning on Graphs", submitted to Int. Conf. on
    Machine Learning (ICML'11), June, 2011
  • (UIUC) Ming Ji, Jun Yan, Siyu Gu, and Jiawei Han,
    "Learning Search Tasks in Queries and Web Pages
    via Graph Regularization", submitted to Int. ACM
    SIGIR Conf. (SIGIR'11), July 2011.
  • (CMU) Leman Akoglu, Pedro Olmo Vaz de Melo,
    Christos Faloutsos, "Reciprocity in Human
    Communication Networks", KDD'11 (sub)

30
Other Technical Contributions
  • (Book UIC UIUC CMU) Philip S. Yu, Jiawei
    Han, and Christos Faloutsos (Editors), LINK
    MINING MODELS, ALGORITHMS AND APPLICATIONS,
    Springer, 2010.
  • (UCSB) Xifeng Yan, Invited talk, Graph Pattern
    Mining and System, Microsoft Research Asia,
    Beijing, Nov. 2010
  • (UIUC) INARC Ph.D. student, Mr. Chi Wang at CS,
    UIUC, has received 2011 Microsoft Research Ph.D.
    Fellowship (A highly competitive award since
    there are only in total 12 Ph.D. Fellowship
    Awardees across all the research fields in the
    U.S. in 2011).  Chi Wang is supervised by Jiawei
    Han at INARC.
  • (UIUC) Ms. Jing Gao, who was partially supported
    by the INARC program, has received IBM Ph.D.
    Fellowship for 2011-2012.  She first received IBM
    Ph.D. Fellowship for the academic year of
    2010-2011.  This is her second-year award.   Jing
    Gao is supervised by Jiawei Han at INARC.
  • (UIUC) Jiawei Han has received Daniel C. Drucker
    Eminent Faculty Award at UIUC
  • Jiawei Han, Towards Integrated Mining of
    Multiple Social and Information Networks 
    (keynote speech) The 2011 Int. Conf. on Advances
    in Social Network Analysis and Mining
    (ASONAM11), July 2011.
  • Jiawei Han, Exploring the Power of Heterogeneous
    Information Networks in Data Mining  (keynote
    speech) The 2011 Int. SIAM Data Mining Conf.
    (SDM11), April 2011.
  • Jiawei Han, Construction and Analysis of
    Web-Based Computer Science Information Networks 
    (keynote speech) The 2011 Int. Conf. on Rough
    Sets, Fuzzy Sets, Data Mining and Granular
    Computing (RSFDGrC'11), June 2011.
  • Latifur Khan, Wei Fan, Jiawei Han, Jing Gao,
    Mohammad Mehedy Masud, Data Stream Mining
    Challenges and Techniques, (tutorial), The 15th
    Pacific-Asia Conference on Knowledge Discovery
    and Data Mining (PAKDD 2011), May 2011
  • Jiawei Han, Web Structure Mining and Information
    Network Analysis An Integrated Approach, invited
    speech at the Third International Workshop on
    Network Theory Web Science Meets Network
    Science, March 2011.
About PowerShow.com