Communities in Social Media An eyepiece into User Intentions and Context Akshay Java eBiquity Resear - PowerPoint PPT Presentation

Loading...

PPT – Communities in Social Media An eyepiece into User Intentions and Context Akshay Java eBiquity Resear PowerPoint presentation | free to download - id: 2aa52-NzZlN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Communities in Social Media An eyepiece into User Intentions and Context Akshay Java eBiquity Resear

Description:

Twitter Network. Facebook Network. What is a Community. Existing Approaches. Clustering Approach ... is our collective wisdom. Twitter. is our collective ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 67
Provided by: aksha3
Learn more at: http://ebiquity.umbc.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Communities in Social Media An eyepiece into User Intentions and Context Akshay Java eBiquity Resear


1

Mining Social Media Communities and
Content Akshay Java Ph.D. Dissertation
Defense October 16th 2008
2
Thesis Statement
  • It is possible to develop effective algorithms
    to detect Web-scale communities using their
    inherent properties, structure, content.

3
Key Observations
  • Understanding communication in social media
    requires identifying and modeling communities
  • Communities are a result of collective, social
    interactions and usage.

4
Contributions
  • Developed and evaluated innovative approaches for
    community detection
  • A new algorithm for finding communities in social
    datasets
  • SimCUT, a novel algorithm for combining
    structural and semantic information
  • First to comprehensively analyze two important,
    new social media forms
  • Feed Readership
  • Microblogging Usage and Communities
  • Built systems, infrastructure and datasets for
    the social media research community

5
Outline
  • Introduction
  • Detecting Communities in Social Media
  • Combining Semantic Information
  • Case Studies
  • Feed Usage and Distillation
  • Microblogging Communities
  • Future Work
  • Conclusions

6
Outline
  • Introduction
  • Detecting Communities in Social Media
  • Combining Semantic Information
  • Case Studies
  • Feed Usage and Distillation
  • Microblogging Communities
  • Future Work
  • Conclusions

7
Social Media Describes the online technologies
and practices that people use to share
opinions, insights, experiences, and
perspectives and engage with each other. UGC
Social Network Wikipedia
8
What you Think blogs Say Podcasts See
Flickr, YouTube Hear Pandora, Last.fm Do
Twitter,Jaiku, Pownce
Its about YOU!
9
Who are our... Friends Facebook Colleagues
LinkedIn Virtual Avatars secondlife
Also about US
10
What we share Knowledge Wikipedia Links
del.icio.us, StumbleUpon Love/Hate yelp,
Upcoming Location FireEagle, BrightKite Spaces
Ustream, Qik
How We Share
11

Communities
  • Social interactions
  • build communities
  • Shared Interests
  • Common Beliefs
  • Events
  • Organization/Location

12
Outline
  • Introduction
  • Detecting Communities in Social Media
  • Combining Semantic Information
  • Case Studies
  • Feed Usage and Distillation
  • Microblogging Communities
  • Future Work
  • Conclusions

13
What is a Community
Political Blogs
  • A community in real world is represented in a
    graph as a set of nodes that have more links
    within the set than outside it.
  • Graph
  • Citation Network
  • Affiliation Network
  • Sentiment Information
  • Shared Resource (tags, videos..)

Twitter Network
Facebook Network
14
Existing Approaches
  • Clustering Approach
  • Agglomerative/Hierarchical
  • Incrementally, group similar nodes to form
    clusters

Communities in Football League (Hierarchical
Clustering)
Football Teams
15
Existing Approaches
  • Clustering Approach
  • Agglomerative/Hierarchical
  • Topological Overlap Similarity is measured in
    terms of number of nodes that both i and j link
    to. (Razvasz et al.)

16
Existing Approaches
  • Clustering Approach
  • Agglomerative/Hierarchical
  • Divisive/Partition based (Girvan Newman)
  • Normalized Cut (NCut) (Shi, Malik)

Political Books
17
Existing Approaches
Graph Laplacian
  • The graph is partitioned using the eigenspectrum
    of the Laplacian. (Shi and Malik)
  • The second smallest eigenvector of the graph
    Laplacian is the Fiedler vector.
  • The graph can be recursively partitioned using
    the sign of the values in its Fielder vector.

Normalized Cuts
Cost of edges deleted to disconnect the graph
Total cost of all edges that start from B
18
Existing Approaches
  • Modularity Score (Newman et al.)
  • Measure of quality of clustering
  • eii fraction of intra-community edges
  • ai expected value of eii disregarding
    communities
  • Q 0 Communities are random
  • Q 0 Higher values are better
  • Optimizing modularity is NP-Hard
  • Spectral Methods
  • Heuristics

(Brandes et al.)
19
Limitations
  • Existing methods
  • Do not scale well for Web graphs
  • Fail to exploit the underlying graphs
    distributions
  • Unable to use available meta-data and semantic
    features.

20
Thesis Statement
  • It is possible to develop effective algorithms
    to detect Web-scale communities using their
    inherent properties, structure, content.

21
Outline
  • Introduction
  • Detecting Communities in Social Media
  • Combining Semantic Information
  • Case Studies
  • Feed Usage and Distillation
  • Microblogging Communities
  • Future Work
  • Conclusions

22
Special Properties of Social Datasets
  • The Long Tail
  • 80/20 Rule or Pareto distribution
  • Few blogs get most attention/links
  • Most are sparsely connected
  • Motivation
  • Web graphs are large, but sparse
  • Expensive to compute community structure over the
    entire graph
  • Goal
  • Approximate the membership of the nodes using
    only a small portion of the entire graph.

23
Special Properties of Social Datasets
  • Intuition
  • communities are defined by the core (A) and the
    membership of the rest of the network (B) can be
    approximated by how they link to the core.
  • Direct Method
  • NCut (Baseline)
  • Approximation
  • Singular Value Decomposition (SVD)
  • Sampling
  • Heuristic

24
Approximating Communities
ICWSM 08
Nodes ordered by degree
  • SVD (low rank)
  • Sampling based Approach
  • Communities can be extracted by sampling only
    columns from the head (Drineas et al.)
  • Heuristic Cluster head to find initial
    communities. Assign cluster that the tail nodes
    most frequently link to.

r
25
Approximating Communities
ICWSM 08
  • Dataset A blog dataset of 6000 blogs.

Original Adjacency
Heuristic Approximation
Modularity 0.51
26
Approximating Communities
ICWSM 08
Similar Modularity
Lower Time
More Time
Low Modularity
  • Advantage Faster detection using small portion of
    the graph, less Memory.
  • SVD O(n3), Ncut O(nk), Sampling O(r3), Heuristic
    O(rk)
  • n number of blogs, k number of clusters, r
    number of columns

27
Approximating Communities
ICWSM 08
  • Blog Dataset
  • Social network datasets

Additional evaluations using Variation of
Information score
28
Outline
  • Introduction
  • Detecting Communities in Social Media
  • Combining Semantic Information
  • Case Studies
  • Feed Usage and Distillation
  • Microblogging Communities
  • Future Work
  • Conclusions

29
  • Tags are free meta-data!
  • Other semantic features
  • Sentiments
  • Named Entities
  • Readership information
  • Geolocation information
  • etc.
  • How to combine this for detecting communities?

30
Social Media Graphs
Links Between Nodes and Tags
Links Between Nodes
Simultaneous Cuts
31
Communities in Social Media
A community in the real world is identified in a
graph as a set of nodes that have more links
within the set than outside it and share similar
tags.
32
SimCUT Simultaneously Clustering Tags and Graphs
WebKDD 08
Nodes
Tags
Tags
Tags
Nodes
Nodes
Tags
Nodes
Fiedler Vector Polarity
ß 0 Entirely ignore link information ß 1 Equal
importance to blog-blog and blog-tag, ß 1 NCut
33
SimCUT Simultaneously Clustering Tags and Graphs
WebKDD 08
Clustering Only Links
Clustering Links Tags
ß 0 Entirely ignore link information ß 1 Equal
importance to blog-blog and blog-tag, ß 1 NCut
34
Datasets
  • Citeseer (Getoor et al.)
  • Agents, AI, DB, HCI, IR, ML
  • Words used in place of tags
  • Blog data
  • derived from the WWE/Buzzmetrics dataset
  • Tags associated with Blogs derived from
    del.icio.us
  • For dimensionality reduction 100 topics derived
    from blog homepages using LDA (Latent Dirichilet
    Allocation)
  • Pairwise similarity computed
  • RBF Kernel for Citeseer
  • Cosine for blogs

35
Clustering Tags and Graphs
Clustering Only Links
Clustering Links Tags
36
Clustering Tags and Graphs
Accuracy 36
Accuracy 62
Higher accuracy by adding tag information
37
Varying Scaling Parameter ß
ß 1
ß1
ß0
Accuracy 36
Accuracy 62
Accuracy 39
Only Graph
Only Tags
Graphs Tags
Higher accuracy by adding tag
information Simple Kmeans 23 Content only,
binary Content only 52 (Getoor et al. 2004)
38
Effect of Number of Tags, Clusters
  • Mutual Information
  • Measures the dependence between two random
    variables.
  • Compares results with ground truth

Link only has lower MI
More Semantics helps
Citeseer
Similar results for real, blog datasets
39
Outline
  • Introduction
  • Detecting Communities in Social Media
  • Combining Semantic Information
  • Case Studies
  • Feed Usage and Distillation
  • Microblogging Communities
  • Future Work
  • Conclusions

40
  • Tags are one type of meta-data!
  • Other semantic information
  • Sentiments
  • Named Entities
  • Readership information
  • Geolocation information
  • etc.
  • How do we get additional semantics?

41
Additional Semantics
  • BlogVox
  • Sentiments and Opinions
  • SemNews
  • Named Entities, beliefs, facts
  • Link Polarity
  • Sentiment from anchor text
  • Readership
  • Feed subscriptions and usage

(TREC 06, IJCAI/AND 07)
(AAAI SS 05, HICS 06, IJSWIS)
(ICWSM 07)
(ICWSM 07)
42
Outline
  • Introduction
  • Detecting Communities in Social Media
  • Combining Semantic Information
  • Case Studies
  • Feed Usage and Distillation
  • Microblogging Communities
  • Future Work
  • Conclusions

43
Key Observations
  • Understanding communication in social media
    requires identifying and modeling communities
  • Communities are a result of collective, social
    interactions and usage.

44
Feeds Readership
http//ftm.umbc.edu
ICWSM 07
Folders
Use folder label as topics/tags. Group similar
folders together. Rank Feeds under a topic
45
Feed Subscription Statistics
ICWSM 07
  • 83K publicly listed subscribers
  • 2.8M feeds, 500K are unique
  • 26K users (35) use folders to organize
    subscriptions
  • Data collected in May 2006

Although there may be 50M Blogs, only a small
fraction get continued user attention in the form
of subscriptions
46
Feeds That Matter
http//ftm.umbc.edu
ICWSM 07
  • Communities from Feed Subscriptions
  • A Common vocabulary emerges from folder names
  • Folder names are used as topics. Lower ranked
    folder are merged into a higher ranked folder if
    there is an overlap and a high cosine similarity

Folder Usage
of Users Using a Folder
Rank of a Folder (By number of Feeds in it)
47
Tag Cloud After Merging
Folder names are used as topics. Lower ranked
folder are merged into a higher ranked folder if
there is an overlap and a high cosine similarity.
48
Feed Recommendations
http//ftm.umbc.edu
ICWSM 07
  • Two feeds are similar if they are categorized
    under similar folders

If you like X you will like..
  • Feed Distillation for Politics
  • Merged folders political, political blogs
  • Talking Points Memo by Joshua Micah Marshal
  • Daily Kos State of the Nation
  • Eschaton
  • The Washington Monthly
  • Wonkette, Politics for People with Dirty Minds
  • http//instapundit.com/
  • Informed Comment
  • Power Line
  • AMERICAblog Because a great nation deserves the
    truth
  • Crooks and Liars

Tech
Knitting
49
Outline
  • Introduction
  • Detecting Communities in Social Media
  • Combining Semantic Information
  • Case Studies
  • Feed Usage and Distillation
  • Microblogging Communities
  • Future Work
  • Conclusions

50
Wikipedia is our collective wisdom Twitter is
our collective consciousness
51
Microblogging
SNAKDD 07
Current Status
Twitter post
Friends
Easily share status messages
52
Twitterment
http//twitterment.umbc.edu
  • First twitter search engine
  • Uses Lucene to index public timeline
  • Provides search and analytics
  • Built a social network of users
  • 1.3 M Tweets
  • 83 K Users
  • Two months of data

53
Microblogging Trend Analytics
Search and Trend analytics on Microblogs
http//twitterment.umbc.edu
lunch
dinner
work
coffee
54
Microblogging Communities
SNAKDD 07
Gaming Community
  • Clique Percolation Method (CPM)
  • Two nodes belong to the same community if they
    can be connected through adjacent k-cliques.
    (Palla et al.)

Finds overlapping communities
A Community is a union of all k-clique subgraphs
3 Clique
55
INFORMATION HUB
Information Source Communities connected via
Robert Scoble, an A-list blogger
56
INFORMATION BRIDGE
Information Source, Information Seeker Different
roles in different communities
57
STAR NETWORKS / SMALL CLIQUES
Friendship-relation Small groups among
friends/co-workers
58
Outline
  • Introduction
  • Detecting Communities in Social Media
  • Combining Semantic Information
  • Case Studies
  • Feed Usage and Distillation
  • Microblogging Communities
  • Future Work
  • Conclusions

59
Thesis Statement
  • It is possible to develop effective algorithms
    to detect Web-scale communities using their
    inherent properties, structure, content.
  • Observations
  • Understanding communication in social media
    requires identifying and modeling communities
  • Communities are a result of collective, social
    interactions and usage.

60
Future Work
  • Social media content is challenging, much
    improvements are needed in textual analysis,
    sentiment detection, named entity detection and
    language understanding in such systems.
  • Temporal analysis of community structures
  • Feed distillation and ranking in blog search
  • Index quality vs. index freshness
  • User intention and personalization

61
Outline
  • Introduction
  • Detecting Communities in Social Media
  • Combining Semantic Information
  • Case Studies
  • Feed Usage and Distillation
  • Microblogging Communities
  • Future Work
  • Conclusions

62
Conclusions
  • Demonstrated a fast, community detection
    algorithm well suited for social datasets.
  • Implemented SimCut, a technique that outperforms
    simple graph based approaches for community
    detection.
  • Evaluated and tested proposed algorithms on real
    social media datasets and benchmark datasets.
  • Conducted the first comprehensive study of feed
    readership and microblogging usage.
  • Built systems, infrastructure and datasets for
    the social media research community.

63
Conclusions
  • We have presented a framework for analyzing
    social media content and structure making use of
    certain special properties and features in such
    systems.
  • We study Social Web from a user perspective and
    analyze not just how people are using these
    systems but also why?
  • Social Media is connecting people and building
    communities by bridging the gap between content
    production and consumption.

64
Thanks!
65
The Future.
  • Location
  • Social, mobile applications
  • Geographically relevant, query(less) search
  • Social Advertising and Personalization
  • Role of influence and communities in advertising
  • Real-Time, Social Information Streams
  • Event detection/ Breaking News
  • How effective is the advertising?
  • Social Web to solve challenging AI problems
  • Just as tagging has helped image search
  • Availability of social tools and Wikipedia
    provide opportunities to work on difficult AI
    problems like disambiguation and common sense
    reasoning.

66
http//ebiquity.umbc.edu http//socialmedia.
typepad.com
About PowerShow.com