EVENT IDENTIFICATION IN SOCIAL MEDIA - PowerPoint PPT Presentation

About This Presentation
Title:

EVENT IDENTIFICATION IN SOCIAL MEDIA

Description:

EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University – PowerPoint PPT presentation

Number of Views:204
Avg rating:3.0/5.0
Slides: 16
Provided by: Hila152
Category:

less

Transcript and Presenter's Notes

Title: EVENT IDENTIFICATION IN SOCIAL MEDIA


1
EVENT IDENTIFICATION IN SOCIAL MEDIA
  • Hila Becker, Luis Gravano Mor Naaman
  • Columbia University Rutgers University

2
Social Media Sites Host Many Event Documents
  • Event something that occurs at a certain time
    in a certain place Yang et al. 99
  • Popular, widely known eventsPresidential
    Inauguration, Thanksgiving Day Parade
  • Smaller events, without traditional news
    coverageLocal food drive, street fair

Photo-sharing Flickr Video-sharing YouTube
Social networking Facebook
Social media documents for All Points West
festival, Liberty State Park, New Jersey, 8/8/08
3
Identifying Events and Associated Social Media
Documents
  • Applications
  • Event search and browsing
  • Local search
  • General approach group similar documents via
    clusteringEach cluster corresponds to one event
    and its associated social media documents

4
Event Identification Challenges
  • Uneven data quality
  • Missing, short, uninformative text
  • but revealing structured context available
    tags, date/time, geo-coordinates
  • Scalability
  • Dynamic data stream of event information
  • Unknown number of events
  • Necessary for many clustering algorithms
  • Difficult to estimate

5
Clustering Social Media Documents
  • Social media document representation
  • Social media document similarity
  • Social media document clustering
  • Clustering task definition
  • Ensemble algorithm combining multiple clustering
    results
  • Preliminary evaluation

6
Social Media Document Representation
Title
Description
Tags
Date/Time
Location
All-Text
7
Social Media Document Similarity
Title
  • Text tf-idf weights, cosine similarity

Title
Description
A
A
A
B
B
B
Description
  • Time proximity in minutes

Tags
Tags
Date/Time-Keywords
time
Date/Time
Location-Keywords
  • Location geo-coordinate proximity

Date/Time-Proximity
Location
Location-Proximity
All-Text
All-Text
8
Social Media Document Clustering Framework
Social media documents
Document feature representation
Event clusters
9
Clustering Ensemble Algorithm
Ctitle
Ensemble clustering solution
Consensus Function combine ensemble similarities
Wtitle
f(C,W)
Wtags
Ctags
Wtime
Ctime
Learned in a training step
10
Clustering Measuring Quality
  • Homogeneous clusters

?
  • Complete clusters

?
  • Metric Normalized Mutual Information
    (NMI)Shared information between clustering
    solution and ground truth

11
Experimental Setup
  • Data gt270K Flickr photos
  • Event labels from Yahoo!s upcoming event
    database
  • Split into 3 parts for training/validation/testing
  • Clusterers single pass algorithm with centroid
    similarity
  • Weighing scheme Normalized Mutual Information
    (NMI) scores on validation set
  • Consensus function weighted average of
    clusterers binary predictions
  • Final prediction step single pass clustering
    algorithm

12
Preliminary Evaluation Results
  • Individual clusterer performance
  • Highest NMI Tags, All-Text
  • Lowest NMI Description, Title
  • Ensemble performance, compared against all
    individual clusterers
  • Highest overall performance in terms of NMI
  • More homogenous clusters each event is spread
    over fewer clusters

Details in paper
13
Future Work Alternative Choices
  • Document similarity metric
  • Ensemble approach
  • Weight assignment
  • Choice of clusterers
  • Train a classifier to predict document similarity
  • Features correspond to similarity scores
  • All-text, title, tags, time, location, etc.
  • Numeric values in 0,1
  • State-of-the-art classifiers SVM, Logistic
    Regression,

14
Future Work Alternative Choices
  • Final clustering step
  • Apply graph partitioning algorithms
  • Requires estimating the number of clusters
  • Evaluation metrics beyond NMI
  • Datasets
  • Flickr LastFM, YouTube
  • Exploit social network connections

15
Conclusions
  • Identified events and their corresponding social
    media documents
  • Proposed a clustering solution
  • Leveraged different representations of social
    media documents
  • Employed various social media similarity metrics
  • Developed a weighted ensemble clustering approach
  • Reported preliminary results of our event
    identification approach on a large-scale dataset
    of Flickr photographs
Write a Comment
User Comments (0)
About PowerShow.com