A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs - PowerPoint PPT Presentation

Loading...

PPT – A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs PowerPoint presentation | free to download - id: 2f57d-NjVkY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs

Description:

Release of iPod Nano. Strength. Time. Unite States. China. UK. Discussion about ... What are the features of iPod that people in US like versus in China? ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 23
Provided by: qia59
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs


1
A Probabilistic Approach to Spatiotemporal Theme
Pattern Mining on Weblogs
  • Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang
    Zhai
  • University of Illinois at Urbana- Champaign
  • Vanderbilt University

2
Motivation
  • Text collections carry rich context information,
    time and location among the most important ones
  • Weblogs, News articles, customer reviews, etc.
  • Different weblogs focus on different aspects of
    the same event, which are highly dependent to the
    time and location that the authors wrote them.
  • In this work, we study on weblogs
  • The modeling of mixture of subtopics (themes)
  • The spatiotemporal content analysis of themes

3
Existing Work on Weblog Analysis
  • Interlinking and Community Analysis
  • Identifying communities
  • Monitoring the evolution and bursting of
    communities

of nodes in communities
of communities
Kumar et al. 2003
  • Content Analysis
  • Blog level topic analysis
  • Information Diffusion through blogspace
  • Use topic bursting to predict sales spikes

Blog mentions
Sales rank
Gruhl et al. 2005
4
Characteristics of Weblogs
  • Content is highly personal and evolving fast
  • Usually rich of public opinions
  • A mixture of topics
  • Fast response to events
  • Associated with time and location information
  • Interlinking and community structure

Location
Weblogs
Time
Personal Opinions
5
Spatiotemporal theme patterns
Unite States
Discussion about the Release of iPod Nano
Strength
China
UK
Themes
Locations
Time
Theme life cycles
09/20/05 09/26/05
Discussion about Government Response in Hurricane
Katrina
Themes
Theme snapshot
6
Applications
  • Answer questions like
  • What are the features of iPod that people in US
    like versus in China? How strong do people in
    Illinois concern about gas price during Hurricane
    Katrina?
  • Potentially useful for
  • Search result summarization
  • Public opinion monitoring
  • Web content analysis
  • Business Intelligence

7
Problem Definition
  • Spatiotemporal theme patterns
  • Theme life cycle
  • Strength of a theme over a given time span given
    a location
  • Formally, p(t?, l)
  • Theme snapshot
  • Distribution of themes over a set of locations
    given a time
  • Formally, p(?, l t)
  • Spatiotemporal theme pattern (STTP) discovery
  • Automatically extract a set of major themes
  • Compute theme life cycles and theme snapshots

8
Challenging Questions
  • How to represent a theme?
  • How to model the themes in a collection?
  • How to model their dependency on time and
    location?
  • How to compute the theme life cycles and theme
    snapshots?

9
Our Approach
  • Probabilistic Spatiotemporal Theme Analysis
  • Each theme is represented as a multinomial
    distribution over the vocabulary (language model)
  • Consider the collection as generated from a
    mixture of these theme models
  • Introduce a spatiotemporal theme model to explain
    this generation process
  • Spatiotemporal theme patterns can then be
    computed from the estimated model parameters
    analytically

10
The Generating Process
Spatiotemporal contextTime t Location l
P(?jt, l)
price 0.3 oil 0.2..
Theme ?1
...
donate 0.1relief 0.05help 0.02 ..
Theme ?2
?TL

city 0.2new 0.1orleans 0.05 ..
1 - ?TL
1 - ?B
Theme ?k
W
Is 0.05the 0.04a 0.03 ..
Background B
?B
?d, j P(?jd)
Spatiotemporal independent topic coverage
11
The Spatiotemporal Theme Model
  • General mixture model
  • Log likelihood after decomposition
  • Computing theme life cycles
  • Computing theme snapshots

12
Parameter Estimation
  • P(wB) estimated by the collection word
    probability
  • Use EM algorithm to estimate the model parameters

E Step
M Step
13
Generality of the Model
The spatiotemporal theme model
14
Experiments and Results
  • Three data sets of weblogs, each is about one
    event (broad topic)
  • Extract location information from author profiles
  • On each data set, we extract a set of salient
    themes and their life cycles / theme snapshots

15
An Example of Weblog Article
Blog Contents
16
Theme life cycles for Hurricane Katrina
Oil Price
price 0.0772oil 0.0643gas 0.0454 increase
0.0210product 0.0203 fuel 0.0188 company
0.0182
New Orleans
city 0.0634orleans 0.0541new
0.0342louisiana 0.0235flood 0.0227 evacuate
0.0211 storm 0.0177
17
Theme Snapshots for Hurricane Katrina
18
Theme life cycles for Hurricane Rita
Hurricane Katrina Government Response
Hurricane Rita Government Response
Hurricane Rita Storms
A theme in Hurricane Katrina is inspired again by
Hurricane Rita
19
Theme Snapshots for Hurricane Rita
Both Hurricane Katrina and Hurricane Rita have
the theme Oil Price
The spatiotemporal patterns of this theme at the
same time period are similar
20
Theme life cycles for iPod Nano
United States
China
Release of Nano
ipod 0.2875nano 0.1646apple 0.0813 september
0.0510mini 0.0442 screen 0.0242 new 0.0200
Canada
United Kingdom
21
Summary and Future Work
  • We defined a new problem of spatiotemporal text
    mining, which is to discover spatiotemporal theme
    patterns
  • We proposed a mixture model to model the themes
    with dependency on time and location information,
    and designed methods to compute theme life cycles
    and theme snapshots
  • We applied this technique on Weblog mining, and
    the experiments show that the model is effective
    to discover spatiotemporal theme patterns
  • Future work
  • Consider adjacency between time stamps and
    locations
  • Study the granularity selection in spatiotemporal
    text mining
  • Model content evolution over locations

22
  • Thanks!
About PowerShow.com