Thesis Proposal: Prediction of popular social annotations - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Thesis Proposal: Prediction of popular social annotations

Description:

Thesis Proposal: Prediction of popular social annotations Abon – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 25
Provided by: edut1258
Category:

less

Transcript and Presenter's Notes

Title: Thesis Proposal: Prediction of popular social annotations


1
Thesis ProposalPrediction of popular social
annotations
  • Abon

2
Outline
  • Background
  • Related Work
  • Problem Definition
  • Possible Solution
  • Experiment Plan
  • Evaluation Plan

3
Background
  • Prevalence of social web services e.g.

MY WEBSITE
WHAT DO THEY HAVE IN COMMON
TAGS User Generated Content
4
BackgroundTAGs are for ?
  • According to del.icio.us founder
  • Tags are one-word descriptors that you can assign
    to your bookmarks on del.icio.us to help you
    organize and remember them. Tags are a little bit
    like keywords, but they're chosen by you, and
    they do not form a hierarchy. You can assign as
    many tags to a bookmark as you like and rename or
    delete the tags later. So, tagging can be a lot
    easier and more flexible than fitting your
    information into preconceived categories or
    folders.

Blah blah blah..
5
BackgroundTAGs are for ?
  • According to del.icio.us founder
  • Tags are one-word descriptors that you can assign
    to your bookmarks on del.icio.us to help you
    organize and to remember them. Tags are a little
    bit like keywords, but they're chosen by you, and
    they do not form a hierarchy. You can assign as
    many tags to a bookmark as you like and rename or
    delete the tags later. So, tagging can be a lot
    easier and more flexible than fitting your
    information into preconceived categories or
    folders.

6
BackgroundAn usage example
7
Why TAGs are useful
  • In Information Retrieval field, it is a common
  • technique to expand query to get more related
    data.
  • Tags are like human-expanded index term.

8
(No Transcript)
9
Query expansion here
10
Why TAGs are useful
  • Traditional term expansion scheme relies on
    term-document relations. And each tags
    importance to a document is often determined by
    tf-idf.
  • For each tag user applies, it is like voting for
    what tag should be with some document. Thus the
    term-document relations could be measured by tag
    applications.

11
Why TAGs are useful
  • Tags are human-expanded query set which enables
    more complete concept mapping.
  • With more and more people applying tags,
  • the popularity of tags reach a stable pattern.
  • and top tags could be used as weighting
    parameters for search optimization

12
Related Work
  • Usage patterns of collaborative tagging systems
    J. Inf. Sci., Vol. 32, No. 2. (April 2006), pp.
    198-208.by Golder SA, Huberman BA .
  • 100 users , stable pattern appear
  • Urn model

13
Stable pattern top 7 tags remain for one year
14
Related Work
  • Collaborative Tagging and Semiotic Dynamics
  • Cattuto C,LoretoV, Pietronero L.
  • Long-term memory version of the classic
    YuleSimon process
  • Memory model based on cognitive model

15
YuleSimon process
Qt (x) a(t)/(x t). a(t) is a normalizing
factor tis memory parameter
16
Related work
  • The Complex Dynamics of Collaborative Tagging,'
  • H.Halpin,V.Robu,H.Shepherd in Proceedings of
    WWW 2007

17
Empirical Results for Power Law Regression for
Popular Sites
18
P(x) tag probability distribution at each time
point
Q(x) The final tag probability distribution
19
Problem definition
  • In initial stage, each url is not sufficiently
    annotated by people. Thus, it is hard to be
    retrieved at this time.
  • For an immature url, predicting future popular
    tags could provide better retrieval experience.
  • Mature url Borrowed from Halpin s empirical
    results for tag dynamics. They are defined as
  • urls with 3 more years of history on
    del.icio.us

20
Expanding tag set
  • Ti The tag set applied by the ith user for
    an url.
  • ETi The expanded tag set after the ith user.
  • T0 The tag set suggested by tf-idf term
    extraction. STiT0
  • ETiETi-1?relevantn(Ti)
  • relevantn(Ti)The n tags with top mutual
    information to each tag in Ti
  • Mutual information f(ti,tj)/f(ti)f(tj)

21
Cohesivity
  • Each tag in ETi has a score which indicates its
    cohesivity to ETi
  • cohesivity of tj to ETi Sf(tk,tj)/f(tj)f(tk)
  • tk belongs
    toETi

22
Pruning ETi
  1. Sort tags in ETi by popularity , take top 7 as
    suggesting tag set STi
  2. Sort tags in ETi by popularitycohesivity ,
    take top 7 as suggesting tag set STi

23
Experiment Plan
  • Dataset from del.icio.us rss api Mar 28April
    19, 30000 of url, 234982 of tagging, 8392 of
    users
  • 1.del.icio.us/rss/popular every 30min
  • del.icio.us/rss/recent every 2 min
  • 2.del.icio.us/rss/url?url xxx.com
  • Suggesting tags from no user to the 10th user.

24
Evaluation Plan
  • For each url, we have mature tags and suggested
    tags at each iteration.
  • Recall rate and precision rate could be
    calculated .

Pruning with cohesivity
with without
with 4. 2.
without 3. 1.Baseline
Expanding with relevant tags
Write a Comment
User Comments (0)
About PowerShow.com