Blog Spam Panel AIRWeb 2006 - PowerPoint PPT Presentation

1 / 7
About This Presentation
Title:

Blog Spam Panel AIRWeb 2006

Description:

Content-based techniques are effective at finding some portion of splog pages ... adopting a dog from an animal shelter: The selection of available canine ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 8
Provided by: dennisf3
Category:
Tags: airweb | animal | blog | panel | shelter | spam

less

Transcript and Presenter's Notes

Title: Blog Spam Panel AIRWeb 2006


1
Blog Spam PanelAIRWeb 2006
  • Dennis Fetterly
  • Microsoft Research, Silicon Valley

2
Scope of the problem
  • Random sample of 2694 blog pages judged by MSN
  • Low cost of creation
  • 39 of spam blog pages were from 4 popular blog
    hosting sites

3
Blog Spam
  • Content-based techniques are effective at finding
    some portion of splog pages
  • Identification of re-purposed content is an open
    problem
  • Whole document
  • Just sentences or phrases
  • Several software packages will automate this

4
Detectable example
5
Harder example
  • 234 results for phrase query
  • "There's one problem with adopting a dog from an
    animal shelter The selection of available canine
    companions can overwhelm you"
  • Some are legitimate
  • Others arent
  • Other posts in these blogs create optimal link
    structure for PageRank as identified by Bianchini
    et al.
  • Also significant overlap with other pages
  • 12 minor differences in first 776 words
  • Missing final 617 words

6
Things to Consider
  • Once you have identified re-purposed content,
    what do you do

7
  • Thank You!
  • http//research.microsoft.com/research/sv/web-grou
    p/
Write a Comment
User Comments (0)
About PowerShow.com