The use of Optical Character Recognition (OCR) software in spam filtering - PowerPoint PPT Presentation

About This Presentation
Title:

The use of Optical Character Recognition (OCR) software in spam filtering

Description:

The use of Optical Character Recognition (OCR) software in spam filtering By: Scott Conrad Spam is changing from text only to multimedia enhanced legitimate message ... – PowerPoint PPT presentation

Number of Views:745
Avg rating:3.0/5.0
Slides: 11
Provided by: NCF1
Learn more at: http://www.cs.ucf.edu
Category:

less

Transcript and Presenter's Notes

Title: The use of Optical Character Recognition (OCR) software in spam filtering


1
The use of Optical Character Recognition (OCR)
software in spam filtering
  • By Scott Conrad

2
Spam is changing from text only to multimedia
enhanced
  • legitimate message-senders have added multimedia
    content, particularly images, to text-based
    emails
  • source Using Visual Features for Anti-Spam
    Filtering, 2005

3
Instances of spam/phishing
4
Instances of spam/phishing
source Spam Filtering Based On The Analysis Of
Text Information Embedded Into Images, 2006
5
Optical Character Recognition (OCR)
  • Pattern recognition to interpret pictures as text

source Using Visual Features for Anti-Spam
Filtering, 2005
6
OCR papers
  • Using Visual Features for Anti-Spam Filtering
  • Ching-Tung Wu, Kwang-Ting Cheng, Qiang Zhu, and
    Yi-Leh Wu
  • Spam Filtering Based On The Analysis Of Text
    Information Embedded Into Images
  • by Giorgio Fumera, Ignazio Pillai, and Fabio
    Roli
  • Learning Fast Classifiers for Image Spam
  • by Mark Dredze, Reuven Gevaryahu, and Ari
    Elias-Bachrach
  • Image Analysis for Efficient Categorization of
    Image-based Spam E-mail
  • by Hrishikesh B. Aradhye, Gregory K. Myers, and
    James A. Herson

7
General Methodology
  • Using Visual Features for Anti-Spam Filtering
  • Created a Bayesian spam filter for Thunderbird
  • Ran this filter against a spam archive
  • Added in OCR capabilities
  • Ran the filter against the spam archive again
  • The detection rate rose from 47.7 to 84.6

8
Counter measures to OCR
  • Image Spam Filtering by Content Obscuring
    Detection
  • Battista Biggio, Giorgio Fumera, Ignazio Pillai,
    and Fabio Roli
  • Filtering Image Spam with Near-Duplicate
    Detection
  • Zhe Wang, William Josephson, Qin Lv, Moses
    Charikar, and Kai Li

9
Images from paper 2
10
Project Goals
  • Research different multimedia-based spam filters
    and any counter measures that spammers have
    created to use against these filters
  • Attempt to recreate one of the spam filters to
    verify the results
Write a Comment
User Comments (0)
About PowerShow.com