Fighting Spam: An Innovative Enhancement to Outlook Express - PowerPoint PPT Presentation

Loading...

PPT – Fighting Spam: An Innovative Enhancement to Outlook Express PowerPoint presentation | free to download - id: 36c23-Yzg1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Fighting Spam: An Innovative Enhancement to Outlook Express

Description:

... Innovative Enhancement to Outlook Express. Zhengxiang Pan & Yuanbo ... Target: Outlook Express. Current anti-spam functionalities in OE: Blocked senders list ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 14
Provided by: zpan7
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Fighting Spam: An Innovative Enhancement to Outlook Express


1
Fighting Spam An Innovative Enhancement to
Outlook Express
  • Zhengxiang Pan
  • Yuanbo Guo

2
Target Outlook Express
  • Current anti-spam functionalities in OE
  • Blocked senders list
  • Mail rules
  • Limitations Limited Rule-based filter
  • Difficulties in generate rules
  • Lack of flexibility
  • Not adaptive spam mutate!
  • Free -gt F r e e -gt Free

3
What did we design?
  • An Intelligent Spam Identification Component
    (ISIC) that use IDSS techniques, specifically
    CBR.
  • Absorb ideas from rule-based and statistical
    filter
  • Featuring dynamical attributes selection and
    heuristic-guided case base maintenance

4
Case Representation
  • Attribute-Value Pairs
  • possible values Yes and No
  • Two sets of attributes
  • 51 predefined attributes
  • about specific properties of an email
  • selected from http//www.spamassassin.org
  • 100 dynamically determined attributes
  • About word occurrences in the email

5
Predefined Attributes - Examples
6
Dynamically Determined Attributes
  • Attribute Selection
  • Use Odd-Ratio as the indicator of the predicative
    power of a word for the categories (spam,
    non-spam) and rank them
  • Select the top 50 words from each vocabulary of
    spam emails and non-spam emails as the attributes

lots of details in the paper
7
An Example Case
  • Case 1
  • (predefined attributes)
  • CHARSET_FARAWAY No
  • TO_EMPTY Yes
  • FROM_AND_TO_SAME Yes
  • LOTS_OF_CC_LINE Yes
  • MISSING_HEADERS Yes
  • (dynamically selected attributes)
  • Free Yes
  • Guaranteed Yes
  • Debt Yes
  • Hello No
  • (solution)
  • Spam Yes

8
Similarity Measurement
  • Simple Matching Coefficiency (SMC) based on
    Hamming Distance

SIMH (P, C) ?i1..NEQ(Xi, Yi) / N EQ(Xi,
Yi) 1 if Xi Yi 0
otherwise.
9
Case Retrieval
  • K-Nearest Neighbor like algorithm
  • For a new email P, calculate its similarity SIMH
    to each case in the case base, and pick out the
    top K cases with the largest SIMH values.
  • If the majority of those chosen cases are labeled
    as spam, the new email will be classified as spam
    too otherwise non-spam
  • e.g. K 5

10
Case Base Maintenance
  • Initially spam and non-spam base each has 200
    cases
  • When case base size reaches 300
  • restore the case base size back using a mechanism
    which removes those cases that are
  • Old (to keep the freshness of cases so that they
    reflect the trend)
  • Close to Center Case (in an attempt to boost
    the variety of cases)
  • Introduced a new concept Center Case. Defined
    in the paper.
  • Redo attribute selection based on current cases

11
Architecture
12
Use enhanced Outlook Express
  • Same UI as OE

13
Conclusion
  • Highlights
  • Localized easy to construct
  • Personalized
  • Easy to use
  • Adaptive
  • Limitations
  • Initial cases limit personalization
  • Not for standalone use on top of current OE
About PowerShow.com