A False Positive Safe Neural Network for Spam Detection

About This Presentation

Title:

A False Positive Safe Neural Network for Spam Detection

Description:

A False Positive Safe Neural Network for Spam Detection. Alexandru Catalin Cosoi ... ART False Positives Spam = OMG!!! (ART ) = Heuristic Filter ARTMAP ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 17

Provided by: nsi12

Learn more at: http://projects.csail.mit.edu

more less

Transcript and Presenter's Notes

Title: A False Positive Safe Neural Network for Spam Detection

1

A False Positive Safe Neural Network for Spam
Detection

Alexandru Catalin Cosoi acosoi_at_bitdefender.com
2
Does this look familiar?
3
Anatrim
4
Oh boy, its getting worst!!!
5
Oh boy, its getting worst!!!
6
Bad Bad Spammer!!!

Databases
D Random legitimate text
D1 Different rephrases of a certain spam phrase
D2 Different rephrases of another spam phrase
Dn Different rephrases of another spam phrase
Create spam message script
Choose a random phrase from D1
Choose random text from D
Choose a random phrase from D2
Choose random text from D
.
Chose random phrase from Dn
Send message.

Appeared as a consequence of botnets

40 samples of different subjects
50 samples of different titles
30 samples of different titles (part II)
60000 different combinations

7
Features

Larger time frame KeyWord!!!!
Weak features
Words like Anatrim, Viagra, Xanax, Stock
Simple word combinations like Stock alert,
Strong buy
Simple Header Heuristics (for both spam and ham)
like valid reply, weird message id, forged
headers
Example
Top 500 spammy words from a Bayesian dictionary
Some simple header heuristics from spamassasins
SARE Ninjas
Trainers personal flavour

8
Why ART?

Training occurs by modifying the weights of each
neuron
For large amounts of data, forgetting important
details might actually happen
Solves the stability-plasticity dilemma
Based on template detection
Unlimited number of templates involves unlimited
number of patterns
2 self organizing neural networks a mapping
module supervised organizing neural network

9
Adaptive Resonance Theory

Similar to a cluster algorithm (as many clusters
as needed)
ARTMAP ARTa ARTb MapField

10
ART Vigilance

A big value Accepts small errors Many small
clusters High precision
A small value Accepts high errors A few big
clusters Errors can appear

11
ART
12
Algorithm
13
Corpus

2.5 million spam messages (sampled on waves with
a high degree of variation) and around 1000
simple low relevance text heuristics (not
counting the standard header heuristics).
The first 1000 words (ordered by discrimination,
but with a minimum of 10-30 hundred occurrences)
from a bayesian dictionary trained on this
corpus, and also standard header heuristics.
Almost 1 million legitimate email messages
75 of the message corpus were used for training
the neural network and,
25 were used in testing the neural network.
1.5 days to train!!!!

14
Results

FP 1 0.0001
FN 4 20
On some corpuses (TREC 2006) we had not so
great results (but current heuristics)
FN 35 (?)
FP 2 email messages! (?)
At least, just a few false positives!

15
Conclusions

ART Simple Features Spam Love
ART False Positives Spam OMG!!!
(ART) Heuristic Filter ARTMAP
Must use a lot of email messages. It is highly
difficult to find representative samples for
individual waves.
Can also be applied to other neural networks
Interesting PowerPoint template

16
Thanks

QUESTIONS?

Write a Comment

User Comments (0)