Ensemble-based Adaptive Intrusion Detection - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Ensemble-based Adaptive Intrusion Detection

Description:

and anomaly: some connection that is neither normal nor some known types of intrusions. ... based Artificial Anomaly (Fan et al, ... Anomaly Detection: ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 28
Provided by: www1CsC
Category:

less

Transcript and Presenter's Notes

Title: Ensemble-based Adaptive Intrusion Detection


1
Ensemble-based Adaptive Intrusion Detection Wei
Fan IBM T.J.Watson Research Salvatore J.
Stolfo Columbia University
2
Data Mining for Intrusion Detection
Connection Records
(telnet, 10,3,...)
Feature Construction
(ftp,10,20,...)
Training Data
Label Existing Connections
Intrusion Detection Model
Inductive Learner
3
Some interesting requirements ... ...
  • New types of intrusions are constantly invented
    by hackers.
  • Most recent coordinated attacks on many ebusiness
    websites in 2000.
  • Hackers tend to use new types of intrusions that
    intrusion detection system is unaware of or weak
    at detecting them successfully.
  • Data mining for intrusion detection is a very
    data-intensive process.
  • very large data
  • revolving patterns
  • real-time detection

4
Question
  • When new types of intrusions are invented, can we
    quickly adapt our existing model to be able to
    detect these new intrusions before they cause
    more damages?
  • If we don't have a solution, the new attack will
    make significant damage.
  • For this kind of problem, having a solution that
    is not completely satisfactory is better than
    having no solution.

5
Naive Approach - Complete Re-training
Existing Training Data
Merged Training Data
New Data
NEW Intrusion Detection Model
Inductive Learner
6
Problem with the Naive Approach
  • Since data (existing plus new) will be very
    large, it takes a long time to compute a
    detection model.
  • By the time, the model is constructed, the new
    attack probably will have already made enough
    damage to our system.

7
New Approach
New Data
NEW Model
Learner
Combined Model
Existing Model
Key point we only compute model from the data
on new types of intrusions only
8
How do we label connections?
a new connection
existing model
connection type unrecognized
NEW Model
normal or previously known intrusion types
normal or new intrusion types
9
Basic Idea
  • Existing model is built to identify THREE classes
  • normal
  • some type of intrusions
  • and anomaly some connection that is neither
    normal nor some known types of intrusions.
  • anomaly detection - we use the artificial
    anomaly generation method (Fan et al, ICDM 2001)

10
Anomaly Detection
  • Generate "artificial anomalies" from training
    data similar to "near misses".
  • Artificial anomalies are data points that are
    different from the training data.
  • The algorithm concentrates on feature values that
    are infrequent in the training data.
  • Distribution-based Artificial Anomaly (Fan et al,
    ICDM2001)

11
Four Configurations
  • H1(x) existing model.
  • H2(x) new model.
  • They differ in how H2(x) is computed.
  • and how H1(x) and H2(x) are combined
  • and how a connection is processed and classified.

12
Configuration I
13
Configuration II
14
Configuration III
15
Configuration IV
16
Experiment
  • 1998 DARPA Intrusion Detection Evaluation Dataset
  • 22 different types of intrusions.

17
Experiment
  • Sequence to introduce intrusions into the
    training data to simulate new intrusions are
    being invented and launched by hackers
  • 22! unique sequences
  • we randomly used 3 unique sequences.
  • The results are averaged.
  • RIPPER
  • unordered rulesets

18
3 Unique Sequences
19
Measurements
  • All results on the new intrusion types
  • Precision
  • If I catch a potential thief, what is the
    probability that it is a real thief?
  • Recall
  • What is the probability that real thieves are
    detected?
  • Anomaly Detection Rate classified as anomaly
  • Other classified as other types of intrusions.

20
Precision Results
21
Recall Results
22
Anomaly Detection Rate
23
Other Detection Rate Results
24
Summary of results
  • The most accurate is Configuration 1 where
  • new model is trained from normal and the new
    intrusion type
  • all predicted normal and anomalies by the old
    model is examined by the new model.
  • Reason
  • Existing model's precision to detect normal
    connection influences combined model's accuracy.
  • New data is limited in amount. Artificial
    anomalies generated from new data is limited as
    well.

25
Training Efficiency
26
Related Work (incomplete list)
  • Anomaly Detection
  • SRI's IDES use probability distribution of past
    activities to measure abnormality of host events.
    We measure network events.
  • Forrest et al uses absence of subsequence to
    measure abnormality.
  • Lane and Brodley employ a similar approach but
    use incremental learning approach to update
    stored sequence from UNIX shell commands.
  • Ghosh and Schwarzbard use neural network to learn
    profile of normality and distance function to
    detect abnormality.
  • Generating Artificial Data
  • Nigam et al assign label to unlabelled data using
    classifier trained from labeled data.
  • Chang and Lippman applied voice transformation
    techniques to add artificial training talkers to
    increase variability.
  • Multiple classifiers
  • Asker and Macline "Ensembles as a sequence of
    classifiers"

27
Summary and Future Work
  • Proposed a two-step two classifier approach for
    efficient training and fast model deployment.
  • Empirically tested in the intrusion detection
    domain.
  • Need to test if it works well for other domains.
Write a Comment
User Comments (0)
About PowerShow.com