Wei Fan - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Wei Fan

Description:

Panda is a very 'sparse' bear in the bear class. ... body, black eye shades, black legs and weight 200lb, THEN it is (panda) bear ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 23
Provided by: Wei74
Category:
Tags: fan | wei

less

Transcript and Presenter's Notes

Title: Wei Fan


1
Using Artificial Anomalies to Detect Known and
Unknown Network Intrusions
Wei Fan IBM Research Matt Miller, Sal
Stolfo Columbia University Wenke Lee Georgia
Tech Philip Chan Florida Tech December 1,
2001
2
Anomaly Detection and Classification
  • Differences
  • Classification system builds models to detect
    repeated patterns of known event types.
  • Anomaly detection tracks inconsistencies
    deviating from "known" and "expected".
  • Examples Intrusion Detection Systems
  • Misuse detection detects known intrusion types.
  • Anomaly detection detects network events
    different from normal events and known
    intrusions. Anomalies are likely to be newly
    launched intrusions.
  • Training Data
  • Classification clearly labeled examples.
  • Anomaly Detection no labeled anomalous data.
    Otherwise, they won't be anomalies.

3
Problem
  • Problem How to use Inductive Learning for
    anomaly detection?
  • Wide range of inductive learners available.
  • Comprehensible models.
  • Solution Compute artificial anomaly data from
    classification training data to convert anomaly
    detection into classification.
  • All artificial anomalies will be assigned to the
    label anomaly.
  • For example, use normal and known intrusion data
    to compute artificial anomalies.

4
Some Observations on Inductive Learning Algorithms
  • Only discover boundaries to separate data with
    different given labels.
  • Data of unknown types will always be
    misclassified as one of the given classes.
  • Example
  • How to distinguish a bear and a cat?
  • An inductive model might be
  • If Weight(x) lt 5lb, x is a cat otherwise, it
    is a bear.
  • However,
  • If x is a horse, the model will mistakenly
    predict it to be a bear.
  • The ideal answer would be "I don't know. It is
    neither bear nor cat."

5
Solution Summary
  • Generate artificial anomalies with label name
    "anomaly" to delineate the boundary between
    known and unknown.
  • How to generate artificial anomalies? Compute
    examples that are close to but different from
    those with given labels.
  • Where to place the artificial anomalies?
  • Put more artificial anomalies around infrequent
    examples or sparse regions in the training data.

6
Digging into Dataset
  • Assuming that the boundary (between known and
    unknown) is close to the known data.
  • Randomly change the value of one feature of a
    given datum while leaving the other features
    unaltered.
  • Concentrate on areas in the training data that
    are "sparse".
  • Sparse regions are characterized by infrequent
    feature values.
  • Example
  • Panda is a very "sparse" bear in the bear class.
  • Panda has white body, black eye shades and black
    legs.
  • Generate more artificial anomalies around sparse
    regions.
  • Something with white body, white eyes shades,
    black legs and a weight above 200 lb. are
    anomalies.
  • Based on the frequency of feature values, we
    compensate sparse regions by filling in more
    artificial anomalies.

7
Overall Effect
  • Sparse regions will be focused and very specific
    rules will be generated to cover these regions.
  • For example,
  • IF white body, black eye shades, black legs and
    weight gt 200lb, THEN it is (panda) bear
  • ELSE try other rules
  • ELSE IF none of the rules are satisfied
  • THEN predict "We don't know what it is based on
    our limited knowledge"
  • Without the artificial anomalies, the animal with
    "white body, white eye shades and black legs"
    will be misclassified as a bear at its best.

8
Distribution-based Artificial Anomaly Generation
Algorithm
  • Iterate through every feature value..
  • fmax is the most frequenct feature value of
    feature F.
  • count(fmax) is its frequency count.
  • fi is another feature value of the same feature
    F.
  • count(fi) is its frequency count.
  • countdiff count(fmax) - count(fi).
  • generate countdiff number of artificial
    anomalies for feature value fi.
  • For a data whose feature F has value fi.
  • Change its value fi to any value that is not fi
    while leaving the other all other features
    unchanged.
  • Change its label to "anomaly".

9
Application of Artificial Anomaly
  • Pure Anomaly Detection
  • Training data have only one class, such as
    normal.
  • Detect any data that are different from the given
    single class.
  • Artificial anomalies are computed from this
    single class.
  • Combined Misuse and Anomaly Detection
  • Classification and anomaly are performed in the
    same time.
  • For example, detects Bear, Cat and non-bear and
    non-cat in the same time
  • Efficient since both classification and anomaly
    detection are done in the same time.
  • One single module.
  • Efficient Model Deployment.

10
Experiment on Pure Anomaly Detection
  • Measurements
  • False alarm rate predicted anomalies that are
    actually normal.
  • Detection rate true anomalies correctly
    detected.
  • Original Dataset
  • 1998 DARPA Intrusion Detection Evaluation Dataset
    (also 1999 KDDCUP dataset).
  • Original dataset contains both normal data and
    intrusion data.
  • There are 4 basic types of intrusions. Each type
    has a few subclasses.
  • U2R User to Root
  • R2L Remote to Local
  • DOS Denial of Services
  • PRB Probing

11
Intrusions and Categories
12
Experiment Setup
  • Training Set normal data and artificial
    anomalies computed from normal data. No
    intrusions are included.
  • Test Set both normal and all intrusion data.
  • Goal can we detect all intrusion data as
    "anomalies" without having them in the training
    data?
  • Learner
  • RIPPER inductive tree learner.

13
Pure Anomaly Result
  • False Alarm Rate 2
  • Anomaly Detection Rate

14
Experiment on Combined Misuse and Anomaly
Detection
  • One single module that detects both known
    intrusions and unknown events that are neither
    normal nor intrusions.
  • Group different types of intrusions into 13
    clusters.
  • Similar intrusions are grouped together.
  • Knowledge of intrusions of one cluster may not
    help detect intrusions of another cluster.

15
Experiment Setup
  • Training Set
  • normal data plus a few clusters of intrusions
    PLUS artificial anomalies.
  • Test Set
  • all data normal and all intrusions
  • Goal
  • Can we detect unseen intrusions (excluded
    intrusion clusters) as anomalies?
  • Do we have to compromise performance to detect
    known intrusions (included intrusion clusters)?
  • There are 13! ways to choose combinations of
    training and test sets.
  • We use 3 unique sequences to introduce intrusion
    clusters.
  • Add one cluster at a time.
  • Training normal Cluster 1 to i
  • Testing normal and all types of intrusions.

16
Measurements
  • True Class Detection Rate
  • Anomaly Detection Rate

17
True Class Detection Rate
  • Intrusion i correctly detected as intrusion i.

18
Anomaly Detection Rate
  • Percentage of anomaly or unknown intrusions
    correctly detected as anomaly.

19
Efficient Model Deployment (summary)
  • Efficient learning and deployment of models to
    detect new attacks.
  • When data about new attacks are collected, we do
    not want to retrain the model for all intrusions
    from scratch.
  • Instead, we only train a lightweight model to
    detect the new attacks.
  • Using artificial anomaly, the older model can
    detect anomalies.
  • The new lightweight model and older model are
    combined together to detect both new and existing
    attacks.
  • When an event is detected as anomaly, it will be
    sent to the new light weight model to check if it
    is the new attack or just anomaly.
  • Experiments show that accuracy remains unchanged,
    but efficiency is 150 times faster.

20
Many Related Work (incomplete list)
  • Anomaly Detection
  • SRI's IDES use probability distribution of past
    activities to measure abnormality of host events.
    We measure network events.
  • Forrest et al uses absence of subsequence to
    measure abnormality.
  • Lane and Brodley employ a similar approach but
    use incremental learning approach to update
    stored sequence from UNIX shell commands.
  • Ghosh and Schwarzbard use neural network to learn
    profile of normality and distance function to
    detect abnormality.
  • Generating Artificial Data
  • Nigam et al assign label to unlabelled data using
    classifier trained from labeled data.
  • Chang and Lippman applied voice transformation
    techniques to add artificial training talkers to
    increase variability.

21
Summary and Future Work
  • Proposed a feature value distribution-based
    artificial anomaly generation algorithm.
  • Applied this algorithm for both pure anomaly and
    combined misuse and anomaly detection for
    intrusion detection.
  • It remains to be seen if the same approach works
    for other domains,

22
Distribution Based Artificial Anomaly
Write a Comment
User Comments (0)
About PowerShow.com