A%20Data%20Mining%20Approach%20for%20Building%20Cost-Sensitive%20and%20Light%20Intrusion%20Detection%20Models%20%20Quarterly%20Review%20 - PowerPoint PPT Presentation

About This Presentation
Title:

A%20Data%20Mining%20Approach%20for%20Building%20Cost-Sensitive%20and%20Light%20Intrusion%20Detection%20Models%20%20Quarterly%20Review%20

Description:

A Data Mining Approach for Building Cost-Sensitive and Light ... Patten discovery ( domain knowledge/expert system, data mining ...) A Case Study: DDoS ... – PowerPoint PPT presentation

Number of Views:172
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: A%20Data%20Mining%20Approach%20for%20Building%20Cost-Sensitive%20and%20Light%20Intrusion%20Detection%20Models%20%20Quarterly%20Review%20


1
A Data Mining Approach for Building
Cost-Sensitive and Light Intrusion Detection
Models Quarterly Review November 2000
North Carolina State University Columbia
University Florida Institute of Technology
2
Outline
  • Project description
  • Progress report
  • Cost-sensitive modeling (NCSU/Columbia/FIT).
  • Automated feature and model construction (NCSU).
  • Anomaly detection (NCSU/Columbia/FIT).
  • Attack clustering and light modeling (FIT).
  • Real-time architecture and systems
    (NCSU/Columbia).
  • Correlation (NCSU).
  • Collaboration with industry (NCSU/Columbia).
  • Publications and software distribution.
  • Effort and budget.
  • Plan of work for next quarter

3
New Ideas and Hypotheses (1/2)
  • High-volume automated attacks can overwhelm a
    real-time IDS and its staff
  • IDS needs to consider cost factors
  • Damage cost, response cost, operational cost,
    etc.
  • Pure statistical accuracy not ideal
  • Base-rate fallacy of anomaly detection.
  • Alternative the cost (saving) of an IDS.

4
New Ideas and Hypotheses (2/2)
  • Thorough analysis cannot always be done in
    real-time by one sensor
  • Correlation of multiple sensor outputs.
  • Trend or scenario analysis.
  • Need better theories and tools for building
    misuse and anomaly detection models
  • Characteristics of normal data and attack
    signatures can be measured and utilized.

5
Main Approaches (1/2)
  • Cost-sensitive models and architecture
  • Optimized for the cost metrics defined by users.
  • Cost-sensitive machine learning algorithms.
  • Multiple specialized and light sensors
    dynamically activated/configured in run-time.
  • Load balancing of models and data
  • Aggregation and correlation.
  • Cost-effectiveness as the guiding principle and
    multi-model correlation as the architectural
    approach.

6
Main Approaches (2/2)
  • Theories and tools for more effective anomaly and
    misuse detection
  • Information-theoretic measures for anomaly
    detection
  • Regularity of normal data is used to build
    model.
  • New algorithms, e.g.
  • Unsupervised learning using noisy data.
  • Using artificial anomalies
  • An automated system that integrate all these
    algorithms/tools.

7
Project Impacts (1/2)
  • A better understanding of the cost factors, cost
    models, and cost metrics related to intrusion
    detection.
  • Modeling techniques and deployment strategies for
    cost-effective IDSs
  • Provide the best-valued protection.
  • Clustering techniques for grouping intrusions
    and building specialized and light sensors.
  • An architecture for dynamically activating,
    configuring, and correlating sensors.

8
Project Impacts (2/2)
  • More effective misuse and anomaly detection
    models
  • With sound theoretical foundations and automation
    tools.
  • Analysis/correlation techniques for
    understanding/recognizing and predicting complex
    attack scenarios.

9
Cost-Sensitive Modeling
  • In previous quarters
  • Cost factors and metrics definition and analysis.
  • Cost model definition.
  • Cost-sensitive modeling with machine learning.
  • Evaluation using DARPA off-line data.
  • Current quarter
  • Real-time architecture.
  • Dynamic cost-sensitive deployment and correlation
    of sensors.

10
A Multi Layer/Component Architecture
models
Remote IDS/Sensor
Dynamic Cost-sensitive Decision Making
FW
Real-time IDS
Backend IDS
ID Model Builder
11
Next Steps
  • Study realistic cost-metrics in the real-world.
  • Implement a prototype system
  • Demonstrate the advantage of cost-sensitive
    modeling and dynamic cost-effective deployment
  • Use representative scenarios for evaluation.

12
An Automated System for Feature and Model
Construction
13
The Data Mining Process of Building ID Models
models
features
patterns
connection/ session records
packets/ events (ASCII)
raw audit data
14
Feature Construction From Patterns
patterns
new intrusion records
mining
mining
normal and historical intrusion records
compare
intrusion patterns
detection models
features
learning
training data
15
Status and Next Steps
  • The effectiveness of the algorithms/tools
    (process steps) have been validated
  • 1998 DARPA Evaluation.
  • Automating the process
  • Process steps chained together.
  • Process iteration under development.
  • Field test
  • Advanced Technology Systems, General Dynamics.
  • Planned public release 2Q-2001.
  • Dealing with unlabeled data
  • Integrate anomaly detection over noisy data
    (Columbia) algorithms.

16
Information-Theoretic Measures for Anomaly
Detection
  • Motivations
  • Need formal understandings.
  • Hypothesis
  • Anomaly detection is based on regularity of
    normal data.
  • Approach
  • Entropy and conditional entropy regularity
  • Determine how to build a model.
  • Relative (conditional) entropy how the
    regularities between training and test datasets
    relate
  • Determine the performance of a model on test data.

17
Case Studies
  • Anomaly detection for Unix processes
  • Short sequences as normal profile.
  • A classification approach
  • Given the first k system calls, predict the k1st
    system call
  • How to determine the sequence length, k? Will
    including other information help?
  • UNM sendmail system call traces.
  • MIT Lincoln Lab BSM data.
  • Anomaly detection for network
  • How to partition the data refine the complex
    subject.
  • MIT Lincoln Lab tcpdump data.

18
Entropy and Conditional Entropy
  • Impurity of the dataset
  • the smaller (the more regular) the better.
  • Irregularity of sequential dependencies
  • uncertainty of a sequence after seeing its
    prefix (subsequences)
  • the smaller (the more regular) the better.

19
Relative (Conditional) Entropy
  • How different is p from q
  • how different is the regularity of test data
    from that of training data
  • the smaller the better.

20
Information Gain and Classification
  • How much can attribute/feature A contribute to
    the classification process
  • the reduction of entropy when the dataset is
    partitioned according to values of A.
  • the larger the better.
  • if A the first k events in a sequence (i.e.,
    Y) and the class label is the k1st event
  • conditional entropy H(XY) is just the second
    term of the Gain(X, A)
  • the smaller the conditional entropy, the better
    performance the classifier.

21
Conditional Entropy of Training Data (UNM)
22
Misclassification Rate Training Data
23
Conditional Entropy vs. Misclassification Rate
24
Misclassification Rate of Testing Data and
Intrusion Data
25
Relative Conditional Entropy btw. Training and
Testing Normal Data
26
(Real and Estimated) Accuracy/Cost (Time)
Trade-off
27
Conditional Entropy of In- and Out- bound Email
(MIT/LL BSM)
28
Relative Conditional Entropy
29
Misclassification Rate of in-bound Email
30
Misclassification Rate of out-bound Email
31
Accuracy/cost Trade-off
32
Estimated Accuracy/cost Trade-off
33
Key Findings
  • Regularity of data can guide how to build a
    model
  • For sequential data, conditional entropy directly
    influences the detection performance
  • Determines the (best) sequence length and whether
    to include more information, before building a
    model.
  • With cost is also considered, the optimal
    model.
  • Detection performance on test data can be
    attained only if regularity is similar to
    training data.

34
Next Steps
  • Study how to measure more complex environments
  • Network topology/configuration/traffic, etc.
  • Extend the principle/approach for misuse
    detection
  • Measure normal, attack, and their relationship
  • Parameter adjustment, performance prediction.

35
New Anomaly Detection Approaches
  • Unsupervised training methods
  • Build models over noisy (not clean) data
  • Artificial anomalies
  • Improves performance of misuse and anomaly
    detection methods.
  • Network traffic anomaly detection

36
AD over Noisy Data
  • Builds normal models over data containing some
    anomalies.
  • Motivating assumptions
  • Intrusions are extremely rare compared to to
    normal.
  • Intrusions are quantitatively different.

37
Approach Overview
  • Mixture model
  • Normal component
  • Anomalous component
  • Build probabilistic model of data
  • Max likelihood test for detection.

38
Mixture Model of Anomalies
  • Assume a generative model
  • The data is generated with a probability
    distribution D.
  • Each element originates from one of two
    components
  • M, the Majority Distribution (x ? M).
  • A, the Anomalous Distribution (x ? A).
  • Thus D (1-?)M ?A.

39
Modeling Probability Distributions
  • Train Probability Distributions over current sets
    of M and A.
  • PM(X) probability distribution for Majority.
  • PA(X) probability distribution for Anomaly.
  • Any probability modeling method can be used
  • Naïve Bayes, Max Entropy, etc.

40
Experiments
  • Two Sets of experiments
  • Measured Performance against comparison methods
    over noisy data.
  • Measured Performance trained over noisy data
    against comparison methods trained over clean
    data.
  • Method Robust in both comparisons.

41
AD Using Artificial Anomalies
  • Generate abnormal behavior artificially
  • Assume the given normal data are representative.
  • Near misses" of normal behavior is considered
    abnormal.
  • Change the value of only one feature in an
    instance of normal behavior.
  • Sparsely represented values are sampled more
    frequently.
  • Near misses" help define a tight boundary
    enclosing the normal behavior.

42
Experimental Results
  • Learning algorithm RIPPER
  • Data 1998 DARPA evaluation
  • U2R, R2L, DOS, PRB 22 clusters
  • Training data normal and artificial anomalies
  • Results
  • Overall detection rate 94.26
  • Overall false alarm rate 2.02
  • 100 dectection buffer_overflow, guess_passwd,
    phf, back
  • 0 detection perl, spy, teardrop, ipsweep, nmap
  • 50 detection 13 out of 22 intrusion subclasses

43
Combining Anomaly and Misuse Detection
  • Training data normal data, artificially
    generated anomalies, known intrusion data
  • The learned model can predict normal, anomaly, or
    known intrusion subclass
  • Experiments were performed on increasing subsets
    of known intrusion subclasses in the training
    data (simulates identified intrusions over time).

44
Combining Anomaly and Misuse Detection (continued)
  • Consider phf, pod, teardrop, spy, and smurf are
    unknown (absent from the training data)
  • Anomaly detection rate phf25, pod100,
    teardrop93.91, spy50, smurf100
  • Overall false alarm rate .20
  • The false alarm rate has dropped from 2.02 to
    .20 when some known attacks are included for
    training

45
Adaptive Combined Anomaly and Misuse Detection
  • Completely re-train model whenever new intrusion
    is found is very expensive and slow process.
  • Effective and fast remedy is very important to
    thwart these attacks.
  • Re-training is still necessary when time and
    resource are enough.

46
Multiple Model Adaptive Approach
  • Generate an additional detection module only good
    at detecting the newly discovered intrusion.
  • Method 1 trained from normal and new intrusion
    data
  • Method 2 new intrusion and artificial anomaly
  • When old classifier predicts anomaly, it will
    be further predicted by the new classifier to
    examine if it is the new intrusion.

47
Multiple Model Adaptive Experiment
  • The old model is trained from n intrusions.
  • A light weight model is trained from one new
    intrusion type.
  • They are combined as an ensemble.
  • The accuracy and training time is compared with
    one model trained from n 1 intrusions.

48
Multiple Model Adaptive Experiment Result
  • The accuracy difference is very small
  • recall 3.4
  • precision -16
  • In other words, ensemble approach detects more
    new intrusion, but also misidentifies more
    anomaly as new intrusion.
  • Training time difference 150 time difference! or
    a cup of coffee versus one or two days.

49
Detecting Anomalies in Network Traffic (1/2)
  • Can we detect intrusions by identifying novel
    values in network packets?
  • Anomaly detection is potentially useful in
    detecting novel attacks.
  • Our model is trained on attack-free tcpdump data.
  • Fields in the Transport layer or below are
    considered.

50
Detecting Anomalies in Network Traffic (2/2)
  • Normal field values are learned.
  • During evaluation, a function scores a packet
    based on the likelihood of encountering novel
    field values.
  • Initial results indicate our learned model
    compares favorably with other systems on the 1999
    DARPA evaluation data.

51
Packet Fields
  • Fields in Data-link, Network, and Transport
    layers.
  • (Application layer will be considered later)
  • Ethernet source, destination, protocol.
  • IP header length, TOS, fragment ID, TTL,
    transport protocol
  • TCP header length, UAPRSF flags, URG pointer
  • UDP length
  • ICMP type, code

52
Anomaly Scoring Function (1/2)
  • N1 Number of unique values in a field in the
    training data
  • N Number of packets in the training data
  • Likelihood of observing a novel value in a field
    is
  • N1 / N
  • (escape probability, Witten and Bell, 1991)

53
Anomaly Scoring Function (2/2)
  • Non-stationary model consider the last
    occurrence of novel values
  • t Number of seconds since the last novel value
    in the same field
  • Likelihood of observing an anomaly
  • P (N1 / N) (1 / t)
  • Field anomaly score Sf 1 / P
  • Packet anomaly score Sf Sf

54
Experiments
  • 1999 DARPA evaluation data (from Lincoln Lab).
  • Same mechanism as DARPA in determining detection
    (correct IP address of the victim, 60 seconds
    before and after an attack).
  • Score thresholds of our system and others are
    lowered to produce no more than 100 false alarms.
  • Some of the other systems use binary scoring.

55
Initial Results
IDS TP/FP All TP/FP Network IDS Type
Oracle 200/0 72/0 ideal
FIT 64/100 51/100 anomaly
GMU 51/22 27/22 anomalysignature
NYU 20/80 14/80 signature
SUNY 24/9 19/9 signature
NetSTAT 70/995 35/995 signature
Emerald-TCP 83/23 35/23 signature
56
Discussion
  • All attacks more detections with 100 or fewer
    false alarms than most systems except Emerald and
    NetSTAT.
  • Our initial experiments did not look at fields in
    the Application protocol layer.
  • Network attacks more detections with 100 or
    fewer false alarms than the other systems.
  • 57 out of 72 attacks were detected with 100 false
    alarms.

57
Summary of Progress
  • Florida Techs official start date August 30,
    2000.
  • Near-term objective using learning techniques to
    build anomaly detection models that can identify
    intrusions.
  • Progress initial experimental results on the
    1999 DARPA evaluation data indicate that our
    techniques compare favorably with the other
    systems in detecting network attacks.

58
Plans for the Next Quarter
  • Investigate an entropy approach to detecting
    anomalies.
  • Study methods that incorporate more information
    from packets prior to the current packet.
  • Examine how effective our techniques are with
    respect to individual attack types.
  • Devise techniques to catch attack types that are
    undetected.
  • Incorporate fields in the Application protocol
    layer into our model.

59
Anomaly Detection Summary and Plans
  • Anomaly detection is a main focus.
  • Both theories and new approaches.
  • Will integrate
  • Theories applied to develop new AD sensors.
  • Incorporate cost-sensitive measures.
  • Study real-time architecture/performance.
  • Automated feature and model construction system.

60
Correlation Analysis of Attack Scenario
  • Motivations
  • Detecting individual attack actions not adequate
  • Damage assessment, trend prediction, etc.
  • Hypothesis
  • Attacks are related and such correlation can be
    learned.
  • Approach
  • Start with crude knowledge models.
  • Use data mining to validate/refine the models.
  • An IETF/IDWG architecture/system.

61
Objectives (1/2)
  • Local/low layer correlations in an IDS
  • Multiple sources of raw (audit) data
  • Raw information tcpdump data, BSM records
  • Based on specific attack signatures, system/user
    normal profiles
  • Benefits
  • Better accuracy higher TP, lower FP
  • More alarm information for higher level and
    global analysis

62
Objectives (2/2)
  • Global / High Layer Correlations
  • Multiple sources of alarms by IDSs
  • The bigger picture
  • What really happened in our networks?
  • What can we learn from these cases?
  • Benefits
  • What is the intention of the attacks?
  • What will happen next? When? Where?
  • What can we do to prevent it from happening?

63
Architecture of Global Correlation System
Alarm Collection Center
Report Center
IDSs
Knowledge Controller
64
Correlation Techniques from Network Management
System (1/2)
  • Rule-Based Reasoning (RBR)
  • If then rules based on the domain knowledge and
    expertise.
  • Sufficient for small, non-changing, and well
    understood system.
  • Model-Based Reasoning (MBR)
  • Model both physical and logical entity, such as
    hub, router
  • Correlation is a result of the collaboration
    among models.

65
Correlation Techniques from Network Management
Systems (2/2)
  • State-Transition Graph (STG)
  • Logical connections via state-transition.
  • May lead to unexpected behavior if the
    collaborating STGs are not carefully defined.
  • Case-Based Reasoning (CBS)
  • Learn from the experience and offer solutions to
    novel problems based on experience.
  • Need to develop a similarity metric to retrieve
    useful cases from the library.

66
Correlation Techniques for IDS
  • Combination of different correlation techniques
  • Network complexity.
  • Wide varieties attacking motives and tools.
  • Adaptation of different correlation techniques
  • Different perspectives between NMS and IDS.

67
Challenges of Correlation (1/2)
  • Knowledge representation
  • How to represent the objects such as alarms, log
    files, network entities?
  • How to model the knowledge such as network
    topology, network history, intrusion library,
    previous cases?

68
Challenges of Correlation (2/2)
  • Knowledge base construction
  • What kind of knowledge base do we need?
  • How to construct the knowledge base?
  • Case library
  • Network Knowledge
  • Intrusion Knowledge
  • Patten discovery ( domain knowledge/expert
    system, data mining )

69
A Case Study DDoS
  • An attack scenario from MIT/LL
  • Phase 1 IPSweep of the AFB from a remote site.
  • Phase 2 Probe of live IPs to look for the
    sadmind daemon running on Solaris hosts.
  • Phase 3 Break-ins via the sadmind
    vulnerability.
  • Phase 4 Installation of the trojan
    programmstream DDoS software on three hosts at
    the AFB.
  • Phase 5 Launching the DDoS.

70
Alarm Model
  • Object-Oriented
  • Alarm A feature1, feature2,
  • Features of Alarm
  • Attack type
  • Time stamp
  • Service
  • Source IP / domain
  • Target IP/ domain
  • Target number
  • Source type (router , host , server)
  • Target type (router, host, server )
  • Duration
  • Frequency within time window

71
Alarm Model
  • Example
  • IP sweep 095151 ICMP ppp5-23.iawhk.com
    172.16.115.x 20 hosts servers 9 1
  • Attack type IP sweep
  • Time stamp 095151
  • Service ICMP
  • Source IP ppp5-23.iawhk.com
  • Target IP 172.16.115.x
  • Target number 20
  • Source type n/a
  • Target type hosts and servers
  • Duration 9 seconds
  • Frequency 1

72
Scenario Representation (1/2)
  • Attack scenario graph
  • Constructed by domain knowledge
  • Can be validated/augmented via data mining.
  • Describing attack scenarios via state transition.
  • Each transition with probability P.
  • Modifiable by experts.
  • Adaptive to new cases.

73
Scenario Representation (2/2)
  • Example of attack scenario graph

TFN2K DDoS
IP Sweep
Trojan Installation
Trinoo DDoS
Mstream DDoS
74
Correlation Rule Sets
  • Based on
  • Attack scenario graph.
  • Domain knowledge and expertise.
  • Case library.
  • Two Layers of Rule Sets
  • Lower layer for matching/correlating specific
    alarms.
  • Higher layer for trend prediction.
  • Probability assigned.

75
Correlation Rule Sets
  • Example of low layer rule sets
  • If (A1.type IP Sweep A2.type Port Scan
    ) (A1.time lt A2.time) (A1.domain A2.domain)
    ( A2.target gt 10 ), then A1A2
  • .
  • If (A2.type Port Scan A3.type Buffer
    Overflow) (A2.time lt A3.time) (A3.DestIP
    belongs to A2.domain) (A3.target gt2), then A2
    A3

76
Correlation Rule Set
  • Example of high layer rule sets
  • If (A1 A2, A2 A3), then A1A2A3
  • If (A1 A2 A3), then the attack scenario
  • is A1 -gt A2 -gtA3 -gt A4 w/ probability P1
  • A1-gt A2 -gt A3 -gt A4 -gt A5 w/ probability P2
  • E.g.,
  • If (IP Sweep Port Scan Buffer Overflow
    )
  • Then next1 Trojan Installation with P1
  • next2 DDoS with P2

77
Status and Next Steps
  • At the very beginning of this research.
  • Attack Scenario Graph
  • How to construct it automatically?
  • How to model the statistical properties of attack
    scenario state transition?
  • How to automatically generate the correlation
    rule sets?
  • Collaboration with other groups
  • Alarm formats, architecture, IETF/IDWG.

78
Real-time System Implementation
  • Motivations
  • Validate our algorithms and models in the
    real-world.
  • Faster technology transfer and greater impact.
  • Approach
  • Collaboration with industries
  • Reuse available building blocks as much as
    possible.

79
Conceptual Architecture
Adaptive Model Generator
models
Data Warehouse
data
models
data
Sensor
Detector
data
80
System Architecture
Unsupervised Machine Learning
Supervised Machine Learning
Model Generation
Data Warehouse
Meta IDS
Adaptive Model Generation
Real Time Data Mining
NT Host Based IDS
Linux Host Based IDS
Solaris Host Based IDS
NFR Network Based IDS
Sensors
Malicious Email Filter
File System Wrappers
Software Wrappers
81
Sensor Host Based IDS System
  • Generic Interface to Sensors
  • BAM (Basic Auditing Module)
  • Sends data to data warehouse
  • Receives models from data warehouse
  • NT System
  • Fully Operational
  • Linux System BSM (Solaris) System
  • Sensor Operational
  • Under Construction
  • Plan to finish construction by end of semester

82
(No Transcript)
83
Sensor Network IDS System
  • NFR Based Sensor
  • Data Mining based
  • Efficient Evaluation Architecture
  • Multiple Models
  • System operational and integrated with larger
    system

84
Sensor Malicious Email Filter
  • Monitors Email (sendmail)
  • Detects malicious emails entering domain
  • Key Features
  • Model Based
  • Generalizes to unknown malicious attachments
  • Models distributed automatically to filters
  • Status
  • Prototype operational
  • Open source release by end of semester

85
Sensor Advanced IDS Sensors
  • File Wrappers
  • Software Wrappers
  • Monitor other aspects of system
  • Status
  • File Wrappers almost finished
  • Software Wrappers under development

86
Data Warehouse
  • Stores data collected from sensors
  • Generic IDS data format
  • Data can be manipulated in database
  • Cross reference data from attacks
  • Stores generated models
  • Status
  • Currently Operational
  • Refining Interface and Data Transfer Protocol
  • Completed by end of Semester

87
Adaptive Model Generator
  • Builds models from data in data warehouse
  • Uses both supervised and unsupervised data
  • Can build models based on data collected
  • XML Based Data Exchange Format
  • Status
  • Exchange Formats defined
  • Prototype developed
  • Completion by end of semester

88
Collaboration with Industries
  • NFR.
  • Cigital (RST).
  • SAS.
  • General Dynamics.
  • Aprisma/Cabletron.
  • HRL.

89
Publications and Software, etc.
  • 4 Journal and 10 Conference papers
  • One best paper and two runner-ups.
  • JAM.
  • MADAMID.
  • PhDs two graduated, one graduating, five in the
    pipeline
  • More to come

90
Efforts Current Tasks
  • Cost-sensitive modeling (NCSU/Columbia/FIT).
  • Automated feature and model construction
    (NCSU/Columbia/FIT)
  • Integration of all algorithms and tools.
  • Anomaly detection (NCSU/Columbia/FIT).
  • Attack clustering and light modeling (FIT).
  • Real-time architecture and systems
    (NCSU/Columbia).
  • Correlation (NCSU).
  • Collaboration with industry (NCSU/Columbia/FIT).
Write a Comment
User Comments (0)
About PowerShow.com