Data Mining for Network Intrusion Detection

About This Presentation

Title:

Data Mining for Network Intrusion Detection

Description:

DOS - Denial Of Service. Probe - e.g. port scanning ... Reported by CERT as recent DoS attacks that needs further analysis (CERT August 9, 2002) ... – PowerPoint PPT presentation

Number of Views:262

Avg rating:3.0/5.0

Slides: 38

Provided by: aleksandar

Learn more at: http://minds.cs.umn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Data Mining for Network Intrusion Detection

1
Data Mining for Network Intrusion Detection
Vipin Kumar Army High Performance Computing
Research Center Department of Computer Science
University of Minnesota http//www.cs.umn.edu/
kumar Project Participants V. Kumar, A.
Lazarevic, J. Srivastava P.
Dokas, E. Eilertson, L. Ertoz, S. Iyer, S.
Ketkar, P. Tan Research
supported by AHPCRC/ARL
2
Cyber Threat Analysis

As the cost of information processing and
Internet accessibility falls, organizations are
becoming increasingly vulnerable to potential
cyber threats such as network intrusions

Intrusions are actions that attempt to bypass
security mechanisms of computer systems
Intrusions are caused by
Attackers accessing the system from Internet
Insider attackers - authorized users attempting
to gain and misuse non-authorized privileges

3
Intrusion Detection

Intrusion Detection System
combination of software and hardware that
attempts to perform intrusion detection
raises the alarm when possible intrusion happens
Traditional intrusion detection system IDS tools
(e.g. SNORT) are based on signatures of known
attacks
Limitations
Signature database has to be manually revised
for each new type of discovered intrusion
They cannot detect emerging cyber threats
Substantial latency in deployment of newly
created signatures across the computer system

www.snort.org
4
Data Mining for Intrusion Detection

Increased interest in data mining based IDS for
detection
Attacks for which it is difficult to build
signatures
Unforeseen/Unknown attacks
Emerging Threats
Data mining approaches for intrusion detection
Misuse detection
Building predictive models from labeled labeled
data sets (instances are labeled as normal or
intrusive)
Can only detect known attacks and their
variations
High accuracy in detecting many kinds of known
attacks
Anomaly detection
Able to detect novel attacks as deviations from
normal behavior
Potential high false alarm rate - previously
unseen (yet legitimate) system behaviors may also
be recognized as anomalies

5
Misuse Detection

Classification of intrusions
RIPPER Madam ID _at_ Columbia U, Bayesian
classifier ADAM _at_ George Mason U, fuzzy
association rules Bridges00, decision trees
ARL U Texas, Sinclair99, neural networks
Lippmann00, Ghosh99, Canady98, genetic
algorithms Bridges00, Sinclair99
Association pattern analysis
Building normal profile Barbara01,
Manganaris99, frequent episodes for constructing
features Madam ID _at_ Columbia U
Cost sensitive modeling
AdaCost Fan99, MetaCost Domingos99, Ting00,
Karakoulas95
Learning from rare class
Kubat97, Fawcett97, Ling98, Provost01,
Japkowicz01, Chawla01, Joshi01

6
Anomaly Detection

Statistical approaches
Finite mixture model Yamanishi00, ?2 based
Ye01
Various anomaly detection
Temporal sequence learning Lane98, neural
networks Ryan98, similarity tree Kokkinaki97,
generating artificial anomalies Fan01,
Clustering Madam ID, Eskin02, unsupervised SVM
Madam ID, Eskin02,
Outlier detection schemes
Nearest neighbor approaches Knorr98, Jin01,
Ramaswamy00, Aggarwal01, Density based
Breunig00, connectivity based
Tang01,Clustering based Yu99

7
Key Technical Challenges

Large data size
Millions of network connections are common for
commercial network sites,
High dimensionality
Hundreds of dimensions are possible
Temporal nature of the data
Data points close in time - highly correlated
Skewed class distribution
Interesting events are very rare ? looking for
the needle in a haystack
Data Preprocessing
Converting network traffic into data
High Performance Computing (HPC) can be critical
for on-line analysis and scalability to very
large data sets

8
MINDS Project - Recent Accomplishments

MINDS MINnesota INtrusion Detection System
Learning from Rare Class Building rare class
prediction models
Anomaly/outlier detection
Summarization of attacks using association
pattern analysis

9
MINDS - Learning from Rare Class

Problem Building models for rare network attacks
(Mining needle in a haystack)
Standard data mining models are not suitable for
rare classes
Models must be able to handle skewed class
distributions
Learning from data streams - intrusions are
sequences of events
Key results
PNrule and related work Joshi, Agarwal, Kumar,
SIAM 2001, SIGMOD 2001, ICDM 2001, KDD 2002
SMOTEBoost algorithm Lazarevic, in review
CREDOS algorithm Joshi, Kumar, in review
Classification based on association - add
frequent items as meta-features to original
data set

10
MINDS - Anomaly and Outlier Detection

Approach
Detecting novel attacks/intrusions by identifying
them as deviations from normal behavior
Goals
Construct useful set of features for data mining
algorithms
Identify novel intrusions using outlier detection
schemes
Distance based techniques
Nearest neighbor approach
Mahalanobis-distance approach
Clustering based approaches
Density based schemes
Unsupervised Support Vector Machines (SVM)

11
Experimental Evaluation

Publicly available data set
DARPA 1998 Intrusion Detection Evaluation Data
Set
Real network data from
University of Minnesota

Anomaly detection is applied
4 times a day
10 minutes time window

Open source signature-based network IDS
network
www.snort.org
10 minutes cycle 2 millions connections
net-flow data using CISCO routers
Anomaly scores
Association pattern analysis

MINDSanomaly detection
Data preprocessing
12
DARPA 1998 Data Set

DARPA 1998 data set (prepared and managed by MIT
Lincoln Lab) includes a wide variety of
intrusions simulated in a military network
environment
9 weeks of raw TCP dump data
7 weeks for training (5 million connection
records)
2 weeks for training (2 million connection
records)
Connections are labeled as normal or attacks (4
main categories of attacks - 38 attack types)
DOS - Denial Of Service
Probe - e.g. port scanning
U2R - unauthorized access to gain root
privileges,
R2L - unauthorized remote login to machine,
Two types of attacks
Bursty attacks - involve multiple network
connections
Non-bursty attacks - involve single network
connections

13
Feature construction

Three groups of features
Basic features of individual TCP connections
source destination IP/port, protocol, number of
bytes, duration, number of packets (used in SNORT
only in stream builder)
Time based features
For the same source (destination) IP address,
number of unique destination (source) IP
addresses inside the network in last T seconds
Number of connections from source (destination)
IP to the same destination (source) port in last
T seconds
Connection based features
For the same source (destination) IP address,
number of unique destination (source) IP
addresses inside the network in last N
connections
Number of connections from source (destination)
IP to the same destination (source) port in last
N connections

14
MINDS Outlier Detection on DARPA98 Data
ROC curves for bursty attacks
LOF approach is consistently better than other
approaches Unsupervised SVMs are good but only
for high false alarm (FA) rate NN approach is
comparable to LOF for low FA rates, but detection
rate decrease for high FA Mahalanobis-distance
approach poor due to multimodal normal behavior
ROC curves for single-connection attacks
LOF approach is superior to other outlier
detection schemes Majority of single connection
attacks are probably located close to the dense
regions of the normal data
15
Outlier Detection Recent Results (on DARPA98
data)

Analyzing multi-connection attacks using the
score values assigned to network connections
Detection rate is measured through number of
connections that have score higher than 0.5

Low peaks due to occasional reset value for the
feature called connection status
16
Recently Detected Real-life Attacks

During the past few months various
intrusive/suspicious activities were detected at
the AHPCRC and at the U of Minnesota using MINDS
A sample of top ranked anomalies/attacks picked
by MINDS
August 13, 2002
Detected scanning for Microsoft DS service on
port 445/TCP (Ranked 1)
Reported by CERT as recent DoS attacks that needs
further analysis (CERT August 9, 2002)
Undetected by SNORT since the scanning was
non-sequential (very slow)

Number of scanning activities on Microsoft DS
service on port 445/TCP reported in the World
(Source www.incidents.org)
17
Recently Detected Real-life Attacks (ctd)

A sample of top ranked anomalies/attacks picked
by MINDS
August 13, 2002
Detected scanning for Oracle server (Ranked 2)
Reported by CERT, June 13, 2002
First detection of this attack type by our
University
Undetected by SNORT because the scanning was
hidden within another Web scanning
August 8, 2002
Identified machine that was running Microsoft
PPTP VPN server on non-standard ports, which is a
policy violation (Ranked 1)
Undetected by SNORT since the collected GRE
traffic was part of the normal traffic
October 30, 2002
Identified compromised machines that were running
FTP servers on non-standard ports, which is a
policy violation (Ranked 1)
Anomaly detection identified this due to huge
file transfer on a non-standard port
Undetectable by SNORT due to the fact there are
no signatures for these activities

18
Recently Detected Real-life Attacks (ctd)

A sample of top ranked anomalies/attacks picked
by MINDS
October 10, 2002
Detected several instances of slapper worm that
were not identified by SNORT since they were
variations of existing warm code
Deteted by MINDS anomaly detection algorithm
since source and destination ports are the same
but non-standard, and slow scan-like behavior for
the source port
Potentially detectable by SNORT using more
general rules, but the false alarm rate will be
too high

Number of slapper worms on port 2002 reported in
the World (Source www.incidents.org)

19
Recently Detected Real-life Attacks (ctd)

Top ranked anomalies/attacks picked by MINDS
October 10, 200
Detected a distributed windows networking scan
from two different source locations (Ranked 1)
Similar distributed scan from 100 machines
scattered around the World happened at University
of Auckland, New Zealand, on August 8, 2002 and
it was reported by CERT, Insecure.org and other
security organizations

Attack sources
Destination IPs
Distributed scanning activity
20
SNORT vs. MINDS Anomaly/Outlier

SNORT has static knowledge manually updated by
human analysts
MINDS anomaly/outlier detection algorithms are
adaptive in nature include infinite number of
rules
MINDS anomaly/outlier detection algorithms san
also be effective in detecting anomalous behavior
originating from a compromised machine

21
SNORT vs. MINDS Anomaly/Outlier

Content-based attacks (e.g. content of the
packet)
SNORT is able to detect only those attacks with
known signatures
Out of scope for MINDS anomaly/detection
algorithms, since they do not use the content of
the packets
Scanning activities
Same source sequential destination scans
SNORT is better than MINDS anomaly/outlier
detection in identifying these attacks, since it
is specifically designed for their detection
Scans with random destinations
MINDS anomaly/outlier detection algorithms
discover them quicker than SNORT since SNORT has
to increase time window (specifies the scanning
threshold) which results in the large memory
requirements
Slow scans
MINDS anomaly/outlier detection identifies them
better than SNORT, since SNORT has to increase
time window which increases processing
requirements

22
SNORT vs. MINDS Anomaly/Outlier

Policy violations (e.g. rogue and unauthorized
services)
MINDS anomaly/outlier detection algorithms are
successful in detecting policy violations, since
they are looking for unusual and suspicious
network behavior
To detect these attacks SNORT has to have a rule
for each specific unauthorized activity, which
causes increase in the number of rules and
therefore the memory requirements

23
MINDS - Framework for Mining Associations
Ranked connections
attack
Discriminating Association Pattern Generator
Anomaly Detection System
normal
update

Build normal profile
Study changes in normal behavior
Create attack summary
Detect misuse behavior
Understand nature of the attack

R1 TCP, DstPort1863 ? Attack R100 TCP,
DstPort80 ? Normal
Knowledge Base
24
Discovered Real-life Association Patterns

Rule 1 SrcIPXXXX, DstPort80, ProtocolTCP,
FlagSYN, NoPackets 3, NoBytes120180
(c1256, c2 1)
Rule 2 SrcIPXXXX, DstIPYYYY, DstPort80,
ProtocolTCP, FlagSYN, NoPackets 3, NoBytes
120180 (c1177, c2 0)

At first glance, Rule 1 appears to describe a Web
scan
Rule 2 indicates an attack on a specific machine
Both rules together indicate that a scan is
performed first, followed by an attack on a
specific machine identified as vulnerable by the
attacker

25
Discovered Real-life Association Patterns(ctd)
DstIPZZZZ, DstPort8888, ProtocolTCP (c1369,
c20)DstIPZZZZ, DstPort8888, ProtocolTCP,
FlagSYN (c1291, c20)

This pattern indicates an anomalously high number
of TCP connections on port 8888 involving machine
ZZZZ
Follow-up analysis of connections covered by the
pattern indicates that this could be a machine
running a variation of the Kazaa file-sharing
protocol
Having an unauthorized application increases the
vulnerability of the system

26
Discovered Real-life Association Patterns(ctd)
SrcIPXXXX, DstPort27374, ProtocolTCP,
FlagSYN, NoPackets4, NoBytes189200 (c1582,
c22) SrcIPXXXX, DstPort12345, NoPackets4,
NoBytes189200 (c1580, c23) SrcIPYYYY,
DstPort27374, ProtocolTCP, FlagSYN,
NoPackets3, NoBytes144 (c1694, c23)

This pattern indicates a large number of scans on
ports 27374 (which is a signature for the
SubSeven worm) and 12345 (which is a signature
for NetBus worm)
Further analysis showed that no fewer than five
machines scanning for one or both of these ports
in any time window

27
Discovered Real-life Association Patterns(ctd)
DstPort6667, ProtocolTCP (c1254, c21)

This pattern indicates an unusually large number
of connections on port 6667 detected by the
anomaly detector
Port 6667 is where IRC (Internet Relay Chat) is
typically run
Further analysis reveals that there are many
small packets from/to various IRC servers around
the world
Although IRC traffic is not unusual, the fact
that it is flagged as anomalous is interesting
This might indicate that the IRC server has been
taken down (by a DOS attack for example) or it is
a rogue IRC server (it could be involved in some
hacking activity)

28
Discovered Real-life Association Patterns(ctd)
DstPort1863, ProtocolTCP, Flag0, NoPackets1,
NoByteslt139 (c1498, c26)DstPort1863,
ProtocolTCP, Flag0 (c1587, c26)DstPort1863,
ProtocolTCP (c1606, c28)

This pattern indicates a large number of
anomalous TCP connections on port 1863
Further analysis reveals that the remote IP block
is owned by Hotmail
Flag0 is unusual for TCP traffic

29
Conclusions

Rare class predictive models improve the
detection of infrequent attack types
MINDS anomaly/outlier detection algorithms are
successful in detection of intrusions that could
not be picked by commercial state of the art
IDS tools (SNORT)
Slow scans and random scans
Policy violations and unauthorized activities
MINDS association patterns can be useful in
creating summaries of detected attacks and
suggesting new signatures

30
Future Work

On-line detection algorithms
Better characterization of normal behavior
Detection of distributed attacks
Insider attacks
Other applications of anomaly detection
Credit card fraud detection
Insurance fraud detection
Transient fault detection for industrial process
control
Detecting individuals with rare medical syndromes
(e.g. cardiac arrhythmia)

Questions?

32
Distance based Outlier Detection Schemes

Nearest Neighbor (NN) approach
For each point compute the distance to the k-th
nearest neighbor dk
Outliers are points that have larger distance dk
and therefore are located in the more sparse
neighborhoods
Mahalanobis-distance based approach
Mahalanobis distance is more appropriate for
computing distances with skewed distributions

Back
33
Density based Outlier Detection Schemes

Local Outlier Factor (LOF) approach
For each point compute the density of local
neighborhood
Compute LOF of example p as the average of the
ratios of the density of example p and the
density of its nearest neighbors
Outliers are points with the largest LOF value

In the NN approach, p2 is not considered as
outlier, while the LOF approach find both p1 and
p2 as outliers
Back
34
Unsupervised Support Vector Machines for Outlier
Detection

Unsupervised SVMs attempt to separate the entire
set of training data from the origin, i.e. to
find a small region where most of the data lies
and label data points in this region as one class
Parameters
Expected number of outliers
Variance of rbf kernel
As the variance of the rbf kernel gets smaller,
the separating surface gets more complex

push the hyper plane away from origin as much as
possible
Back
35
SNORT signature based Network IDS

SNORT (www.snort.org) is an open source Network
Intrusion Detection System (IDS) based on
signatures
SNORT contains anomaly detector SPADE
(Statistical Packet Anomaly Detection Engine)
usually turned off due to high false alarm rate
SNORT may be configured in one of the following
modes
sniffer mode reads the packets from the network
and displays them for you in a continuous stream
on the console
packet logger mode logs the packet to the disk
intrusion detection mode - analyzes network
traffic for matches against a user defined rule
set and perform several actions based upon what
it sees.

Back
36
SPADE SNORT Anomaly Detection

SPADE is a SNORT preprocessor plugin which sends
alerts of anomalous packet through standard SNORT
reporting mechanisms (the fewer times that a
particular kind of packet has occurred in the
past, the higher its anomaly score will be)
It is a part of SPICE (Stealthy Probing and
Intrusion Correlation Engine) project at
www.silicondefense.com
SPICE consists of two parts
SPADE that act as an anomaly sensor engine and
report anomalous events to event correlator
event correlator that groups these events
together and send out reports of unusual activity
(e.g., portscans)

Back
37
Recently detected real-life attacks

http//www.cert.org/current/current_activity.html
Microsoft-DS
Microsoft-DS (445/tcp) Activityupdated August 9
added August 9
We have received reports of widespread scanning
and possible denial of service activity targeted
at the Microsoft-DS service on port 445/tcp. We
are interested in receiving reports of this
activity from sites with detailed logs and
evidence of an attack. Please send all reports to
cert_at_cert.org

Back

Write a Comment

User Comments (0)