MINDS: Data Mining Based Network Intrusion Detection System - PowerPoint PPT Presentation

About This Presentation
Title:

MINDS: Data Mining Based Network Intrusion Detection System

Description:

Team Members: Eric Eilertson, Paul Dokas, Levent Ertoz, Ben Mayer, ... UMN Computers doing large transfers via BitTorrent to many outside hosts ... – PowerPoint PPT presentation

Number of Views:166
Avg rating:3.0/5.0
Slides: 31
Provided by: bradbl2
Category:

less

Transcript and Presenter's Notes

Title: MINDS: Data Mining Based Network Intrusion Detection System


1
MINDS Data Mining Based Network Intrusion
Detection System
Vipin Kumar KUMAR_at_cs.umn.edu Army High
Performance Computing Research Center University
of Minnesota http//www.cs.umn.edu/research/min
ds/ Team Members Eric Eilertson, Paul Dokas,
Levent Ertoz, Ben Mayer, Aleksandar Lazarevic,
Michael Steinbach, George Simon, Varun
Chandola, Mark Shaneck, Jaideep Srivastava,
Zhi-Li Zhang, Yongdae Kim, Vipin Kumar
1
2
Information Assurance
  • Sophistication of cyber attacks and their
    severity is increasing
  • ARL, the Army, DOD and Other U.S. Government
    Agencies are major targets for sophisticated
    state sponsored cyber terrorists
  • Cyber strategies can be a major force multiplier
    and equalizer
  • Across DoD, computer assets have been
    compromised, information has been stolen, putting
    technological advantage and battlefield
    superiority at risk
  • Security mechanisms always have inevitable
    vulnerabilities
  • Firewalls are not sufficient to ensure security
    in computer networks
  • Insider attacks

Incidents Reported to Computer Emergency Response
Team/Coordination Center
Spread of SQL Slammer worm 10 minutes after its
deployment
3
Information Assurance
  • Intrusion Detection System
  • Combination of software and hardware that
    attempts to perform intrusion detection
  • Raises the alarm when possible intrusion happens
  • Traditional intrusion detection system IDS tools
    are based on signatures of known attacks
  • Limitations
  • Signature database has to be manually revised
    for each new type of discovered intrusion
  • Substantial latency in deployment of newly
    created signatures across the computer system
  • They cannot detect emerging cyber threats
  • Not suitable for detecting policy violations and
    insider abuse
  • Do not provide understanding of network traffic
  • Generate too many false alarms

Example of SNORT rule (MS-SQL Slammer
worm) any -gt udp port 1434 (content"81 F1 03 01
04 9B 81 F1 01" content"sock" content"send")
www.snort.org
4
Data Mining for Intrusion Detection
  • Increased interest in data mining based intrusion
    detection
  • Attacks for which it is difficult to build
    signatures
  • Unforeseen/Unknown/Emerging attacks
  • Misuse detection
  • Building predictive models from labeled labeled
    data sets (instances are labeled as normal or
    intrusive) to identify known intrusions
  • High accuracy in detecting many kinds of known
    attacks
  • Cannot detect unknown and emerging attacks
  • Anomaly detection
  • Detect novel attacks as deviations from normal
    behavior
  • Potential high false alarm rate - previously
    unseen (yet legitimate) system behaviors may also
    be recognized as anomalies

5
Data Mining for Intrusion Detection
Training Set
continuous
categorical
categorical
temporal
  • Misuse Detection Building Predictive Models

class
  • Key Technical Challenges
  • Large data size
  • High dimensionality
  • Temporal nature of the data
  • Skewed class distribution
  • Data preprocessing
  • On-line analysis

Test Set
Learn Classifier
Summarization of attacks using association rules
Anomaly Detection
Rules Discovered Src IP 206.163.37.95, Dest
Port 139, Bytes ? 150, 200 --gt ATTACK
6
Data Mining for Intrusion Detection
Training Set
continuous
categorical
categorical
temporal
Misuse Detection Building Predictive Models
class
  • Key Technical Challenges
  • Large data size
  • High dimensionality
  • Temporal nature of the data
  • Skewed class distribution
  • Data preprocessing
  • On-line analysis

Test Set
Learn Classifier
Summarization of attacks using association rules
Anomaly Detection
Rules Discovered Src IP 206.163.37.95, Dest
Port 139, Bytes ? 150, 200 --gt ATTACK
7
MINDS Minnesota INtrusion Detection System
MINDS system
Association pattern analysis
Summary and characterizationof attacks
Anomaly scores
network
Detected novel attacks
Anomaly detection

Humananalyst
  • Net flow tools
  • tcpdump

Data capturing device
Labels
Known attack detection
Detected known attacks
Feature Extraction
Filtering
  • Data mining based intrusion detection system
  • Incorporated into Interrogator architecture at
    ARL Center for Intrusion Monitoring and
    Protection (CIMP)
  • Helps analyze data from multiple sensors at DoD
    sites around the country
  • MINDS anomalies are used as the primary key when
    viewing related alerts from other tools (SNORT,
    Jids, etc.)
  • MINDS is the first effective anomaly intrusion
    detection system used by ARL
  • Routinely detects attacks and intrusive behavior
    not detected by widely used intrusion detection
    systems
  • Insider Abuse / Policy Violations / Worms / Scans

8
Feature Extraction Module
  • Three groups of features
  • Basic features of individual TCP connections
  • source destination IP - Features 1 2
  • source destination port - Features 3 4
  • Protocol Feature 5
  • Duration Feature 6
  • Bytes per packets Feature 7
  • number of bytes Feature 8
  • Time based features
  • For the same source (destination) IP address,
    number of unique destination (source) IP
    addresses inside the network in last T seconds
    Features 9 (13)
  • Number of connections from source (destination)
    IP to the same destination (source) port in last
    T seconds Features 11 (15)
  • Connection based features
  • For the same source (destination) IP address,
    number of unique destination (source) IP
    addresses inside the network in last N
    connections - Features 10 (14)
  • Number of connections from source (destination)
    IP to the same destination (source) port in last
    N connections - Features 12 (16)

9
Detection of Anomalies on Real Network Data
  • Anomalies/attacks picked by MINDS include
    scanning activities, worms, and non-standard
    behavior such as policy violations and insider
    attacks. Many of these attacks detected by MINDS,
    have already been on the CERT/CC list of recent
    advisories and incident notes.
  • Some illustrative examples of intrusive behavior
    detected using MINDS at U of M
  • Scans
  • Detected scanning for Microsoft DS service on
    port 445/TCP
  • Undetected by SNORT since the scanning was
    non-sequential (very slow). Rule added to SNORT
    in September 2002
  • Detected scanning for Oracle server
  • Undetected by SNORT because the scanning was
    hidden within another Web scanning
  • Detected a distributed windows networking scan
    from multiple source locations
  • Policy Violations
  • Identified machine running Microsoft PPTP VPN
    server on non-standard ports
  • Undetected by SNORT since the collected GRE
    traffic was part of the normal traffic
  • Identified compromised machines running FTP
    servers on non-standard ports, which is a policy
    violation
  • Example of anomalous behavior following a
    successful Trojan horse attack
  • Detected computers on the network apparently
    communicating with outside computers over a VPN
    or on IPv6
  • Worms
  • Detected several instances of slapper worm that
    were not identified by SNORT since they were
    variations of existing worm code
  • Detected unsolicited ICMP ECHOREPLY messages to a
    computer previously infected with Stacheldract
    worm (a DDos agent)

10
Typical Anomaly Detection Output
  • January 26, 2003 (48 hours after the slammer
    worm)
  • Anomalous connections that correspond to the
    slammer worm
  • Anomalous connections that correspond to the ping
    scan
  • Connections corresponding to UM machines
    connecting to half-life game servers

11
Summarization Using Association Patterns
Ranked connections
attack
Discriminating Association Pattern Generator
Anomaly Detection System
normal
update
  1. Build normal profile
  2. Study changes in normal behavior
  3. Create attack summary
  4. Detect misuse behavior
  5. Understand nature of the attack

R1 TCP, DstPort1863 ? Attack R100 TCP,
DstPort80 ? Normal
Knowledge Base
12
Typical MINDS Output
  • UM computer connecting to a remote FTP server,
    running on port 5002
  • Summarized TCP reset packets received from
    64.156.X.74, which is a victim of DoS attack, and
    we were observing backscatter, i.e. replies to
    spoofed packets
  • Summarization of FTP scan from a computer in
    Columbia, 200.75.X.2
  • Summary of IDENT lookups, where a remote computer
    tries to get user name
  • Summarization of a USENET server transferring a
    large amount of data

13
Typical MINDS Output
  • UM computers doing bulk transfers
  • Attack on Real-Media server (Reported by CERT on
    September 9, 2003, RealNetworks media server
    RTSP protocol parser buffer overflow)
  • 8200/tcp traffic related to gotomypc.com which
    allows users to remotely control a desktop
    (involves a third party)
  • Mysterious traffic currently being investigated

14
Typical MINDS Output
  • UMN computers doing bulk transfers
  • 160.94.122.142 is running a rogue FTP server on
    60000/TCP
  • UMN Computers doing large transfers via
    BitTorrent to many outside hosts
  • This computer is scanning for computers on port
    139/TCP. Majority of the packets are 192bytes or
    144bytes, except for the second summary (score
    88.2)
  • UMN computer running a RealMedia server, that was
    not known to the analyst
  • Odd looking P2P traffic to/from a UMN computer
    (potentially KaZaA or Gnutella)
  • The remote computer was scanning for 57/TCP,
    where RESET packets are sent back from computers
    that do not have 57/TCP open.

15
Scan Detection
  • Despite the importance of scan detection its
    value is often overlooked
  • Lack of good tools for scan detection
  • Existing methods either miss stealth scans or
    give too many false alarms
  • Fast scans are easy to catch using existing
    schemes but stealth scans are very difficult to
    recognize
  • MINDS employs our new methodology for detecting
    network scans
  • Makes use of powerful new heuristics
  • Only considers flows with a small number of
    packets
  • Only considers scans in a subnet (not the whole
    internet)
  • Makes effective use of usage information
  • Touches to rare IP / port combinations are more
    suspicious than others
  • A scanner will hit machines where the service is
    not available resulting in a low count
  • Very low False Alarm rate
  • Evaluation of 36 million flows over a 30-minute
    window at the University of Minnesota showed 2583
    alarms but only 22 false alarms
  • Evaluation on an hour of data at the ARL showed
    1150 scans report, but only 5 false alarms
  • Routinely finds compromised machines at ARL-CIMP

16
Detecting Suspicious Ports for Possible Worm
Activity
  • We find destinations located within the network
    for which there is a high connection failure rate
    on specific ports for inbound, non-scan
    connections
  • Then we find ports on which there are many such
    destinations
  • The existence of these ports indicates a
    potential worm or slow scan
  • This warrants targeted and more detailed data
    collection and analysis that cannot be done
    easily on the entire data
  • Packet content analysis
  • Signature generation

17
IP / port pairs for which a large percentage of
connections failed
18
IP / port pairs for which a large percentage of
connections failed (only for ports with many hits)
19
(No Transcript)
20
999 unique sources (Min1, Max28, Avg1) 1126
unique destinations (Min1, Max55, Avg1) 1516
total flows involved 1472 scan flows on port 80
(found by scan detector)
21
(No Transcript)
22
7982 unique sources (Min1, Max16, Avg1) 6184
unique destinations (Min1, Max28, Avg1) 9930
total flows involved 9406 scan flows on port 445
(found by scan detector)
23
(No Transcript)
24
Clustering
  • Useful for detecting modes of behavior
  • Shared Nearest Neighbor (SNN) clustering works
    quite well at determining modes of behavior
  • Not distracted by noise in the data
  • SNN is CPU intensive, O(N2)
  • Requires storing an N x K matrix
  • K (number of neighbors) is typically between 10
    20
  • K should be about the size of the smallest expect
    mode
  • Clustered 850,000 connections collected over one
    hour at one US Army Fort
  • Took 10 hours using 3 Quad 2.8 Ghz Servers, and 4
    2 Ghz workstations (total of 16 CPUs)
  • Required around 100 Meg of memory per PE for the
    distance calculations
  • 500 Meg of memory for the final clustering step
    on a single PE
  • Found 3135 clusters
  • Largest clusters around 500 records, smallest
    cluster 10 records

25
Detecting Large Modes of Network Traffic Using
Clustering
  • Large clusters of VPN traffic (hundreds of
    connections)
  • Used between forts for secure sharing of data and
    working remotely

26
Detecting Unusual Modes of Network Traffic Using
Clustering
  • Clusters Involving GoToMyPC.com (Army Data)
  • Policy violation, allows remote control of a
    desktop

27
Detecting Unusual Modes of Network Traffic Using
Clustering
  • Clusters involving mysterious ping and SNMP
    traffic

28
Detecting Unusual Modes of Network Traffic Using
Clustering
  • Clusters involving unusual repeated ftp sessions
  • Further investigations revealed a misconfigured
    Army computer was trying to contact Microsoft

29
MINDS CRITICAL TO COMPLETE FUNCTIONALITY
MINDS CRITICAL TO COMPLETE FUNCTIONALITY
Scans with Automatic Virus Attacks
Packet-Based Signature Detection
Header Analysis
Behavior Analysis (MINDS)
Viruses and Worms
Simple Scans
Anomaly Detection and New Attacks
New and Variant Attacks
Scans with Target Responses
Compromises
Army Research Laboratory (ARL), supported by the
AHPCRC and the MINDS initiative, successfully
monitors and analyzes network data to protect ARL
and its Army and DoD customer infospace
Session-Based Signature Detection
30
Current MINDS Research and Development Work
  • Correlation of suspicious events across network
    sites
  • Helps detect sophisticated attacks not
    identifiable by single site analyses
  • Scalable anomaly detection
  • Distributed correlation algorithms
  • Grids middleware
  • Analysis of long term data (months/years)
  • Uncover suspicious stealth activities (e.g.
    insiders leaking/modifying information)
Write a Comment
User Comments (0)
About PowerShow.com