Network Intrusion Detection Using Random Forests - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Network Intrusion Detection Using Random Forests

Description:

Total losses of 2004 (reported): $141,496,560. Source: FBI survey for Year 2004. 50% of security breaches are undetected. Source: FBI Statistics for Year 2000. PST2005 ... – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 20
Provided by: jiong
Category:

less

Transcript and Presenter's Notes

Title: Network Intrusion Detection Using Random Forests


1
Network Intrusion Detection Using Random Forests
  • Jiong Zhang
  • Mohammad Zulkernine
  • School of Computing
  • Queen's University
  • Kingston, Ontario, Canada

2
Outline
  • Motivation
  • Intrusion detection system
  • Data mining meets intrusion detection
  • Proposed architecture
  • Challenges and solutions
  • Experimental results
  • Conclusion and future work

3
Motivation
  • Intrusion Prevention System (firewall) can not
    prevent all attacks.

Intruder
Victim
Intruder
Firewall
Internet
4
Motivation (contd.)
  • Statistical data for intrusions
  • Total losses of 2004 (reported) 141,496,560.
  • Source FBI survey for Year 2004
  • 50 of security breaches are undetected.
  • Source FBI Statistics for Year 2000

5
Intrusion Detection Techniques
  • Misuse Detection
  • Extracts patterns of known intrusions
  • Cannot detect novel intrusions
  • Has low false positive rate
  • Anomaly Detection
  • Builds profiles for normal activities
  • Uses the deviations from the profiles to detect
    attacks
  • Can detect unknown attacks
  • Has high false positive rate

6
Network Intrusion Detection System (NIDS)
  • Monitors network traffic to detect intrusions
  • Monitors more targets on a network
  • Detects some attacks that host-based systems miss
  • Does not affect network operations

7
Current NIDS
  • Many current NIDSs (like snort)
  • Rule-based
  • Unable to detect novel attacks
  • High maintenance cost

8
Rule Based vs. Data Mining
  • Rule based systems
  • Data mining based systems

Intrusion Data
Security Experts
Rules
Labeled Data
Data Mining Engine
Patterns
9
Data Mining Meets Intrusion Detection
  • Extract patterns of intrusions for misuse
    detection
  • Build profiles of normal activities for anomaly
    detection
  • Build classifiers to detect attacks
  • Some IDSs have successfully applied data mining
    techniques in intrusion detection

10
Proposed Architecture
Database (On line)
Alarms
Packets
Audited data
  • Networks

Alarmer
Detector
Sensors
On-line Pre- Processors
Feature vectors
Patterns
On line
Off line
Training data
Feature vectors
Pattern Builder
Data Set
Off-line Pre- processor
Database (Off line)
Architecture of the proposed NIDS
11
Random Forests
  • Unsurpassable in accuracy among the current data
    mining algorithms
  • Runs efficiently on large data set with many
    features
  • Gives the estimates of what features are
    important
  • No nominal data problem
  • No over-fitting

12
Imbalanced Intrusion
  • Problems
  • Higher error rate for minority intrusions
  • Some minority intrusions are more dangerous
  • Need to improve the performance for the minority
    intrusions
  • Proposed Solution
  • Down-sample the majority intrusions and
    over-sample the minority intrusions

13
Feature Selection
  • Essential for improving detection rate
  • Reduces the computational cost
  • Many NIDSs select features by intuition or the
    domain knowledge

14
Feature Selection over the KDD99 Dataset
  • Calculate variable importance using random
    forests.
  • Select the 38 most important features in
    detection.

15
Some Features
  • The two most important features
  • Feature 3. service type, such as http, telnet,
    and ftp
  • Feature 23. count, connections to the same
    host as the current one during past two seconds
  • The three least important features
  • Feature 7. land, 1 if connection is from/to the
    same host/port 0 otherwise
  • Feature 20. num_outbound_cmds, of outbound
    commands in an ftp session
  • Feature 21. is_hot_login, 1 if the login belongs
    to the hot list 0 otherwise

16
Parameter Optimization for Random Forests
  • Optimize the parameter Mtry of random forests to
    improve detection rate.
  • Choose 15 as the optimal value, which reaches the
    minimum of the oob error rate.

17
Performance Comparison on the KDD99 Dataset
  • Our approach provides lower overall error rate
    and cost compared to the best KDD99 result.
  • Feature selection can improve the performance of
    intrusion detection.

18
Conclusion and Future Work
  • Random forests algorithm can help improve
    detection performance and select features.
  • Sampling techniques can reduce the time to build
    patterns and increase the detection rate of
    minority intrusions.
  • In future, we will focus on anomaly detection and
    a multiple classifier architecture.

19
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com