Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory - PowerPoint PPT Presentation

About This Presentation
Title:

Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory

Description:

... a typical air force base network The 1998 evaluation, cont d Collected synthetic traffic data Follow-up Work DETER - Testbed for network security technology. – PowerPoint PPT presentation

Number of Views:1685
Avg rating:3.0/5.0
Slides: 31
Provided by: hon87
Category:

less

Transcript and Presenter's Notes

Title: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory


1
Testing Intrusion Detection Systems A Critic for
the 1998 and 1999 DARPA Intrusion Detection
System Evaluations as Performed by Lincoln
Laboratory
  • By John Mchugh
  • Presented by Hongyu Gao
  • Feb. 5, 2009

2
Outline
  • Lincoln Labs evaluation in 1998
  • Critic on data generation
  • Critic on taxonomy
  • Critic on evaluation process
  • Brief discussion on 1999 evaluation
  • Conclusion

3
The 1998 evaluation
  • The most comprehensive evaluation of research on
    intrusion detection systems that has been
    performed to date

4
The 1998 evaluation contd
  • Objective
  • To provide unbiased measurement of current
    performance levels.
  • To provide a common shared corpus of
    experimental data that is available to a wide
    range of researchers

5
The 1998 evaluation, contd
  • Simulated a typical air force base network

6
The 1998 evaluation, contd
  • Collected synthetic traffic data

7
The 1998 evaluation contd
  • Researchers tested their system using the traffic
  • Receiver Operating Curve (ROC) was used to
    present the result

8
1. Critic on data generation
  • Both background (normal) and attack data are
    synthesized.
  • Said to represent traffic to and from a typical
    air force base.
  • It is required that such synthesized data should
    reflect system performance in realistic
    scenarios.

9
Critic on background data
  • Counter point 1
  • Real traffic is not well-behaved.
  • E.g. spontaneous packet storms that are
    indistinguishable from malicious attempts at
    flooding.
  • Not considered in background traffic

10
Critic on background data, contd
  • Counter point 2
  • Low average data rate

11
Critic on background data, contd
  • Possible negative consequences
  • System may produce larger amount of FP in
    realistic scenario.
  • System may drop packets in realistic scenario

12
Critic on attack data
  • The distribution of attack is not realisitic
  • The number of attacks, which are U2R, R2L, DoS,
    Probing, is of the same order

U2R R2L DoS Probing
114 34 99 64
13
Critic on attack data, contd
  • Possible negative consequences
  • The aggregate detection rate does not reflect the
    detection rate in real traffic

14
Critic on simulated AFB network
  • Not likely to be realistic
  • 4 real machines
  • 3 fixed attack target
  • Flat architecture
  • Possible negative consequence
  • IDS can be tuned to only look at traffic
    targeting to certain hosts
  • Preclude the execution of smurf or ICMP echo
    attack

15
2. Critic on taxonomy
  • Based on the attackers point of view
  • Denial of service
  • Remote to user
  • User to root
  • probing
  • Not useful describing what an IDS might see

16
Critic on taxonomy, contd
  • Alternative taxonomy
  • Classify by protocol layer
  • Classify by whether a completed protocol
    handshake is necessary
  • Classify by severity of attack
  • Many others

17
3. Critic on evaluation
  • The unit of evaluation
  • Session is used
  • Some traffic (e.g. message originating with
    Ethernet hubs) are not in any session
  • Is session an appropriate unit?

18
3. Critic on evaluation
  • Scoring and ROC
  • Denominator?

19
Critic on evaluation, contd
  • An non-standard variation of ROC
  • --Substitue x-axis with false alarms per day
  • Possible problem
  • The number of false alarms per unit time may
    increase significantly with data rate increasing
  • Suggested alternative
  • The total number of alert (both TP and FP)
  • Use the standard ROC

20
Evaluation on Snort
21
Evaluation on Snort, contd
  • Poor performance on Dos and Probe
  • Good performance on R2L and U2R
  • Conclusion on Snort
  • Not sufficient to get any conclusion

22
Critic on evaluation, contd
  • False alarm rate
  • A crucial concern
  • The designated maximum value (0.1) is
    inconsistent with the maximum operator load set
    by Lincoln lab (100/day)

23
Critic on evaluation, contd
  • Does the evaluation result really mean something?
  • ROC curve reflects the ability to detect attack
    against normal traffic
  • What does a good IDS consist of?
  • Algorithm
  • Reliability
  • Good signatures

24
Brief discussion on 1999 evaluation
  • Have some superficial improvements
  • Additional hosts and host types are added
  • New attacks are added
  • None of these addresses the flaws listed above

25
Brief discussion on 1999 evaluation, contd
  • Security policy is not clear
  • What is an attack, what is not?
  • Scan, probe

26
Conclusion
  • The Lincoln lab evaluation is a major and
    impressive effort.
  • This paper criticizes the evaluation from
    different aspects.

27
Follow-up Work
  • DETER - Testbed for network security technology.
  • Public facility for medium-scale repeatable
    experiments in computer security
  • Located at USC ISI and UC Berkeley.
  • 300 PC systems running Utah's Emulab software.
  • Experimenter can access DETER remotely to
    develop, configure, and manipulate collections of
    nodes and links with arbitrary network
    topologies.
  • Problem with this is currently that there isn't
    realistic attack module or background noise
    generator plugin for the framework. Attack
    distribution is a problem.
  • PREDICT - Its a huge trace repository. It is not
    public and there are several legal issues in
    working with it.

28
Follow-up Work
  • KDD Cup - Its goal is to provide data-sets from
    real world problems to demonstrate the
    applicability of dierent knowledge discovery and
    machine learning techniques.
  • The 1999 KDD intrusion detection contest uses a
    labelled version of this 1998 DARPA dataset,
  • Annotated with connection features.
  • There are several problems with KDD Cup.
    Recently, people have found average TCP packet
    sizes as best correlation metrics for attacks,
    which is clearly points out the inefficacy.

29
Discussion
  • Can the aforementioned problems be addressed?
  • Dataset
  • Taxonomy
  • Unit for analysis
  • Approach to compare between IDSes

30
The End
  • Thank you
Write a Comment
User Comments (0)
About PowerShow.com