Benjamin Mayer bmayercs.umn.edu - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Benjamin Mayer bmayercs.umn.edu

Description:

Analyzing Long term Network Data for Cyber Attacks using HPC ... Read and distribute data one chunk at a time. Loop (To calculate similarity matrix) ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 37
Provided by: wwwuser
Category:
Tags: benjamin | bmayercs | chunk | edu | mayer | umn

less

Transcript and Presenter's Notes

Title: Benjamin Mayer bmayercs.umn.edu


1
Benjamin Mayerbmayer_at_cs.umn.edu
Analyzing Long term Network Data for Cyber
Attacks using HPC -A Comparison of MPI and UPC
Implementations
  • Army High Performance Computing Research Center
  • University of Minnesota
  • http//www.cs.umn.edu/research/minds/
  • Collaborators Eric Eilertson, Levent Ertoz,
  • Vipin Kumar

2
Overview
  • Problem Description
  • End Goal
  • Network Intrusion Detection
  • Shared Nearest Neighbor
  • Solution Technology Description
  • MPI
  • UPC
  • Implementation Discussion
  • Results
  • Time to Completion
  • Performance of Implementations
  • Problems / Oddities

3
End Goal
  • We want solutions to our problems as quickly as
    possible
  • What are the things that take time
  • Programming
  • Computing
  • Decision of developing serial or parallel code
    (or extending serial to parallel) depends on the
    time spent in each phase

4
Information Assurance
  • Sophistication of cyber attacks and their
    severity is increasing
  • ARL, the Army, DOD and Other U.S. Government
    Agencies are major targets for sophisticated
    state sponsored cyber terrorists
  • Cyber strategies can be a major force multiplier
    and equalizer
  • Across DoD, computer assets have been
    compromised, information has been stolen, putting
    technological advantage and battlefield
    superiority at risk
  • Security mechanisms always have inevitable
    vulnerabilities
  • Firewalls are not sufficient to ensure security
    in computer networks
  • Insider attacks

Incidents Reported to Computer Emergency Response
Team/Coordination Center
Spread of SQL Slammer worm 10 minutes after its
deployment
5
Information Assurance
  • Intrusion Detection System
  • Combination of software and hardware that
    attempts to perform intrusion detection
  • Raises the alarm when possible intrusion happens
  • Traditional intrusion detection system IDS tools
    are based on signatures of known attacks
  • Limitations
  • Signature database has to be manually revised
    for each new type of discovered intrusion
  • Substantial latency in deployment of newly
    created signatures across the computer system
  • They cannot detect emerging cyber threats
  • Not suitable for detecting policy violations and
    insider abuse
  • Do not provide understanding of network traffic
  • Generate too many false alarms

Example of SNORT rule (MS-SQL Slammer
worm) any -gt udp port 1434 (content"81 F1 03 01
04 9B 81 F1 01" content"sock" content"send")
www.snort.org
6
Data Mining for Intrusion Detection
  • Increased interest in data mining based intrusion
    detection
  • Attacks for which it is difficult to build
    signatures
  • Unforeseen/Unknown/Emerging attacks
  • Misuse detection
  • Building predictive models from labeled data sets
    (instances are labeled as normal or
    intrusive) to identify known intrusions
  • High accuracy in detecting many kinds of known
    attacks
  • Cannot detect unknown and emerging attacks
  • Anomaly detection
  • Detect novel attacks as deviations from normal
    behavior
  • Potential high false alarm rate - previously
    unseen (yet legitimate) system behaviors may also
    be recognized as anomalies

7
Modes of Behavior
  • We want to detect people using the network in
    inappropriate ways. Finding different modes of
    behavior is the approach we will take.
  • Clustering is good for grouping things which are
    similar i.e. modes of behavior.
  • Being able to deal with noise is also a desirable
    attribute.
  • We chose the Shared Nearest Neighbor (SNN)
    clustering algorithm.

8
Computing SNN
  • There are two computational components of the SNN
    algorithm.
  • Compute distances from all points to all other
    points. O(N2)
  • Using the similarity scores from the previous
    step to find how many neighbors two nodes share.
    Based on the number of shared neighbors. O(N)
  • Concentrate on making the first step run in
    parallel because it is more computationally
    intense.
  • The data storage only requires storing an N x K
    matrix
  • K (number of neighbors) is typically between 10
    20.
  • K should be about the size of the smallest expect
    mode.

9
SNN Clustering
10
SNN Clustering
11
SNN Clustering
12
Parallel SNN Algorithm Overview
  • Very similar to Parallel Matrix Multiply
  • Read and distribute data one chunk at a time
  • Loop (To calculate similarity matrix)
  • Calculate similarity for data in current
    processor
  • Keep k most similar items from all iterations of
    the loop for each local processor
  • Shift data
  • Collect each set of k top items to root processor
  • Run Clustering algorithm

13
Other Applications
  • NASA Earth Science Data
  • Used to find spatial patterns in climate data
  • Document clustering

14
Overview
  • Problem Description
  • End Goal
  • Intrusion Detection
  • Shared Nearest Neighbor
  • Solution Technology Description
  • MPI
  • UPC
  • Implementation Discussion
  • Results
  • Time to Completion
  • Performance of Implementations
  • Problems / Oddities

15
MPI Overview
  • Industry standard 12 years
  • Support on every major platform
  • A generation of parallel programmers know it.
  • Need 6 functions to write a MPI program
  • 4 of these are setup/teardown
  • Other 2 are Send() Recv()
  • Non-blocking variants along with scatter/gather
    types of routines are useful, as well as the need
    for barriers
  • Uses Two-sided Communication
  • Requires a send-recv function pair to move data
    (message passing) from one processor to another
  • Memory space of each process is separate from all
    others

16
MPI Communication
  • Init(argc, argv)
  • size COMM_WORLD.Get_size()
  • rank COMM_WORLD.Get_rank()
  • Server
  • err MPI_Send(sizeof(data), 1, MPI_INT, dest_id,
    tag, MPI_COMM_WORLD)
  • err MPI_Send(data, sizeof(data), MPI_INT,
    dest_id, tag, MPI_COMM_WORLD)
  • Client
  • err MPI_Recv(size, 1, MPI_INT, MPI_ANY_SOURCE,
    tag, MPI_COMM_WORLD, status)
  • data (int)malloc(sizesizeof(int))
  • If(null data) exit(-1)
  • err MPI_Recv(data, size, MPI_INT,
    MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, status)

17
UPC Overview
  • Tech Report in 1999
  • Compilers from several vendors
  • Implementations on SSI and Clusters
  • Few programmers with significant experience.
  • Only need to know how to declare a shared
    variable. For higher performance need to do
    memory copies and barriers
  • Ability to just access memory, no message passing
    needed.
  • Uses a single memory space. Made for SSI
    machines, but works on clusters.

18
UPC Syntax
  • shared block-size type array number-of-elements
  • shared 2 int array8
  • shared int array6

19
Example UPC Vector Addition
  • //From GWU UPC Manual
  • include ltupc.hgt
  • define N 100
  • Shared int v1N, v2N, v1plusv2N
  • void main()
  • int i
  • for(i0 iltN i)
  • if(MYTHREAD iTHREADS)
  • v1plusv2i v1i v2i

20
Example UPC Vector Addition
  • //From GWU UPC Manual
  • include ltupc.hgt
  • define N 100
  • Shared int v1N, v2N, v1plusv2N
  • void main()
  • int i
  • upc_forall(i0 iltN i i)
  • v1plusv2i v1i v2i
  • if(0 MYTHREAD)
  • for(i0 iltN i)
  • printf(d , v1plusv2i)

21
Example UPC Memory Copy
  • int moveLocal(int offset)
  • int i
  • shared SIZE/THREADS long p
  • i (MYTHREAD offset) (THREADS)
  • p elementsi(SIZE/THREADS)
  • upc_memget(localBuffer, p, sizeof(long)(SIZE/THR
    EADS))
  • return 0

22
Overview
  • Problem Description
  • End Goal
  • Intrusion Detection
  • Shared Nearest Neighbor
  • Solution Technology Description
  • MPI
  • UPC
  • Implementation Discussion
  • Results
  • Time to Completion
  • Performance of Implementations
  • Problems / Oddities

23
SNN Results
  • Common behaviors group together
  • VPN
  • Web/HTTP
  • FTP
  • These clusters are large. Consisting of thousands
    of connections.
  • Things which are uncommon (attacks) form small
    clusters or are removed as noise.

24
Detecting Large Modes of Network Traffic Using
Clustering
  • Large clusters of VPN traffic (hundreds of
    connections)
  • Used between forts for secure sharing of data and
    working remotely

25
Detecting Unusual Modes of Network Traffic Using
Clustering
  • Clusters Involving GoToMyPC.com (Army Data)
  • Policy violation, allows remote control of a
    desktop

26
Detecting Unusual Modes of Network Traffic Using
Clustering
  • Clusters involving mysterious ping and SNMP
    traffic

27
Detecting Unusual Modes of Network Traffic Using
Clustering
  • Clusters involving unusual repeated ftp sessions
  • Further investigations revealed a misconfigured
    Army computer was trying to contact Microsoft

28
Implementation MPI and UPC
  • MPI
  • Approximately 3 uninterrupted weeks
  • 430 lines of code
  • Benchmarked on Cray T3E, Cluster, Cray X1
  • Memory scales O(NK)
  • UPC
  • Two weeks one week during classes, another one
    week 2 months after initial development, while
    working
  • 240 lines of code
  • Benchmarked on Cray X1, Cluster compiler failed
    (fixed), Compiled on Apple G5
  • Memory scales O(NK/P)

29
Cray X1
  • Complex architecture
  • Memory is Globally Accessible
  • Vector supercomputer, vector length 64
  • Single Streaming Processor (SSP)
  • Multi-Streaming Processor (MSP) is made up of 4
    SSPs. The compiler defaults to MSP mode and can
    use streaming directives to make execution of a
    code more parallel.

30
SNN Performance MPI vs. UPC
31
MPI Problems
  • Issues making the code properly shift data. This
    would have achieved the O(NK/P) memory
    requirement
  • Three days to port code from cluster to T3E.
  • Compilers do not optimize across the functions
    calls
  • Dead locks due to incorrect ordering of send/recv

32
UPC Problems
  • Difficulty porting code on other machines
  • Compilers
  • Not default installed like MPI
  • Global memory and bad indexing
  • Did not give a core dump (still had good memory
    to write to)
  • Output very close to expected
  • Resulted in a misdirection of debugging effort
  • Explicit copying of data to local memory needed
    for performance.
  • I did not have as good of a feel for bandwidth I
    am using
  • Dynamic memory allocation with block size other
    then 1
  • Need to specify Data Structure size and number of
    processors at compile time.

33
Conclusions
  • MPI is good for portability and maintainability
  • Install base
  • Maturity
  • UPC is good for quickly writing simple
    computationally intense code
  • UPC has a smaller learning curve

34
Other Work Comparing MPI and UPC
  • A study comparing how students learned UPC and
    MPI while implementing tasks.

35
References
  • Book Chapters
  • Ertoz, L., Eilertson, E., Lazarevic, A., Tan, P.,
    Srivastava, J., Kumar, V., Dokas, P. The MINDS -
    Minnesota Intrusion Detection System, "Next
    Generation Data Mining, MIT Press, 2004".
  • Conference papers
  • Ertoz, L., Lazarevic, A.,  Eilertson, E.,
    Lazarevic, A., Tan, P., Dokas, P., Kumar, V.,
    Srivastava, J. Protecting Against Cyber Threats
    in Networked Information Systems, SPIE Annual
    Symposium on AeroSense, Battlespace Digitization
    and Network Centric Systems III, April, 2003,
    Orlando, FL.
  • Lazarevic, A., Ertoz, L., Ozgur, A, Srivastava,
    J., Kumar, V. A Comparative Study of Anomaly
    Detection Schemes in Network Intrusion Detection,
    Proceedings of Third SIAM Conference on Data
    Mining, San Francisco, May. 2003.
  • Dokas, P., Ertoz, L., Kumar, V., Lazarevic, A.,
    Srivastava, J., Tan, P. Data Mining for  Network
    Intrusion Detection, Proc. NSF Workshop on Next
    Generation Data Mining, Baltimore, MD, November
    2002.
  • Lazarevic, A., Dokas, P., Ertoz, L., Kumar, V.,
    Srivastava, J., Tan, P. Cyber Threat Analysis -
    A Key Enabling Technology for the Objective Force
    (A Case Study in Network Intrusion Detection),
    Proceedings 23rd Army Science Conference,
    Orlando, FL, December 2002.
  • Ertoz, L., Eilertson, E., Lazarevic, A., Tan, P.,
    Dokas, P., Srivastava, J., Kumar, V. Detection
    and Summarization of Novel Network Attacks Using
    Data Mining, Technical Report, 2003.

36
Acknowledgements
  • University of Minnesota
  • AHPCRC/NCS
  • ARL
Write a Comment
User Comments (0)
About PowerShow.com