Title: Gnort: High Performance Intrusion Detection Using Graphics Processors
1Gnort High Performance Intrusion Detection Using
Graphics Processors
- Giorgos Vasiliadis, Spiros Antonatos, Michalis
Polychronakis, Evangelos Markatos, Sotiris
Ioannidis - Institute of Computer Science
- Foundation for Research and Technology Hellas
2General Idea
- How to speed up the processing throughput of
intrusion detection systems by offloading the
pattern matching operations to the GPU.
3Introduction
- The problem
- Network Intrusion Detection Systems (NIDS) are
based on String Matching for detecting and
preventing from well-known attacks - String Matching process accounts up to 75 of the
total CPU processing - String Matching Algorithms
- Aho-Corasick
- Specialized hardware devices (NP, FPGAs, ASICs)
- Complex to modify and program
- Poor flexibility
- Graphics Cards
- Easy to program
- Powerful and ubiquitous
- Researches have begun exploring ways to tap their
power for non-graphics applications
4Why use the GPU ?
- The GPU is specialized for compute-intensive,
highly parallel computation
5NVIDIA GeForce SIMD Architecture
- Many Multiprocessors
- Each multiprocessor contains many Stream
Processors - Memory model
- Shared On-Chip Memory
- 1 cycle
- Constant Memory
- 400-600 cycles 1 cycle if cached
- Texture Memory
- 400-600 cycles 1 cycle if cached
- Global Device Memory
- 400-600 cycles
Size
GPU can be used as a general purpose processor,
capable of executing many threads in parallel
6The Aho-Corasick Algorithm
- Used in most modern NIDSes
- Scans for multiple patterns simultaneously
- Preprocess all patterns to build a state machine
- The state machine is used to scan for multiple
patterns simultaneously at linear time - Complexity is independent of the number of
patterns
Example Phe, she, his, hers
7Mapping Aho-Corasick on GPU
- How to represent the State Machine ?
- Snort represent each state as an array of
pointers - It is difficult to map them on the GPU memory
- Transform to a 2D array
- Can easily bind to Texture Memory
- Texture fetches are cached
- Aho-Corasick exhibits strong locality of
references - Random access memory read
- The usage of Texture Memory boosts GPU execution
time about 19
8Parallelizing Packet Searching (1/2)
- Assigning a Single Packet to each Multiprocessor
- Each packet is copied to the shared memory of the
Multiprocessor - Stream Processors search different parts of the
packet concurrently - Overlapping computation
- Matching patterns may span consecutive chunks of
the packet - Same amount of work per Stream Processor
- Stream Processors will be synchronized
9Parallelizing Packet Searching (2/2)
- Assigning a Single Packet to each Stream Processor
- Each packet is processed by a different Stream
Processor - No overlapping computation
- Different amount of work per Stream Processor
- Stream processors of the same Multiprocessor will
have to wait until all have finished
10Software Mapping
- Packets are transferred to the GPU in batches
- Performs much better than making each transfer
separately - Packets are stored to a buffer that is copied to
the GPU when gets full - Use page-locked memory to store the packets
- Higher transfer throughput from host to device
- Copies are performed using DMA, without occupying
the CPU - CPU and GPU execution can overlap
11Evaluation (1/2)
- Scalability as a function of the number of
patterns
- We ran Snort using random generated patterns
- All patterns are matched against every packet
- Payload trace contained UDP 800-bytes packets of
random payload - Throughput remains constant when patterns
increases - 2.4x faster than the CPU
12Evaluation (2/2)
- Throughput as a function of the packets size
- Ran Snort using 1000 random patterns
- All patterns are matched against every packet
- 2.3 Gbit/s for full packets
- 3.2x faster compared to the CPU
- Both GPU implementations do not present
significant differences in performance
13Evaluation with real input and rules
- Experimental setup
- Two PCs connected via a 1 Gbit/s Ethernet switch
- To directly compare with prior work Jacob et
al, we re-implemented the Knuth-Morris-Pratt
(KMP) and Boyer-Moore (BM) algorithms on the GPU.
14Evaluation with real input and rules
- Snort loaded about 8000 patterns.
- Preprocessors and PCRE were disabled
- Original Snort (AC) cannot process all packets in
rates higher than 300 Mbit/s - GPU-assisted Snort (AC1, AC2) begins to loose
packets at 600 Mbit/s - 200 improvement
- KMP and BM algorithms used from Jacob et al
perform worse in all cases
15Conclusion
- Graphics cards can be used effectively to speed
up Network Intrusion Detection Systems. - Low-cost
- Easy programming
- Future work includes
- Transfer the packets directly from the NIC to the
GPU - Utilize multiple GPUs on multi-slot motherboards
16Thank you
- Any questions?
- gvasil_at_ics.forth.gr